Jan Beulich [Thu, 13 Jun 2024 14:54:17 +0000 (16:54 +0200)]
x86/EPT: avoid marking non-present entries for re-configuring
For non-present entries EMT, like most other fields, is meaningless to
hardware. Make the logic in ept_set_entry() setting the field (and iPAT)
conditional upon dealing with a present entry, leaving the value at 0
otherwise. This has two effects for epte_get_entry_emt() which we'll
want to leverage subsequently:
1) The call moved here now won't be issued with INVALID_MFN anymore (a
respective BUG_ON() is being added).
2) Neither of the other two calls could now be issued with a truncated
form of INVALID_MFN anymore (as long as there's no bug anywhere
marking an entry present when that was populated using INVALID_MFN).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Thu, 13 Jun 2024 14:53:34 +0000 (16:53 +0200)]
x86/EPT: correct special page checking in epte_get_entry_emt()
mfn_valid() granularity is (currently) 256Mb. Therefore the start of a
1Gb page passing the test doesn't necessarily mean all parts of such a
range would also pass. Yet using the result of mfn_to_page() on an MFN
which doesn't pass mfn_valid() checking is liable to result in a crash
(the invocation of mfn_to_page() alone is presumably "just" UB in such a
case).
Fixes: ca24b2ffdbd9 ("x86/hvm: set 'ipat' in EPT for special pages") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jens Wiklander [Mon, 10 Jun 2024 06:53:43 +0000 (08:53 +0200)]
xen/arm: ffa: support notification
Add support for FF-A notifications, currently limited to an SP (Secure
Partition) sending an asynchronous notification to a guest.
Guests and Xen itself are made aware of pending notifications with an
interrupt. The interrupt handler triggers a tasklet to retrieve the
notifications using the FF-A ABI and deliver them to their destinations.
Update ffa_partinfo_domain_init() to return error code like
ffa_notif_domain_init().
Jens Wiklander [Mon, 10 Jun 2024 06:53:42 +0000 (08:53 +0200)]
xen/arm: add and call tee_free_domain_ctx()
Add tee_free_domain_ctx() to the TEE mediator framework.
tee_free_domain_ctx() is called from arch_domain_destroy() to allow late
freeing of the d->arch.tee context. This will simplify access to
d->arch.tee for domains retrieved with rcu_lock_domain_by_id().
Jens Wiklander [Mon, 10 Jun 2024 06:53:41 +0000 (08:53 +0200)]
xen/arm: add and call init_tee_secondary()
Add init_tee_secondary() to the TEE mediator framework and call it from
start_secondary() late enough that per-cpu interrupts can be configured
on CPUs as they are initialized. This is needed in later patches.
Jens Wiklander [Mon, 10 Jun 2024 06:53:40 +0000 (08:53 +0200)]
xen/arm: allow dynamically assigned SGI handlers
Updates so request_irq() can be used with a dynamically assigned SGI irq
as input. This prepares for a later patch where an FF-A schedule
receiver interrupt handler is installed for an SGI generated by the
secure world.
>From the Arm Base System Architecture v1.0C [1]:
"The system shall implement at least eight Non-secure SGIs, assigned to
interrupt IDs 0-7."
gic_route_irq_to_xen() don't gic_set_irq_type() for SGIs since they are
always edge triggered.
gic_interrupt() is updated to route the dynamically assigned SGIs to
do_IRQ() instead of do_sgi(). The latter still handles the statically
assigned SGI handlers like for instance GIC_SGI_CALL_FUNCTION.
Jens Wiklander [Mon, 10 Jun 2024 06:53:39 +0000 (08:53 +0200)]
xen/arm: ffa: simplify ffa_handle_mem_share()
Simplify ffa_handle_mem_share() by removing the start_page_idx and
last_page_idx parameters from get_shm_pages() and check that the number
of pages matches expectations at the end of get_shm_pages().
Jens Wiklander [Mon, 10 Jun 2024 06:53:38 +0000 (08:53 +0200)]
xen/arm: ffa: use ACCESS_ONCE()
Replace read_atomic() with ACCESS_ONCE() to match the intended use, that
is, to prevent the compiler from (via optimization) reading shared
memory more than once.
Jens Wiklander [Mon, 10 Jun 2024 06:53:37 +0000 (08:53 +0200)]
xen/arm: ffa: refactor ffa_handle_call()
Refactors the large switch block in ffa_handle_call() to use common code
for the simple case where it's either an error code or success with no
further parameters.
Jan Beulich [Wed, 12 Jun 2024 12:31:21 +0000 (14:31 +0200)]
x86/physdev: replace physdev_{,un}map_pirq() checking against DOMID_SELF
It's hardly ever correct to check for just DOMID_SELF, as guests have
ways to figure out their domain IDs and hence could instead use those as
inputs to respective hypercalls. Note, however, that for ordinary DomU-s
the adjustment is relaxing things rather than tightening them, since
- as a result of XSA-237 - the respective XSM checks would have rejected
self (un)mapping attempts for other than the control domain.
Since in physdev_map_pirq() handling overall is a little easier this
way, move obtaining of the domain pointer into the caller. Doing the
same for physdev_unmap_pirq() is just to keep both consistent in this
regard.
Fixes: 0b469cd68708 ("Interrupt remapping to PIRQs in HVM guests") Fixes: 9e1a3415b773 ("x86: fixes after emuirq changes") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:30:40 +0000 (14:30 +0200)]
x86/irq: limit interrupt movement done by fixup_irqs()
The current check used in fixup_irqs() to decide whether to move around
interrupts is based on the affinity mask, but such mask can have all bits set,
and hence is unlikely to be a subset of the input mask. For example if an
interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
an all set CPU mask would cause that interrupt to be shuffled around
unconditionally.
What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
input CPU mask, and for that purpose it should check whether the interrupt is
assigned to a CPU not present in the input mask. Assume that ->arch.cpu_mask
is a subset of the ->affinity mask, and keep the current logic that resets the
->affinity mask if the interrupt has to be shuffled around.
Doing the affinity movement based on ->arch.cpu_mask requires removing the
special handling to ->arch.cpu_mask done for high priority vectors, otherwise
the adjustment done to cpu_mask makes them always skip the CPU interrupt
movement.
While there also adjust the comment as to the purpose of fixup_irqs().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:30:06 +0000 (14:30 +0200)]
x86/irq: describe how the interrupt CPU movement works
The logic to move interrupts across CPUs is complex, attempt to provide a
comment that describes the expected behavior so users of the interrupt system
have more context about the usage of the arch_irq_desc structure fields.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:29:31 +0000 (14:29 +0200)]
x86/smp: do not use shorthand IPI destinations in CPU hot{,un}plug contexts
Due to the current rwlock logic, if the CPU calling get_cpu_maps() does
so from a cpu_hotplug_{begin,done}() region the function will still
return success, because a CPU taking the rwlock in read mode after
having taken it in write mode is allowed. Such corner case makes using
get_cpu_maps() alone not enough to prevent using the shorthand in CPU
hotplug regions.
Introduce a new helper to detect whether the current caller is between a
cpu_hotplug_{begin,done}() region and use it in send_IPI_mask() to restrict
shorthand usage.
Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Wed, 12 Jun 2024 08:52:56 +0000 (10:52 +0200)]
MAINTAINERS: alter EFI section
To get past the recurring friction on the approach to take wrt
workarounds needed for various firmware flaws, I'm stepping down as the
maintainer of our code interfacing with EFI firmware. Two new
maintainers are being introduced in my place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Marek Marczykowski <marmarek@invisiblethingslab.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
This tests if QEMU works in PVH dom0. QEMU in dom0 requires enabling TUN
in the kernel, so do that too.
Add it to both x86 runners, similar to the PVH domU test.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Mon, 10 Jun 2024 11:29:25 +0000 (13:29 +0200)]
x86/pvh: declare PVH dom0 supported with caveats
PVH dom0 is functionally very similar to PVH domU except for the domain
builder and the added set of hypercalls available to it.
The main concern with declaring it "Supported" is the lack of some features
when compared to classic PV dom0, hence switch it's status to supported with
caveats. List the known missing features, there might be more features missing
or not working as expected apart from the ones listed.
Note there's some (limited) PVH dom0 testing on both osstest and gitlab.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Mon, 10 Jun 2024 08:34:05 +0000 (10:34 +0200)]
x86/domain: deviate violation of MISRA C Rule 20.12
MISRA C Rule 20.12 states: "A macro parameter used as an operand to
the # or ## operators, which is itself subject to further macro replacement,
shall only be used as an operand to these operators".
In this case, builds where CONFIG_COMPAT=y the fpu_ctxt
macro is used both as a regular macro argument and as an operand for
stringification in the expansion of CHECK_FIELD_.
This is deviated using a SAF-x-safe comment.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Mon, 10 Jun 2024 08:33:22 +0000 (10:33 +0200)]
x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count
When adjusting move_cleanup_count to account for CPUs that are offline also
adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
those again and create an imbalance in move_cleanup_count.
Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Nicola Vetrini [Sat, 1 Jun 2024 10:16:56 +0000 (12:16 +0200)]
xen: fix MISRA regressions on rule 20.9 and 20.12
Commit ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of
rearrangements") introduced new violations on previously clean rules 20.9 and
20.12 (clean on ARM only, right now).
The first is introduced because CONFIG_CC_IS_CLANG in xen/self-tests.h is not
defined in the configuration under analysis. Using "defined()" instead avoids
relying on the preprocessor's behaviour upon encountering an undedfined identifier
and addresses the violation.
The violation of Rule 20.12 is due to "val" being used both as an ordinary argument
in macro RUNTIME_CHECK, and as a stringification operator.
No functional change.
Fixes: ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of rearrangements") Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 24 May 2024 19:37:50 +0000 (20:37 +0100)]
xen/bitops: Rearrange the top of xen/bitops.h
The #include <asm/bitops.h> can move to the top of the file now now that
generic_ffs()/generic_fls() have been untangled.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Sat, 9 Mar 2024 02:44:56 +0000 (02:44 +0000)]
xen/bitops: Clean up ffs64()/fls64() definitions
Implement ffs64() and fls64() as plain static inlines, dropping the ifdefary
and intermediate generic_f?s64() forms.
Add tests for all interesting bit positions at 32bit boundaries.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 14 Mar 2024 23:31:11 +0000 (23:31 +0000)]
x86/bitops: Improve arch_ffs() in the general case
The asm in arch_ffs() is safe but inefficient.
CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input. It is safe to pre-load the destination register with -1 and drop the
conditional logic.
However, it is common to find ffs() in a context where the optimiser knows
that x is non-zero even if it the value isn't known precisely. In this case,
it's safe to drop the preload of -1 too.
There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-92 (-92)
Function old new delta
mask_write 121 113 -8
xmem_pool_alloc 1076 1056 -20
test_bitops 390 358 -32
pt_update_contig_markers 1236 1204 -32
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 31 Jan 2024 18:31:16 +0000 (18:31 +0000)]
xen/bitops: Implement ffs() in common logic
Perform constant-folding unconditionally, rather than having it implemented
inconsistency between architectures.
Confirm the expected behaviour with compile time and boot time tests.
For non-constant inputs, use arch_ffs() if provided but fall back to
generic_ffsl() if not. In particular, RISC-V doesn't have a builtin that
works in all configurations.
For x86, rename ffs() to arch_ffs() and adjust the prototype.
For PPC, __builtin_ctz() is 1/3 of the size of size of the transform to
generic_fls(). Drop the definition entirely. ARM too benefits in the general
case by using __builtin_ctz(), but less dramatically because it using
optimised asm().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 24 May 2024 12:36:25 +0000 (13:36 +0100)]
xen/bitops: Implement generic_ffsl()/generic_flsl() in lib/
generic_ffs()/generic_fls*( being static inline is the cause of lots of the
complexity between the common and arch-specific bitops.h
They appear to be static inline for constant-folding reasons (ARM), but there
are better ways to achieve the same effect.
It is presumptuous that an unrolled binary search is the right algorithm to
use on all microarchitectures. Indeed, it's not for the eventual users, but
that can be addressed at a later point.
It is also nonsense to implement the int form as the base primitive and
construct the long form from 2x int in 64-bit builds, when it's just one extra
step to operate at the native register width.
Therefore, implement generic_ffsl()/generic_flsl() in lib/. They're not
actually needed in x86/ARM/PPC by the end of the cleanup (i.e. the functions
will be dropped by the linker), and they're only expected be needed by RISC-V
on hardware which lacks the Zbb extension.
Implement generic_fls() in terms of generic_flsl() for now, but this will be
cleaned up in due course.
Provide basic runtime testing using __constructor inside the lib/ file. This
is important, as it means testing runs if and only if generic_f?sl() are used
elsewhere in Xen.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 8 Mar 2024 23:45:08 +0000 (23:45 +0000)]
xen/bitops: Cleanup and new infrastructure ahead of rearrangements
* Rename __attribute_pure__ to just __pure before it gains users.
* Introduce __constructor which is going to be used in lib/, and is
unconditionally cf_check.
* Identify the areas of xen/bitops.h which are a mess.
* Introduce xen/self-tests.h as helpers for compile and boot time testing.
This provides a statement of the ABI, and a confirmation that arch-specific
implementations behave as expected.
* Introduce HIDE() in macros.h. While it's only used in self-tests.h for
now, we're going to consolidate similar constructs in due course.
Sadly Clang 7 and older isn't happy with the compile time checks. Skip them,
and just rely on the runtime checks.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 14 Mar 2024 20:38:44 +0000 (20:38 +0000)]
xen/bitops: Delete find_first_set_bit()
No more users.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Thu, 30 May 2024 17:58:18 +0000 (18:58 +0100)]
arch/irq: Centralise no_irq_type
Having no_irq_type defined per arch, but using common callbacks is a mess, and
is particualrly hard to bootstrap a new architecture with.
Now that the ack()/end() hooks have been exported suitably, move the
definition of no_irq_type into common/irq.c, and make it const too for good
measure.
No functional change, but a whole lot less tangled.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Oleksii Kurochko [Wed, 29 May 2024 19:55:02 +0000 (21:55 +0200)]
xen/riscv: Update Kconfig in preparation for a full Xen build
Disable unnecessary configs for two cases:
1. By utilizing EXTRA_FIXED_RANDCONFIG for randconfig builds (GitLab CI jobs).
2. By using tiny64_defconfig for non-randconfig builds.
Only configs which lead to compilation issues were disabled.
Remove lines related to disablement of configs which aren't affected
compilation:
-# CONFIG_SCHED_CREDIT is not set
-# CONFIG_SCHED_RTDS is not set
-# CONFIG_SCHED_NULL is not set
-# CONFIG_SCHED_ARINC653 is not set
-# CONFIG_TRACEBUFFER is not set
-# CONFIG_HYPFS is not set
-# CONFIG_SPECULATIVE_HARDEN_ARRAY is not set
Update argo.c to include asm/p2m.h directly, rather than on a transitive
dependency through asm/domain.h Update asm/p2m.h to include xen/errno.h,
rather than rely on it having included already.
CONFIG_XSM=n as it requires an introduction of:
* boot_module_find_by_kind()
* BOOTMOD_XSM
* struct bootmodule
* copy_from_paddr()
The mentioned things aren't introduced now.
CONFIG_BOOT_TIME_CPUPOOLS requires an introduction of cpu_physical_id() and
acpi_disabled, so it is disabled for now.
PERF_COUNTERS requires asm/perf.h and asm/perfc-defn.h, so it is
also disabled for now, as RISC-V hasn't introduced this headers yet.
LIVEPATCH isn't ready for RISC-V too and it can be overriden by randconfig,
so to avoid compilation errors for randconfig it is disabled for now.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Fix up common/argo.c rather than inserting a transitive dependency] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Thu, 30 May 2024 07:53:18 +0000 (09:53 +0200)]
x86/hvm: allow XENMEM_machine_memory_map
For HVM based control domains XENMEM_machine_memory_map must be available so
that the `e820_host` xl.cfg option can be used.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Sat, 9 Mar 2024 02:22:53 +0000 (02:22 +0000)]
xen/bitops: Replace find_first_set_bit() with ffs()/ffsl() - 1
find_first_set_bit() is a Xen-ism which has undefined behaviour with a 0
input. The latter is well defined with an input of 0, and is a found outside
of Xen too.
timer_sanitize_int_route(), pt_update_contig_markers() and
set_iommu_ptes_present() are all already operating on unsigned int data, so
switch straight to ffs().
The ffsl() in pvh_populate_memory_range() needs coercion to unsigned to keep
the typecheck in min() happy in the short term.
_init_heap_pages() is comparing the LSB of two different addresses, so the -1
cancels off both sides of the expression.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 24 May 2024 12:36:15 +0000 (13:36 +0100)]
xen/page_alloc: Coerce min(flsl(), foo) expressions to being unsigned
This is in order to maintain bisectability through the subsequent changes,
where flsl() changes sign-ness non-atomically by architecture.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 30 May 2024 10:02:16 +0000 (11:02 +0100)]
tools: (Actually) drop libsystemd as a dependency
When reinstating some of systemd.m4 between v1 and v2, I reintroduced a little
too much. While {c,o}xenstored are indeed no longer linked against
libsystemd, ./configure still looks for it.
Drop this too.
Fixes: ae26101f6bfc ("tools: Drop libsystemd as a dependency") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Wed, 29 May 2024 14:11:19 +0000 (16:11 +0200)]
xen/x86: remove foreign mappings from the p2m on teardown
Iterate over the p2m up to the maximum recorded gfn and remove any foreign
mappings, in order to drop the underlying page references and thus don't keep
extra page references if a domain is destroyed while still having foreign
mappings on it's p2m.
The logic is similar to the one used on Arm.
Note that foreign mappings cannot be created by guests that have altp2m or
nested HVM enabled, as p2ms different than the host one are not currently
scrubbed when destroyed in order to drop references to any foreign maps.
It's unclear whether the right solution is to take an extra reference when
foreign maps are added to p2ms different than the host one, or just rely on the
host p2m already having a reference. The mapping being removed from the host
p2m should cause it to be dropped on all domain p2ms.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 29 May 2024 14:10:04 +0000 (16:10 +0200)]
xen: enable altp2m at create domain domctl
Enabling it using an HVM param is fragile, and complicates the logic when
deciding whether options that interact with altp2m can also be enabled.
Leave the HVM param value for consumption by the guest, but prevent it from
being set. Enabling is now done using and additional altp2m specific field in
xen_domctl_createdomain.
Note that albeit only currently implemented in x86, altp2m could be implemented
in other architectures, hence why the field is added to xen_domctl_createdomain
instead of xen_arch_domainconfig.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Christian Lindig <christian.lindig@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> # hypervisor Acked-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> # tools/libs/ Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 28 May 2024 14:11:54 +0000 (15:11 +0100)]
xen: Introduce CONFIG_SELF_TESTS
... and move x86's stub_selftest() under this new option.
There is value in having these tests included in release builds too.
It will shortly be used to gate the bitops unit tests on all architectures.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Wed, 29 May 2024 07:57:28 +0000 (09:57 +0200)]
x86: address violations of MISRA C Rule 8.4
Rule 8.4 states: "A compatible declaration shall be visible when an
object or function with external linkage is defined."
These variables are only referenced from assembly code, so they need to
be extern and there is negligible risk of them being used improperly
without noticing.
As a result, they can be exempted using a comment-based deviation.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Wed, 29 May 2024 07:56:57 +0000 (09:56 +0200)]
x86/MCE: optional build of AMD/Intel MCE code
Separate Intel/AMD-specific MCE code using CONFIG_{INTEL,AMD} config options.
Now we can avoid build of mcheck code if support for specific platform is
intentionally disabled by configuration.
Also global variables lmce_support & cmci_support from Intel-specific
mce_intel.c have to moved to common mce.c, as they get checked in common code.
Sergiy Kibrik [Wed, 29 May 2024 07:56:15 +0000 (09:56 +0200)]
x86/MCE: add default switch case in init_nonfatal_mce_checker()
The default switch case block is wanted here, to handle situation
e.g. of unexpected c->x86_vendor value -- then no mcheck init is done, but
misleading message still gets logged anyway.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Wed, 29 May 2024 07:54:22 +0000 (09:54 +0200)]
x86/intel: move vmce_has_lmce() routine to header
Moving this function out of mce_intel.c will make it possible to disable
build of Intel MCE code later on, because the function gets called from
common x86 code.
Also replace boilerplate code that checks for MCG_LMCE_P flag with
vmce_has_lmce(), which might contribute to readability a bit.
Andrew Cooper [Tue, 28 May 2024 15:29:11 +0000 (16:29 +0100)]
x86/svm: Rework VMCB_ACCESSORS() to use a plain type name
This avoids having a function call in a typeof() expression.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Tue, 28 May 2024 06:52:27 +0000 (08:52 +0200)]
x86/traps: address violation of MISRA C Rule 8.4
Rule 8.4 states: "A compatible declaration shall be visible when
an object or function with external linkage is defined".
The function do_general_protection is either used is asm code
or only within this unit, so there is no risk of this getting
out of sync with its definition, but the function must remain
extern.
Therefore, this function is deviated using a comment-based deviation.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Tue, 28 May 2024 06:52:15 +0000 (08:52 +0200)]
CHANGELOG: Mention libxl blktap/tapback support
Add entry for backendtype=tap support in libxl. blktap needs some
changes to work with libxl, which haven't been merged. They are
available from this PR: https://github.com/xapi-project/blktap/pull/394
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Henry Wang [Thu, 23 May 2024 07:40:39 +0000 (15:40 +0800)]
tools: Introduce the "xl dt-overlay attach" command
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Henry Wang [Thu, 23 May 2024 07:40:36 +0000 (15:40 +0800)]
xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains
In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.
Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.
The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.
Also take the opportunity to make one coding style fix in sysctl.h.
Introduce DT_OVERLAY_MAX_SIZE and use it to avoid repetitions of
KB(500).
xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported. For now return errors for not-1:1 mapped domains.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Vikram Garhwal <fnu.vikram@xilinx.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:35 +0000 (15:40 +0800)]
xen/arm/gic: Allow adding interrupt to running VMs
Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.
Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:34 +0000 (15:40 +0800)]
tools/arm: Introduce the "nr_spis" xl config entry
Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.
Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.
Update the doc and the golang bindings accordingly.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Henry Wang [Thu, 23 May 2024 07:40:33 +0000 (15:40 +0800)]
xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.
Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:32 +0000 (15:40 +0800)]
tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.
Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.
Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD <anthony@xenproject.org> Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
George Dunlap [Fri, 26 Apr 2024 13:17:33 +0000 (14:17 +0100)]
tools/xenalyze: Ignore HVM_EMUL events harder
To unify certain common sanity checks, checks are done very early in
processing based only on the top-level type.
Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the
assumptions about how the top-level types worked. Namely, traces of
this type will show up outside of HVM contexts: in idle domains and in
PV domains.
Make an explicit exception for TRC_HVM_EMUL types in a number of places:
- Pass the record info pointer to toplevel_assert_check, so that it
can exclude TRC_HVM_EMUL records from idle and vcpu data_mode
checks
- Don't attempt to set the vcpu data_type in hvm_process for
TRC_HVM_EMUL records.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Thu, 25 Apr 2024 12:03:58 +0000 (13:03 +0100)]
x86/hvm/trace: Use a different trace type for AMD processors
A long-standing usability sub-optimality with xenalyze is the
necessity to specify `--svm-mode` when analyzing AMD processors. This
fundamentally comes about because the same trace event ID is used for
both VMX and SVM, but the contents of the trace must be interpreted
differently.
Instead, allocate separate trace events for VMX and SVM vmexits in
Xen; this will allow all readers to properly interpret the meaning of
the vmexit reason.
In xenalyze, first remove the redundant call to init_hvm_data();
there's no way to get to hvm_vmexit_process() without it being already
initialized by the set_vcpu_type call in hvm_process().
Replace this with set_hvm_exit_reson_data(), and move setting of
hvm->exit_reason_* into that function.
Modify hvm_process and hvm_vmexit_process to handle all four potential
values appropriately.
If SVM entries are encountered, set opt.svm_mode so that other
SVM-specific functionality is triggered.
Remove the `--svm-mode` command-line option, since it's now redundant.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Henry Wang [Thu, 21 Mar 2024 03:57:06 +0000 (11:57 +0800)]
xen/arm: Set correct per-cpu cpu_core_mask
In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.
cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
George Dunlap [Fri, 26 Apr 2024 14:18:25 +0000 (15:18 +0100)]
tools/xentrace: Remove xentrace_format
xentrace_format was always of limited utility, since trace records
across pcpus were processed out of order; it was superseded by xenalyze
over a decade ago.
But for several releases, the `formats` file it has depended on for
proper operation has not even been included in `make install` (which
generally means it doesn't get picked up by distros either); yet
nobody has seemed to complain.
Simple remove xentrace_format, and point people to xenalyze instead.
NB that there is no man page for xenalyze, so the "see also" on the
xentrace man page is simply removed for now.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Olaf Hering <olaf@aepfle.de>
Andrew Cooper [Thu, 25 Apr 2024 09:46:40 +0000 (10:46 +0100)]
tools: Drop libsystemd as a dependency
There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends. Drop the dependency.
We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.
Rerun autogen.sh, and mark the dependency as removed in the build containers.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 25 Apr 2024 09:26:58 +0000 (10:26 +0100)]
tools/{c,o}xenstored: Don't link against libsystemd
Use the local freestanding wrapper instead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 16 May 2024 17:59:00 +0000 (18:59 +0100)]
tools: Import stand-alone sd_notify() implementation from systemd
... in order to avoid linking against the whole of libsystemd.
Only minimal changes to the upstream copy, to function as a drop-in
replacement for sd_notify() and as a header-only library.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 16 May 2024 17:50:26 +0000 (18:50 +0100)]
LICENSES: Add MIT-0 (MIT No Attribution)
We are about to import code licensed under MIT-0. It's compatible for us to
use, so identify it as a permitted license.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Commit 634cfc8beb ("Make MEM_ACCESS configurable") intended to make
MEM_ACCESS configurable on Arm to reduce the code size when the user
doesn't need it.
However, this didn't cover the arch specific code. None of the code
in arm/mem_access.c is necessary when MEM_ACCESS=n, so it can be
compiled out. This will require to provide some stub for functions
called by the common code.
Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
vpci: add initial support for virtual PCI bus topology
Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.
Please note, that at the moment only function 0 of a multifunction
device can be passed through.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
vpci/header: emulate PCI_COMMAND register for guests
Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.
According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.
Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.
PCI_COMMAND_IO (bit 0)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if an I/O BAR is exposed to the guest.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
don't yet support I/O BARs for domUs.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_MEMORY (bit 1)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if a Memory BAR is exposed to the guest.
Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
regions.
Xen domU: For devices assigned to DomUs, memory decoding will be
disabled at the time of initialization.
PCI_COMMAND_MASTER (bit 2)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_SPECIAL (bit 3)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_INVALIDATE (bit 4)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_VGA_PALETTE (bit 5)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_PARITY (bit 6)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_WAIT (bit 7)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: hardwire to 0
QEMU: res_mask
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_SERR (bit 8)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_FAST_BACK (bit 9)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_INTX_DISABLE (bit 10)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU checks if INTx was mapped
for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit.
Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
Xen dom0: We allow dom0 to control this bit freely.
Bits 11-15
PCIe 6.1: RsvdP
PCI LB 3.0: Reserved
QEMU: res_mask
Xen domU: rsvdp_mask
Xen dom0: We allow dom0 to control these bits freely.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
arm/vpci: honor access size when returning an error
Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0xFFFFFFFFFFFFFFFF in the target
register.
Jan Beulich [Thu, 23 May 2024 08:16:52 +0000 (10:16 +0200)]
x86: detect PIT aliasing on ports other than 0x4[0-3]
... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to PIT. Unlike for CMOS/RTC, do detection
pretty early, to avoid disturbing normal operation later on (even if
typically we won't use much of the PIT).
Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects (beyond such that PIT
reads have anyway) in case it does not alias the PIT's.
As to the port 0x61 accesses: Unlike other accesses we do, this masks
off the top four bits (in addition to the bottom two ones), following
Intel chipset documentation saying that these (read-only) bits should
only be written with zero.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Oleksii Kurochko [Fri, 17 May 2024 13:54:55 +0000 (15:54 +0200)]
xen/riscv: introduce atomic.h
Initially the patch was introduced by Bobby, who takes the header from
Linux kernel.
The following changes were done on top of Bobby's changes:
- atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
to use__*xchg_generic()
- drop casts in write_atomic() as they are unnecessary
- drop introduction of WRITE_ONCE() and READ_ONCE().
Xen provides ACCESS_ONCE()
- remove zero-length array access in read_atomic()
- drop defines similar to pattern:
#define atomic_add_return_relaxed atomic_add_return_relaxed
- move not RISC-V specific functions to asm-generic/atomics-ops.h
- drop atomic##prefix##_{cmp}xchg_{release, aquire, release}() as they
are not used in Xen.
- update the defintion of atomic##prefix##_{cmp}xchg according to
{cmp}xchg() implementation in Xen.
- some ATOMIC_OP() macros were updated:
- drop size argument for ATOMIC_OP which defines atomic##prefix##_xchg()
and atomic##prefix##_cmpxchg().
- drop c_op argument for ATOMIC_OPS which defines ATOMIC_OPS(and, and),
ATOMIC_OPS( or, or), ATOMIC_OPS(xor, xor), ATOMIC_OPS(add, add, +),
ATOMIC_OPS(sub, add, -) as c_op is always "+" for them.
- drop "" from definition of __atomic_{acquire/release"}_fence.
The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:
- .aq: The LR/SC sequence can be given acquire semantics by
setting the aq bit on the LR instruction.
- .rl: The LR/SC sequence can be given release semantics by
setting the rl bit on the SC instruction.
- .aqrl: Setting the aq bit on the LR instruction, and setting
both the aq and the rl bit on the SC instruction makes
the LR/SC sequence sequentially consistent, meaning that
it cannot be reordered with earlier or later memory
operations from the same hart.
Software should not set the rl bit on an LR instruction unless
the aq bit is also set, nor should software set the aq bit on an
SC instruction unless the rl bit is also set. LR.rl and SC.aq
instructions are not guaranteed to provide any stronger ordering
than those with both bits clear, but may result in lower
performance.
Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]
Oleksii Kurochko [Fri, 17 May 2024 13:54:54 +0000 (15:54 +0200)]
xen/riscv: introduce cmpxchg.h
The header was taken from Linux kernl 6.4.0-rc1.
Addionally, were updated:
* add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
access.
* replace tabs with spaces
* replace __* variale with *__
* introduce generic version of xchg_* and cmpxchg_*.
* drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
* drop barries and use instruction suffixices instead ( .aq, .rl, .aqrl )
Implementation of 4- and 8-byte cases were updated according to the spec:
```
....
Linux Construct RVWMO AMO Mapping
...
atomic <op> amo<op>.{w|d}.aqrl
Linux Construct RVWMO LR/SC Mapping
...
atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez loop
Table A.5: Mappings from Linux memory primitives to RISC-V primitives
```
The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:
- .aq: The LR/SC sequence can be given acquire semantics by
setting the aq bit on the LR instruction.
- .rl: The LR/SC sequence can be given release semantics by
setting the rl bit on the SC instruction.
- .aqrl: Setting the aq bit on the LR instruction, and setting
both the aq and the rl bit on the SC instruction makes
the LR/SC sequence sequentially consistent, meaning that
it cannot be reordered with earlier or later memory
operations from the same hart.
Software should not set the rl bit on an LR instruction unless
the aq bit is also set, nor should software set the aq bit on an
SC instruction unless the rl bit is also set. LR.rl and SC.aq
instructions are not guaranteed to provide any stronger ordering
than those with both bits clear, but may result in lower
performance.
Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]
Otherwise it's not possible to call functions described in hvm/vlapic.h from the
inline functions of hvm/hvm.h.
This is because a static inline in vlapic.h depends on hvm.h, and pulls it
transitively through vpt.h. The ultimate cause is having hvm.h included in any
of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
No functional change.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 23 May 2024 08:03:33 +0000 (10:03 +0200)]
iommu/x86: print RMRR/IVMD ranges using full addresses
It's easier to correlate with the physical memory map if the addresses are
fully printed, instead of using frame numbers.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 23 May 2024 08:03:14 +0000 (10:03 +0200)]
xen/livepatch: make .livepatch.funcs read-only for in-tree tests
This matches the flags of the .livepatch.funcs section when generated using
livepatch-build-tools, which only sets the SHT_ALLOC flag.
Also constify the definitions of the livepatch_func variables in the tests
themselves, in order to better match the resulting output. Note that just
making those variables constant is not enough to force the generated sections
to be read-only.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Nicola Vetrini [Tue, 21 May 2024 14:01:17 +0000 (16:01 +0200)]
x86_64/cpu_idle: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Tue, 21 May 2024 14:00:47 +0000 (16:00 +0200)]
x86_64/uaccess: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
xlat_malloc_init is touched for consistency, despite the construct
being already deviated.
Nicola Vetrini [Tue, 21 May 2024 14:00:20 +0000 (16:00 +0200)]
x86/hvm: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Tue, 21 May 2024 13:59:50 +0000 (15:59 +0200)]
x86/vpmu: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Henry Wang [Tue, 21 May 2024 13:59:14 +0000 (15:59 +0200)]
xen/common/dt-overlay: Fix lock issue when add/remove the device
If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ----[ Xen-4.19-unstable arm64 debug=y Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN) [<00000a0000257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN) [<00000a00002573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN) [<00000a000020797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN) [<00000a0000207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN) [<00000a0000208460>] dt_overlay_sysctl+0x428/0xc68
(XEN) [<00000a00002707f8>] arch_do_sysctl+0x1c/0x2c
(XEN) [<00000a0000230b40>] do_sysctl+0x96c/0x9ec
(XEN) [<00000a0000271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN) [<00000a0000273490>] do_trap_guest_sync+0x448/0x63c
(XEN) [<00000a000025c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ****************************************
This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. dt_host_lock is meant to ensure that the DT node will not
disappear behind back. So fix the issue by taking the lock as soon as
getting hold of overlay_node.
Similar issue will be observed in adding the dtbo:
(XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(&dt_host_lock)'
failed at xen-source/xen/drivers/passthrough/device_tree.c:192
(XEN) ----[ Xen-4.19-unstable arm64 debug=y Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN) [<00000a00002594f4>] iommu_add_dt_device+0x7c/0x17c (PC)
(XEN) [<00000a0000259494>] iommu_add_dt_device+0x1c/0x17c (LR)
(XEN) [<00000a0000267db4>] handle_device+0x68/0x1e8
(XEN) [<00000a0000208ba8>] dt_overlay_sysctl+0x9d4/0xb84
(XEN) [<00000a000027342c>] arch_do_sysctl+0x24/0x38
(XEN) [<00000a0000231ac8>] do_sysctl+0x9ac/0xa34
(XEN) [<00000a0000274b70>] traps.c#do_trap_hypercall+0x230/0x2dc
(XEN) [<00000a0000276330>] do_trap_guest_sync+0x478/0x688
(XEN) [<00000a000025e480>] entry.o#guest_sync_slowpath+0xa8/0xd8
This is because the lock is released too early. So fix the issue by
releasing the lock after handle_device().
Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Roger Pau Monné [Tue, 21 May 2024 07:15:03 +0000 (09:15 +0200)]
xen/x86: pretty print interrupt CPU affinity masks
Print the CPU affinity masks as numeric ranges instead of plain hexadecimal
bitfields.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Mon, 20 Sep 2021 12:40:21 +0000 (13:40 +0100)]
xen/trace: Drop old trace API
With all users updated to the new API, drop the old API. This includes all of
asm/hvm/trace.h, which allows us to drop some includes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Tue, 21 Sep 2021 18:55:47 +0000 (19:55 +0100)]
xen/trace: Removal final {__,}trace_var() users in favour of the new API
The cycles parameter (which gets removed as a consequence) determines whether
trace() or trace_time() is used.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Fri, 17 Sep 2021 23:31:27 +0000 (00:31 +0100)]
xen: Switch to new TRACE() API
(Almost) no functional change.
* In irq_move_cleanup_interrupt(), use the 'me' local variable rather than
calling smp_processor_id() again. This manifests as a minor code
improvement.
* In vlapic_update_timer() and lapic_rearm(), introduce a new 'timer_period'
local variable to simplify the expressions used for both the trace and
create_periodic_time() calls.
All other differences in the compiled binary are to do with line numbers
changing.
Some conversion notes:
* HVMTRACE_LONG_[234]D() and TRACE_2_LONG_[234]D() were latently buggy. They
blindly discard extra parameters, but luckily no users are impacted. They
are also obfuscated wrappers, depending on exactly one or two parameters
being TRC_PAR_LONG() to compile successfully.
* HVMTRACE_LONG_1D() behaves unlike its named companions, and takes exactly
one 64bit parameter which it splits manually. It's one user,
vmx_cr_access()'s LMSW path, is gets adjusted.
* TRACE_?D() and TRACE_2_LONG_*() change to TRACE_TIME() as cycles is always
enabled.
* HVMTRACE_ND() is opencoded for VMENTRY/VMEXIT records to include cycles.
These are converted to TRACE_TIME(), with the old modifier parameter
expressed as an OR at the callsite. One callsite, svm_vmenter_helper() had
a nested tb_init_done check, which is dropped. (The optimiser also spotted
this, which is why it doesn't manifest as a binary difference.)
* All uses of *LONG() are either opencoded or swapped to using a struct, to
avoid MISRA issues.
* All HVMTRACE_?D() change to TRACE() as cycles is explicitly skipped.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Mon, 20 Sep 2021 13:07:43 +0000 (14:07 +0100)]
xen/sched: Clean up trace handling
There is no need for bitfields anywhere - use more sensible types. There is
also no need to cast 'd' to (unsigned char *) before passing it to a function
taking void *. Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Fri, 17 Sep 2021 15:28:19 +0000 (16:28 +0100)]
xen/rt: Clean up trace handling
Most uses of bitfields and __packed are unnecessary. There is also no need to
cast 'd' to (unsigned char *) before passing it to a function taking void *.
Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>