Daniel P. Smith [Wed, 9 Apr 2025 13:32:26 +0000 (15:32 +0200)]
x86/boot: introduce domid field to struct boot_domain
boot_domain stores the domid until it is used to create (and allocate)
struct domain. d->domain_id is not available early enough.
boot_domain domids are initialized to DOMID_INVALID. If not overridden
by device tree, domids of DOMID_INVALID are assigned a valid value. The
domid will be optionally parsed from the device tree configuration.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Alejandro Vallejo <agarciav@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Daniel P. Smith [Wed, 9 Apr 2025 13:32:02 +0000 (15:32 +0200)]
x86/boot: introduce boot domain
To begin moving toward allowing the hypervisor to construct more than one
domain at boot, a container is needed for a domain's build information.
Introduce a new header, <xen/asm/bootdomain.h>, that contains the initial
struct boot_domain that encapsulate the build information for a domain.
Add a kernel and ramdisk boot module reference along with a struct domain
reference to the new struct boot_domain. This allows a struct boot_domain
reference to be the only parameter necessary to pass down through the domain
construction call chain.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Alejandro Vallejo <agarciav@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 9 Apr 2025 13:30:15 +0000 (15:30 +0200)]
libxc/PM: correct (not just) error handling in xc_get_cpufreq_para()
From their introduction all xc_hypercall_bounce_pre() uses, when they
failed, would properly cause exit from the function including cleanup,
yet without informing the caller of the failure. Purge the unlock_1
label for being both pointless and mis-named.
An earlier attempt to switch to the usual split between return value and
errno wasn't quite complete.
HWP work made the cleanup of the "available governors" array
conditional, neglecting the fact that the condition used may not be the
condition that was used to allocate the buffer (as the structure field
is updated upon getting back EAGAIN). Since cleanup can be done even if
no buffer was allocated, drop the conditional there again.
Fixes: 4513025a8790 ("libxc: convert sysctl interfaces over to hypercall buffers")
Amends: 73367cf3b4b4 ("libxc: Fix xc_pm API calls to return negative error and stash error in errno") Fixes: 31e264c672bc ("pmstat&xenpm: Re-arrage for cpufreq union") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
Andrew Cooper [Wed, 9 Apr 2025 10:36:40 +0000 (11:36 +0100)]
x86/ucode: Extend warning about disabling digest check too
This was missed by accident.
Fixes: b63951467e96 ("x86/ucode: Extend AMD digest checks to cover Zen5 CPUs") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
The current implementation of PVH dom0 relies on vPCI to trap and handle
accesses to the MMCFG area. Previous implementation of PVH dom0 (v1)
didn't have vPCI, and as a classic PV dom0, relied on the MMCFG range being
RO. As such hvm_emulate_one_mmio() had to special case write accesses to
the MMCFG area.
With PVH dom0 using vPCI, and the MMCFG accesses being fully handled there,
hvm_emulate_one_mmio() should never handle accesses to MMCFG, making the
code effectively unreachable.
Remove it and leave an ASSERT to make sure MMCFG accesses never get into
hvm_emulate_one_mmio(). As a result of the removal of one of the users of
mmcfg_intercept_write(), the function can now be moved into the same
translation unit where it's solely used, allowing it to be made static and
effectively built only when PV support is enabled.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Mon, 31 Mar 2025 16:56:01 +0000 (18:56 +0200)]
automation/dockers: add to README how to rebuild all containers
Document in the README how to rebuild all containers. This is helpful when
populating a local docker registry for testing purposes.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Tue, 8 Apr 2025 16:09:15 +0000 (17:09 +0100)]
x86/ucode: Extend AMD digest checks to cover Zen5 CPUs
AMD have updated the SB-7033 advisory to include Zen5 CPUs. Extend the digest
check to cover Zen5 too.
In practice, cover everything until further notice.
Observant readers may be wondering where the update to the digest list is. At
the time of writing, no Zen5 patches are available via a verifiable channel.
xen: x86: irq: initialize irq desc in create_irq()
While building xen with GCC 14.2.1 with "-fcondition-coverage" option
or with "-Og", the compiler produces a false positive warning:
arch/x86/irq.c: In function ‘create_irq’:
arch/x86/irq.c:281:11: error: ‘desc’ may be used uninitialized [-Werror=maybe-uninitialized]
281 | ret = init_one_irq_desc(desc);
| ^~~~~~~~~~~~~~~~~~~~~~~
arch/x86/irq.c:269:22: note: ‘desc’ was declared here
269 | struct irq_desc *desc;
| ^~~~
cc1: all warnings being treated as errors
make[2]: *** [Rules.mk:252: arch/x86/irq.o] Error 1
While we have signed/unsigned comparison both in "for" loop and in
"if" statement, this still can't lead to use of uninitialized "desc",
as either loop will be executed at least once, or the function will
return early. So this is a clearly false positive warning due to a
bug [1] in GCC.
Jan Beulich [Tue, 8 Apr 2025 07:38:36 +0000 (09:38 +0200)]
Config.mk: correct gcc5 check
Passing the -dumpversion option to gcc may only print the major version
(my system 4.x.y printed major and minor, which in nowaday's scheme is
then indeed just 5 for 5.x, which in turn is what my secondary system
compiler does).
Fixes: 40458f752550 ("Xen: Update compiler baseline checks") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
tools/libxl: search PATH for QEMU if `QEMU_XEN_PATH` is not absolute
`QEMU_XEN_PATH` will be configured as `qemu-system-i386` with no clue where, if
`--with-system-qemu` is set without giving a path (as matched in the case `yes`
but not `*`). However, the existence of the executable is checked by `access()`,
that will not look for anywhere in $PATH but the current directory. And since it
is possible for `qemu-system-i386` (or any other configured values) to be
executed from PATH later, we'd better find that in PATH and return the full path
for the caller to check against.
Signed-off-by: Hongbo <hehongbo@mail.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
CPUID leaf 0x2 emits one-byte descriptors in its four output registers
EAX, EBX, ECX, and EDX. For these descriptors to be valid, the most
significant bit (MSB) of each register must be clear.
Leaf 0x2 parsing at intel.c only validated the MSBs of EAX, EBX, and
ECX, but left EDX unchecked.
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 1881148215c6 Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen: vm_event: do not do vm_event_op for an invalid domain
A privileged domain can issue XEN_DOMCTL_vm_event_op with
op->domain == DOMID_INVALID. In this case vm_event_domctl()
function will get NULL as the first parameter and this will
cause hypervisor panic, as it tries to derefer this pointer.
Fix the issue by checking if valid domain is passed in.
Fixes: 48b84249459f ("xen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()") Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
In order to close a race window for Xenstore live update when using
the new unique_id of domains, the migration stream needs to contain
this unique_id for each domain known by Xenstore.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
CI: adjust resolving network interface into PCI device
Change how PCI device lookup is done to handle also USB devices, in
which case get the USB controller. Instead of taking basename of the
'device' symlink, resolve the full path (example:
/sys/devices/pci0000:00/0000:00:09.0/usb4/4-7/4-7:1.0) and take the
first part after pci0000:00. Theoretically it could be a bridge, but VM
has flat PCI topology.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
CI: wait for the network interface in PCI passthrough tests
The network driver initializes asynchronously, and it may not be ready
yet by the time the startup script is called. This is especially the
case for USB network adapter (where the PCI device is the USB
controller) in the upcoming runner.
Don't bother about separate timeout - test timeout will cover this part
too.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Thu, 3 Apr 2025 14:37:23 +0000 (15:37 +0100)]
x86/AMD: Convert wrmsr_amd_safe() to use asm goto()
Bloat-o-meter reports:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-29 (-29)
Function old new delta
_probe_mask_msr 99 94 -5
init_amd 2418 2394 -24
but this under-reports because .fixup doesn't contain sized/typed symbols.
This also drops two "mov -EFAULT, %reg; jmp ...;" sequences too, so the net
saving is -50.
wrmsr_amd_safe()'s return value is only checked against 0 (if at all), and
because of this, the compiler can now avoid manifesting the 0/-EFAULT
constants entirely, and the %[fault] label simply lands on the right basic
block.
Convert to Xen style while rewriting.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 3 Apr 2025 10:49:02 +0000 (11:49 +0100)]
xen/link: Drop .fixup section from non-x86 architectures
The fixup section is only used by x86, and we're working to remove it there
too. Logic in the fixup section is unconnected to it's origin site, and
interferes with backtraces/etc.
Remove the section from the architectures which don't use it.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
tools/libxl: do not use `-c -E` compiler options together
It makes no sense to request for preprocessor only output and also request
object file generation. Fix the _libxl.api-for-check target to only use
-E (preprocessor output).
Also Clang 20.0 reports an error if both options are used.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Fixes: 2862bf5b6c81 ('libxl: enforce prohibitions of internal callers') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Improve error handling in VMX wrappers by switching to `asm goto()` where
possible.
No functional change.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/210 Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/emulate: Remove HAVE_AS_RDRAND and HAVE_AS_RDSEED
The new toolchain baseline knows the RDRAND and RDSEED instructions; no need
to carry the workaround in the code.
Fix up arch_get_random() too.
No functional change.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/208 Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The new toolchain baseline knows both the XSAVEOPT and CLWB instructions.
It knows CLFLUSHOPT too, so fix up those.
No functional change.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/205 Signed-off-by: "Alexander M. Merritt" <alexander@edera.dev> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 7 Apr 2025 10:16:43 +0000 (12:16 +0200)]
x86emul: replace _BYTES_PER_LONG
We can now easily use __SIZEOF_LONG__ instead. For this to also work in
the test harness, move hvmloader's STR() to common-macros.h.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
The new toolchain baseline knows the CRC32 instructions; no need to carry the
workaround in the code.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/206 Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The new toolchain baseline knows the INVPCID instruction; no need to carry the
workaround in the code.
No functional change.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/209 Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The new toolchain baseline knows the {RD,WR}{F,G}SBASE instructions; no need
to carry the workaround in the code.
No functional change.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/207 Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The new toolchain baseline knows the VMX instructions; no need to carry the
workaround in the code.
Inline __vmxoff() into it's single caller.
Updated formatting in the wrappers to consistent.
No functional change.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/202 Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Michal Orzel [Wed, 2 Apr 2025 08:42:33 +0000 (10:42 +0200)]
xen/arm: Drop process_shm_chosen()
There's no benefit in having process_shm_chosen() next to process_shm().
The former is just a helper to pass "/chosen" node to the latter for
hwdom case. Drop process_shm_chosen() and instead use process_shm()
passing NULL as node parameter, which will result in searching for and
using /chosen to find shm node (the DT full path search is done in
process_shm() to avoid expensive lookup if !CONFIG_STATIC_SHM). This
will simplify future handling of hw/control domain separation.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Michal Orzel [Wed, 2 Apr 2025 08:42:32 +0000 (10:42 +0200)]
xen/arm: Don't call process_shm_chosen() during ACPI boot
Static shared memory requires device-tree boot. At the moment, booting
with ACPI enabled and CONFIG_STATIC_SHM=y results in a data abort when
dereferencing node in process_shm() because dt_host is always NULL.
Fixes: 09c0a8976acf ("xen/arm: enable statically shared memory on Dom0") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Thu, 3 Apr 2025 07:39:52 +0000 (09:39 +0200)]
x86/boot: re-order .init.data contributions
Putting a few bytes ahead of page tables isn't very efficient; there's
a gap almost worth a full page. To avoid re-ordering of items in the
source file, simply put the few small items in sub-section 1, for them
to end up after the page tables, followed (in the final binary) by non-
page-aligned items from other CUs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Jan Beulich [Thu, 3 Apr 2025 07:39:35 +0000 (09:39 +0200)]
x86/CPU: don't hard-code MTRR availability
In particular if we're running virtualized, the underlying hypervisor
(which may be another Xen) may not surface MTRRs, and offer PAT only.
Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 3 Apr 2025 07:39:13 +0000 (09:39 +0200)]
x86/MTRR: hook mtrr_bp_restore() back up
Unlike stated in the offending commit's description,
load_system_tables() wasn't the only thing left to retain from the
earlier restore_rest_processor_state(). Note that MTRR state was still
reloaded via mtrr_aps_sync_end(), but that happens quite a bit later in
the resume process.
While there also do Misra-related tidying for the function itself: The
function being used from assembly only means it doesn't need to have a
declaration, but wants to be asmlinkage.
Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely") Reported-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 3 Apr 2025 07:38:41 +0000 (09:38 +0200)]
x86/MTRR: constrain AP sync and BSP restore
mtrr_set_all() has quite a bit of overhead, which is entirely useless
when set_mtrr_state() really does nothing. Furthermore, with
mtrr_state.def_type never initialized from hardware, post_set()'s
unconditional writing of the MSR means would leave us running in UC
mode after the sync.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
This is Intel i7-7567U in NUC 7i7BNH. This one is an older one, with no
firmware updates (last update from 2023) and no microcode udpates
either. While this firmware supports UEFI, network boot works only in
legacy mode - thus legacy is used here (via iPXE, instead of grub2.efi).
Testing legacy boot path may be a useful thing on its own.
Add the same set of tests as on ADL runner.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Evtchn fifos are not needed on smaller systems; the older interface is
lightweight and sufficient. Also, event_fifo causes runtime anonymous
memory allocations, which are undesirable. Additionally, it exposes an
extra interface to the guest, which is also undesirable unless
necessary.
Make it possible to disable evtchn fifo.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Michal Orzel <michal.orzel@amd.com>
The new toolchain baseline knows the STAC/CLAC instructions,
no need to carry the workaround in the code.
Resolves: https://gitlab.com/xen-project/xen/-/work_items/203 Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 1 Apr 2025 22:56:46 +0000 (23:56 +0100)]
x86/vmx: Use asm goto() in _vmx_cpu_up()
With the new toolchain baseline, we can make use of asm goto() in certain
places, and the VMXON invocation is one example.
This removes the logic to set up rc (including a fixup section where bactraces
have no connection to the invoking function), the logic to decode it,
including the default case which was dead but not visibly-so to the compiler.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 28 Mar 2025 10:04:31 +0000 (10:04 +0000)]
xen/lzo: Remove more remanants of TMEM
This logic was inserted by commit 447f613c5404 ("lzo: update LZO compression
to current upstream version") but was only relevant for the TMEM logic, so
should have been deleted in commit c492e19fdd05 ("xen: remove tmem from
hypervisor")
Fixes: c492e19fdd05 ("xen: remove tmem from hypervisor") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Michal Orzel [Wed, 2 Apr 2025 10:10:13 +0000 (12:10 +0200)]
xen/arm: Include xen/vmap.h in mm.c
As reported by ECLAIR scan, MISRA requires declaration to be visible
(R8.4). This is not the case for ioremap().
Fixes: 2cd02c27d327 ("arm/mpu: Implement stubs for ioremap_attr on MPU") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen: simplify bitmap_to_xenctl_bitmap for little endian
The little endian implementation of bitmap_to_xenctl_bitmap leads to
unnecessary xmallocs and xfrees. Given that Xen only supports little
endian architectures, it is worth optimizing.
This patch removes the need for the xmalloc on little endian
architectures.
Remove clamp_last_byte as it is only called once and only needs to
modify one byte. Inline it.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
ARM MPU system doesn't need to use paging memory pool, as MPU memory
mapping table at most takes only one 4KB page, which is enough to
manage the maximum 255 MPU memory regions, for all EL2 stage 1
translation and EL1 stage 2 translation.
Introduce ARCH_PAGING_MEMPOOL Kconfig common symbol, selected for Arm
MMU systems and x86. Removed stubs from RISC-V now that the common code
provide them and the functions are not gonna be used.
Wrap the code inside 'construct_domU' that deal with p2m paging
allocation in a new function 'domain_p2m_set_allocation', protected
by ARCH_PAGING_MEMPOOL, this is done in this way to prevent polluting
the former function with #ifdefs and improve readability
Introduce arch_{get,set}_paging_mempool_size stubs for architecture
with !ARCH_PAGING_MEMPOOL.
Remove 'struct paging_domain' from Arm 'struct arch_domain' when the
field is not required.
Implement ioremap_attr() stub for MPU system; the
implementation of ioremap() is the same between MMU
and MPU system, and it relies on ioremap_attr(), so
move the definition from mmu/pt.c to arm/mm.c.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
The MPU system requires static memory to work, select that
when building this memory management subsystem.
While there, provide a restriction for the ARM_EFI Kconfig
parameter to be built only when !MPU, the EFI stub is not
used as there are no implementation of UEFI services for
armv8-r.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Introduce frame_table in order to provide the implementation of
virt_to_page for MPU system, move the MMU variant in mmu/mm.h.
Introduce FRAMETABLE_NR that is required for 'pdx_group_valid' in
pdx.c, but leave the initialisation of the frame table to a later
stage.
Define FRAMETABLE_SIZE for MPU to support up to 1TB of ram at this
stage, as the only current implementation of armv8-r aarch64, which
is cortex R82, can support 1TB or 256TB (r82 TRM r3p1
ID_AA64MMFR0_EL1.PARange).
Take the occasion to sort alphabetically the headers following
the Xen code style and add the emacs footer in mpu/mm.c.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
xen/arm: Implement virt/maddr conversion in MPU system
virt_to_maddr and maddr_to_virt are used widely in Xen code. So
even there is no VMSA in MPU system, we keep the interface in MPU to
to avoid changing the existing common code.
In order to do that, move the virt_to_maddr() and maddr_to_virt()
definitions to mmu/mm.h, move the include of memory management
subsystems (MMU/MPU) on a different place because the mentioned
helpers needs visibility of some macro in asm/mm.h.
Finally implement virt_to_maddr() and maddr_to_virt() for MPU systems
under mpu/mm.h, the MPU version of virt/maddr conversion is simple since
VA==PA.
arm/mpu: Add HYPERVISOR_VIRT_START and avoid a check in xen.lds.S
The define HYPERVISOR_VIRT_START is required by the common code,
even if MPU system doesn't use virtual memory, define it in
mpu/layout.h in order to reuse existing code.
Disable a check in the linker script for arm for !MMU systems.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Jan Beulich [Tue, 1 Apr 2025 10:48:23 +0000 (12:48 +0200)]
x86emul: make test harness build again as 32-bit binary
Adding Q suffixes to FXSAVE/FXRSTOR did break the 32-bit build. Don't go
back though, as the hand-coded 0x48 there weren't quite right either for
the 32-bit case (they might well cause confusion when looking at the
disassembly). Instead arrange for the compiler to DCE respective asm()-s,
by short-circuiting REX_* to zero.
Fixes: 5a33ea2800c1 ("x86emul: drop open-coding of REX.W prefixes") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Currently, only the device tree method is available to locate and perform
pre-initialization steps for the interrupt controller (at the moment, only
one interrupt controller is going to be supported). When `acpi_disabled`
is true, the system will scan for a node with the "interrupt-controller"
property and then call `device_init()` to validate if it is an expected
interrupt controller and if yes then save this node for further usage.
If `acpi_disabled` is false, the system will panic, as ACPI support is not
yet implemented for RISC-V.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Introduce preinitialization stuff for the RISC-V Advanced Platform-Level
Interrupt Controller (APLIC) in Xen:
- Implementing the APLIC pre-initialization function (`aplic_preinit()`),
ensuring that only one APLIC instance is supported in S mode.
- Initialize APLIC's correspoinding DT node.
- Declaring the DT device match table for APLIC.
- Setting `aplic_info.hw_version` during its declaration.
- Declaring an APLIC device.
Since Microchip originally developed aplic.c [1], an internal discussion
with them led to the decision to use the MIT license instead of the default
GPL-2.0-only.
automation/RISC-V: select APLIC and IMSIC to handle both wired interrupts and MSIs
By default, the `aia` option is set to "none" which selects the SiFive PLIC for
handling wired interrupts. However, since PLIC is now considered obsolete and
will not be supported by Xen now, APLIC and IMSIC are selected instead to manage
both wired interrupts and MSIs.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
preinit_xen_time() does two things:
1. Parse timebase-frequency properpy of /cpus node to initialize cpu_khz
variable.
2. Initialize boot_clock_cycles with the current time counter value to
have starting point for Xen.
timebase-frequency is read as a uint32_t because it is unlikely that the
timer will run at more than 4 GHz. If timebase-frequency exceeds 4 GHz,
a panic() is triggered, since dt_property_read_u32() will return 0 if
the size of the timebase-frequency property is greater than the size of
the output variable.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 28 Mar 2025 17:18:51 +0000 (17:18 +0000)]
x86emul: Fix blowfish build in 64bit-clean environments
In a 64bit-clean environment, blowfish fails:
make[6]: Leaving directory
'/builddir/build/BUILD/xen-4.19.1/tools/tests/x86_emulator'
In file included from /usr/include/features.h:535,
from /usr/include/bits/libc-header-start.h:33,
from /usr/include/stdint.h:26,
from
/usr/lib/gcc/x86_64-xenserver-linux/12/include/stdint.h:9,
from blowfish.c:18:
/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-32.h: No such
file or directory
7 | # include <gnu/stubs-32.h>
| ^~~~~~~~~~~~~~~~
compilation terminated.
make[6]: *** [testcase.mk:15: blowfish.bin] Error 1
because of lack of glibc-i386-devel or equivelent. It's non-fatal, but
reduces the content in test_x86_emulator, which we do care about running.
Instead, convert all emulator testcases to being freestanding builds, resuing
the tools/firmware/include/ headers.
This in turn requires making firmware's stdint.h compatible with 64bit builds.
We now have compiler types for every standard type we use.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Xen does not currently support boot modules that span multiple banks: at
least one of the regions get freed twice. The first time from
setup_mm->populate_boot_allocator, then again from
discard_initial_modules->fw_unreserved_regions. With a high number of
banks, it can be difficult to arrange the boot modules in a way that
avoids spanning across multiple banks.
This small patch merges neighboring regions, to make dealing with them
more efficient, and to make it easier to load boot modules.
gcc 14 (with patch "Add condition coverage (MC/DC)") introduced 9th
gcov counter. Also this version can call new merge function
__gcov_merge_ior(), so we need a new stub for it.
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
INVALID_PERCPU_AREA depends solely on the system state.
If the system is suspended, this area is not freed, and during resume, an error
occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
set and park_offline_cpus remains 0:
Jan Beulich [Mon, 31 Mar 2025 07:21:12 +0000 (09:21 +0200)]
x86/P2M: synchronize fast and slow paths of p2m_get_page_from_gfn()
Handling of both grants and foreign pages was different between the two
paths.
While permitting access to grants would be desirable, doing so would
require more involved handling; undo that for the time being. In
particular the page reference obtained would prevent the owning domain
from changing e.g. the page's type (after the grantee has released the
last reference of the grant). Instead perhaps another reference on the
grant would need obtaining. Which in turn would require determining
which grant that was.
Foreign pages in any event need permitting on both paths.
Introduce a helper function to be used on both paths, such that
respective checking differs in just the extra "to be unshared" condition
on the fast path.
While there adjust the sanity check for foreign pages: Don't leak the
reference on release builds when on a debug build the assertion would
have triggered. (Thanks to Roger for the suggestion.)
Fixes: 80ea7af17269 ("x86/mm: Introduce get_page_from_gfn()") Fixes: 50fe6e737059 ("pvh dom0: add and remove foreign pages") Fixes: cbbca7be4aaa ("x86/p2m: make p2m_get_page_from_gfn() handle grant case correctly") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 31 Mar 2025 07:20:25 +0000 (09:20 +0200)]
trace: convert init_trace_bufs() to constructor
There's no need for each arch to invoke it directly, and there's no need
for having a stub either. With the present placement of the calls to
init_constructors() it can easily be a constructor itself.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Anthony PERARD [Thu, 27 Mar 2025 10:34:01 +0000 (10:34 +0000)]
CI: Change pipeline name for scheduled pipeline
This description is already displayed on the web UI of the list of
pipeline, but using it as "name" will make it available in webhooks as
well and can be used by a bot.
This doesn't change the behavior for other pipeline types, where the
variable isn't set.
Signed-off-by: Anthony PERARD <anthony.perard@vates.tech> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Michal Orzel [Tue, 25 Mar 2025 11:00:29 +0000 (12:00 +0100)]
tools/arm: Fix nr_spis handling v2
We are missing a way to detect whether a user provided a value for
nr_spis equal to 0 or did not provide any value (default is also 0) which
can cause issues when calculated nr_spis is > 0 and the value from domain
config is 0. Fix it by setting default value for nr_spis to newly added
LIBXL_NR_SPIS_DEFAULT i.e. UINT32_MAX (max supported nr of SPIs is 960
anyway).
Fixes: 55d62b8d4636 ("tools/arm: Reject configuration with incorrect nr_spis value") Reported-by: Luca Fancellu <luca.fancellu@arm.com> Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
Anthony PERARD [Wed, 26 Mar 2025 14:29:04 +0000 (14:29 +0000)]
kconfig/randconfig: Remove non-existing config
CONFIG_GCOV_FORMAT_AUTODETECT has been removed in 767e6c5fd55b.
Fixes: 767e6c5fd55b ("kconfig/gcov: remove gcc version choice from kconfig") Signed-off-by: Anthony PERARD <anthony.perard@vates.tech> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 20 Mar 2025 14:05:58 +0000 (14:05 +0000)]
Xen: Update compiler baseline checks
We have checks in both xen/compiler.h, and Config.mk. Both are incomplete.
The check in Config.mk sees $(CC) in system and cross-compiler form, so cannot
express anything more than the global baseline. Change it to simply 5.1.
In xen/compiler.h, rewrite the expression for clarity/brevity.
Include a GCC 12.2 check for RISCV, and include a Clang 11 baseline check.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 30 Aug 2024 13:25:28 +0000 (14:25 +0100)]
ARM/vgic: Use for_each_set_bit() in vgic_mmio_write_sgir()
The bitmap_for_each() expression only inspects the bottom 8 bits of targets.
Change it's type to uint8_t and use for_each_set_bit() which is more efficient
over scalars.
GICD_SGI_TARGET_LIST_MASK is 2 bits wide. Two cases discard the prior
calculation of targets, and one case exits early.
Therefore, move the GICD_SGI_TARGET_MASK calculation into the only case which
wants it, and use MASK_EXTR() to simplify the expression.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Andrew Cooper [Wed, 26 Mar 2025 15:26:56 +0000 (15:26 +0000)]
ARM/vgic: Fix out-of-bounds accesses in vgic_mmio_write_sgir()
The switch() statement is over bits 24:25 (unshifted) of the guest provided
value. This makes case 0x3: dead, and not an implementation of the 4th
possible state.
A guest which writes (0x3 << 24) | (0xff << 16) to this register will skip the
early exit, then enter bitmap_for_each() with targets not bound by nr_vcpus.
If the guest has fewer than 8 vCPUs, bitmap_for_each() will read off the end
of d->vcpu[] and use the resulting vcpu pointer to ultimately derive irq, and
perform out-of-bounds writes.
Fix this by changing case 0x3 to default.
Fixes: 08c688ca6422 ("ARM: new VGIC: Add SGIR register handler") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Oleksii Kurochko [Thu, 27 Mar 2025 11:23:10 +0000 (12:23 +0100)]
xen/riscv: add H extension to -march
H provides additional instructions and CSRs that control the new stage of
address translation and support hosting a guest OS in virtual S-mode
(VS-mode).
According to the Unprivileged Architecture (version 20240411) specification:
```
Table 74 summarizes the standardized extension names. The table also defines
the canonical order in which extension names must appear in the name string,
with top-to-bottom in table indicating first-to-last in the name string, e.g.,
RV32IMACV is legal, whereas RV32IMAVC is not.
```
According to Table 74, the h extension is placed last in the one-letter
extensions name part of the ISA string.
`h` is a standalone extension based on the patch [1] but it wasn't so
before.
As the minimal supported GCC version to build Xen for RISC-V is 12.2.0,
and for that version, h is still considered a prefix for the hypervisor
extension but the name of hypervisor extension must be more then 1 letter
extension, a workaround ( with using `hh` as an H extension name ) is
implemented as otherwise the following compilation error will occur:
error: '-march=rv64gc_h_zbb_zihintpause': name of hypervisor extension
must be more than 1 letter
After GCC version 13.1.0, the commit [1] introducing H extension support
allows us to drop the workaround with `hh` as hypervisor extension name
and use only one h in -march.
Jan Beulich [Thu, 27 Mar 2025 11:22:39 +0000 (12:22 +0100)]
Arm/domctl: correct XEN_DOMCTL_vuart_op error return value
copy_to_guest() returns the number of bytes not copied; that's not what
the function should return to its caller though. Convert to returning
-EFAULT instead.
Fixes: 86039f2e8c20 ("xen/arm: vpl011: Add a new domctl API to initialize vpl011") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Michal Orzel <michal.orzel@amd.com>
Jan Beulich [Thu, 27 Mar 2025 11:22:06 +0000 (12:22 +0100)]
x86/pmstat: correct get_cpufreq_para()'s error return value
copy_to_guest() returns the number of bytes not copied; that's not what
the function should return to its caller though. Convert to returning
-EFAULT instead.
Fixes: 7542c4ff00f2 ("Add user PM control interface") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 27 Mar 2025 11:21:08 +0000 (12:21 +0100)]
x86/PVH: account for module command line length
As per observation in practice, initrd->cmdline_pa is not normally zero.
Hence so far we always appended at least one byte. That alone may
already render insufficient the "allocation" made by find_memory().
Things would be worse when there's actually a (perhaps long) command
line.
Skip setup when the command line is empty. Amend the "allocation" size
by padding and actual size of module command line. Along these lines
also skip initrd setup when the initrd is zero size.
Fixes: 0ecb8eb09f9f ("x86/pvh: pass module command line to dom0") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Roger Pau Monne [Fri, 14 Mar 2025 12:37:46 +0000 (13:37 +0100)]
automation/cirrus-ci: add smoke tests for the FreeBSD builds
Introduce a basic set of smoke tests using the XTF selftest image, and run
them on QEMU. Use the matrix keyword to create a different task for each
XTF flavor on each FreeBSD build.
Roger Pau Monne [Sat, 15 Mar 2025 08:35:12 +0000 (09:35 +0100)]
automation/cirrus-ci: use matrix keyword to generate per-version build tasks
Move the current logic to use the matrix keyword to generate a task for
each version of FreeBSD we want to build Xen on. The matrix keyword
however cannot be used in YAML aliases, so it needs to be explicitly used
inside of each task, which creates a bit of duplication. At least abstract
the FreeBSD minor version numbers to avoid repetition of image names.
Note that the full build uses matrix over an env variable instead of using
it directly in image_family. This is so that the alias can also be set
based on the FreeBSD version, in preparation for adding further tasks that
will depend on the full build having finished.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Tue, 25 Mar 2025 17:55:33 +0000 (17:55 +0000)]
x86/elf: Remove ASM_CALL_CONSTRAINT from elf_core_save_regs()
I was mistaken about when ASM_CALL_CONSTRAINT is applicable. It is not
applicable for plain pushes/pops, so remove it from the flags logic.
Clarify the description of ASM_CALL_CONSTRAINT to be explicit about unwinding
using framepointers.
Fixes: 0754534b8a38 ("x86/elf: Improve code generation in elf_core_save_regs()") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>