]> xenbits.xensource.com Git - xen.git/log
xen.git
4 weeks agoarm/mpu: Create the skeleton for MPU compilation
Luca Fancellu [Tue, 1 Apr 2025 08:58:58 +0000 (09:58 +0100)]
arm/mpu: Create the skeleton for MPU compilation

This commit introduces the skeleton for the MPU memory management
subsystem that allows the compilation on Arm64.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 weeks agoxen: introduce Kconfig ARCH_PAGING_MEMPOOL
Penny Zheng [Tue, 1 Apr 2025 08:58:57 +0000 (09:58 +0100)]
xen: introduce Kconfig ARCH_PAGING_MEMPOOL

ARM MPU system doesn't need to use paging memory pool, as MPU memory
mapping table at most takes only one 4KB page, which is enough to
manage the maximum 255 MPU memory regions, for all EL2 stage 1
translation and EL1 stage 2 translation.

Introduce ARCH_PAGING_MEMPOOL Kconfig common symbol, selected for Arm
MMU systems and x86. Removed stubs from RISC-V now that the common code
provide them and the functions are not gonna be used.

Wrap the code inside 'construct_domU' that deal with p2m paging
allocation in a new function 'domain_p2m_set_allocation', protected
by ARCH_PAGING_MEMPOOL, this is done in this way to prevent polluting
the former function with #ifdefs and improve readability

Introduce arch_{get,set}_paging_mempool_size stubs for architecture
with !ARCH_PAGING_MEMPOOL.

Remove 'struct paging_domain' from Arm 'struct arch_domain' when the
field is not required.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com> # arm
Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> # riscv
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 weeks agoarm/mpu: Implement stubs for ioremap_attr on MPU
Luca Fancellu [Tue, 1 Apr 2025 08:58:56 +0000 (09:58 +0100)]
arm/mpu: Implement stubs for ioremap_attr on MPU

Implement ioremap_attr() stub for MPU system; the
implementation of ioremap() is the same between MMU
and MPU system, and it relies on ioremap_attr(), so
move the definition from mmu/pt.c to arm/mm.c.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 weeks agoarm/mpu: Kconfig symbols for MPU build
Luca Fancellu [Tue, 1 Apr 2025 08:58:55 +0000 (09:58 +0100)]
arm/mpu: Kconfig symbols for MPU build

The MPU system requires static memory to work, select that
when building this memory management subsystem.

While there, provide a restriction for the ARM_EFI Kconfig
parameter to be built only when !MPU, the EFI stub is not
used as there are no implementation of UEFI services for
armv8-r.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 weeks agoxen/arm: Introduce frame_table and virt_to_page
Luca Fancellu [Tue, 1 Apr 2025 08:58:54 +0000 (09:58 +0100)]
xen/arm: Introduce frame_table and virt_to_page

Introduce frame_table in order to provide the implementation of
virt_to_page for MPU system, move the MMU variant in mmu/mm.h.

Introduce FRAMETABLE_NR that is required for 'pdx_group_valid' in
pdx.c, but leave the initialisation of the frame table to a later
stage.
Define FRAMETABLE_SIZE for MPU to support up to 1TB of ram at this
stage, as the only current implementation of armv8-r aarch64, which
is cortex R82, can support 1TB or 256TB (r82 TRM r3p1
ID_AA64MMFR0_EL1.PARange).

Take the occasion to sort alphabetically the headers following
the Xen code style and add the emacs footer in mpu/mm.c.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 weeks agoxen/arm: Implement virt/maddr conversion in MPU system
Penny Zheng [Tue, 1 Apr 2025 08:58:53 +0000 (09:58 +0100)]
xen/arm: Implement virt/maddr conversion in MPU system

virt_to_maddr and maddr_to_virt are used widely in Xen code. So
even there is no VMSA in MPU system, we keep the interface in MPU to
to avoid changing the existing common code.

In order to do that, move the virt_to_maddr() and maddr_to_virt()
definitions to mmu/mm.h, move the include of memory management
subsystems (MMU/MPU) on a different place because the mentioned
helpers needs visibility of some macro in asm/mm.h.

Finally implement virt_to_maddr() and maddr_to_virt() for MPU systems
under mpu/mm.h, the MPU version of virt/maddr conversion is simple since
VA==PA.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 weeks agoarm/mpu: Add HYPERVISOR_VIRT_START and avoid a check in xen.lds.S
Luca Fancellu [Tue, 1 Apr 2025 08:58:52 +0000 (09:58 +0100)]
arm/mpu: Add HYPERVISOR_VIRT_START and avoid a check in xen.lds.S

The define HYPERVISOR_VIRT_START is required by the common code,
even if MPU system doesn't use virtual memory, define it in
mpu/layout.h in order to reuse existing code.

Disable a check in the linker script for arm for !MMU systems.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 weeks agoxen/arm: mpu: Use new-style annotations
Michal Orzel [Wed, 2 Apr 2025 09:02:29 +0000 (11:02 +0200)]
xen/arm: mpu: Use new-style annotations

When purging old-style annotations, MPU code was left unmodified. Fix
it.

Fixes: 221c66f4f2a4 ("Arm: purge ENTRY(), ENDPROC(), and ALIGN")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agox86emul: make test harness build again as 32-bit binary
Jan Beulich [Tue, 1 Apr 2025 10:48:23 +0000 (12:48 +0200)]
x86emul: make test harness build again as 32-bit binary

Adding Q suffixes to FXSAVE/FXRSTOR did break the 32-bit build. Don't go
back though, as the hand-coded 0x48 there weren't quite right either for
the 32-bit case (they might well cause confusion when looking at the
disassembly). Instead arrange for the compiler to DCE respective asm()-s,
by short-circuiting REX_* to zero.

Fixes: 5a33ea2800c1 ("x86emul: drop open-coding of REX.W prefixes")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 weeks agox86: annotate page tables also with type
Jan Beulich [Tue, 1 Apr 2025 10:47:44 +0000 (12:47 +0200)]
x86: annotate page tables also with type

Use infrastructure from xen/linkage.h instead of the custom legacy
macros that we're in the process of phasing out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 weeks agoxen/riscv: introduce intc_preinit()
Oleksii Kurochko [Tue, 1 Apr 2025 10:46:47 +0000 (12:46 +0200)]
xen/riscv: introduce intc_preinit()

Currently, only the device tree method is available to locate and perform
pre-initialization steps for the interrupt controller (at the moment, only
one interrupt controller is going to be supported). When `acpi_disabled`
is true, the system will scan for a node with the "interrupt-controller"
property and then call `device_init()` to validate if it is an expected
interrupt controller and if yes then save this node for further usage.

If `acpi_disabled` is false, the system will panic, as ACPI support is not
yet implemented for RISC-V.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoxen/riscv: implement basic aplic_preinit()
Oleksii Kurochko [Tue, 1 Apr 2025 10:46:07 +0000 (12:46 +0200)]
xen/riscv: implement basic aplic_preinit()

Introduce preinitialization stuff for the RISC-V Advanced Platform-Level
Interrupt Controller (APLIC) in Xen:
 - Implementing the APLIC pre-initialization function (`aplic_preinit()`),
   ensuring that only one APLIC instance is supported in S mode.
 - Initialize APLIC's correspoinding DT node.
 - Declaring the DT device match table for APLIC.
 - Setting `aplic_info.hw_version` during its declaration.
 - Declaring an APLIC device.

Since Microchip originally developed aplic.c [1], an internal discussion
with them led to the decision to use the MIT license instead of the default
GPL-2.0-only.

[1] https://gitlab.com/xen-project/people/olkur/xen/-/commit/7cfb4bd4748ca268142497ac5c327d2766fb342d

Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoautomation/RISC-V: select APLIC and IMSIC to handle both wired interrupts and MSIs
Oleksii Kurochko [Tue, 1 Apr 2025 10:45:26 +0000 (12:45 +0200)]
automation/RISC-V: select APLIC and IMSIC to handle both wired interrupts and MSIs

By default, the `aia` option is set to "none" which selects the SiFive PLIC for
handling wired interrupts. However, since PLIC is now considered obsolete and
will not be supported by Xen now, APLIC and IMSIC are selected instead to manage
both wired interrupts and MSIs.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 weeks agoxen/riscv: introduce preinit_xen_time()
Oleksii Kurochko [Tue, 1 Apr 2025 10:44:51 +0000 (12:44 +0200)]
xen/riscv: introduce preinit_xen_time()

preinit_xen_time() does two things:
1. Parse timebase-frequency properpy of /cpus node to initialize cpu_khz
   variable.
2. Initialize boot_clock_cycles with the current time counter value to
   have starting point for Xen.

timebase-frequency is read as a uint32_t because it is unlikely that the
timer will run at more than 4 GHz. If timebase-frequency exceeds 4 GHz,
a panic() is triggered, since dt_property_read_u32() will return 0 if
the size of the timebase-frequency property is greater than the size of
the output variable.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoArm: purge ENTRY(), ENDPROC(), and ALIGN
Jan Beulich [Tue, 1 Apr 2025 10:43:35 +0000 (12:43 +0200)]
Arm: purge ENTRY(), ENDPROC(), and ALIGN

They're no longer used. This also makes it unnecessary to #undef two of
them in the linker script.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com> # arm
5 weeks agoArm32: use new-style entry annotations in head.S
Jan Beulich [Tue, 1 Apr 2025 10:43:06 +0000 (12:43 +0200)]
Arm32: use new-style entry annotations in head.S

Locally override SYM_PUSH_SECTION() to retain the intended section
association.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com> # arm
Acked-by: Julien Grall <jgrall@amazon.com>
5 weeks agoArm32: use new-style entry annotations for MMU code
Jan Beulich [Tue, 1 Apr 2025 10:42:39 +0000 (12:42 +0200)]
Arm32: use new-style entry annotations for MMU code

Locally override SYM_PUSH_SECTION() to retain the intended section
association.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com> # arm
Acked-by: Julien Grall <jgrall@amazon.com>
5 weeks agoCI: Add yet another HW runner
Marek Marczykowski-Górecki [Fri, 14 Mar 2025 03:06:26 +0000 (04:06 +0100)]
CI: Add yet another HW runner

This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 G7.

This one has working S3, so add a test for it here.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 weeks agox86emul: Fix blowfish build in 64bit-clean environments
Andrew Cooper [Fri, 28 Mar 2025 17:18:51 +0000 (17:18 +0000)]
x86emul: Fix blowfish build in 64bit-clean environments

In a 64bit-clean environment, blowfish fails:

  make[6]: Leaving directory
  '/builddir/build/BUILD/xen-4.19.1/tools/tests/x86_emulator'
  In file included from /usr/include/features.h:535,
                   from /usr/include/bits/libc-header-start.h:33,
                   from /usr/include/stdint.h:26,
                   from
  /usr/lib/gcc/x86_64-xenserver-linux/12/include/stdint.h:9,
                   from blowfish.c:18:
  /usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-32.h: No such
  file or directory
      7 | # include <gnu/stubs-32.h>
        |           ^~~~~~~~~~~~~~~~
  compilation terminated.
  make[6]: *** [testcase.mk:15: blowfish.bin] Error 1

because of lack of glibc-i386-devel or equivelent.  It's non-fatal, but
reduces the content in test_x86_emulator, which we do care about running.

Instead, convert all emulator testcases to being freestanding builds, resuing
the tools/firmware/include/ headers.

This in turn requires making firmware's stdint.h compatible with 64bit builds.
We now have compiler types for every standard type we use.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoxen/types: Drop compatibility for GCC < 4.4
Andrew Cooper [Fri, 28 Mar 2025 16:49:21 +0000 (16:49 +0000)]
xen/types: Drop compatibility for GCC < 4.4

We now have compiler types for every standard type we use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoxen/arm/efi: merge neighboring banks
Stefano Stabellini [Fri, 28 Mar 2025 21:33:31 +0000 (14:33 -0700)]
xen/arm/efi: merge neighboring banks

When booting from U-Boot bootefi, there can be a high number of
neighboring RAM banks. See for example:

(XEN) RAM: 0000000000000000 - 0000000000bfffff
(XEN) RAM: 0000000000c00000 - 0000000000c00fff
(XEN) RAM: 0000000000c01000 - 0000000000dfffff
(XEN) RAM: 0000000000e00000 - 000000000279dfff
(XEN) RAM: 000000000279e000 - 00000000029fffff
(XEN) RAM: 0000000002a00000 - 0000000008379fff
(XEN) RAM: 000000000837a000 - 00000000083fffff
(XEN) RAM: 0000000008400000 - 0000000008518fff
(XEN) RAM: 0000000008519000 - 00000000085fffff
(XEN) RAM: 0000000008600000 - 0000000008613fff
(XEN) RAM: 0000000008614000 - 00000000097fffff
(XEN) RAM: 0000000009800000 - 00000000098a7fff
(XEN) RAM: 00000000098a8000 - 0000000009dfffff
(XEN) RAM: 0000000009e00000 - 0000000009ea7fff
(XEN) RAM: 0000000009ea8000 - 000000001fffffff
(XEN) RAM: 0000000020000000 - 000000002007ffff
(XEN) RAM: 0000000020080000 - 0000000077b17fff
(XEN) RAM: 0000000077b19000 - 0000000077b2bfff
(XEN) RAM: 0000000077b2c000 - 0000000077c8dfff
(XEN) RAM: 0000000077c8e000 - 0000000077c91fff
(XEN) RAM: 0000000077ca7000 - 0000000077caafff
(XEN) RAM: 0000000077cac000 - 0000000077caefff
(XEN) RAM: 0000000077cd0000 - 0000000077cd2fff
(XEN) RAM: 0000000077cd4000 - 0000000077cd7fff
(XEN) RAM: 0000000077cd8000 - 000000007bd07fff
(XEN) RAM: 000000007bd09000 - 000000007fd5ffff
(XEN) RAM: 000000007fd70000 - 000000007fefffff
(XEN) RAM: 0000000800000000 - 000000087fffffff

Xen does not currently support boot modules that span multiple banks: at
least one of the regions get freed twice. The first time from
setup_mm->populate_boot_allocator, then again from
discard_initial_modules->fw_unreserved_regions. With a high number of
banks, it can be difficult to arrange the boot modules in a way that
avoids spanning across multiple banks.

This small patch merges neighboring regions, to make dealing with them
more efficient, and to make it easier to load boot modules.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 weeks agoxen/arm32: Allow ARM_PA_BITS_40 only if !MPU
Michal Orzel [Sun, 30 Mar 2025 18:03:07 +0000 (20:03 +0200)]
xen/arm32: Allow ARM_PA_BITS_40 only if !MPU

ArmV8-R AArch32 does not support LPAE. The reason being PMSAv8-32
supports 32-bit physical address only.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 weeks agoxen/arm32: Move MM specific registers to enable_mmu
Ayan Kumar Halder [Sun, 30 Mar 2025 18:03:04 +0000 (20:03 +0200)]
xen/arm32: Move MM specific registers to enable_mmu

All the memory management specific registers are initialized in enable_mmu.

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
5 weeks agoxen: gcov: add support for gcc 14
Volodymyr Babchuk [Mon, 31 Mar 2025 07:22:11 +0000 (09:22 +0200)]
xen: gcov: add support for gcc 14

gcc 14 (with patch "Add condition coverage (MC/DC)") introduced 9th
gcov counter. Also this version can call new merge function
__gcov_merge_ior(), so we need a new stub for it.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoxen/percpu: don't initialize percpu on resume
Mykyta Poturai [Mon, 31 Mar 2025 07:21:50 +0000 (09:21 +0200)]
xen/percpu: don't initialize percpu on resume

Invocation of the CPU_UP_PREPARE notification
on ARM64 during resume causes a crash:

(XEN) [  315.807606] Error bringing CPU1 up: -16
(XEN) [  315.811926] Xen BUG at common/cpu.c:258
[...]
(XEN) [  316.142765] Xen call trace:
(XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
(XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
(XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
(XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
(XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
(XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
(XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
(XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
(XEN) [  316.202383]    [<0000000000000008>] 0000000000000008

Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
INVALID_PERCPU_AREA depends solely on the system state.

If the system is suspended, this area is not freed, and during resume, an error
occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
set and park_offline_cpus remains 0:

    if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
        return park_offline_cpus ? 0 : -EBUSY;

The same crash can occur on x86 if park_offline_cpus is set
to 0 during Xen resume.

Fixes: f75780d26b2f ("xen: move per-cpu area management into common code")
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agox86/P2M: synchronize fast and slow paths of p2m_get_page_from_gfn()
Jan Beulich [Mon, 31 Mar 2025 07:21:12 +0000 (09:21 +0200)]
x86/P2M: synchronize fast and slow paths of p2m_get_page_from_gfn()

Handling of both grants and foreign pages was different between the two
paths.

While permitting access to grants would be desirable, doing so would
require more involved handling; undo that for the time being. In
particular the page reference obtained would prevent the owning domain
from changing e.g. the page's type (after the grantee has released the
last reference of the grant). Instead perhaps another reference on the
grant would need obtaining. Which in turn would require determining
which grant that was.

Foreign pages in any event need permitting on both paths.

Introduce a helper function to be used on both paths, such that
respective checking differs in just the extra "to be unshared" condition
on the fast path.

While there adjust the sanity check for foreign pages: Don't leak the
reference on release builds when on a debug build the assertion would
have triggered. (Thanks to Roger for the suggestion.)

Fixes: 80ea7af17269 ("x86/mm: Introduce get_page_from_gfn()")
Fixes: 50fe6e737059 ("pvh dom0: add and remove foreign pages")
Fixes: cbbca7be4aaa ("x86/p2m: make p2m_get_page_from_gfn() handle grant case correctly")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 weeks agotrace: convert init_trace_bufs() to constructor
Jan Beulich [Mon, 31 Mar 2025 07:20:25 +0000 (09:20 +0200)]
trace: convert init_trace_bufs() to constructor

There's no need for each arch to invoke it directly, and there's no need
for having a stub either. With the present placement of the calls to
init_constructors() it can easily be a constructor itself.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 weeks agoArm32: use new-style entry annotations for entry code
Jan Beulich [Mon, 31 Mar 2025 07:18:19 +0000 (09:18 +0200)]
Arm32: use new-style entry annotations for entry code

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com> # arm
Acked-by: Julien Grall <jgrall@amazon.com>
5 weeks agoArm32: use new-style entry annotations for library code
Jan Beulich [Mon, 31 Mar 2025 07:17:46 +0000 (09:17 +0200)]
Arm32: use new-style entry annotations for library code

No functional change, albeit all globals now become hidden, and aliasing
symbols (__aeabi_{u,}idiv) lose their function-ness and size.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com> # arm
Acked-by: Julien Grall <jgrall@amazon.com>
5 weeks agoCI: Change pipeline name for scheduled pipeline
Anthony PERARD [Thu, 27 Mar 2025 10:34:01 +0000 (10:34 +0000)]
CI: Change pipeline name for scheduled pipeline

This description is already displayed on the web UI of the list of
pipeline, but using it as "name" will make it available in webhooks as
well and can be used by a bot.

This doesn't change the behavior for other pipeline types, where the
variable isn't set.

Signed-off-by: Anthony PERARD <anthony.perard@vates.tech>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 weeks agotools/arm: Fix nr_spis handling v2
Michal Orzel [Tue, 25 Mar 2025 11:00:29 +0000 (12:00 +0100)]
tools/arm: Fix nr_spis handling v2

We are missing a way to detect whether a user provided a value for
nr_spis equal to 0 or did not provide any value (default is also 0) which
can cause issues when calculated nr_spis is > 0 and the value from domain
config is 0. Fix it by setting default value for nr_spis to newly added
LIBXL_NR_SPIS_DEFAULT i.e. UINT32_MAX (max supported nr of SPIs is 960
anyway).

Fixes: 55d62b8d4636 ("tools/arm: Reject configuration with incorrect nr_spis value")
Reported-by: Luca Fancellu <luca.fancellu@arm.com>
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
5 weeks agokconfig/randconfig: Remove non-existing config
Anthony PERARD [Wed, 26 Mar 2025 14:29:04 +0000 (14:29 +0000)]
kconfig/randconfig: Remove non-existing config

CONFIG_GCOV_FORMAT_AUTODETECT has been removed in 767e6c5fd55b.

Fixes: 767e6c5fd55b ("kconfig/gcov: remove gcc version choice from kconfig")
Signed-off-by: Anthony PERARD <anthony.perard@vates.tech>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 weeks agoCHANGELOG: Minimum toolchain requirements for x86 and ARM
Andrew Cooper [Thu, 20 Mar 2025 14:10:58 +0000 (14:10 +0000)]
CHANGELOG: Minimum toolchain requirements for x86 and ARM

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
5 weeks agoXen: Update compiler baseline checks
Andrew Cooper [Thu, 20 Mar 2025 14:05:58 +0000 (14:05 +0000)]
Xen: Update compiler baseline checks

We have checks in both xen/compiler.h, and Config.mk.  Both are incomplete.

The check in Config.mk sees $(CC) in system and cross-compiler form, so cannot
express anything more than the global baseline.  Change it to simply 5.1.

In xen/compiler.h, rewrite the expression for clarity/brevity.

Include a GCC 12.2 check for RISCV, and include a Clang 11 baseline check.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoARM/vgic: Use for_each_set_bit() in vgic-mmio*
Andrew Cooper [Fri, 30 Aug 2024 13:25:28 +0000 (14:25 +0100)]
ARM/vgic: Use for_each_set_bit() in vgic-mmio*

These are all loops over a scalar value, and don't need to call general bitop
helpers behind the scenes.

Clamp data to the width of the access in dispatch_mmio_write(), rather than
doing so in every handler.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 weeks agoARM/vgic: Use for_each_set_bit() in vgic_mmio_write_sgir()
Andrew Cooper [Fri, 30 Aug 2024 13:25:28 +0000 (14:25 +0100)]
ARM/vgic: Use for_each_set_bit() in vgic_mmio_write_sgir()

The bitmap_for_each() expression only inspects the bottom 8 bits of targets.
Change it's type to uint8_t and use for_each_set_bit() which is more efficient
over scalars.

GICD_SGI_TARGET_LIST_MASK is 2 bits wide.  Two cases discard the prior
calculation of targets, and one case exits early.

Therefore, move the GICD_SGI_TARGET_MASK calculation into the only case which
wants it, and use MASK_EXTR() to simplify the expression.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
5 weeks agoARM/vgic: Fix out-of-bounds accesses in vgic_mmio_write_sgir()
Andrew Cooper [Wed, 26 Mar 2025 15:26:56 +0000 (15:26 +0000)]
ARM/vgic: Fix out-of-bounds accesses in vgic_mmio_write_sgir()

The switch() statement is over bits 24:25 (unshifted) of the guest provided
value.  This makes case 0x3: dead, and not an implementation of the 4th
possible state.

A guest which writes (0x3 << 24) | (0xff << 16) to this register will skip the
early exit, then enter bitmap_for_each() with targets not bound by nr_vcpus.

If the guest has fewer than 8 vCPUs, bitmap_for_each() will read off the end
of d->vcpu[] and use the resulting vcpu pointer to ultimately derive irq, and
perform out-of-bounds writes.

Fix this by changing case 0x3 to default.

Fixes: 08c688ca6422 ("ARM: new VGIC: Add SGIR register handler")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 weeks agoxen/riscv: add H extension to -march
Oleksii Kurochko [Thu, 27 Mar 2025 11:23:10 +0000 (12:23 +0100)]
xen/riscv: add H extension to -march

H provides additional instructions and CSRs that control the new stage of
address translation and support hosting a guest OS in virtual S-mode
(VS-mode).

According to the Unprivileged Architecture (version 20240411) specification:
```
Table 74 summarizes the standardized extension names. The table also defines
the canonical order in which extension names must appear in the name string,
with top-to-bottom in table indicating first-to-last in the name string, e.g.,
RV32IMACV is legal, whereas RV32IMAVC is not.
```
According to Table 74, the h extension is placed last in the one-letter
extensions name part of the ISA string.

`h` is a standalone extension based on the patch [1] but it wasn't so
before.
As the minimal supported GCC version to build Xen for RISC-V is 12.2.0,
and for that version, h is still considered a prefix for the hypervisor
extension but the name of hypervisor extension must be more then 1 letter
extension, a workaround ( with using `hh` as an H extension name ) is
implemented as otherwise the following compilation error will occur:
 error: '-march=rv64gc_h_zbb_zihintpause': name of hypervisor extension
        must be more than 1 letter

After GCC version 13.1.0, the commit [1] introducing H extension support
allows us to drop the workaround with `hh` as hypervisor extension name
and use only one h in -march.

[1] https://github.com/gcc-mirror/gcc/commit/0cd11d301013af50a3fae0694c909952e94e20d5#diff-d6f7db0db31bfb339b01bec450f1b905381eb4730cc5ab2b2794971e34647d64R148

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoArm/domctl: correct XEN_DOMCTL_vuart_op error return value
Jan Beulich [Thu, 27 Mar 2025 11:22:39 +0000 (12:22 +0100)]
Arm/domctl: correct XEN_DOMCTL_vuart_op error return value

copy_to_guest() returns the number of bytes not copied; that's not what
the function should return to its caller though. Convert to returning
-EFAULT instead.

Fixes: 86039f2e8c20 ("xen/arm: vpl011: Add a new domctl API to initialize vpl011")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>
5 weeks agox86/pmstat: correct get_cpufreq_para()'s error return value
Jan Beulich [Thu, 27 Mar 2025 11:22:06 +0000 (12:22 +0100)]
x86/pmstat: correct get_cpufreq_para()'s error return value

copy_to_guest() returns the number of bytes not copied; that's not what
the function should return to its caller though. Convert to returning
-EFAULT instead.

Fixes: 7542c4ff00f2 ("Add user PM control interface")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 weeks agox86/PVH: account for module command line length
Jan Beulich [Thu, 27 Mar 2025 11:21:08 +0000 (12:21 +0100)]
x86/PVH: account for module command line length

As per observation in practice, initrd->cmdline_pa is not normally zero.
Hence so far we always appended at least one byte. That alone may
already render insufficient the "allocation" made by find_memory().
Things would be worse when there's actually a (perhaps long) command
line.

Skip setup when the command line is empty. Amend the "allocation" size
by padding and actual size of module command line. Along these lines
also skip initrd setup when the initrd is zero size.

Fixes: 0ecb8eb09f9f ("x86/pvh: pass module command line to dom0")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
5 weeks agoautomation/cirrus-ci: add smoke tests for the FreeBSD builds
Roger Pau Monne [Fri, 14 Mar 2025 12:37:46 +0000 (13:37 +0100)]
automation/cirrus-ci: add smoke tests for the FreeBSD builds

Introduce a basic set of smoke tests using the XTF selftest image, and run
them on QEMU.  Use the matrix keyword to create a different task for each
XTF flavor on each FreeBSD build.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 weeks agoautomation/cirrus-ci: store XTF and Xen build artifacts
Roger Pau Monne [Fri, 14 Mar 2025 12:01:36 +0000 (13:01 +0100)]
automation/cirrus-ci: store XTF and Xen build artifacts

In preparation for adding some smoke tests that will consume those outputs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 weeks agoautomation/cirrus-ci: build XTF
Roger Pau Monne [Fri, 14 Mar 2025 11:16:19 +0000 (12:16 +0100)]
automation/cirrus-ci: build XTF

In preparation for using the XTF selftests to smoke test the FreeBSD based
Xen builds.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 weeks agoautomation/cirrus-ci: use matrix keyword to generate per-version build tasks
Roger Pau Monne [Sat, 15 Mar 2025 08:35:12 +0000 (09:35 +0100)]
automation/cirrus-ci: use matrix keyword to generate per-version build tasks

Move the current logic to use the matrix keyword to generate a task for
each version of FreeBSD we want to build Xen on.  The matrix keyword
however cannot be used in YAML aliases, so it needs to be explicitly used
inside of each task, which creates a bit of duplication.  At least abstract
the FreeBSD minor version numbers to avoid repetition of image names.

Note that the full build uses matrix over an env variable instead of using
it directly in image_family.  This is so that the alias can also be set
based on the FreeBSD version, in preparation for adding further tasks that
will depend on the full build having finished.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 weeks agox86/bitops: Account for POPCNT errata on earlier Intel CPUs
Andrew Cooper [Tue, 25 Mar 2025 18:02:03 +0000 (18:02 +0000)]
x86/bitops: Account for POPCNT errata on earlier Intel CPUs

Manually break the false dependency for the benefit of cases such as
bitmap_weight() which is a reasonable hotpath.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 weeks agox86/elf: Remove ASM_CALL_CONSTRAINT from elf_core_save_regs()
Andrew Cooper [Tue, 25 Mar 2025 17:55:33 +0000 (17:55 +0000)]
x86/elf: Remove ASM_CALL_CONSTRAINT from elf_core_save_regs()

I was mistaken about when ASM_CALL_CONSTRAINT is applicable.  It is not
applicable for plain pushes/pops, so remove it from the flags logic.

Clarify the description of ASM_CALL_CONSTRAINT to be explicit about unwinding
using framepointers.

Fixes: 0754534b8a38 ("x86/elf: Improve code generation in elf_core_save_regs()")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agox86/emul: Emulate %cr8 accesses
Andrew Cooper [Mon, 17 Mar 2025 17:48:51 +0000 (17:48 +0000)]
x86/emul: Emulate %cr8 accesses

Petr reports:

  (XEN) MMIO emulation failed (1): d12v1 64bit @ 0010:fffff8057ba7dfbf -> 45 0f 20 c2 ...

during introspection.

This is MOV %cr8, which is wired up for hvm_mov_{to,from}_cr(); the VMExit
fastpaths, but not for the full emulation slowpaths.

Xen's handling of %cr8 turns out to be quite wrong.  At a minimum, we need
storage for %cr8 separate to APIC_TPR, and to alter intercepts based on
whether the vLAPIC is enabled or not.  But that's more work than there is time
for in the short term, so make a stopgap fix.

Extend hvmemul_{read,write}_cr() with %cr8 cases.  Unlike hvm_mov_to_cr(),
hardware hasn't filtered out invalid values (#GP checks are ahead of
intercepts), so introduce X86_CR8_VALID_MASK.

Reported-by: Petr Beneš <w1benny@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agox86/emul: Rearrange the logic in hvmemul_{read,write}_cr()
Andrew Cooper [Mon, 24 Mar 2025 21:44:30 +0000 (21:44 +0000)]
x86/emul: Rearrange the logic in hvmemul_{read,write}_cr()

In hvmemul_read_cr(), make the TRACE()/X86EMUL_OKAY path common in preparation
for adding a %cr8 case.  Use a local 'val' variable instead of always
operating on a deferenced pointer.

In both, calculate curr once.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 weeks agoCI: Update build tests based on new minimum toolchain requirements
Andrew Cooper [Thu, 20 Mar 2025 14:13:56 +0000 (14:13 +0000)]
CI: Update build tests based on new minimum toolchain requirements

Drop CentOS 7 entirely.  It's way to old now.

Ubuntu 22.04 is the oldest Ubuntu with a suitable version of Clang, so swap
the 16.04 clang builds for 22.04.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
5 weeks agox86/PVH: expose OEMx ACPI tables to Dom0
Jan Beulich [Wed, 26 Mar 2025 11:32:03 +0000 (12:32 +0100)]
x86/PVH: expose OEMx ACPI tables to Dom0

What they contain we don't know, but we can't sensibly hide them. On my
Skylake system OEM1 (with a description of "INTEL  CPU EIST") is what
contains all the _PCT, _PPC, and _PSS methods, i.e. about everything
needed for cpufreq. (_PSD interestingly are in an SSDT there.)

Further OEM2 there has a description of "INTEL  CPU  HWP", while OEM4
has "INTEL  CPU  CST". Pretty clearly all three need exposing for
cpufreq and cpuidle to work.

Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 weeks agox86/pmstat: fold two allocations in get_cpufreq_para()
Jan Beulich [Wed, 26 Mar 2025 11:31:33 +0000 (12:31 +0100)]
x86/pmstat: fold two allocations in get_cpufreq_para()

There's little point in allocation two uint32_t[] arrays separately.
We'll need the bigger of the two anyway, and hence we can use that
bigger one also for transiently storing the smaller number of items.

While there also drop j (we can use i twice) and adjust the type of
the remaining two variables on that line.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 weeks agoxenpm: sanitize allocations in show_cpufreq_para_by_cpuid()
Jan Beulich [Wed, 26 Mar 2025 11:30:57 +0000 (12:30 +0100)]
xenpm: sanitize allocations in show_cpufreq_para_by_cpuid()

malloc(), when passed zero size, may return NULL (the behavior is
implementation defined). Mirror the ->gov_num check to the other two
allocations as well. Don't chance then actually using a NULL in
print_cpufreq_para().

Fixes: 75e06d089d48 ("xenpm: add cpu frequency control interface, through which user can")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
5 weeks agolib/inflate.c: remove dead code
Ariel Otilibili [Wed, 26 Mar 2025 11:30:35 +0000 (12:30 +0100)]
lib/inflate.c: remove dead code

This is a follow up from a discussion in Xen:

The if-statement tests that `res` is non-zero; meaning the case zero is
never reached.

Link: https://lore.kernel.org/all/7587b503-b2ca-4476-8dc9-e9683d4ca5f0@suse.com/
Link: https://lkml.kernel.org/r/20241219092615.644642-2-ariel.otilibili-anieli@eurecom.fr
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ariel Otilibili <ariel.otilibili-anieli@eurecom.fr>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 41c761dede6e
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agoarinc653: move next_switch_time access under lock
Jan Beulich [Tue, 25 Mar 2025 08:23:48 +0000 (09:23 +0100)]
arinc653: move next_switch_time access under lock

Even before its recent movement to the scheduler's private data
structure it looks to have been wrong to update the field under lock,
but then read it with the lock no longer held.

Coverity-ID: 1644500
Fixes: 9f0c658baedc ("arinc: add cpu-pool support to scheduler")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Nathan Studer <nathan.studer@dornerworks.com>
6 weeks agox86/irq: introduce APIC_VECTOR_VALID()
Denis Mukhin [Tue, 25 Mar 2025 08:22:59 +0000 (09:22 +0100)]
x86/irq: introduce APIC_VECTOR_VALID()

Add new macro APIC_VECTOR_VALID() to validate the interrupt vector
range as per [1]. This macro replaces hardcoded checks against the
open-coded value 16 in LAPIC and virtual LAPIC code and simplifies
the code a bit.

[1] Intel SDM volume 3A
    Chapter "ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER"
    Section "Valid Interrupt Vectors"

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agodocs: Add some details on XenServer PCI devices
Frediano Ziglio [Tue, 25 Mar 2025 08:22:43 +0000 (09:22 +0100)]
docs: Add some details on XenServer PCI devices

Describe the usage of devices 5853:0002 and 5853:C000.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
6 weeks agoRevert "x86: make Viridian support optional"
Jan Beulich [Mon, 24 Mar 2025 13:36:57 +0000 (14:36 +0100)]
Revert "x86: make Viridian support optional"

This reverts commit e0cf36bf295b40cac71af26b35eedee216e156ff. It
introduced not just UBSAN failures, but apparentlz actual NULL
de-references.

6 weeks agox86: make Viridian support optional
Sergiy Kibrik [Mon, 24 Mar 2025 11:55:39 +0000 (12:55 +0100)]
x86: make Viridian support optional

Add config option HVM_VIRIDIAN that covers viridian code within HVM.
Calls to viridian functions guarded by is_viridian_domain() and related macros.
Having this option may be beneficial by reducing code footprint for systems
that are not using Hyper-V.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agoprocess/release: mention MAINTAINERS adjustments
Jan Beulich [Mon, 24 Mar 2025 11:55:24 +0000 (12:55 +0100)]
process/release: mention MAINTAINERS adjustments

For many major releases I've been updating ./MAINTAINERS _after_ the
respective branch was handed over to me. That update, however, is
relevant not only from the .1 minor release onwards, but right from the
.0 release. Hence it ought to be done as one of the last things before
tagging the tree for the new major release.

See the seemingly unrelated parts (as far as the commit subject goes) of
e.g. 9d465658b405 ("update Xen version to 4.20.1-pre") for an example.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
6 weeks agox86/svm: use nsvm_efer_svm_enabled() to check guest's EFER.SVME
Sergiy Kibrik [Mon, 24 Mar 2025 11:55:00 +0000 (12:55 +0100)]
x86/svm: use nsvm_efer_svm_enabled() to check guest's EFER.SVME

There's a macro for this, might improve readability a bit & save a bit of space.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/PVH: don't open-code elf_round_up()
Jan Beulich [Mon, 24 Mar 2025 11:54:27 +0000 (12:54 +0100)]
x86/PVH: don't open-code elf_round_up()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/traps: Introduce early_traps_init() and simplify setup
Andrew Cooper [Sat, 28 Dec 2024 14:56:40 +0000 (14:56 +0000)]
x86/traps: Introduce early_traps_init() and simplify setup

Something I overlooked when last cleaning up exception handling is that a TSS
is not necessary if IST isn't configured, and IST isn't necessary until we're
running guest code.

Introduce early_traps_init(), and rearrange the existing logic between this
and traps_init() later on boot, to allow defering TSS and IST setup.

In early_traps_init(), load the IDT and invalidate TR/LDTR; this sufficient
system-table setup to make exception handling work.  The setup of the BSPs
per-cpu variables stay early too; they're used on certain error paths.

Move load_system_tables() later into traps_init().  Note that it already
contains enable_each_ist(), so this call is simply dropped.

This removes some complexity prior to having exception support, and lays the
groundwork to not even allocate a TSS when using FRED.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/traps: Move trap_init() into traps-setup.c
Andrew Cooper [Mon, 17 Mar 2025 18:48:18 +0000 (18:48 +0000)]
x86/traps: Move trap_init() into traps-setup.c

... and rename to traps_init() for consistency.  Move the declaration from
asm/system.h into asm/traps.h.

This also involves moving init_ler() and variables.  Move the declaration of
ler_msr from asm/msr.h to asm/traps.h.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/traps: Move percpu_traps_init() into traps-setup.c
Andrew Cooper [Tue, 31 Dec 2024 15:56:34 +0000 (15:56 +0000)]
x86/traps: Move percpu_traps_init() into traps-setup.c

Move the declaration from asm/system.h into asm/traps.h.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/traps: Move cpu_init() out of trap_init()
Andrew Cooper [Mon, 6 Jan 2025 06:36:34 +0000 (06:36 +0000)]
x86/traps: Move cpu_init() out of trap_init()

cpu_init() doesn't particularly belong in trap_init().  This brings the BSP
more in line with the APs.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/boot: Simplify the expression for extra allocation space
Andrew Cooper [Wed, 19 Mar 2025 12:12:37 +0000 (12:12 +0000)]
x86/boot: Simplify the expression for extra allocation space

The expression for one parameter of find_memory() is already complicated and
about to become moreso.  Break it out into a new variable, and express it in
an easier-to-follow way.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
6 weeks agoxen/compiler: Fix the position of the visibility pragma
Andrew Cooper [Tue, 18 Mar 2025 13:32:50 +0000 (13:32 +0000)]
xen/compiler: Fix the position of the visibility pragma

This needs to be ahead of everything.  Right now, it is after xen/init.h being
included for -DINIT_SECTIONS_ONLY

  # 1 "./include/xen/compiler.h" 1
  # 83 "./include/xen/compiler.h"
  # 1 "./include/xen/init.h" 1
  # 62 "./include/xen/init.h"
  typedef int (*initcall_t)(void);
  typedef void (*exitcall_t)(void);
  # 72 "./include/xen/init.h"
  void do_presmp_initcalls(void);
  void do_initcalls(void);
  # 84 "./include/xen/compiler.h" 2
  # 122 "./include/xen/compiler.h"
  #pragma GCC visibility push(hidden)

Fixes: 84c4461b7d3a ("Force out-of-line instances of inline functions into .init.text in init-only code")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agokconfig/randconfig: enable UBSAN for randconfig
Roger Pau Monne [Wed, 12 Mar 2025 17:51:43 +0000 (18:51 +0100)]
kconfig/randconfig: enable UBSAN for randconfig

Introduce an additional Kconfig check to only offer the option if the
compiler supports -fsanitize=undefined.

We no longer use Travis CI, so the original motivation for not enabling
UBSAN might no longer present.  Regardless, the option won't be present in
the first place if the compiler doesn't support -fsanitize=undefined.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/vga: fix mapping of the VGA text buffer
Roger Pau Monne [Mon, 17 Mar 2025 17:51:21 +0000 (18:51 +0100)]
x86/vga: fix mapping of the VGA text buffer

The call to ioremap_wc() in video_init() will always fail, because
video_init() is called ahead of vm_init_type(), and so the underlying
__vmap() call will fail to allocate the linear address space.

Fix by reverting to the previous behavior and use __va() for the VGA text
buffer, as it's below the 1MB boundary, and thus always mapped in the
directmap.

Fixes: 81d195c6c0e2 ('x86: introduce ioremap_wc()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/mkelf32: account for offset when detecting note segment placement
Roger Pau Monne [Wed, 5 Mar 2025 17:08:13 +0000 (18:08 +0100)]
x86/mkelf32: account for offset when detecting note segment placement

mkelf32 attempt to check that the program header defined NOTE segment falls
inside of the LOAD segment, as the build-id should be loaded for Xen at
runtime to check.

However the current code doesn't take into account the LOAD program header
segment offset when calculating overlap with the NOTE segment.  This
results in incorrect detection, and the following build error:

arch/x86/boot/mkelf32 --notes xen-syms ./.xen.elf32 0x200000 \
               `nm xen-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$/0x\1/p'`
Expected .note section within .text section!
Offset 4244776 not within 2910364!

When xen-syms has the following program headers:

Program Header:
    LOAD off    0x0000000000200000 vaddr 0xffff82d040200000 paddr 0x0000000000200000 align 2**21
         filesz 0x00000000002c689c memsz 0x00000000003f7e20 flags rwx
    NOTE off    0x000000000040c528 vaddr 0xffff82d04040c528 paddr 0x000000000040c528 align 2**2
         filesz 0x0000000000000024 memsz 0x0000000000000024 flags r--

Account for the program header offset of the LOAD segment when checking
whether the NOTE segments is contained within.  Also fix the logic to
ensure the NOTE segments is fully contained between the LOAD segment.

Fixes: a353cab905af ('build_id: Provide ld-embedded build-ids')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/boot: clarify comment about trampoline_setup usage
Roger Pau Monne [Tue, 4 Mar 2025 14:28:11 +0000 (15:28 +0100)]
x86/boot: clarify comment about trampoline_setup usage

Clarify that trampoline_setup is only used for EFI when booted using the
multiboot2 entry point.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agoautomation/console.exp: do not assume expect is always at /usr/bin/
Roger Pau Monne [Mon, 17 Mar 2025 09:31:07 +0000 (10:31 +0100)]
automation/console.exp: do not assume expect is always at /usr/bin/

Instead use env to find the location of expect.

Additionally do not use the -f flag, as it's only meaningful when passing
arguments on the command line, which we never do for console.exp.  From the
expect 5.45.4 man page:

> The -f flag prefaces a file from which to read commands from.  The flag
> itself is optional as it is only useful when using the #! notation (see
> above), so  that other arguments may be supplied on the command line.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agoautomation/cirrus-ci: store Xen Kconfig before doing a build
Roger Pau Monne [Fri, 14 Mar 2025 10:55:48 +0000 (11:55 +0100)]
automation/cirrus-ci: store Xen Kconfig before doing a build

In case the build fails or gets stuck, store the Kconfig file ahead of
starting the build.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agoautomation/cirrus-ci: update FreeBSD to 13.5
Roger Pau Monne [Fri, 14 Mar 2025 10:49:28 +0000 (11:49 +0100)]
automation/cirrus-ci: update FreeBSD to 13.5

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agoautomation/cirrus-ci: add timestamps
Roger Pau Monne [Fri, 14 Mar 2025 10:44:45 +0000 (11:44 +0100)]
automation/cirrus-ci: add timestamps

Such timestamps can still be disabled from the Web UI using a tick box.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agox86/shadow: fix UB pointer arithmetic in sh_mfn_is_a_page_table()
Roger Pau Monne [Tue, 18 Mar 2025 08:20:59 +0000 (09:20 +0100)]
x86/shadow: fix UB pointer arithmetic in sh_mfn_is_a_page_table()

UBSAN complains with:

UBSAN: Undefined behaviour in arch/x86/mm/shadow/private.h:515:30
pointer operation overflowed ffff82e000000000 to ffff82dfffffffe0
[...]
Xen call trace:
    [<ffff82d040303782>] R common/ubsan/ubsan.c#ubsan_epilogue+0xa/0xc0
    [<ffff82d040304bc3>] F __ubsan_handle_pointer_overflow+0xcb/0x100
    [<ffff82d040471b2d>] F arch/x86/mm/shadow/guest_2.c#sh_page_fault__guest_2+0x1e350
    [<ffff82d0403b206b>] F svm_vmexit_handler+0xdf3/0x2450
    [<ffff82d0402049c0>] F svm_stgi_label+0x5/0x15

Fix by moving the call to mfn_to_page() after the check of whether the
passed gmfn is valid.  This avoid the call to mfn_to_page() with an
INVALID_MFN parameter.

While there make the page local variable const, it's not modified by the
function.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/xlat: fix UB pointer arithmetic in COMPAT_ARG_XLAT_VIRT_BASE
Roger Pau Monne [Tue, 18 Mar 2025 08:20:41 +0000 (09:20 +0100)]
x86/xlat: fix UB pointer arithmetic in COMPAT_ARG_XLAT_VIRT_BASE

UBSAN complains with:

UBSAN: Undefined behaviour in common/compat/memory.c:90:9
pointer operation overflowed ffff820080000000 to 0000020080000000
[...]
Xen call trace:
    [<ffff82d040303782>] R common/ubsan/ubsan.c#ubsan_epilogue+0xa/0xc0
    [<ffff82d040304bc3>] F __ubsan_handle_pointer_overflow+0xcb/0x100
    [<ffff82d0402a6259>] F compat_memory_op+0xf1/0x4d20
    [<ffff82d04041532d>] F hvm_memory_op+0x55/0xe0
    [<ffff82d040416150>] F hvm_hypercall+0xae8/0x21b0
    [<ffff82d0403b24ca>] F svm_vmexit_handler+0x1252/0x2450
    [<ffff82d0402049c0>] F svm_stgi_label+0x5/0x15

Adjust the calculations in COMPAT_ARG_XLAT_VIRT_BASE to subtract from the
per-domain area to obtain the mirrored linear address in the 4th slot,
instead of overflowing the per-domain linear address.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/wait: prevent duplicated assembly labels
Roger Pau Monne [Fri, 14 Mar 2025 09:40:49 +0000 (10:40 +0100)]
x86/wait: prevent duplicated assembly labels

When enabling UBSAN with clang, the following error is triggered during the
build:

common/wait.c:154:9: error: symbol '.L_wq_resume' is already defined
  154 |         "push %%rbx; push %%rbp; push %%r12;"
      |         ^
<inline asm>:1:121: note: instantiated into assembly here
    1 |         push %rbx; push %rbp; push %r12;push %r13; push %r14; push %r15;sub %esp,%ecx;cmp $4096, %ecx;ja .L_skip;mov %rsp,%rsi;.L_wq_resume: rep movsb;mov %rsp,%rsi;.L_skip:pop %r15; pop %r14; pop %r13;pop %r12; pop %rbp; pop %rbx
      |                                                                                                                                ^
common/wait.c:154:9: error: symbol '.L_skip' is already defined
  154 |         "push %%rbx; push %%rbp; push %%r12;"
      |         ^
<inline asm>:1:159: note: instantiated into assembly here
    1 |         push %rbx; push %rbp; push %r12;push %r13; push %r14; push %r15;sub %esp,%ecx;cmp $4096, %ecx;ja .L_skip;mov %rsp,%rsi;.L_wq_resume: rep movsb;mov %rsp,%rsi;.L_skip:pop %r15; pop %r14; pop %r13;pop %r12; pop %rbp; pop %rbx
      |                                                                                                                                                                      ^
2 errors generated.

The inline assembly block in __prepare_to_wait() is duplicated, thus
leading to multiple definitions of the otherwise unique labels inside the
assembly block.  GCC extended-asm documentation notes the possibility of
duplicating asm blocks:

> Under certain circumstances, GCC may duplicate (or remove duplicates of)
> your assembly code when optimizing. This can lead to unexpected duplicate
> symbol errors during compilation if your asm code defines symbols or
> labels. Using ‘%=’ (see AssemblerTemplate) may help resolve this problem.

Workaround the issue by latching esp to a local variable, this prevents
clang duplicating the inline asm blocks.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/msi: always propagate MSI register writes from __setup_msi_irq()
Roger Pau Monne [Tue, 18 Mar 2025 08:31:35 +0000 (09:31 +0100)]
x86/msi: always propagate MSI register writes from __setup_msi_irq()

After 8e60d47cf011 writes from __setup_msi_irq() will no longer be
propagated to the MSI registers if the IOMMU IRTE was already allocated.
Given the purpose of __setup_msi_irq() is MSI initialization, always
propagate the write to the hardware, regardless of whether the IRTE was
already allocated.

No functional change expected, as the write should always be propagated in
__setup_msi_irq(), but make it explicit on the write_msi_msg() call.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/msi: always propagate MSI writes when not in active system mode
Roger Pau Monne [Mon, 17 Mar 2025 14:40:11 +0000 (15:40 +0100)]
x86/msi: always propagate MSI writes when not in active system mode

Relax the limitation on MSI register writes, and only apply it when the
system is in active state.  For example AMD IOMMU drivers rely on using
set_msi_affinity() to force an MSI register write on resume from
suspension.

The original patch intention was to reduce the number of MSI register
writes when the system is in active state.  Leave the other states to
always perform the writes, as it's safer given the existing code, and it's
expected to not make a difference performance wise.

For such propagation to work even when the IRT index is not updated the MSI
message must be adjusted in all success cases for AMD IOMMU, not just when
the index has been newly allocated.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Fixes: 8e60d47cf011 ('x86/iommu: avoid MSI address and data writes if IRT index hasn't changed')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
6 weeks agox86/setup: correct off-by-1 in module mapping
Jan Beulich [Thu, 20 Mar 2025 07:51:55 +0000 (08:51 +0100)]
x86/setup: correct off-by-1 in module mapping

If a module's length is an exact multiple of PAGE_SIZE, the 2nd argument
passed to set_pdx_range() would be one larger than intended. Use
PFN_{UP,DOWN}() there instead.

Fixes: cd7cc5320bb2 ("x86/boot: add start and size fields to struct boot_module")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 weeks agoxen/console: fix trailing whitespaces
Denis Mukhin [Thu, 20 Mar 2025 07:51:14 +0000 (08:51 +0100)]
xen/console: fix trailing whitespaces

Remove trailing whitespaces in the console driver.

No functional change.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agoxen: Update toolchain requirements to GCC 5.1/Binutils 2.25 or Clang/LLVM 11
Andrew Cooper [Fri, 7 Mar 2025 17:29:10 +0000 (17:29 +0000)]
xen: Update toolchain requirements to GCC 5.1/Binutils 2.25 or Clang/LLVM 11

GCC 4.1.2 is from 2007, and Binutils 2.16 is a similar vintage.  Clang 3.5 is
from 2014.  Supporting toolchains this old is a massive development and
testing burden.

Set a minimum baseline of GCC 5.1 across the board, along with Binutils 2.25
which is the same age.  These were chosen *3 years ago* as Linux's minimum
requirements because even back then, they were ubiquitous in distros.  Choose
Clang/LLVM 11 as a baseline for similar reasons; the Linux commit making this
change two years ago cites a laudry list of code generation bugs.

This will allow us to retire a lot of compatiblity logic, and start using new
features previously unavailable because of no viable compatibility option.

Merge the ARM 32bit and 64bit sections now they're the same.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
6 weeks agoxen/arinc653: call xfree() with local IRQ enabled
Anderson Choi [Tue, 18 Mar 2025 07:34:15 +0000 (16:34 +0900)]
xen/arinc653: call xfree() with local IRQ enabled

xen panic is observed with the following configuration.

1. Debug xen build (CONFIG_DEBUG=y)
2. dom1 of an ARINC653 domain
3. shutdown dom1 with xl command

$ xl shutdown <domain_name>

(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
(XEN) ****************************************

panic was triggered since xfree() was called with local IRQ disabled and
therefore assertion failed.

Fix this by calling xfree() after local IRQ is enabled.

Fixes: 19049f8d796a sched: fix locking in a653sched_free_vdata()
Signed-off-by: Anderson Choi <anderson.choi@boeing.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Nathan Studer <nathan.studer@dornerworks.com>
6 weeks agox86/mm: Fix IS_ALIGNED() check in IS_LnE_ALIGNED()
Andrew Cooper [Wed, 19 Mar 2025 02:58:18 +0000 (02:58 +0000)]
x86/mm: Fix IS_ALIGNED() check in IS_LnE_ALIGNED()

The current CI failures turn out to be a latent bug triggered by a narrow set
of properties of the initrd and the host memory map, which CI encountered by
chance.

One step during boot involves constructing directmap mappings for modules.
With some probing at the point of creation, it is observed that there's a 4k
mapping missing towards the end of the initrd.

  (XEN) === Mapped Mod1 [000000039400100000000003be1ff6dc] to Directmap
  (XEN) Probing paddr 394001000, va ffff830394001000
  (XEN) Probing paddr 3be1ff6db, va ffff8303be1ff6db
  (XEN) Probing paddr 3bdffffff, va ffff8303bdffffff
  (XEN) Probing paddr 3be001000, va ffff8303be001000
  (XEN) Probing paddr 3be000000, va ffff8303be000000
  (XEN) Early fatal page fault at e008:ffff82d04032014c (cr2=ffff8303be000000, ec=0000)

The conditions for this bug appear to be map_pages_to_xen() call with a start
address of exactly 4k beyond a 2M boundary, some number of full 2M pages, then
a tail needing 4k pages.

Anyway, the condition for spotting superpage boundaries in map_pages_to_xen()
is wrong.  The IS_ALIGNED() macro expects a power of two for the alignment
argument, and subtracts 1 itself.

Fixing this causes the failing case to now boot.

Fixes: 97fb6fcf26e8 ("x86/mm: introduce helpers to detect super page alignment")
Debugged-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agoCHANGELOG.md: Mention PCI passthrough for HVM domUs
Jiqian Chen [Tue, 18 Mar 2025 08:48:00 +0000 (09:48 +0100)]
CHANGELOG.md: Mention PCI passthrough for HVM domUs

PCI passthrough is already supported for HVM domUs when dom0 is PVH
on x86. The last related patch on Qemu side was merged after Xen4.20
release. So mention this feature in Xen4.21 entry.

But SR-IOV is not yet supported on PVH dom0, add a note for it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
7 weeks agotools/xenstored: use xenmanage_poll_changed_domain()
Juergen Gross [Tue, 18 Mar 2025 08:47:45 +0000 (09:47 +0100)]
tools/xenstored: use xenmanage_poll_changed_domain()

Instead of checking each known domain after having received a
VIRQ_DOM_EXC event, use the new xenmanage_poll_changed_domain()
function for directly getting the domid of a domain having changed
its state.

A test doing "xl shutdown" of 1000 guests has shown to reduce the
consumed cpu time of xenstored by 6% with this change applied.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
7 weeks agotools/xenstored: use unique_id to identify new domain with same domid
Juergen Gross [Tue, 18 Mar 2025 08:47:15 +0000 (09:47 +0100)]
tools/xenstored: use unique_id to identify new domain with same domid

Use the new unique_id of a domain in order to detect that a domain
has been replaced with another one reusing the domain-id of the old
domain.

While changing the related code, switch from "dom_invalid" to
"dom_valid" in order to avoid double negation and use "bool" as type
for it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
7 weeks agosymbols: sanitize a few variable's types
Jan Beulich [Tue, 18 Mar 2025 08:46:38 +0000 (09:46 +0100)]
symbols: sanitize a few variable's types

Parameter and return types of symbols_expand_symbol() make clear that
xensyms_read()'s next_offset doesn't need to be 64-bit.

xensyms_read()'s first parameter type makes clear that the function's
next_symbols doesn't need to be 64-bit.

symbols_num_syms'es type makes clear that iteration locals in
symbols_lookup() don't need to be unsigned long (i.e. 64-bit on 64-bit
architectures).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agosymbols: don't over-align generated data
Jan Beulich [Tue, 18 Mar 2025 08:44:57 +0000 (09:44 +0100)]
symbols: don't over-align generated data

x86 is one of the few architectures where .align has the same meaning as
.balign; most other architectures (Arm, PPC, and RISC-V in particular)
give it the same meaning as .p2align. Aligning every one of these item
to 256 bytes (on all 64-bit architectures except x86-64) is clearly too
much.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agotools: Mark ACPI SDTs as NVS in the PVH build path
Alejandro Vallejo [Tue, 18 Mar 2025 08:44:18 +0000 (09:44 +0100)]
tools: Mark ACPI SDTs as NVS in the PVH build path

Commit cefeffc7e583 marked ACPI tables as NVS in the hvmloader path
because SeaBIOS may otherwise just mark it as RAM. There is, however,
yet another reason to do it even in the PVH path. Xen's incarnation of
AML relies on having access to some ACPI tables (e.g: _STA of Processor
objects relies on reading the processor online bit in its MADT entry)

This is problematic if the OS tries to reclaim ACPI memory for page
tables as it's needed for runtime and can't be reclaimed after the OSPM
is up and running.

Fixes: de6d188a519f ("hvmloader: flip "ACPI data" to "ACPI NVS" type for ACPI table region)"
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/hvm: Use for_each_set_bit() in hvm_emulate_writeback()
Andrew Cooper [Tue, 11 Jun 2024 19:03:32 +0000 (20:03 +0100)]
x86/hvm: Use for_each_set_bit() in hvm_emulate_writeback()

... which is more consise than the opencoded form, and more efficient when
compiled.

Furthermore, now that find_{first,next}_bit() are no longer in use, the
seg_reg_{accessed,dirty} fields aren't forced to be unsigned long, although
they do need to remain unsigned int because of __set_bit() elsewhere.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/boot: Fix zap_low_mappings() to map less of the trampoline
Andrew Cooper [Tue, 31 Dec 2024 16:52:39 +0000 (16:52 +0000)]
x86/boot: Fix zap_low_mappings() to map less of the trampoline

Regular data access into the trampoline is via the directmap.

As now discussed quite extensively in asm/trampoline.h, the trampoline is
arranged so that only the AP and S3 paths need an identity mapping, and that
they fit within a single page.

Right now, PFN_UP(trampoline_end - trampoline_start) is 2, causing more than
expected of the trampoline to be mapped.  Cut it down just the single page it
ought to be.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/debug: Move activate_debugregs() into debug.c
Andrew Cooper [Fri, 3 Jan 2025 15:19:49 +0000 (15:19 +0000)]
x86/debug: Move activate_debugregs() into debug.c

We have since gained a better location for it to live.

Fix up the includes while doing so.  I don't recall why we had kernel.h but
it's definitely stale now.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/irq: use NR_ISA_IRQS instead of open-coded value
Denis Mukhin [Sat, 15 Mar 2025 01:00:51 +0000 (01:00 +0000)]
x86/irq: use NR_ISA_IRQS instead of open-coded value

Replace the open-coded value 16 with the NR_ISA_IRQS symbol to enhance
readability.

No functional changes.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/irq: rename NR_ISAIRQS to NR_ISA_IRQS
Denis Mukhin [Sat, 15 Mar 2025 01:00:47 +0000 (01:00 +0000)]
x86/irq: rename NR_ISAIRQS to NR_ISA_IRQS

Rename NR_ISAIRQS to NR_ISA_IRQS to enhance readability.

No functional changes.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/hvm: add HVM-specific Kconfig
Denis Mukhin [Sat, 15 Mar 2025 01:19:49 +0000 (01:19 +0000)]
x86/hvm: add HVM-specific Kconfig

Add a separate menu for configuring HVM build-time settings to better
organize HVM-specific options.

HVM options will now appear in a dedicated sub-menu in the menuconfig
tool.

Also, make AMD_SVM config dependent on AMD config and INTEL_VMX on INTEL
respectively.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agox86/ioremap: prevent additions against the NULL pointer
Roger Pau Monne [Thu, 13 Mar 2025 11:19:48 +0000 (12:19 +0100)]
x86/ioremap: prevent additions against the NULL pointer

This was reported by clang UBSAN as:

UBSAN: Undefined behaviour in arch/x86/mm.c:6297:40
applying zero offset to null pointer
[...]
Xen call trace:
    [<ffff82d040303662>] R common/ubsan/ubsan.c#ubsan_epilogue+0xa/0xc0
    [<ffff82d040304aa3>] F __ubsan_handle_pointer_overflow+0xcb/0x100
    [<ffff82d0406ebbc0>] F ioremap_wc+0xc8/0xe0
    [<ffff82d0406c3728>] F video_init+0xd0/0x180
    [<ffff82d0406ab6f5>] F console_init_preirq+0x3d/0x220
    [<ffff82d0406f1876>] F __start_xen+0x68e/0x5530
    [<ffff82d04020482e>] F __high_start+0x8e/0x90

Fix bt_ioremap() and ioremap{,_wc}() to not add the offset if the returned
pointer from __vmap() is NULL.

Fixes: d0d4635d034f ('implement vmap()')
Fixes: f390941a92f1 ('x86/DMI: fix table mapping when one lives above 1Mb')
Fixes: 81d195c6c0e2 ('x86: introduce ioremap_wc()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agoxen/ubsan: expand pointer overflow message printing
Roger Pau Monne [Thu, 13 Mar 2025 11:02:50 +0000 (12:02 +0100)]
xen/ubsan: expand pointer overflow message printing

Add messages about operations against the NULL pointer, or that result in
a NULL pointer.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>