]> xenbits.xensource.com Git - xen.git/log
xen.git
6 weeks agoCI: Update build tests based on new minimum toolchain requirements
Andrew Cooper [Thu, 20 Mar 2025 14:13:56 +0000 (14:13 +0000)]
CI: Update build tests based on new minimum toolchain requirements

Drop CentOS 7 entirely.  It's way to old now.

Ubuntu 22.04 is the oldest Ubuntu with a suitable version of Clang, so swap
the 16.04 clang builds for 22.04.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
6 weeks agox86/PVH: expose OEMx ACPI tables to Dom0
Jan Beulich [Wed, 26 Mar 2025 11:32:03 +0000 (12:32 +0100)]
x86/PVH: expose OEMx ACPI tables to Dom0

What they contain we don't know, but we can't sensibly hide them. On my
Skylake system OEM1 (with a description of "INTEL  CPU EIST") is what
contains all the _PCT, _PPC, and _PSS methods, i.e. about everything
needed for cpufreq. (_PSD interestingly are in an SSDT there.)

Further OEM2 there has a description of "INTEL  CPU  HWP", while OEM4
has "INTEL  CPU  CST". Pretty clearly all three need exposing for
cpufreq and cpuidle to work.

Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 weeks agox86/pmstat: fold two allocations in get_cpufreq_para()
Jan Beulich [Wed, 26 Mar 2025 11:31:33 +0000 (12:31 +0100)]
x86/pmstat: fold two allocations in get_cpufreq_para()

There's little point in allocation two uint32_t[] arrays separately.
We'll need the bigger of the two anyway, and hence we can use that
bigger one also for transiently storing the smaller number of items.

While there also drop j (we can use i twice) and adjust the type of
the remaining two variables on that line.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agoxenpm: sanitize allocations in show_cpufreq_para_by_cpuid()
Jan Beulich [Wed, 26 Mar 2025 11:30:57 +0000 (12:30 +0100)]
xenpm: sanitize allocations in show_cpufreq_para_by_cpuid()

malloc(), when passed zero size, may return NULL (the behavior is
implementation defined). Mirror the ->gov_num check to the other two
allocations as well. Don't chance then actually using a NULL in
print_cpufreq_para().

Fixes: 75e06d089d48 ("xenpm: add cpu frequency control interface, through which user can")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
6 weeks agolib/inflate.c: remove dead code
Ariel Otilibili [Wed, 26 Mar 2025 11:30:35 +0000 (12:30 +0100)]
lib/inflate.c: remove dead code

This is a follow up from a discussion in Xen:

The if-statement tests that `res` is non-zero; meaning the case zero is
never reached.

Link: https://lore.kernel.org/all/7587b503-b2ca-4476-8dc9-e9683d4ca5f0@suse.com/
Link: https://lkml.kernel.org/r/20241219092615.644642-2-ariel.otilibili-anieli@eurecom.fr
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ariel Otilibili <ariel.otilibili-anieli@eurecom.fr>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 41c761dede6e
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agoarinc653: move next_switch_time access under lock
Jan Beulich [Tue, 25 Mar 2025 08:23:48 +0000 (09:23 +0100)]
arinc653: move next_switch_time access under lock

Even before its recent movement to the scheduler's private data
structure it looks to have been wrong to update the field under lock,
but then read it with the lock no longer held.

Coverity-ID: 1644500
Fixes: 9f0c658baedc ("arinc: add cpu-pool support to scheduler")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Nathan Studer <nathan.studer@dornerworks.com>
6 weeks agox86/irq: introduce APIC_VECTOR_VALID()
Denis Mukhin [Tue, 25 Mar 2025 08:22:59 +0000 (09:22 +0100)]
x86/irq: introduce APIC_VECTOR_VALID()

Add new macro APIC_VECTOR_VALID() to validate the interrupt vector
range as per [1]. This macro replaces hardcoded checks against the
open-coded value 16 in LAPIC and virtual LAPIC code and simplifies
the code a bit.

[1] Intel SDM volume 3A
    Chapter "ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER"
    Section "Valid Interrupt Vectors"

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agodocs: Add some details on XenServer PCI devices
Frediano Ziglio [Tue, 25 Mar 2025 08:22:43 +0000 (09:22 +0100)]
docs: Add some details on XenServer PCI devices

Describe the usage of devices 5853:0002 and 5853:C000.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
6 weeks agoRevert "x86: make Viridian support optional"
Jan Beulich [Mon, 24 Mar 2025 13:36:57 +0000 (14:36 +0100)]
Revert "x86: make Viridian support optional"

This reverts commit e0cf36bf295b40cac71af26b35eedee216e156ff. It
introduced not just UBSAN failures, but apparentlz actual NULL
de-references.

6 weeks agox86: make Viridian support optional
Sergiy Kibrik [Mon, 24 Mar 2025 11:55:39 +0000 (12:55 +0100)]
x86: make Viridian support optional

Add config option HVM_VIRIDIAN that covers viridian code within HVM.
Calls to viridian functions guarded by is_viridian_domain() and related macros.
Having this option may be beneficial by reducing code footprint for systems
that are not using Hyper-V.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agoprocess/release: mention MAINTAINERS adjustments
Jan Beulich [Mon, 24 Mar 2025 11:55:24 +0000 (12:55 +0100)]
process/release: mention MAINTAINERS adjustments

For many major releases I've been updating ./MAINTAINERS _after_ the
respective branch was handed over to me. That update, however, is
relevant not only from the .1 minor release onwards, but right from the
.0 release. Hence it ought to be done as one of the last things before
tagging the tree for the new major release.

See the seemingly unrelated parts (as far as the commit subject goes) of
e.g. 9d465658b405 ("update Xen version to 4.20.1-pre") for an example.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
6 weeks agox86/svm: use nsvm_efer_svm_enabled() to check guest's EFER.SVME
Sergiy Kibrik [Mon, 24 Mar 2025 11:55:00 +0000 (12:55 +0100)]
x86/svm: use nsvm_efer_svm_enabled() to check guest's EFER.SVME

There's a macro for this, might improve readability a bit & save a bit of space.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/PVH: don't open-code elf_round_up()
Jan Beulich [Mon, 24 Mar 2025 11:54:27 +0000 (12:54 +0100)]
x86/PVH: don't open-code elf_round_up()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/traps: Introduce early_traps_init() and simplify setup
Andrew Cooper [Sat, 28 Dec 2024 14:56:40 +0000 (14:56 +0000)]
x86/traps: Introduce early_traps_init() and simplify setup

Something I overlooked when last cleaning up exception handling is that a TSS
is not necessary if IST isn't configured, and IST isn't necessary until we're
running guest code.

Introduce early_traps_init(), and rearrange the existing logic between this
and traps_init() later on boot, to allow defering TSS and IST setup.

In early_traps_init(), load the IDT and invalidate TR/LDTR; this sufficient
system-table setup to make exception handling work.  The setup of the BSPs
per-cpu variables stay early too; they're used on certain error paths.

Move load_system_tables() later into traps_init().  Note that it already
contains enable_each_ist(), so this call is simply dropped.

This removes some complexity prior to having exception support, and lays the
groundwork to not even allocate a TSS when using FRED.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/traps: Move trap_init() into traps-setup.c
Andrew Cooper [Mon, 17 Mar 2025 18:48:18 +0000 (18:48 +0000)]
x86/traps: Move trap_init() into traps-setup.c

... and rename to traps_init() for consistency.  Move the declaration from
asm/system.h into asm/traps.h.

This also involves moving init_ler() and variables.  Move the declaration of
ler_msr from asm/msr.h to asm/traps.h.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/traps: Move percpu_traps_init() into traps-setup.c
Andrew Cooper [Tue, 31 Dec 2024 15:56:34 +0000 (15:56 +0000)]
x86/traps: Move percpu_traps_init() into traps-setup.c

Move the declaration from asm/system.h into asm/traps.h.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/traps: Move cpu_init() out of trap_init()
Andrew Cooper [Mon, 6 Jan 2025 06:36:34 +0000 (06:36 +0000)]
x86/traps: Move cpu_init() out of trap_init()

cpu_init() doesn't particularly belong in trap_init().  This brings the BSP
more in line with the APs.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/boot: Simplify the expression for extra allocation space
Andrew Cooper [Wed, 19 Mar 2025 12:12:37 +0000 (12:12 +0000)]
x86/boot: Simplify the expression for extra allocation space

The expression for one parameter of find_memory() is already complicated and
about to become moreso.  Break it out into a new variable, and express it in
an easier-to-follow way.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
6 weeks agoxen/compiler: Fix the position of the visibility pragma
Andrew Cooper [Tue, 18 Mar 2025 13:32:50 +0000 (13:32 +0000)]
xen/compiler: Fix the position of the visibility pragma

This needs to be ahead of everything.  Right now, it is after xen/init.h being
included for -DINIT_SECTIONS_ONLY

  # 1 "./include/xen/compiler.h" 1
  # 83 "./include/xen/compiler.h"
  # 1 "./include/xen/init.h" 1
  # 62 "./include/xen/init.h"
  typedef int (*initcall_t)(void);
  typedef void (*exitcall_t)(void);
  # 72 "./include/xen/init.h"
  void do_presmp_initcalls(void);
  void do_initcalls(void);
  # 84 "./include/xen/compiler.h" 2
  # 122 "./include/xen/compiler.h"
  #pragma GCC visibility push(hidden)

Fixes: 84c4461b7d3a ("Force out-of-line instances of inline functions into .init.text in init-only code")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agokconfig/randconfig: enable UBSAN for randconfig
Roger Pau Monne [Wed, 12 Mar 2025 17:51:43 +0000 (18:51 +0100)]
kconfig/randconfig: enable UBSAN for randconfig

Introduce an additional Kconfig check to only offer the option if the
compiler supports -fsanitize=undefined.

We no longer use Travis CI, so the original motivation for not enabling
UBSAN might no longer present.  Regardless, the option won't be present in
the first place if the compiler doesn't support -fsanitize=undefined.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/vga: fix mapping of the VGA text buffer
Roger Pau Monne [Mon, 17 Mar 2025 17:51:21 +0000 (18:51 +0100)]
x86/vga: fix mapping of the VGA text buffer

The call to ioremap_wc() in video_init() will always fail, because
video_init() is called ahead of vm_init_type(), and so the underlying
__vmap() call will fail to allocate the linear address space.

Fix by reverting to the previous behavior and use __va() for the VGA text
buffer, as it's below the 1MB boundary, and thus always mapped in the
directmap.

Fixes: 81d195c6c0e2 ('x86: introduce ioremap_wc()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/mkelf32: account for offset when detecting note segment placement
Roger Pau Monne [Wed, 5 Mar 2025 17:08:13 +0000 (18:08 +0100)]
x86/mkelf32: account for offset when detecting note segment placement

mkelf32 attempt to check that the program header defined NOTE segment falls
inside of the LOAD segment, as the build-id should be loaded for Xen at
runtime to check.

However the current code doesn't take into account the LOAD program header
segment offset when calculating overlap with the NOTE segment.  This
results in incorrect detection, and the following build error:

arch/x86/boot/mkelf32 --notes xen-syms ./.xen.elf32 0x200000 \
               `nm xen-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$/0x\1/p'`
Expected .note section within .text section!
Offset 4244776 not within 2910364!

When xen-syms has the following program headers:

Program Header:
    LOAD off    0x0000000000200000 vaddr 0xffff82d040200000 paddr 0x0000000000200000 align 2**21
         filesz 0x00000000002c689c memsz 0x00000000003f7e20 flags rwx
    NOTE off    0x000000000040c528 vaddr 0xffff82d04040c528 paddr 0x000000000040c528 align 2**2
         filesz 0x0000000000000024 memsz 0x0000000000000024 flags r--

Account for the program header offset of the LOAD segment when checking
whether the NOTE segments is contained within.  Also fix the logic to
ensure the NOTE segments is fully contained between the LOAD segment.

Fixes: a353cab905af ('build_id: Provide ld-embedded build-ids')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/boot: clarify comment about trampoline_setup usage
Roger Pau Monne [Tue, 4 Mar 2025 14:28:11 +0000 (15:28 +0100)]
x86/boot: clarify comment about trampoline_setup usage

Clarify that trampoline_setup is only used for EFI when booted using the
multiboot2 entry point.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agoautomation/console.exp: do not assume expect is always at /usr/bin/
Roger Pau Monne [Mon, 17 Mar 2025 09:31:07 +0000 (10:31 +0100)]
automation/console.exp: do not assume expect is always at /usr/bin/

Instead use env to find the location of expect.

Additionally do not use the -f flag, as it's only meaningful when passing
arguments on the command line, which we never do for console.exp.  From the
expect 5.45.4 man page:

> The -f flag prefaces a file from which to read commands from.  The flag
> itself is optional as it is only useful when using the #! notation (see
> above), so  that other arguments may be supplied on the command line.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agoautomation/cirrus-ci: store Xen Kconfig before doing a build
Roger Pau Monne [Fri, 14 Mar 2025 10:55:48 +0000 (11:55 +0100)]
automation/cirrus-ci: store Xen Kconfig before doing a build

In case the build fails or gets stuck, store the Kconfig file ahead of
starting the build.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agoautomation/cirrus-ci: update FreeBSD to 13.5
Roger Pau Monne [Fri, 14 Mar 2025 10:49:28 +0000 (11:49 +0100)]
automation/cirrus-ci: update FreeBSD to 13.5

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agoautomation/cirrus-ci: add timestamps
Roger Pau Monne [Fri, 14 Mar 2025 10:44:45 +0000 (11:44 +0100)]
automation/cirrus-ci: add timestamps

Such timestamps can still be disabled from the Web UI using a tick box.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 weeks agox86/shadow: fix UB pointer arithmetic in sh_mfn_is_a_page_table()
Roger Pau Monne [Tue, 18 Mar 2025 08:20:59 +0000 (09:20 +0100)]
x86/shadow: fix UB pointer arithmetic in sh_mfn_is_a_page_table()

UBSAN complains with:

UBSAN: Undefined behaviour in arch/x86/mm/shadow/private.h:515:30
pointer operation overflowed ffff82e000000000 to ffff82dfffffffe0
[...]
Xen call trace:
    [<ffff82d040303782>] R common/ubsan/ubsan.c#ubsan_epilogue+0xa/0xc0
    [<ffff82d040304bc3>] F __ubsan_handle_pointer_overflow+0xcb/0x100
    [<ffff82d040471b2d>] F arch/x86/mm/shadow/guest_2.c#sh_page_fault__guest_2+0x1e350
    [<ffff82d0403b206b>] F svm_vmexit_handler+0xdf3/0x2450
    [<ffff82d0402049c0>] F svm_stgi_label+0x5/0x15

Fix by moving the call to mfn_to_page() after the check of whether the
passed gmfn is valid.  This avoid the call to mfn_to_page() with an
INVALID_MFN parameter.

While there make the page local variable const, it's not modified by the
function.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/xlat: fix UB pointer arithmetic in COMPAT_ARG_XLAT_VIRT_BASE
Roger Pau Monne [Tue, 18 Mar 2025 08:20:41 +0000 (09:20 +0100)]
x86/xlat: fix UB pointer arithmetic in COMPAT_ARG_XLAT_VIRT_BASE

UBSAN complains with:

UBSAN: Undefined behaviour in common/compat/memory.c:90:9
pointer operation overflowed ffff820080000000 to 0000020080000000
[...]
Xen call trace:
    [<ffff82d040303782>] R common/ubsan/ubsan.c#ubsan_epilogue+0xa/0xc0
    [<ffff82d040304bc3>] F __ubsan_handle_pointer_overflow+0xcb/0x100
    [<ffff82d0402a6259>] F compat_memory_op+0xf1/0x4d20
    [<ffff82d04041532d>] F hvm_memory_op+0x55/0xe0
    [<ffff82d040416150>] F hvm_hypercall+0xae8/0x21b0
    [<ffff82d0403b24ca>] F svm_vmexit_handler+0x1252/0x2450
    [<ffff82d0402049c0>] F svm_stgi_label+0x5/0x15

Adjust the calculations in COMPAT_ARG_XLAT_VIRT_BASE to subtract from the
per-domain area to obtain the mirrored linear address in the 4th slot,
instead of overflowing the per-domain linear address.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/wait: prevent duplicated assembly labels
Roger Pau Monne [Fri, 14 Mar 2025 09:40:49 +0000 (10:40 +0100)]
x86/wait: prevent duplicated assembly labels

When enabling UBSAN with clang, the following error is triggered during the
build:

common/wait.c:154:9: error: symbol '.L_wq_resume' is already defined
  154 |         "push %%rbx; push %%rbp; push %%r12;"
      |         ^
<inline asm>:1:121: note: instantiated into assembly here
    1 |         push %rbx; push %rbp; push %r12;push %r13; push %r14; push %r15;sub %esp,%ecx;cmp $4096, %ecx;ja .L_skip;mov %rsp,%rsi;.L_wq_resume: rep movsb;mov %rsp,%rsi;.L_skip:pop %r15; pop %r14; pop %r13;pop %r12; pop %rbp; pop %rbx
      |                                                                                                                                ^
common/wait.c:154:9: error: symbol '.L_skip' is already defined
  154 |         "push %%rbx; push %%rbp; push %%r12;"
      |         ^
<inline asm>:1:159: note: instantiated into assembly here
    1 |         push %rbx; push %rbp; push %r12;push %r13; push %r14; push %r15;sub %esp,%ecx;cmp $4096, %ecx;ja .L_skip;mov %rsp,%rsi;.L_wq_resume: rep movsb;mov %rsp,%rsi;.L_skip:pop %r15; pop %r14; pop %r13;pop %r12; pop %rbp; pop %rbx
      |                                                                                                                                                                      ^
2 errors generated.

The inline assembly block in __prepare_to_wait() is duplicated, thus
leading to multiple definitions of the otherwise unique labels inside the
assembly block.  GCC extended-asm documentation notes the possibility of
duplicating asm blocks:

> Under certain circumstances, GCC may duplicate (or remove duplicates of)
> your assembly code when optimizing. This can lead to unexpected duplicate
> symbol errors during compilation if your asm code defines symbols or
> labels. Using ‘%=’ (see AssemblerTemplate) may help resolve this problem.

Workaround the issue by latching esp to a local variable, this prevents
clang duplicating the inline asm blocks.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 weeks agox86/msi: always propagate MSI register writes from __setup_msi_irq()
Roger Pau Monne [Tue, 18 Mar 2025 08:31:35 +0000 (09:31 +0100)]
x86/msi: always propagate MSI register writes from __setup_msi_irq()

After 8e60d47cf011 writes from __setup_msi_irq() will no longer be
propagated to the MSI registers if the IOMMU IRTE was already allocated.
Given the purpose of __setup_msi_irq() is MSI initialization, always
propagate the write to the hardware, regardless of whether the IRTE was
already allocated.

No functional change expected, as the write should always be propagated in
__setup_msi_irq(), but make it explicit on the write_msi_msg() call.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 weeks agox86/msi: always propagate MSI writes when not in active system mode
Roger Pau Monne [Mon, 17 Mar 2025 14:40:11 +0000 (15:40 +0100)]
x86/msi: always propagate MSI writes when not in active system mode

Relax the limitation on MSI register writes, and only apply it when the
system is in active state.  For example AMD IOMMU drivers rely on using
set_msi_affinity() to force an MSI register write on resume from
suspension.

The original patch intention was to reduce the number of MSI register
writes when the system is in active state.  Leave the other states to
always perform the writes, as it's safer given the existing code, and it's
expected to not make a difference performance wise.

For such propagation to work even when the IRT index is not updated the MSI
message must be adjusted in all success cases for AMD IOMMU, not just when
the index has been newly allocated.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Fixes: 8e60d47cf011 ('x86/iommu: avoid MSI address and data writes if IRT index hasn't changed')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
6 weeks agox86/setup: correct off-by-1 in module mapping
Jan Beulich [Thu, 20 Mar 2025 07:51:55 +0000 (08:51 +0100)]
x86/setup: correct off-by-1 in module mapping

If a module's length is an exact multiple of PAGE_SIZE, the 2nd argument
passed to set_pdx_range() would be one larger than intended. Use
PFN_{UP,DOWN}() there instead.

Fixes: cd7cc5320bb2 ("x86/boot: add start and size fields to struct boot_module")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 weeks agoxen/console: fix trailing whitespaces
Denis Mukhin [Thu, 20 Mar 2025 07:51:14 +0000 (08:51 +0100)]
xen/console: fix trailing whitespaces

Remove trailing whitespaces in the console driver.

No functional change.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 weeks agoxen: Update toolchain requirements to GCC 5.1/Binutils 2.25 or Clang/LLVM 11
Andrew Cooper [Fri, 7 Mar 2025 17:29:10 +0000 (17:29 +0000)]
xen: Update toolchain requirements to GCC 5.1/Binutils 2.25 or Clang/LLVM 11

GCC 4.1.2 is from 2007, and Binutils 2.16 is a similar vintage.  Clang 3.5 is
from 2014.  Supporting toolchains this old is a massive development and
testing burden.

Set a minimum baseline of GCC 5.1 across the board, along with Binutils 2.25
which is the same age.  These were chosen *3 years ago* as Linux's minimum
requirements because even back then, they were ubiquitous in distros.  Choose
Clang/LLVM 11 as a baseline for similar reasons; the Linux commit making this
change two years ago cites a laudry list of code generation bugs.

This will allow us to retire a lot of compatiblity logic, and start using new
features previously unavailable because of no viable compatibility option.

Merge the ARM 32bit and 64bit sections now they're the same.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
6 weeks agoxen/arinc653: call xfree() with local IRQ enabled
Anderson Choi [Tue, 18 Mar 2025 07:34:15 +0000 (16:34 +0900)]
xen/arinc653: call xfree() with local IRQ enabled

xen panic is observed with the following configuration.

1. Debug xen build (CONFIG_DEBUG=y)
2. dom1 of an ARINC653 domain
3. shutdown dom1 with xl command

$ xl shutdown <domain_name>

(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714
(XEN) ****************************************

panic was triggered since xfree() was called with local IRQ disabled and
therefore assertion failed.

Fix this by calling xfree() after local IRQ is enabled.

Fixes: 19049f8d796a sched: fix locking in a653sched_free_vdata()
Signed-off-by: Anderson Choi <anderson.choi@boeing.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Nathan Studer <nathan.studer@dornerworks.com>
6 weeks agox86/mm: Fix IS_ALIGNED() check in IS_LnE_ALIGNED()
Andrew Cooper [Wed, 19 Mar 2025 02:58:18 +0000 (02:58 +0000)]
x86/mm: Fix IS_ALIGNED() check in IS_LnE_ALIGNED()

The current CI failures turn out to be a latent bug triggered by a narrow set
of properties of the initrd and the host memory map, which CI encountered by
chance.

One step during boot involves constructing directmap mappings for modules.
With some probing at the point of creation, it is observed that there's a 4k
mapping missing towards the end of the initrd.

  (XEN) === Mapped Mod1 [000000039400100000000003be1ff6dc] to Directmap
  (XEN) Probing paddr 394001000, va ffff830394001000
  (XEN) Probing paddr 3be1ff6db, va ffff8303be1ff6db
  (XEN) Probing paddr 3bdffffff, va ffff8303bdffffff
  (XEN) Probing paddr 3be001000, va ffff8303be001000
  (XEN) Probing paddr 3be000000, va ffff8303be000000
  (XEN) Early fatal page fault at e008:ffff82d04032014c (cr2=ffff8303be000000, ec=0000)

The conditions for this bug appear to be map_pages_to_xen() call with a start
address of exactly 4k beyond a 2M boundary, some number of full 2M pages, then
a tail needing 4k pages.

Anyway, the condition for spotting superpage boundaries in map_pages_to_xen()
is wrong.  The IS_ALIGNED() macro expects a power of two for the alignment
argument, and subtracts 1 itself.

Fixing this causes the failing case to now boot.

Fixes: 97fb6fcf26e8 ("x86/mm: introduce helpers to detect super page alignment")
Debugged-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agoCHANGELOG.md: Mention PCI passthrough for HVM domUs
Jiqian Chen [Tue, 18 Mar 2025 08:48:00 +0000 (09:48 +0100)]
CHANGELOG.md: Mention PCI passthrough for HVM domUs

PCI passthrough is already supported for HVM domUs when dom0 is PVH
on x86. The last related patch on Qemu side was merged after Xen4.20
release. So mention this feature in Xen4.21 entry.

But SR-IOV is not yet supported on PVH dom0, add a note for it.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
7 weeks agotools/xenstored: use xenmanage_poll_changed_domain()
Juergen Gross [Tue, 18 Mar 2025 08:47:45 +0000 (09:47 +0100)]
tools/xenstored: use xenmanage_poll_changed_domain()

Instead of checking each known domain after having received a
VIRQ_DOM_EXC event, use the new xenmanage_poll_changed_domain()
function for directly getting the domid of a domain having changed
its state.

A test doing "xl shutdown" of 1000 guests has shown to reduce the
consumed cpu time of xenstored by 6% with this change applied.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
7 weeks agotools/xenstored: use unique_id to identify new domain with same domid
Juergen Gross [Tue, 18 Mar 2025 08:47:15 +0000 (09:47 +0100)]
tools/xenstored: use unique_id to identify new domain with same domid

Use the new unique_id of a domain in order to detect that a domain
has been replaced with another one reusing the domain-id of the old
domain.

While changing the related code, switch from "dom_invalid" to
"dom_valid" in order to avoid double negation and use "bool" as type
for it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
7 weeks agosymbols: sanitize a few variable's types
Jan Beulich [Tue, 18 Mar 2025 08:46:38 +0000 (09:46 +0100)]
symbols: sanitize a few variable's types

Parameter and return types of symbols_expand_symbol() make clear that
xensyms_read()'s next_offset doesn't need to be 64-bit.

xensyms_read()'s first parameter type makes clear that the function's
next_symbols doesn't need to be 64-bit.

symbols_num_syms'es type makes clear that iteration locals in
symbols_lookup() don't need to be unsigned long (i.e. 64-bit on 64-bit
architectures).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agosymbols: don't over-align generated data
Jan Beulich [Tue, 18 Mar 2025 08:44:57 +0000 (09:44 +0100)]
symbols: don't over-align generated data

x86 is one of the few architectures where .align has the same meaning as
.balign; most other architectures (Arm, PPC, and RISC-V in particular)
give it the same meaning as .p2align. Aligning every one of these item
to 256 bytes (on all 64-bit architectures except x86-64) is clearly too
much.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agotools: Mark ACPI SDTs as NVS in the PVH build path
Alejandro Vallejo [Tue, 18 Mar 2025 08:44:18 +0000 (09:44 +0100)]
tools: Mark ACPI SDTs as NVS in the PVH build path

Commit cefeffc7e583 marked ACPI tables as NVS in the hvmloader path
because SeaBIOS may otherwise just mark it as RAM. There is, however,
yet another reason to do it even in the PVH path. Xen's incarnation of
AML relies on having access to some ACPI tables (e.g: _STA of Processor
objects relies on reading the processor online bit in its MADT entry)

This is problematic if the OS tries to reclaim ACPI memory for page
tables as it's needed for runtime and can't be reclaimed after the OSPM
is up and running.

Fixes: de6d188a519f ("hvmloader: flip "ACPI data" to "ACPI NVS" type for ACPI table region)"
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/hvm: Use for_each_set_bit() in hvm_emulate_writeback()
Andrew Cooper [Tue, 11 Jun 2024 19:03:32 +0000 (20:03 +0100)]
x86/hvm: Use for_each_set_bit() in hvm_emulate_writeback()

... which is more consise than the opencoded form, and more efficient when
compiled.

Furthermore, now that find_{first,next}_bit() are no longer in use, the
seg_reg_{accessed,dirty} fields aren't forced to be unsigned long, although
they do need to remain unsigned int because of __set_bit() elsewhere.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/boot: Fix zap_low_mappings() to map less of the trampoline
Andrew Cooper [Tue, 31 Dec 2024 16:52:39 +0000 (16:52 +0000)]
x86/boot: Fix zap_low_mappings() to map less of the trampoline

Regular data access into the trampoline is via the directmap.

As now discussed quite extensively in asm/trampoline.h, the trampoline is
arranged so that only the AP and S3 paths need an identity mapping, and that
they fit within a single page.

Right now, PFN_UP(trampoline_end - trampoline_start) is 2, causing more than
expected of the trampoline to be mapped.  Cut it down just the single page it
ought to be.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/debug: Move activate_debugregs() into debug.c
Andrew Cooper [Fri, 3 Jan 2025 15:19:49 +0000 (15:19 +0000)]
x86/debug: Move activate_debugregs() into debug.c

We have since gained a better location for it to live.

Fix up the includes while doing so.  I don't recall why we had kernel.h but
it's definitely stale now.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/irq: use NR_ISA_IRQS instead of open-coded value
Denis Mukhin [Sat, 15 Mar 2025 01:00:51 +0000 (01:00 +0000)]
x86/irq: use NR_ISA_IRQS instead of open-coded value

Replace the open-coded value 16 with the NR_ISA_IRQS symbol to enhance
readability.

No functional changes.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/irq: rename NR_ISAIRQS to NR_ISA_IRQS
Denis Mukhin [Sat, 15 Mar 2025 01:00:47 +0000 (01:00 +0000)]
x86/irq: rename NR_ISAIRQS to NR_ISA_IRQS

Rename NR_ISAIRQS to NR_ISA_IRQS to enhance readability.

No functional changes.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86/hvm: add HVM-specific Kconfig
Denis Mukhin [Sat, 15 Mar 2025 01:19:49 +0000 (01:19 +0000)]
x86/hvm: add HVM-specific Kconfig

Add a separate menu for configuring HVM build-time settings to better
organize HVM-specific options.

HVM options will now appear in a dedicated sub-menu in the menuconfig
tool.

Also, make AMD_SVM config dependent on AMD config and INTEL_VMX on INTEL
respectively.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agox86/ioremap: prevent additions against the NULL pointer
Roger Pau Monne [Thu, 13 Mar 2025 11:19:48 +0000 (12:19 +0100)]
x86/ioremap: prevent additions against the NULL pointer

This was reported by clang UBSAN as:

UBSAN: Undefined behaviour in arch/x86/mm.c:6297:40
applying zero offset to null pointer
[...]
Xen call trace:
    [<ffff82d040303662>] R common/ubsan/ubsan.c#ubsan_epilogue+0xa/0xc0
    [<ffff82d040304aa3>] F __ubsan_handle_pointer_overflow+0xcb/0x100
    [<ffff82d0406ebbc0>] F ioremap_wc+0xc8/0xe0
    [<ffff82d0406c3728>] F video_init+0xd0/0x180
    [<ffff82d0406ab6f5>] F console_init_preirq+0x3d/0x220
    [<ffff82d0406f1876>] F __start_xen+0x68e/0x5530
    [<ffff82d04020482e>] F __high_start+0x8e/0x90

Fix bt_ioremap() and ioremap{,_wc}() to not add the offset if the returned
pointer from __vmap() is NULL.

Fixes: d0d4635d034f ('implement vmap()')
Fixes: f390941a92f1 ('x86/DMI: fix table mapping when one lives above 1Mb')
Fixes: 81d195c6c0e2 ('x86: introduce ioremap_wc()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agoxen/ubsan: expand pointer overflow message printing
Roger Pau Monne [Thu, 13 Mar 2025 11:02:50 +0000 (12:02 +0100)]
xen/ubsan: expand pointer overflow message printing

Add messages about operations against the NULL pointer, or that result in
a NULL pointer.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agox86/dom0: placate GCC 12 compile-time errors with UBSAN and PVH_GUEST
Roger Pau Monne [Thu, 13 Mar 2025 10:08:05 +0000 (11:08 +0100)]
x86/dom0: placate GCC 12 compile-time errors with UBSAN and PVH_GUEST

When building Xen with GCC 12 with UBSAN and PVH_GUEST both enabled the
compiler emits the following errors:

arch/x86/setup.c: In function '__start_xen':
arch/x86/setup.c:1504:19: error: 'consider_modules' reading 40 bytes from a region of size 4 [-Werror=stringop-overread]
 1504 |             end = consider_modules(s, e, reloc_size + mask,
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1505 |                                    bi->mods, bi->nr_modules, -1);
      |                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/setup.c:1504:19: note: referencing argument 4 of type 'const struct boot_module[0]'
arch/x86/setup.c:686:24: note: in a call to function 'consider_modules'
  686 | static uint64_t __init consider_modules(
      |                        ^~~~~~~~~~~~~~~~
arch/x86/setup.c:1535:19: error: 'consider_modules' reading 40 bytes from a region of size 4 [-Werror=stringop-overread]
 1535 |             end = consider_modules(s, e, size, bi->mods,
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1536 |                                    bi->nr_modules + relocated, j);
      |                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/setup.c:1535:19: note: referencing argument 4 of type 'const struct boot_module[0]'
arch/x86/setup.c:686:24: note: in a call to function 'consider_modules'
  686 | static uint64_t __init consider_modules(
      |                        ^~~~~~~~~~~~~~~~

This seems to be the result of some function manipulation done by UBSAN
triggering GCC stringops related errors.  Placate the errors by declaring
the function parameter as `const struct *boot_module` instead of `const
struct boot_module[]`.

Note that GCC 13 seems to be fixed, and doesn't trigger the error when
using `[]`.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agoxen/ubsan: provide helper for clang's -fsanitize=function
Roger Pau Monne [Wed, 12 Mar 2025 12:35:53 +0000 (13:35 +0100)]
xen/ubsan: provide helper for clang's -fsanitize=function

clang's -fsanitize=function relies on the presence of
__ubsan_handle_function_type_mismatch() to print the detection of indirect
calls of a function through a function pointer of the wrong type.

Implement the helper, inspired on the llvm ubsan lib implementation.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agoxen: mem_access: conditionally compile vm_event.c & monitor.c
Stefano Stabellini [Fri, 14 Mar 2025 05:25:17 +0000 (07:25 +0200)]
xen: mem_access: conditionally compile vm_event.c & monitor.c

Extend coverage of CONFIG_VM_EVENT option and make the build of VM events
and monitoring support optional. Also make MEM_PAGING option depend on VM_EVENT
to document that mem_paging is relying on vm_event.
This is to reduce code size on Arm when this option isn't enabled.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agox86:monitor: control monitor.c build with CONFIG_VM_EVENT option
Sergiy Kibrik [Fri, 14 Mar 2025 05:23:14 +0000 (07:23 +0200)]
x86:monitor: control monitor.c build with CONFIG_VM_EVENT option

Replace more general CONFIG_HVM option with CONFIG_VM_EVENT which is more
relevant and specific to monitoring. This is only to clarify at build level
to which subsystem this file belongs.

No functional change here, as VM_EVENT depends on HVM.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agoxen: kconfig: rename MEM_ACCESS -> VM_EVENT
Sergiy Kibrik [Fri, 14 Mar 2025 05:21:09 +0000 (07:21 +0200)]
xen: kconfig: rename MEM_ACCESS -> VM_EVENT

Use more generic CONFIG_VM_EVENT name throughout Xen code instead of
CONFIG_MEM_ACCESS. This reflects the fact that vm_event is a higher level
feature, with mem_access & monitor depending on it.

Suggested-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
7 weeks agox86/elf: Improve code generation in elf_core_save_regs()
Andrew Cooper [Sun, 29 Dec 2024 14:06:18 +0000 (14:06 +0000)]
x86/elf: Improve code generation in elf_core_save_regs()

A CALL with 0 displacement is handled specially, and is why this logic
functions even with CET Shadow Stacks active.  Nevertheless a RIP-relative LEA
is the more normal way of doing this in 64bit code.

The retrieval of flags modifies the stack pointer so needs to state a
dependency on the stack pointer.  Despite it's name, ASM_CALL_CONSTRAINT is
the way to do this.

read_sreg() forces the answer through a register, causing code generation of
the form:

    mov    %gs, %eax
    mov    %eax, %eax
    mov    %rax, 0x140(%rsi)

Encode the reads directly with a memory operand.  This results in a 16bit
store instead of an 64bit store, but the backing memory is zeroed.

While cleaning this up, drop one piece of trailing whitespace.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 weeks agoVT-d: Adjust diagnostics in set_msi_source_id()
Andrew Cooper [Fri, 14 Mar 2025 09:52:02 +0000 (09:52 +0000)]
VT-d: Adjust diagnostics in set_msi_source_id()

Use %pd, and state what the unknown is.  As it's an enum, it's a signed type.

Also drop one piece of trailing whitespace.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
7 weeks agoVT-d: have set_msi_source_id() return a success indicator
Jan Beulich [Fri, 14 Mar 2025 09:18:34 +0000 (10:18 +0100)]
VT-d: have set_msi_source_id() return a success indicator

Handling possible internal errors by just emitting a (debug-build-only)
log message can't be quite enough. Return error codes in those cases,
and have the caller propagate those up.

Drop a pointless return path, rather than "inventing" an error code for
it.

While touching the function declarator anyway also constify its first
parameter.

Fixes: 476bbccc811c ("VT-d: fix MSI source-id of interrupt remapping")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agoVT-d: move obtaining of MSI/HPET source ID
Jan Beulich [Fri, 14 Mar 2025 09:18:12 +0000 (10:18 +0100)]
VT-d: move obtaining of MSI/HPET source ID

This was the original attempt to address XSA-467, until it was found
that IRQs can be off already from higher up the call stack. Nevertheless
moving code out of locked regions is generally desirable anyway; some of
the callers, after all, don't disable interrupts or acquire other locks.

Hence, despite this not addressing the original report:

Data collection solely depends on the passed in PCI device. Furthermore,
since the function only writes to a local variable, we can pull the
invocation of set_msi_source_id() (and also set_hpet_source_id()) ahead
of the acquiring of the (IRQ-safe) lock.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 weeks agodocs: specify numerical values of Xenstore commands
Juergen Gross [Fri, 14 Mar 2025 09:17:47 +0000 (10:17 +0100)]
docs: specify numerical values of Xenstore commands

In docs/misc/xenstore.txt all Xenstore commands are specified, but
the specifications lack the numerical values of the commands.

Add a table with all commands, their values, and a potential remark
(e.g. whether the command is optional).

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
7 weeks agoxen/sched: fix arinc653 to not use variables across cpupools
Juergen Gross [Fri, 14 Mar 2025 09:17:11 +0000 (10:17 +0100)]
xen/sched: fix arinc653 to not use variables across cpupools

a653sched_do_schedule() is using two function local static variables,
which is resulting in bad behavior when using more than one cpupool
with the arinc653 scheduler.

Fix that by moving those variables to the scheduler private data.

Fixes: 22787f2e107c ("ARINC 653 scheduler")
Reported-by: Choi Anderson <Anderson.Choi@boeing.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Nathan Studer <nathan.studer@dornerworks.com>
7 weeks agoxen/page: fix return type of online_page()
Penny Zheng [Fri, 14 Mar 2025 09:16:28 +0000 (10:16 +0100)]
xen/page: fix return type of online_page()

This commit fixes return type of online_page(), which shall be int
to include correct error value.

Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 weeks agox86/shadow: replace p2m_is_valid() uses
Jan Beulich [Thu, 13 Mar 2025 09:24:15 +0000 (10:24 +0100)]
x86/shadow: replace p2m_is_valid() uses

The justification for dropping p2m_mmio_dm from p2m_is_valid() was wrong
for two of the shadow mode uses.

In _sh_propagate() we want to create special L1 entries for p2m_mmio_dm
pages. Hence we need to make sure we don't bail early for that type.

In _sh_page_fault() we want to handle p2m_mmio_dm by forwarding to
(internal or external) emulation. Pull the !p2m_is_mmio() check out of
the || expression (as otherwise it would need adding to the lhs as
well).

In both cases, p2m_is_valid() in combination with p2m_is_grant() still
doesn't cover foreign mappings. Hence use p2m_is_any_ram() plus (as
necessary) p2m_mmio_* instead.

Fixes: be59cceb2dbb ("x86/P2M: don't include MMIO_DM in p2m_is_valid()")
Reported-by: Luca Fancellu <Luca.Fancellu@arm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com>
7 weeks agotools/libxl: Skip missing PCI GSIs
Jason Andryuk [Thu, 13 Mar 2025 09:23:52 +0000 (10:23 +0100)]
tools/libxl: Skip missing PCI GSIs

A PCI device may not have a legacy IRQ.  In that case, we don't need to
do anything, so don't fail in libxl__arch_hvm_map_gsi() and
libxl__arch_hvm_unmap_gsi().

Requires an updated pciback to return -ENOENT.

Fixes: f97f885c7198 ("tools: Add new function to do PIRQ (un)map on PVH dom0")
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
7 weeks agotools/ctrl: Silence missing GSI in xc_pcidev_get_gsi()
Jason Andryuk [Thu, 13 Mar 2025 09:23:42 +0000 (10:23 +0100)]
tools/ctrl: Silence missing GSI in xc_pcidev_get_gsi()

It is valid for a PCI device to not have a legacy IRQ.  In that case, do
not print an error to keep the logs clean.

This relies on pciback being updated to return -ENOENT for a missing
GSI.

Fixes: b93e5981d258 ("tools: Add new function to get gsi from dev")
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
7 weeks agolibxl: avoid infinite loop in libxl__remove_directory()
Jan Beulich [Thu, 13 Mar 2025 09:23:10 +0000 (10:23 +0100)]
libxl: avoid infinite loop in libxl__remove_directory()

Infinitely retrying the rmdir() invocation makes little sense. While the
original observation was the log filling the disk (due to repeated
"Directory not empty" errors, in turn occurring for unclear reasons),
the loop wants breaking even if there was no error message being logged
(much like is done in the similar loops in libxl__remove_file() and
libxl__remove_file_or_directory()).

Fixes: c4dcbee67e6d ("libxl: provide libxl__remove_file et al")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
7 weeks agoxen/arm: Restrict Kconfig configuration for LLC coloring
Luca Fancellu [Wed, 12 Mar 2025 13:52:50 +0000 (13:52 +0000)]
xen/arm: Restrict Kconfig configuration for LLC coloring

Xen LLC coloring feature can be used only on the MMU subsystem,
move the code that selects it from ARM_64 to MMU and add the
ARM_64 dependency.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
7 weeks agoxen/passthrough: Provide stub functions when !HAS_PASSTHROUGH
Luca Fancellu [Wed, 12 Mar 2025 13:52:49 +0000 (13:52 +0000)]
xen/passthrough: Provide stub functions when !HAS_PASSTHROUGH

When Xen is built without HAS_PASSTHROUGH, there are some parts
in arm where iommu_* functions are called in the codebase, but
their implementation is under xen/drivers/passthrough that is
not built.

So provide some stub for these functions in order to build Xen
when !HAS_PASSTHROUGH, which is the case for example on systems
with MPU support.

For gnttab_need_iommu_mapping() in the Arm part, modify the macro
to use IS_ENABLED for the HAS_PASSTHROUGH Kconfig.

Fixes: 0388a5979b21 ("xen/arm: mpu: Introduce choice between MMU and MPU")
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 weeks agotools/arm: Reject configuration with incorrect nr_spis value
Michal Orzel [Wed, 12 Mar 2025 10:16:19 +0000 (11:16 +0100)]
tools/arm: Reject configuration with incorrect nr_spis value

If the calculated value for nr_spis by the toolstack is bigger than the
value provided by the user, we silently ignore the latter. This is not
consistent with the approach we have in Xen on Arm when we try to reject
incorrect configuration. Also, the documentation for nr_spis is
incorrect as it mentions 991 as the number of max SPIs, where it should
be 960 i.e. (1020 - 32) rounded down to the nearest multiple of 32.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
7 weeks agoxen/arm: Improve handling of nr_spis
Michal Orzel [Wed, 12 Mar 2025 10:16:18 +0000 (11:16 +0100)]
xen/arm: Improve handling of nr_spis

At the moment, we print a warning about max number of IRQs supported by
GIC bigger than vGIC only for hardware domain. This check is not hwdom
special, and should be made common. Also, in case of user not specifying
nr_spis for dom0less domUs, we should take into account max number of
IRQs supported by vGIC if it's smaller than for GIC.

Introduce VGIC_MAX_IRQS macro and use it instead of hardcoded 992 value.
Introduce VGIC_DEF_NR_SPIS macro to store the default number of vGIC SPIs.
Fix calculation of nr_spis for dom0less domUs and make the GIC/vGIC max
IRQs comparison common.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
7 weeks agoxen/arm: fix iomem_ranges cfg in map_range_to_domain()
Grygorii Strashko [Tue, 18 Feb 2025 11:22:53 +0000 (13:22 +0200)]
xen/arm: fix iomem_ranges cfg in map_range_to_domain()

Now the following code in map_range_to_domain()

 res = rangeset_add_range(mr_data->iomem_ranges,
                          paddr_to_pfn(addr),
                          paddr_to_pfn_aligned(addr + len - 1));
 where
  paddr_to_pfn_aligned(paddr) defined as paddr_to_pfn(PAGE_ALIGN(paddr))

calculates the iomem range end address by rounding it up to the next Xen
page with incorrect assumption that iomem range end address passed to
rangeset_add_range() is exclusive, while it is expected to be inclusive.

For example, if requested range is [00e6140000:00e6141004] then it expected
to add [e6140:e6141] range (num_pages=2) to the mr_data->iomem_ranges
rangeset, but will add [e6140:e6142] (num_pages=3) instead.

To fix it, drop PAGE_ALIGN() from the iomem range end address calculation
formula and just use paddr_to_pfn(addr + len - 1).

Fixes: 57d4d7d4e8f3b (arm/asm/setup.h: Update struct map_range_data to add rangeset.")
Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
7 weeks agoxen/arm: fix iomem permissions cfg in map_range_to_domain()
Grygorii Strashko [Tue, 18 Feb 2025 11:22:52 +0000 (13:22 +0200)]
xen/arm: fix iomem permissions cfg in map_range_to_domain()

Now the following code in map_range_to_domain()

    res = iomem_permit_access(d, paddr_to_pfn(addr),
                    paddr_to_pfn(PAGE_ALIGN(addr + len - 1)));

calculates the iomem range end address by rounding it up to the next Xen
page with incorrect assumption that iomem range end address passed to
iomem_permit_access() is exclusive, while it is expected to be inclusive.
It gives Control domain (Dom0) access to manage incorrect MMIO range with
one additional page.

For example, if requested range is [00e6140000:00e6141004] then it expected
to add [e6140:e6141] range (num_pages=2) to the domain iomem_caps rangeset,
but will add [e6140:e6142] (num_pages=3) instead.

To fix it, drop PAGE_ALIGN() from the iomem range end address calculation
formula.

Fixes: 33233c2758345 ("arch/arm: domain build: let dom0 access I/O memory of mapped devices")
Signed-off-by: Grygorii Strashko <grygorii_strashko@epam.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
8 weeks agox86/iommu: avoid MSI address and data writes if IRT index hasn't changed
Roger Pau Monne [Fri, 7 Mar 2025 09:16:01 +0000 (10:16 +0100)]
x86/iommu: avoid MSI address and data writes if IRT index hasn't changed

Attempt to reduce the MSI entry writes, and the associated checking whether
memory decoding and MSI-X is enabled for the PCI device, when the MSI data
hasn't changed.

When using Interrupt Remapping the MSI entry will contain an index into
the remapping table, and it's in such remapping table where the MSI vector
and destination CPU is stored.  As such, when using interrupt remapping,
changes to the interrupt affinity shouldn't result in changes to the MSI
entry, and the MSI entry update can be avoided.

Signal from the IOMMU update_ire_from_msi hook whether the MSI data or
address fields have changed, and thus need writing to the device registers.
Such signaling is done by returning 1 from the function.  Otherwise
returning 0 means no update of the MSI fields, and thus no write
required.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 weeks agox86/hvm: check return code of hvm_pi_update_irte when binding
Roger Pau Monne [Mon, 10 Mar 2025 17:13:52 +0000 (18:13 +0100)]
x86/hvm: check return code of hvm_pi_update_irte when binding

Consume the return code from hvm_pi_update_irte(), and propagate the error
back to the caller if hvm_pi_update_irte() fails.

Fixes: 35a1caf8b6b5 ('pass-through: update IRTE according to guest interrupt config changes')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 weeks agox86/vmx: fix posted interrupts usage of msi_desc->msg field
Roger Pau Monne [Mon, 10 Mar 2025 15:49:29 +0000 (16:49 +0100)]
x86/vmx: fix posted interrupts usage of msi_desc->msg field

The current usage of msi_desc->msg in vmx_pi_update_irte() will make the
field contain a translated MSI message, instead of the expected
untranslated one.  This breaks dump_msi(), that use the data in
msi_desc->msg to print the interrupt details.

Fix this by introducing a dummy local msi_msg, and use it with
iommu_update_ire_from_msi().  vmx_pi_update_irte() relies on the MSI
message not changing, so there's no need to propagate the resulting msi_msg
to the hardware, and the contents can be ignored.

Additionally add a comment to clarify that msi_desc->msg must always
contain the untranslated MSI message.

Fixes: a5e25908d18d ('VT-d: introduce new fields in msi_desc to track binding with guest interrupt')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 weeks agoxen/page_alloc: Simplify domain_adjust_tot_pages
Alejandro Vallejo [Tue, 4 Mar 2025 11:10:00 +0000 (11:10 +0000)]
xen/page_alloc: Simplify domain_adjust_tot_pages

The logic has too many levels of indirection and it's very hard to
understand it its current form. Split it between the corner case where
the adjustment is bigger than the current claim and the rest to avoid 5
auxiliary variables.

Add a functional change to prevent negative adjustments from
re-increasing the claim. This has the nice side effect of avoiding
taking the heap lock here on every free.

While at it, fix incorrect field name in nearby comment.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
8 weeks agox86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD
Roger Pau Monne [Fri, 21 Feb 2025 11:34:49 +0000 (12:34 +0100)]
x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD

The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
Linux pre-6.14 is unconditionally attempting to read the MSR without a
safe MSR accessor, and since Xen doesn't allow access to it Linux reports
the following error:

unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
Call Trace:
 xen_read_msr+0x1e/0x30
 amd_get_mmconfig_range+0x2b/0x80
 quirk_amd_mmconfig_area+0x28/0x100
 pnp_fixup_device+0x39/0x50
 __pnp_add_device+0xf/0x150
 pnp_add_device+0x3d/0x100
 pnpacpi_add_device_handler+0x1f9/0x280
 acpi_ns_get_device_callback+0x104/0x1c0
 acpi_ns_walk_namespace+0x1d0/0x260
 acpi_get_devices+0x8a/0xb0
 pnpacpi_init+0x50/0x80
 do_one_initcall+0x46/0x2e0
 kernel_init_freeable+0x1da/0x2f0
 kernel_init+0x16/0x1b0
 ret_from_fork+0x30/0x50
 ret_from_fork_asm+0x1b/0x30

Such access is conditional to the presence of a device with PnP ID
"PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area()
function.  Note that prior to commit 3fac3734c43a MSR accesses when running
as a PV guest would always use the safe variant, and thus silently handle
the #GP.

Fix by allowing access to the MSR on AMD systems for the hardware domain.

Write attempts to the MSR will still result in #GP for all domain types.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 weeks agox86/IDT: Fix IDT generation for INT $0x80
Andrew Cooper [Tue, 11 Mar 2025 21:13:33 +0000 (21:13 +0000)]
x86/IDT: Fix IDT generation for INT $0x80

When PV is enabled, entry_int80 needs to be DPL3, not DPL0.

This, combined with a QEMU bug which incorrectly calculates the error
code (fix submitted separately), causes the XSA-259 PoC to fail with:

  --- Xen Test Framework ---
  Environment: PV 64bit (Long mode 4 levels)
  XSA-259 PoC
  Error: Unexpected fault 0x800d0802, #GP[IDT[256]]
  Test result: ERROR

Fixes: 3da2149cf4dc ("x86/IDT: Generate bsp_idt[] at build time")
Reported-by: Luca Fancellu <luca.fancellu@arm.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 weeks agodocs: add explanation for 'Resolves:'
Denis Mukhin [Tue, 11 Mar 2025 07:28:26 +0000 (07:28 +0000)]
docs: add explanation for 'Resolves:'

'Resolves:' tag may be used if the patch addresses one of the tickets
logged via Gitlab to auto-close such ticket when the patch got merged.

Add documentation for the tag.

Resolves: https://gitlab.com/xen-project/xen/-/issues/199
Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 weeks agoMISRA: Rephrase the deviation for Directive 4.10
Andrew Cooper [Tue, 4 Mar 2025 23:48:54 +0000 (23:48 +0000)]
MISRA: Rephrase the deviation for Directive 4.10

The use of "legitimately" mixes the concepts of "it was designed to do this"
and "it was correct to do this".

The latter in particular can go stale.  "intended" is a better way of phrasing
this.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 weeks agox86/P2M: don't include MMIO_DM in p2m_is_valid()
Jan Beulich [Tue, 11 Mar 2025 08:55:47 +0000 (09:55 +0100)]
x86/P2M: don't include MMIO_DM in p2m_is_valid()

MMIO_DM specifically marks pages which aren't valid, much like INVALID
does. Dropping the type from the predicate
- (conceptually) corrects _sh_propagate(), where the comment says that
  "something valid" is needed (the only call path not passing in RAM_RW
  would pass in INVALID_GFN along with MMIO_DM),
- is benign to the use in sh_page_fault(), where the subsequent
  mfn_valid() check would otherwise cause the same bail-out code path to
  be taken,
- is benign to all three uses in p2m_pt_get_entry(), as MMIO_DM entries
  will only ever yield non-present entries, which are being checked for
  earlier,
- is benign to sh_unshadow_for_p2m_change(), for the same reason,
- is benign to gnttab_transfer() with EPT not in use, again because
  MMIO_DM entries will only ever yield non-present entries, and
  INVALID_MFN is returned for those anyway by p2m_pt_get_entry().
- for gnttab_transfer() with EPT in use (conceptually) corrects the
  corner case of a page first being subject to XEN_DMOP_set_mem_type
  converting a RAM type to MMIO_DM (which retains the MFN in the entry),
  and then being subject to GNTTABOP_transfer, except that steal_page()
  would later make the operation fail unconditionally anyway.

While there also drop the unused (and otherwise now redundant)
p2m_has_emt().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
8 weeks agox86/P2M: correct old entry checking in p2m_remove_entry()
Jan Beulich [Tue, 11 Mar 2025 08:55:20 +0000 (09:55 +0100)]
x86/P2M: correct old entry checking in p2m_remove_entry()

Using p2m_is_valid() isn't quite right here. It expanding to RAM+MMIO,
the subsequent p2m_mmio_direct check effectively reduces its use to
RAM+MMIO_DM. Yet MMIO_DM entries, which are never marked present in the
page tables, won't pass the mfn_valid() check. It is, however, quite
plausible (and supported by the rest of the function) to permit
"removing" hole entries, i.e. in particular to convert MMIO_DM to
INVALID. Which leaves the original check to be against RAM (plus MFN
validity), while HOLE then instead wants INVALID_MFN to be passed in.

Further more grant and foreign entries (together with RAM becoming
ANY_RAM) as well as BROKEN want the MFN checking, too.

All other types (i.e. MMIO_DIRECT and POD) want rejecting here rather
than skipping, for needing handling / accounting elsewhere.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
8 weeks agoPCI: drop pci_segments_init()
Jan Beulich [Tue, 11 Mar 2025 08:54:19 +0000 (09:54 +0100)]
PCI: drop pci_segments_init()

Have callers invoke pci_add_segment() directly instead: With radix tree
initialization moved out of the function, its name isn't quite
describing anymore what it actually does.

On x86 move the logic into __start_xen() itself, to reduce the risk of
re-introducing ordering issues like the one which was addressed by
26fe09e34566 ("radix-tree: introduce RADIX_TREE{,_INIT}()").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
8 weeks agoautomation/cirrus-ci: store xen/.config as an artifact
Roger Pau Monne [Mon, 10 Mar 2025 17:41:57 +0000 (18:41 +0100)]
automation/cirrus-ci: store xen/.config as an artifact

Always store xen/.config as an artifact, renamed to xen-config to match
the naming used in the Gitlab CI tests.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 weeks agox86/apic: remove delivery and destination mode fields from drivers
Roger Pau Monne [Thu, 6 Mar 2025 08:07:31 +0000 (09:07 +0100)]
x86/apic: remove delivery and destination mode fields from drivers

All local APIC drivers use physical destination and fixed delivery modes,
remove the fields from the genapic struct and simplify the logic.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 weeks agodocs: fix INTRODUCE description in xenstore.txt
Juergen Gross [Thu, 6 Mar 2025 07:47:52 +0000 (08:47 +0100)]
docs: fix INTRODUCE description in xenstore.txt

The description of the Xenstore INTRODUCE command is still referencing
xend. Fix that.

The <evtchn> description is starting with a grammatically wrong
sentence. Fix that.

While at it, make clear that the Xenstore implementation is allowed
to ignore the specified gfn and use the Xenstore reserved grant id
GNTTAB_RESERVED_XENSTORE instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 weeks agoxen/watchdog: Identify which domain watchdog fired
Andrew Cooper [Fri, 7 Mar 2025 14:24:42 +0000 (14:24 +0000)]
xen/watchdog: Identify which domain watchdog fired

When a watchdog fires, the domain is crashed and can't dump any state.

Xen allows a domain to have two separate watchdogs.  Therefore, for a
domain running multiple watchdogs (e.g. one based around network, one
for disk), it is important for diagnostics to know which watchdog
fired.

As the printk() is in a timer callback, this is a bit awkward to
arrange, but there are 12 spare bits in the bottom of the domain
pointer owing to its alignment.

Reuse these bits to encode the watchdog id too, so the one which fired
is identified when the domain is crashed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
8 weeks agoxen/domain: Initialise the domain handle before inserting into the domlist
Andrew Cooper [Fri, 7 Mar 2025 16:38:26 +0000 (16:38 +0000)]
xen/domain: Initialise the domain handle before inserting into the domlist

As soon as the the domain is in the domlist, it can be queried via various
means, ahead of being fully constructed.  Ensure it has the toolstack-given
UUID prior to becoming visible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 weeks agoCI: Drop the now-obsolete 11-riscv64.dockerfile
Andrew Cooper [Fri, 7 Mar 2025 15:16:29 +0000 (15:16 +0000)]
CI: Drop the now-obsolete 11-riscv64.dockerfile

Fixes: bd9bda50553b ("automation: drop debian:11-riscv64 container")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 weeks agotools/libs: Make uselibs.mk more legible
Andrew Cooper [Mon, 8 Mar 2021 23:31:11 +0000 (23:31 +0000)]
tools/libs: Make uselibs.mk more legible

A few blank lines go a very long way.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
8 weeks agovpci: Add resizable bar support
Jiqian Chen [Mon, 24 Feb 2025 03:24:33 +0000 (11:24 +0800)]
vpci: Add resizable bar support

Some devices, like AMDGPU, support resizable bar capability,
but vpci of Xen doesn't support this feature, so they fail
to resize bars and then cause probing failure.

According to PCIe spec, each bar that supports resizing has
two registers, PCI_REBAR_CAP and PCI_REBAR_CTRL. So, add
handlers to support resizing the size of BARs.

Note that Xen will only trap PCI_REBAR_CTRL, as PCI_REBAR_CAP
is read-only register and the hardware domain already gets
access to it without needing any setup.

Link: https://gitlab.com/xen-project/xen/-/issues/87
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Roger Pau Monné <roger.pau@cirtrix.com>
Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
8 weeks agoxen/arm: Factor out construct_hwdom()
Jason Andryuk [Mon, 10 Mar 2025 08:53:51 +0000 (09:53 +0100)]
xen/arm: Factor out construct_hwdom()

Factor out construct_hwdom() from construct_dom0().  This will be
re-used by the dom0less code when building a domain with the hardware
capability.

iommu_hwdom_init(d) is moved into construct_hwdom() which moves it after
kernel_probe().  kernel_probe() doesn't seem to depend on its setting.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 weeks agoxen/consoled: clean up console handling for PV shim
Denis Mukhin [Mon, 10 Mar 2025 08:53:11 +0000 (09:53 +0100)]
xen/consoled: clean up console handling for PV shim

There are few places which check pv_shim console under CONFIG_PV_SHIM or
CONFIG_X86 in xen console driver.

Instead of inconsistent #ifdef-ing, introduce and use consoled_is_enabled() in
switch_serial_input() and __serial_rx().

PV shim case is fixed in __serial_rx() - should be under 'pv_shim &&
pv_console' check.

Signature of consoled_guest_{rx,tx} has changed so the errors can be logged
on the callsites.

Also, move get_initial_domain_id() to arch-independent header since it is now
required by console driver.

Lastly, add missing SPDX-License-Identifier to xen/consoled.h

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 weeks agotools/libs/store: use single_with_domid() in xs_get_domain_path()
Juergen Gross [Mon, 10 Mar 2025 08:52:54 +0000 (09:52 +0100)]
tools/libs/store: use single_with_domid() in xs_get_domain_path()

xs_get_domain_path() can be simplified by using single_with_domid().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 weeks agotools/hvmloader: Replace LAPIC_ID() with cpu_to_apicid[]
Alejandro Vallejo [Mon, 10 Mar 2025 08:52:39 +0000 (09:52 +0100)]
tools/hvmloader: Replace LAPIC_ID() with cpu_to_apicid[]

Replace uses of the LAPIC_ID() macro with accesses to the
cpu_to_apicid[] lookup table. This table contains the APIC IDs of each
vCPU as probed at runtime rather than assuming a predefined relation.

Moved smp_initialise() ahead of apic_setup() in order to initialise
cpu_to_apicid ASAP and avoid using it uninitialised. Note that bringing
up the APs doesn't need the APIC in hvmloader becasue it always runs
virtualized and uses the PV interface.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 weeks agotools/hvmloader: Retrieve APIC IDs from the APs themselves
Alejandro Vallejo [Mon, 10 Mar 2025 08:52:30 +0000 (09:52 +0100)]
tools/hvmloader: Retrieve APIC IDs from the APs themselves

Make it so the APs expose their own APIC IDs in a lookup table (LUT). We
can use that LUT to populate the MADT, decoupling the algorithm that
relates CPU IDs and APIC IDs from hvmloader.

Modified the printf to also print the APIC ID of each CPU, as well as
fixing a (benign) wrong specifier being used for the vcpu id.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 weeks agodocs: hardware runners setup
Stefano Stabellini [Sat, 8 Mar 2025 00:57:44 +0000 (16:57 -0800)]
docs: hardware runners setup

Document how to setup a new hardware runner

Signed-off-by: Victor Lira <VictorM.Lira@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
2 months agox86/e820: Remove opencoded vendor/feature checks
Andrew Cooper [Thu, 6 Mar 2025 23:21:07 +0000 (23:21 +0000)]
x86/e820: Remove opencoded vendor/feature checks

We've already scanned features by the time init_e820() is called.  Remove the
cpuid() calls.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 months agox86/vlapic: Drop vlapic->esr_lock
Andrew Cooper [Thu, 28 Nov 2024 00:47:37 +0000 (00:47 +0000)]
x86/vlapic: Drop vlapic->esr_lock

The exact behaviour of LVTERR interrupt generation is implementation
specific.

 * Newer Intel CPUs generate an interrupt when pending_esr becomes
   nonzero.

 * Older Intel and all AMD CPUs generate an interrupt when any
   individual bit in pending_esr becomes nonzero.

Neither vendor documents their behaviour very well.  Xen implements
the per-bit behaviour and has done since support was added.

Importantly, the per-bit behaviour can be expressed using the atomic
operations available in the x86 architecture, whereas the
former (interrupt only on pending_esr becoming nonzero) cannot.

With vlapic->hw.pending_esr held outside of the main LAPIC register page,
it's much easier to use atomic operations.

Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().

The only interesting change is that vlapic_error() now needs to take a
single bit only, rather than a mask, but this fine for all current
callers and forseable changes.

No change from a guests perspective.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>