Andrew Cooper [Tue, 29 Apr 2025 13:52:08 +0000 (14:52 +0100)]
x86/vpic: Improve bitops usage
* For vpic_get_priority(), introduce a common ror8() helper in plain C. One
thing that I can't persuade the compiler to realise is that a non-zero
value rotated is still non-zero, so use __builtin_clz() to help the
optimiser out.
* vpic_ioport_write() can be simplified to just for_each_set_bit(), which
avoids spilling pending to the stack each loop iteration. Changing pending
from unsigned int to uint8_t isn't even strictly necessary given the
underlying types of vpic->isr and vpic->irr, but done so clarity.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 28 Apr 2025 16:39:18 +0000 (17:39 +0100)]
x86/vmx: Fix label name in vmwrite_safe()
This condition is called VMFail(valid) in the SDM.
No functional change.
Fixes: fc3db01db6fb ("x86/vmx: Rework VMX wrappers using `asm goto()`") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 2 May 2025 07:44:49 +0000 (09:44 +0200)]
x86/alternatives: allow replacement code snippets to be re-used
In a number of cases we use ALTERNATIVE_2 with both replacement insns /
insn sequences being identical. Avoid emitting the same code twice, and
instead alias the necessary helper labels to the existing ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 2 May 2025 07:44:03 +0000 (09:44 +0200)]
mm: move paddr_to_pdx()
There's nothing arch-specific about it.
While there, on x86 visually separate the vmap_to_*() macros from those
covered by the earlier comment.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
Jan Beulich [Fri, 2 May 2025 07:43:23 +0000 (09:43 +0200)]
{hyper,multi}call: further limit arguments to just 5
Multicall compat translation and hypercall continuation handling can
also be shrunk to the processing of just (up to) 5 arguments.
Take the opportunity to
- make exceeding the limit noisy in hypercall_create_continuation(),
- use speculation-safe array access in hypercall_create_continuation(),
- avoid a Misra C:2012 Rule 19.1 violation in xlat_multicall_entry(),
- further tidy xlat_multicall_entry() and __trace_multicall_call()
style-wise.
Amends: 2f531c122e95 ("x86: limit number of hypercall parameters to 5") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> # arm
Several APIs take an architecture-dependent set of flags in an unsigned int,
but this needs to be a wider type to support PPC.
The new type pte_attr_t has been introduced for this purpose, so switch to it
in map_pages_to_xen(), __vmap() and modify_xen_mappings{,_lite}().
No functional change.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jan Beulich [Wed, 30 Apr 2025 06:47:49 +0000 (08:47 +0200)]
cpufreq: don't leave stale statistics pointer
Error paths of cpufreq_statistic_init() correctly free the base
structure pointer, but the per-CPU variable would still hold it, mis-
guiding e.g. cpufreq_statistic_update(). Defer installing of the pointer
there until the structure was fully populated.
Fixes: 755af07edba1 ("x86/cpufreq: don't use static array for large per-CPU data structures") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 30 Apr 2025 06:46:49 +0000 (08:46 +0200)]
x86: drop underscore-prefixed {maddr,virt} <=> page conversion macros
Unlike the ones converting to/from frame numbers, these don't have type-
safe overrides, and they also can't gain any within our present type
system. Unsurprisingly we also don't have any uses of the underscore-
prefixed variants.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 30 Apr 2025 06:46:21 +0000 (08:46 +0200)]
x86emul: avoid UB shifts in FLDENV/FRSTOR handling
16-bit quantities, no matter whether expressed as uint16_t or as
bitfield, will be promoted to plain int before doing any arithmetic on
them. Shifting such values by 16 will therefore shift into the sign bit,
which is UB if that bit becomes set. To account for all reads and all
writes accessing opposite members of the same union, introduce yet more
local variables to reduce the shift counts to 12.
Fixes: be55ed744ed8 ("x86emul: support FLDENV and FRSTOR") Reported-by: Fabian Specht <f.specht@tum.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
x86/hvm: only register the r/o subpage ops when needed
MMIO operation handlers can be expensive to process, hence attempt to
register only those that will be needed by the domain.
Subpage r/o MMIO regions are added exclusively at boot, further limit their
addition to strictly before the initial domain gets created, so by the time
initial domain creation happens Xen knows whether subpage is required or
not. This allows only registering the MMIO handler when there are
subpage regions to handle.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/hvm: fix handling of accesses to partial r/o MMIO pages
The current logic to handle accesses to MMIO pages partially read-only is
based on the (now removed) logic used to handle accesses to the r/o MMCFG
region(s) for PVH v1 dom0. However that has issues when running on AMD
hardware, as in that case the guest linear address that triggered the fault
is not provided as part of the VM exit. This caused
mmio_ro_emulated_write() to always fail before calling
subpage_mmio_write_emulate() when running on AMD and called from an HVM
context.
Take a different approach and convert the handling of partial read-only
MMIO page accesses into an HVM MMIO ops handler, as that's the more natural
way to handle this kind of emulation for HVM domains.
This allows getting rid of hvm_emulate_one_mmio() and it's single call site
in hvm_hap_nested_page_fault(). As part of the fix r/o MMIO accesses are
now handled by handle_mmio_with_translation(), re-using the same logic that
was used for other read-only types part of p2m_is_discard_write(). The
usage of emulation for faulting p2m_mmio_direct types is limited to
addresses in the r/o MMIO range. The page present check is dropped as type
p2m_mmio_direct must have the present bit set in the PTE.
Note a small adjustment is needed to the `pf-fixup` dom0 PVH logic: avoid
attempting to fixup faults resulting from write accesses to read-only MMIO
regions, as handling of those accesses is now done by handle_mmio().
Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in HVM containers') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/io: provide helpers for multi size MMIO accesses
Several handlers have the same necessity of reading or writing from or to
an MMIO region using 1, 2, 4 or 8 bytes accesses. So far this has been
open-coded in the function itself. Instead provide a new set of handlers
that encapsulate the accesses.
Since the added helpers are not architecture specific, introduce a new
generic io.h header.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
The current implementation of __vmread() is void and returns the result via
pointer argument which leads to excess code in some places.
Introduce a new vmread() function, and implement __vmread() in terms of it.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Denis Mukhin <dmukhin@ford.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 22 Apr 2025 11:30:19 +0000 (12:30 +0100)]
xen/link: Include .debug_str_offsets in DWARF2_DEBUG_SECTIONS
Building Xen with Clang-17 yields the following warning:
ld: warning: orphan section `.debug_str_offsets' from `prelink.o' being placed in section `.debug_str_offsets'
ld: ./.xen.efi.0xffff82d040000000.0:/4: section below image base
ld: ./.xen.efi.0xffff82d040000000.1:/4: section below image base
ld: warning: orphan section `.debug_str_offsets' from `prelink.o' being placed in section `.debug_str_offsets'
ld: xen.efi:/4: section below image base
Set the alignment to 4 as it holds 4-byte values, despite the fact that Clang
appears to only use 1.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 28 Apr 2025 07:48:14 +0000 (09:48 +0200)]
x86: constrain sub-page access length in mmio_ro_emulated_write()
Without doing so we could trigger the ASSERT_UNREACHABLE() in
subpage_mmio_write_emulate(). A comment there actually says this
validation would already have been done ...
Fixes: 8847d6e23f97 ("x86/mm: add API for marking only part of a MMIO page read only") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Jason Andryuk [Thu, 24 Apr 2025 21:23:26 +0000 (17:23 -0400)]
xen/vpci: Fix msix existing mapping printk
The format string lacks a space, so mfn and type run together:
(XEN) d0v0 0000:06:00.7: existing mapping (mfn: 753037type: 0) at 0x1 clobbers MSIX MMIO area
Add a space. Additionally, move the format string to a single long line
to improve grep-ability.
Fixes: 677053fac17a ("vpci/msix: carve p2m hole for MSIX MMIO regions") Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
x86/hvmloader: fix usage of NULL with cpuid_count()
The commit that added support for retrieving the APIC IDs from the APs
introduced several usages of cpuid() with NULL parameters, which is not
handled by the underlying implementation. For GCC I expect this results in
writes to the physical address at 0, however when using Clang 19.1.2 the
generated code in smp.o for the whole file is:
tools/firmware/hvmloader/smp.o: file format elf32-i386
Showing the usage of a NULL pointer results in undefined behavior, and
Clang refusing to generate further code after it.
Fix by using a temporary variable in cpuid_count() in place for any NULL
parameter.
Fixes: 9ad0db58c7e2 ('tools/hvmloader: Retrieve APIC IDs from the APs themselves') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen: fix buffer over-read in bitmap_to_xenctl_bitmap()
There's an off-by-one when calculating the last byte in the input array to
bitmap_to_xenctl_bitmap(), which leads to bitmaps with sizes multiple of 8
to over-read and incorrectly use a byte past the end of the array.
Fixes: 288c4641c80d ('xen: simplify bitmap_to_xenctl_bitmap for little endian') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jason Andryuk [Fri, 18 Apr 2025 21:05:50 +0000 (17:05 -0400)]
tools/libxl: Switch irq to unsigned int
The PCI device irq is read with fscanf(%u). Switch the irq variable to
unsigned int to match.
Linux driver/pci/pci-sysfs.c:irq_show() uses %u to print the value.
However, unsigned int irq doesn't compile because of:
error: pointer targets in passing argument 4 of 'xc_physdev_map_pirq' differ in signedness [-Werror=pointer-sign]
Add int pirq to provide the desired type instead of re-using irq.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
Jason Andryuk [Fri, 18 Apr 2025 21:05:49 +0000 (17:05 -0400)]
tools/libxl: Skip invalid IRQs
A PCI device's irq field is an 8-bit number. A value of 0xff indicates
that the device IRQ is not connected. Additionally, the Linux ACPI code
can convert these 0xff values to IRQ_NOTCONNECTED(0x80000000) because
"0x80000000 is guaranteed to be outside the available range of
interrupts and easy to distinguish from other possible incorrect
values." When the hypercall to assign that IRQ fails, device
passthrough as a whole fails.
Add checking for a valid IRQ and skip the IRQ handling for PCI devices
outside that range. This allows for passthrough of devices without
legacy IRQs.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
Anthony PERARD [Mon, 14 Apr 2025 14:17:14 +0000 (16:17 +0200)]
tools/tests: Fix newly introduced Makefile
Fix few issue with this new directory:
- clean generated files
- and ignore those generated files
- include the dependency files generated by `gcc`.
- rework prerequisites:
"test-rangeset.o" also needs the generated files "list.h" and
"rangeset.h". Technically, both only needs "harness.h" which needs
the generated headers, but that's a bit simpler and the previous
point will add the dependency on "harness.h" automatically.
This last point fix an issue where `make` might decide to build
"test-rangeset.o" before the other files are ready.
Fixes: 7bf777b42cad ("tootls/tests: introduce unit tests for rangesets") Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@vates.tech>
Hongbo [Sun, 30 Mar 2025 16:03:04 +0000 (00:03 +0800)]
tools/libxl: search PATH for QEMU if `QEMU_XEN_PATH` is not absolute
`QEMU_XEN_PATH` will be configured as `qemu-system-i386` with no clue where, if
`--with-system-qemu` is set without giving a path (as matched in the case `yes`
but not `*`). However, the existence of the executable is checked by `access()`,
that will not look for anywhere in $PATH but the current directory. And since it
is possible for `qemu-system-i386` (or any other configured values) to be
executed from PATH later, we'd better find that in PATH and return the full path
for the caller to check against.
Signed-off-by: Hongbo <hehongbo@mail.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
[Initialise `saveptr` to NULL] Signed-off-by: Anthony PERARD <anthony.perard@vates.tech>
Andrew Cooper [Sun, 20 Apr 2025 00:46:57 +0000 (01:46 +0100)]
x86/alternatives: Simplify _apply_alternatives() now altcall is separate
With altcall handled separately, the special case in _apply_alternatives() is
unused and can be dropped. The force parameter (used to signify the seal
pass) can be removed too.
In turn, nmi_apply_alternatives() no longer needs to call
_apply_alternatives() on the second pass.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sat, 19 Apr 2025 19:44:31 +0000 (20:44 +0100)]
x86/altcall: Switch to simpler scheme
With all the infrastructure in place, switch from using ALTERNATIVE() to
simply populating .alt_call_sites.
Before, _apply_alternatives() would devirtualise in two passes; the first
being opportunistic, and the second (signified by the force parameter) sealing
any call with a still-NULL function pointer.
Now, all devirtualising is performed together, at the point in time of the
second pass previously. The call to seal_endbr64() needs delaying until after
apply_alt_calls() is complete, or we have a narrow window with real indirect
branches and no ENDBR64 instructions.
Under the hood, the following changes are happening:
The changes aren't quite equal because inlining is affected by the smaller
asm() block. Nevertheless, the metadata is held in 1/3 of the space, and
there are no CALL instructions held in the replacement section any more.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sun, 20 Apr 2025 01:32:26 +0000 (02:32 +0100)]
xen/livepatch: Support new altcall scheme
The new altcall scheme uses an .alt_call_sites section. Wire this up in very
much the same way as the .altinstructions section, although there is less
sanity checking necessary.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Sat, 19 Apr 2025 22:05:52 +0000 (23:05 +0100)]
x86/altcall: Introduce new simpler scheme
Encoding altcalls as regular alternatives leads to an unreasonable amount of
complexity in _apply_alternatives().
Introduce apply_alt_calls(), and an .alt_call_sites section which simply
tracks the source address (relative, to save on space). That's literally all
that is needed in order to devirtualise the function pointers.
apply_alt_calls() is mostly as per _apply_alternatives(), except the size is
known to be 6 bytes. Drop the logic for JMP *RIPREL, as there's no support
for tailcall optimisations, nor a feasbile plan on how to introduce support.
Pad with a redundant prefix to avoid needing a separate NOP on the end.
Wire it up in nmi_apply_alternatives(), although the section is empty at this
juncture so nothing happens in practice.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sat, 19 Apr 2025 23:44:17 +0000 (00:44 +0100)]
x86/alternatives: Rework information passing into nmi_apply_alternatives()
nmi_apply_alternatives() is soon going to need to dispatch to multiple
functions, and a force parameter is not a good way of passing information.
Introduce ALT_INSNS and ALT_CALLS to pass in at the top level to select the
operation(s) desired. They represent what will happen when we've separated
the altcalls out of the general alternative instructions infrastructure,
although in the short term we still need to synthesise the force parameter for
_apply_alternatives().
Move two externs to reduce their scope a little.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sat, 19 Apr 2025 21:38:23 +0000 (22:38 +0100)]
x86/altcall: Rename alternative_branches() to boot_apply_alt_calls()
The alternatives APIs are not great; rename alternative_branches() to be more
precise. Centralise the declaration in xen/alternative-call.h, in the
expectation that x86 won't be the only user in the long term.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Wed, 5 Mar 2025 10:53:20 +0000 (11:53 +0100)]
xen: remove -N from the linker command line
It's unclear why -N is being used in the first place. It was added by
commit 4676bbf96dc8 back in 2002 without any justification.
When building a PE image it's actually detrimental to forcefully set the
.text section as writable. The GNU LD man page contains the following
warning regarding the -N option:
> Note: Although a writable text section is allowed for PE-COFF targets, it
> does not conform to the format specification published by Microsoft.
Remove the usage of -N uniformly on all architectures, assuming that the
addition was simply done as a copy and paste of the original x86 linking
rune.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Julien Grall <jgrall@amazon.com>
x86/intel: workaround several MONITOR/MWAIT errata
There are several errata on Intel regarding the usage of the MONITOR/MWAIT
instructions, all having in common that stores to the monitored region
might not wake up the CPU.
Fix them by forcing the sending of an IPI for the affected models.
The Ice Lake issue has been reproduced internally on XenServer hardware,
and the fix does seem to prevent it. The symptom was APs getting stuck in
the idle loop immediately after bring up, which in turn prevented the BSP
from making progress. This would happen before the watchdog was
initialized, and hence the whole system would get stuck.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
A randconfig job failed with the following issue:
riscv64-linux-gnu-ld: Xen too large for early-boot assumptions
The reason is that enabling the UBSAN config increased the size of
the Xen binary.
Increase XEN_VIRT_SIZE to reserve enough space, allowing both UBSAN
and GCOV to be enabled together, with some slack for future growth.
Additionally, add checks to verify that XEN_VIRT_START is 1GB-aligned
and XEN_VIRT_SIZE is 2MB-aligned to reduce the number of page tables
needed for the initial mapping. In the future, when 2MB mappings are
used for .text (rx), .rodata (r), and .data (rw), this will also help
reduce TLB pressure.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 23 Apr 2025 07:39:44 +0000 (09:39 +0200)]
x86/EFI: correct mkreloc header (field) reading
With us now reading the full combined optional and NT headers, the
subsequent reading of (and seeking to) NT header fields is wrong. Since
PE32 and PE32+ NT headers are different anyway (beyond the image base
oddity extending across both headers), switch to using a union. This
allows to fetch the image base more directly then.
Additionally add checking to map_section(), which would have caught at
least the wrong (zero) image size that we previously used.
Fixes: f7f42accbbbb ("x86/efi: Use generic PE/COFF structures") Reported-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Condition coverage, also known as MC/DC (modified condition/decision
coverage) is a coverage metric that tracks separate outcomes in
boolean expressions.
This patch adds CONFIG_CONDITION_COVERAGE option to enable MC/DC for
GCC. Clang is not supported right now because Xen can't emit version
10 of LLVM profile data, where MC/DC support was added.
Also, use the opportunity to convert COV_FLAGS to cov-cflags-y, which
reduces amount of ifeqs in Rules.mk. Otherwise this patch had to add
another nesting level with "ifeq ($(CONFIG_CONDITION_COVERAGE),y)".
Add test runner script qemu-xtf.sh which is allows any XTF x86 test to be
easily executed. Test runner is invoked from the qemu-smoke* jobs with the
hardcoded parameters.
Each x86 XTF job lead time is reduced a bit since only the test-related code
is built, not the entire XTF project.
Add .gitignore to avoid committing test artifacts by mistake.
Andrew Cooper [Mon, 21 Apr 2025 15:31:17 +0000 (16:31 +0100)]
x86/alternative: Clean up headers
alternative.h doesn't need lib.h now that macros.h exists. Futhermore, STR()
is already the prevailing style, so convert the final __stringify() to drop
stringify.h too.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Tue, 22 Apr 2025 09:25:48 +0000 (11:25 +0200)]
xenstored: Remove setjmp.h
Use of setjmp/longjmp as removed in 2006, but the include remained.
Remove it now.
Fixes: 1bac3b49cdd4 ("Import the current version of talloc from the Samba 3 source base") Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Tue, 22 Apr 2025 09:25:23 +0000 (11:25 +0200)]
compat/memory: avoid UB shifts in XENMEM_exchange handling
Add an early basic check, yielding the same error code as the more
thorough one the main handler would produce.
Fixes: b8a7efe8528a ("Enable compatibility mode operation for HYPERVISOR_memory_op") Reported-by: Manuel Andreas <manuel.andreas@tum.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Daniel P. Smith [Tue, 22 Apr 2025 09:24:57 +0000 (11:24 +0200)]
x86/boot: add cmdline to struct boot_domain
Add a container for the "cooked" command line for a domain. This
provides for the backing memory to be directly associated with the
domain being constructed. This is done in anticipation that the domain
construction path may need to be invoked multiple times, thus ensuring
each instance had a distinct memory allocation.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Alejandro Vallejo <agarciav@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Denis Mukhin <dmukhin@ford.com>
Jan Beulich [Tue, 22 Apr 2025 09:24:20 +0000 (11:24 +0200)]
x86emul: also clip repetition count for STOS
Like MOVS, INS, and OUTS, STOS also has a special purpose hook, where
the hook function may legitimately have the same expectation as to the
request not straddling address space start/end.
Fixes: 5dfe4aa4eeb6 ("x86_emulate: Do not request emulation of REP instructions beyond the") Reported-by: Fabian Specht <f.specht@tum.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 22 Apr 2025 09:23:41 +0000 (11:23 +0200)]
Arm: rename smp_clear_cpu_maps()
The function has lost all clearing operations. Use the commonly
available name (declared in xen/smp.h), that x86 also uses. This then
also addresses a Misra C:2012 rule 8.6 violation (not really covered
by the deviation we have).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Move code for processing DT IOMMU specifier to a separate helper.
This helper will be re-used for adding PCI devices by the subsequent
patches as we will need exact the same actions for processing
DT PCI-IOMMU specifier.
Jason Andryuk [Wed, 16 Apr 2025 21:29:11 +0000 (17:29 -0400)]
xen/arm: Add capabilities to dom0less
Add a capabilities property to dom0less to allow building a
disaggregated system. Only a single hardware domain and single xenstore
domain can be specified. Multiple control domains are possible.
Introduce bootfdt.h to contain these constants.
When using the hardware or xenstore capabilities, adjust the grant and
event channel limits similar to dom0.
For a hardware domain, disallow specifying "vpl011", "nr_spis",
"multiboot,device-tree" and "passthrough" nodes. Also, require an IOMMU
when not direct-mapped,
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jason Andryuk [Wed, 16 Apr 2025 21:29:09 +0000 (17:29 -0400)]
tools/init-dom0less: Only seed legacy xenstore grants
The hardware domain is unable to seed a control domain, but we want the
control domain to use xenstore. Rely on the hypervisor to seed dom0less
grant table entries for Xenstore, so this seeding is unnecessary.
However, that only works for the new xenstore late init. The legacy
protocol which uses init-dom0less to populate the page still needs to
seed the grant.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jason Andryuk [Wed, 16 Apr 2025 21:29:08 +0000 (17:29 -0400)]
xen/arm: dom0less seed xenstore grant table entry
xenstored maps other domains' xenstore pages. Currently this relies on
init-dom0less or xl to seed the grants from Dom0. With split
hardware/control/xenstore domains, this is problematic since we don't
want the hardware domain to be able to map other domains' resources
without their permission. Instead have the hypervisor seed the grant
table entry for every dom0less domain. The grant is then accessible as
normal.
C xenstored uses grants, so it can map the xenstore pages from a
non-dom0 xenstore domain. OCaml xenstored uses foreign mappings, so it
can only run from a privileged domain (dom0).
Add a define to indicate the late alloc xsentore PFN, to better indicate
what is being checked. Use UINT64_MAX instead of ~0ULL as the HVM_PARAM
field is a uint64_t. UINT64_MAX is not defined, so add it.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jason Andryuk [Wed, 16 Apr 2025 21:29:07 +0000 (17:29 -0400)]
xen/arm: dom0less delay xenstore initialization
To allocate the xenstore event channel and initialize the grant table
entry, the xenstore domid is neeed. A dom0 is created before the domUs,
so it is normally available through hardware_domain. With capabilities
and dom0less, the xenstore domain may not be created first.
Keep the population of the page and HVM_PARAM_STORE_PFN in the normal
domain construction, but delay event channel creation and grant seeding
to after all domUs are created. HVM_PARAM_STORE_PFN now serves as
indication to setup xenstore since the device tree is no longer
immediately available. 0 means no xenstore. ~0ULL means legacy so only
the event channel needs setup, and any other value means to seed the
page.
dom0 needs to set xs_domid when it is serving as the xenstore domain.
The domain running xenstored needs to be the handler for VIRQ_DOM_EXC,
so set that as well - it otherwise defaults to hardware domain.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jason Andryuk [Wed, 16 Apr 2025 21:29:06 +0000 (17:29 -0400)]
xen/arm: dom0less hwdom construction
When creating a hardware domain, have the dom0less code call
construct_hwdom() which is shared with the dom0 code. The hardware
domain requires building that best matches the dom0 build path. Re-use
it to keep them in sync.
The device tree node of the dom0less config is now passed into
construct_hwdom(). dom0 uses /chosen for process_shm while a hwdom will
use the value from its dom0less device tree node.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Daniel P. Smith [Wed, 16 Apr 2025 21:29:05 +0000 (17:29 -0400)]
xen: introduce hardware domain create flag
Add and use a new internal create domain flag to specify the hardware
domain. This removes the hardcoding of domid 0 as the hardware domain.
This allows more flexibility with domain creation.
The assignment of d->cdf is moved later so CDF_hardware is added for the
late_hwdom case. Also old_hwdom has the flag removed to reflect the
change.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Fri, 28 Mar 2025 11:19:23 +0000 (11:19 +0000)]
x86: Drop asm/byteorder.h
With the common code moved fully onto xen/byteorder.h, clean up the dregs.
It turns out that msi.h has not needed byteorder.h since the use of
__{BIG,LITTLE}_ENDIAN_BITFIELD was dropped in commit d58f3941ce3f ("x86/MSI:
use standard C types in structures/unions").
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Lin Liu [Thu, 21 Oct 2021 02:52:39 +0000 (02:52 +0000)]
xen/decompressors: Remove use of *_to_cpup() helpers
These wrappers simply hide a deference, which adds to the cognitive complexity
of reading the code. As such, they're not going to be included in the new
byteswap infrastructure.
No functional change.
Signed-off-by: Lin Liu <lin.liu@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Lin Liu [Thu, 21 Oct 2021 02:52:39 +0000 (03:52 +0100)]
xen/device-tree: Remove use of *_to_cpup() helpers
These wrappers simply hide a deference, which adds to the cognitive complexity
of reading the code. As such, they're not going to be included in the new
byteswap infrastructure.
No functional change.
Signed-off-by: Lin Liu <lin.liu@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Bertrand Marquis <bertrand.marquis@arm.com>
Lin Liu [Wed, 20 Oct 2021 04:29:46 +0000 (04:29 +0000)]
xen/lib: Switch to xen/byteorder.h
In divmod.c, additionally swap xen/lib.h for xen/macros.h as only ABS() is
needed.
In find-next-bit.c, ext2 has nothing to do with this logic. It was a local
modification when the logic was imported from Linux, because Xen didn't have a
suitable helper at the time.
The new infrastructure does have a suitable primitive, so use it.
No functional change.
Signed-off-by: Lin Liu <lin.liu@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Lin Liu [Mon, 9 May 2022 05:47:10 +0000 (01:47 -0400)]
xen: Implement common byte{order,swap}.h
The current swab??() infrastructure is unnecessarily complicated, and can be
replaced entirely with compiler builtins.
All supported compilers provide __BYTE_ORDER__ and __builtin_bswap??().
Nothing in Xen cares about the values of __{BIG,LITTLE}_ENDIAN; just that one
of them is defined. Therefore, centralise their definitions in xen/config.h
Signed-off-by: Lin Liu <lin.liu@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 17 Apr 2025 08:01:19 +0000 (10:01 +0200)]
x86/HVM: update repeat count upon nested lin->phys failure
For the X86EMUL_EXCEPTION case the repeat count must be correctly
propagated back. Since for the recursive invocation we use a local
helper variable, its value needs copying to the caller's one.
While there also correct the off-by-1 range in the comment ahead of the
function (strictly speaking for the "DF set" case we'd need to put
another, different range there as well).
Fixes: 53f87c03b4ea ("x86emul: generalize exception handling for rep_* hooks") Reported-by: Manuel Andreas <manuel.andreas@tum.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 17 Apr 2025 08:00:41 +0000 (10:00 +0200)]
Arm: cpu_*_map adjustments
First, they all start out zeroed. There's no point doing an initial
cpumask_clear() on them.
Next, only cpu_online_map may be altered post-boot, and even that only
rarely. Add respective placement attributes.
Finally, cpu_present_map really isn't anything more than an alias of
cpu_possible_map. Avoid the copying, and have the linker provide the
symbol (if needed in the first place; it is needed right now as
common code references the symbol).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
EFI: Avoid crash calling PrintErrMesg from efi_multiboot2
Although code is compiled with -fpic option data is not position
independent. This causes data pointer to become invalid if
code is not relocated properly which is what happens for
efi_multiboot2 which is called by multiboot entry code.
Code tested adding
PrintErrMesg(L"Test message", EFI_BUFFER_TOO_SMALL);
in efi_multiboot2 before calling efi_arch_edd (this function
can potentially call PrintErrMesg).
After the patch:
Booting `XenServer (Serial)'Booting `XenServer (Serial)'
Test message: Buffer too small
BdsDxe: loading Boot0000 "UiApp" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(462CAA21-7614-4503-836E-8AB6F4662331)
BdsDxe: starting Boot0000 "UiApp" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(462CAA21-7614-4503-836E-8AB6F4662331)
Andrew Cooper [Tue, 15 Apr 2025 12:49:01 +0000 (13:49 +0100)]
CI: Build with --prefix=/usr rather than setting LD_LIBRARY_PATH
This also moves executables too.
I'm not sure why xilinx-smoke-dom0-x86_64.sh was overriding PATH too, as
/usr/local is clearly in PATH given the other tests, but drop that too.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
is_xen_heap_page() and is_xen_heap_mfn() are not implemented for arm32 MPU.
Thus, introduce the stubs for these functions in asm/mpu/mm.h and move the
original code to asm/mmu/mm.h (as it is used for arm32 MMU based system).
xen/arm32: Create the same boot-time MPU regions as arm64
Create Boot-time MPU protection regions (similar to Armv8-R AArch64) for
Armv8-R AArch32.
Also, defined *_PRBAR macros for arm32. The only difference from arm64 is that
XN is 1-bit for arm32.
Define the system registers and macros in mpu/cpregs.h.
Introduce WRITE_SYSREG_ASM() to write to system registers in assembly.
x86/mm: account for the offset when performing subpage r/o MMIO access
The current logic in subpage_mmio_write_emulate() doesn't take into account
the page offset, and always performs the writes at offset 0 (start of the
page).
Fix this by accounting for the offset before performing the write.
Fixes: 8847d6e23f97 ('x86/mm: add API for marking only part of a MMIO page read only') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen/config.h: Move BITS_PER_* definitions from asm/config.h to xen/config.h
BITS_PER_* values can be defined in a common way using compiler-provided macros.
Thus, these definitions are moved to xen/config.h to reduce duplication across
architectures.
Additionally, *_BYTEORDER macros are removed, as BITS_PER_* values now come
directly from the compiler environment.
The arch_fls() implementation for Arm and PPC is updated to use BITS_PER_INT
instead of a hardcoded value of 32.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
This avoids the need to re-compress it in every test job. This saves minutes
of wallclock time.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>