Oleksii Kurochko [Thu, 10 Oct 2024 08:54:46 +0000 (10:54 +0200)]
xen/riscv: implement virt_to_maddr()
Implement the virt_to_maddr() function to convert virtual addresses
to machine addresses. The function includes checks for valid address
ranges, specifically the direct mapping region (DIRECTMAP_VIRT_START)
and the Xen's Linkage (XEN_VIRT_START) region. If the virtual address
falls outside of these regions, an assertion will trigger.
To implement this, the phys_offset variable is made accessible
outside of riscv/mm.c.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 10 Oct 2024 08:54:15 +0000 (10:54 +0200)]
x86: restore semicolon after explicit DS prefix
It's not unnecessary (as the earlier commit claimed): The integrated
assembler of Clang up to 11 complains about an "invalid operand for
instruction".
Fixes: b42cf31d1165 ("x86: use alternative_input() in cache_flush()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Sat, 13 Jul 2024 16:50:30 +0000 (17:50 +0100)]
CI: Stop building QEMU in general
We spend an awful lot of CI time building QEMU, even though most changes don't
touch the subset of tools/libs/ used by QEMU. Some numbers taken at a time
when CI was otherwise quiet:
With Without
Alpine: 13m38s 6m04s
Debian 12: 10m05s 8m10s
OpenSUSE Tumbleweed: 11m40s 7m54s
Ubuntu 24.04: 14m56s 8m06s
which is a >50% improvement in wallclock time in some cases.
The only build we have that needs QEMU is alpine-3.18-gcc-debug. This is the
build deployed and used by the QubesOS ADL-* and Zen3p-* jobs.
Xilinx-x86_64 deploys it too, but is PVH-only and doesn't use QEMU.
QEMU is also built by CirrusCI for FreeBSD (fully Clang/LLVM toolchain).
This should help quite a lot with Gitlab CI capacity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jan Beulich [Wed, 9 Oct 2024 07:56:43 +0000 (09:56 +0200)]
types: replace remaining uses of s64
... and move the type itself to linux-compat.h. An exception being
arch/arm/arm64/cpufeature.c and arch/arm/include/asm/arm64/cpufeature.h,
which are to use linux-compat.h instead (the former by including the
latter).
While doing so
- correct the type of union uu's uq field in lib/divmod.c,
- switch a few adjacent types as well, for (a little bit of)
consistency.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Roger Pau Monné [Wed, 9 Oct 2024 07:55:38 +0000 (09:55 +0200)]
x86/msr: add log messages to MSR state load error paths
Some error paths in the MSR state loading logic don't contain error messages,
which makes debugging them quite hard without adding extra patches to print the
information.
Add two new log messages to the MSR state load path that print information
about the entry that failed to load, for both PV and HVM.
While there also adjust XEN_DOMCTL_set_vcpu_msrs to return -ENXIO in case the
MSR is unhandled or can't be loaded, so it matches the error code used by HVM
MSR loading (and it's less ambiguous than -EINVAL).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Matthew Barnes [Wed, 9 Oct 2024 07:54:48 +0000 (09:54 +0200)]
x86/APIC: Switch flat driver to use phys dst for ext ints
External interrupts via logical delivery mode in xAPIC do not benefit
from targeting multiple CPUs and instead simply bloat up the vector
space.
However the xAPIC flat driver currently uses logical delivery for
external interrupts.
This patch switches the xAPIC flat driver to use physical destination
mode for external interrupts, instead of logical destination mode.
This patch also applies the following non-functional changes:
- Remove now unused logical flat functions
- Expand GENAPIC_FLAT and GENAPIC_PHYS macros, and delete them.
Resolves: https://gitlab.com/xen-project/xen/-/issues/194 Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Tue, 8 Oct 2024 12:37:53 +0000 (14:37 +0200)]
x86/domctl: fix maximum number of MSRs in XEN_DOMCTL_{get,set}_vcpu_msrs
Since the addition of the MSR_AMD64_DR{1-4}_ADDRESS_MASK MSRs to the
msrs_to_send array, the calculations for the maximum number of MSRs that
the hypercall can handle is off by 4.
Remove the addition of 4 to the maximum number of MSRs that
XEN_DOMCTL_{set,get}_vcpu_msrs supports, as those are already part of the
array.
A further adjustment could be to subtract 4 from the maximum size if the DBEXT
CPUID feature is not exposed to the guest, but guest_{rd,wr}msr() will already
perform that check when fetching or loading the MSRs. The maximum array is
used to indicate the caller of the buffer it needs to allocate in the get case,
and as an early input sanitation in the set case, using a buffer size slightly
lager than required is not an issue.
Fixes: 86d47adcd3c4 ('x86/msr: Handle MSR_AMD64_DR{0-3}_ADDRESS_MASK in the new MSR infrastructure') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Refactor the code to avoid an implicit fallthrough and address
a violation of MISRA C:2012 Rule 16.3: "An unconditional `break'
statement shall terminate every switch-clause".
Add defensive code after unreachable program points.
This also meets the requirements to deviate violations of MISRA C:2012
Rule 16.3: "An unconditional `break' statement shall terminate every
switch-clause".
Signed-off-by: Federico Serafini <federico.serafini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 8 Oct 2024 12:36:27 +0000 (14:36 +0200)]
ioreq: don't wrongly claim "success" in ioreq_send_buffered()
Returning a literal number is a bad idea anyway when all other returns
use IOREQ_STATUS_* values. The function is dead on Arm, and mapping to
X86EMUL_OKAY is surely wrong on x86.
Fixes: f6bf39f84f82 ("x86/hvm: add support for broadcast of buffered ioreqs...") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Tue, 2 Jul 2024 16:40:11 +0000 (17:40 +0100)]
CI: Drop bin86/dev86 from archlinux container
These packages have moved out of main to AUR, and are not easily accessible
any more. Drop them, because they're only needed for RomBIOS which is very
legacy these days.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Juergen Gross [Sat, 5 Oct 2024 15:15:47 +0000 (17:15 +0200)]
build: move xenlibs-dependencies make definition to uselibs.mk
In order to be able to use the xenlibs-dependencies macro from stubdom
build, move it to tools/libs/uselibs.mk, which is included from
current users and stubdom/Makefile.
No functional change intended.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Roger Pau Monné [Mon, 7 Oct 2024 09:10:21 +0000 (11:10 +0200)]
x86/dpci: do not leak pending interrupts on CPU offline
The current dpci logic relies on a softirq being executed as a side effect of
the cpu_notifier_call_chain() call in the code path that offlines the target
CPU. However the call to cpu_notifier_call_chain() won't trigger any softirq
processing, and even if it did, such processing should be done after all
interrupts have been migrated off the current CPU, otherwise new pending dpci
interrupts could still appear.
Currently the ASSERT() in the cpu callback notifier is fairly easy to trigger
by doing CPU offline from a PVH dom0.
Solve this by instead moving out any dpci interrupts pending processing once
the CPU is dead. This might introduce more latency than attempting to drain
before the CPU is put offline, but it's less complex, and CPU online/offline is
not a common action. Any extra introduced latency should be tolerable.
Fixes: f6dd295381f4 ('dpci: replace tasklet with softirq') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Use agreed syntax for pseudo-keyword fallthrough to meet the
requirements to deviate a violation of MISRA C:2012 Rule 16.3:
"An unconditional `break' statement shall terminate every
switch-clause".
No functional change.
Signed-off-by: Federico Serafini <federico.serafini@bugseng.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
xen/gnttab: address a violation of MISRA C Rule 13.6
guest_handle_ok()'s expansion contains a sizeof() involving its
first argument guest_handle_cast().
The expansion of the latter, in turn, contains a variable
initialization.
Since MISRA considers the initialization (even of a local variable)
a side effect, the chain of expansions mentioned above violates
MISRA C:2012 Rule 13.6 (The operand of the `sizeof' operator shall not
contain any expression which has potential side effect).
Refactor the code to address the rule violation.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Federico Serafini <federico.serafini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
guest_handle_ok()'s expansion contains a sizeof() involving its
first argument which is guest_handle_cast().
The expansion of the latter, in turn, contains a variable
initialization.
Since MISRA considers the initialization (even of a local variable)
a side effect, the chain of expansions mentioned above violates
MISRA C:2012 Rule 13.6 (The operand of the `sizeof' operator shall not
contain any expression which has potential side effect).
Refactor the code to address the rule violation.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Federico Serafini <federico.serafini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Fri, 4 Oct 2024 13:27:02 +0000 (14:27 +0100)]
CI: Fix builds following qemu-xen update
A recent update to qemu-xen has bumped the build requirements, with Python 3.8
being the new baseline but also needing the 'ensurepip' and 'tomllib/tomli'
packages.
* Ubuntu/Debian package 'ensurepip' separately, but it can be obtained by
installing the python3-venv package.
* 'tomllib' was added to the python standard library in Python 3.11, but
previously it was a separate package named 'tomli'.
In terms of changes required to build QEMU:
* Ubuntu 24.04 (Noble) has Python 3.12 so only needs python3-venv
* Ubuntu 22.04 (Jammy) has Python 3.10 but does have a python3-tomli package
that QEMU is happy with.
* FreeBSD has Python 3.9, but Python 3.11 is available.
In terms of exclusions:
* Ubuntu 20.04 (Focal) has Python 3.8, but lacks any kind of tomli package.
* Fedora 29 (Python 3.7), OpenSUSE Leap 15.6 (Python 3.6), and Ubuntu
18.04/Bionic (Python 3.6) are now too old.
Detecting tomllib/tomli is more than can fit in build's one-liner, so break it
out into a proper script.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Fri, 18 Sep 2020 15:50:15 +0000 (16:50 +0100)]
x86: Introduce X86_ET_* constants in x86-defns.h
The FRED spec architecturalises the Event Type encoding, previously exposed
only in VMCB/VMCS fields.
Introduce the constants in x86-defns.h, making them a bit more concise, and
retire enum x86_event_type.
Take the opportunity to introduce X86_ET_OTHER. It's absence appears to be a
bug in Introspection's Monitor Trap Flag support, when considering VECTORING
events during another VMExit.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 2 Oct 2024 19:59:19 +0000 (20:59 +0100)]
x86/boot: Convert remaining uses of the legacy ALIGN
There are only two remaining standalone uses the legacy ALIGN macro.
Drop these by switching the .incbin's over to using FUNC()/END() which has
alignment handled internally. While the incbin's aren't technically one
single function, they behave as if they are.
Finally, expand ALIGN inside the legacy ENTRY() macro in order to remove ALIGN
itself.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
TEST_TIMEOUT is set as a CI/CD project variable, as it should be, to
match the capability and speed of the testing infrastructure.
As it turns out, TEST_TIMEOUT defined in test.yaml cannot override
TEST_TIMEOUT defined as CI/CD project variable. As a consequence, today
the TEST_TIMEOUT setting in test.yaml for the Xilinx jobs is ignored.
Instead, rename TEST_TIMEOUT to TEST_TIMEOUT_OVERRIDE in test.yaml and
check for TEST_TIMEOUT_OVERRIDE first in console.exp.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Thu, 3 Oct 2024 14:03:38 +0000 (15:03 +0100)]
x86/boot: Don't use INC to set defaults
__efi64_mb2_start() makes some bold assumptions about the efi_platform and
skip_realmode booleans. Set them to 1 explicitly, which is more robust.
Make the comment a little more concise.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Frediano Ziglio <frediano.ziglio@cloud.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
xen: move per-cpu area management into common code
Centralize per-cpu area management to reduce code duplication and
enhance maintainability across architectures.
The per-cpu area management code, which is largely common among
architectures, is moved to a shared implementation in
xen/common/percpu.c. This change includes:
* Remove percpu.c from the X86 and Arm architectures.
* For x86, define INVALID_PERCPU_AREAS and PARK_OFFLINE_CPUS_VAR.
* Drop the declaration of __per_cpu_offset[] from stubs.c in
PPC and RISC-V to facilitate the build of the common per-cpu code.
No functional changes for x86.
For Arm add support of CPU_RESUME_FAILED, CPU_REMOVE and freeing of
percpu in the case when system_state != SYS_STATE_suspend, however,
there is no change in behavior for Arm at this time.
Move the asm-generic/percpu.h definitions to xen/percpu.h, except for
__per_cpu_start[] and __per_cpu_data_end[], which are moved to
common/percpu.c as they are only used in common/percpu.c.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Frediano Ziglio [Tue, 1 Oct 2024 10:22:38 +0000 (11:22 +0100)]
x86/boot: Rewrite EFI/MBI2 code partly in C
No need to have it coded in assembly.
Declare efi_multiboot2 in a new header to reuse between implementations
and caller.
Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Wed, 2 Oct 2024 18:01:26 +0000 (19:01 +0100)]
stubdom: Fix newlib build with GCC-14
Based on a fix from OpenSUSE, but adjusted to be Clang-compatible too. Pass
-Wno-implicit-function-declaration library-wide rather than using local GCC
pragmas.
Fix of copy_past_newline() to avoid triggering -Wstrict-prototypes.
Jan Beulich [Wed, 2 Oct 2024 06:52:18 +0000 (08:52 +0200)]
x86: prefer RDTSCP in rdtsc_ordered()
If available, its use is supposed to be cheaper than LFENCE+RDTSC, and
is virtually guaranteed to be cheaper than MFENCE+RDTSC.
Update commentary (and indentation) while there.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Michal Orzel [Tue, 24 Sep 2024 08:29:23 +0000 (09:29 +0100)]
docs: fusa: Add Assumption of Use (AOU)
AoU are the assumptions that Xen relies on other components (eg platform
platform, domains) to fulfill its requirements. In our case, platform means
a combination of hardware, firmware and bootloader.
We have defined AoU in the intro.rst and added AoU for the generic
timer.
Also, fixed a requirement to denote that Xen shall **not** expose the
system counter frequency via the "clock-frequency" device tree property.
The reason being the device tree documentation strongly discourages the
use of this peoperty. Further if the "clock-frequency" is exposed, then
it overrides the value programmed in the CNTFRQ_EL0 register.
So, the frequency shall be exposed via the CNTFRQ_EL0 register only and
consequently there is an assumption on the platform to program the
register correctly.
Andrew Cooper [Tue, 1 Oct 2024 12:00:13 +0000 (13:00 +0100)]
x86/pv: Rename pv.iobmp_limit to iobmp_nr and clarify behaviour
Ever since it's introduction in commit 013351bd7ab3 ("Define new event-channel
and physdev hypercalls") in 2006, the public interface was named nr_ports
while the internal field was called iobmp_limit.
Rename the internal field to iobmp_nr to match the public interface, and
clarify that, when nonzero, Xen will read 2 bytes.
There isn't a perfect parallel with a real TSS, but iobmp_nr being 0 is the
paravirt "no IOPB" case, and it is important that no read occurs in this case.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 30 Sep 2024 15:20:29 +0000 (16:20 +0100)]
x86/pv: Handle #PF correctly when reading the IO permission bitmap
The switch statement in guest_io_okay() is a very expensive way of
pre-initialising x with ~0, and performing a partial read into it.
However, the logic isn't correct either.
In a real TSS, the CPU always reads two bytes (like here), and any TSS limit
violation turns silently into no-access. But, in-limit accesses trigger #PF
as usual. AMD document this property explicitly, and while Intel don't (so
far as I can tell), they do behave consistently with AMD.
Switch from __copy_from_guest_offset() to __copy_from_guest_pv(), like
everything else in this file. This removes code generation setting up
copy_from_user_hvm() (in the likely path even), and safety LFENCEs from
evaluate_nospec().
Change the logic to raise #PF if __copy_from_guest_pv() fails, rather than
disallowing the IO port access. This brings the behaviour better in line with
normal x86.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 30 Sep 2024 15:09:51 +0000 (16:09 +0100)]
x86/pv: Rework guest_io_okay() to return X86EMUL_*
In order to fix a bug with guest_io_okay() (subsequent patch), rework
guest_io_okay() to take in an emulation context, and return X86EMUL_* rather
than a boolean.
For the failing case, take the opportunity to inject #GP explicitly, rather
than returning X86EMUL_UNHANDLEABLE. There is a logical difference between
"we know what this is, and it's #GP", vs "we don't know what this is".
There is no change in practice as emulation is the final step on general #GP
resolution, but returning X86EMUL_UNHANDLEABLE would be a latent bug if a
subsequent action were to appear.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 1 Oct 2024 07:47:32 +0000 (09:47 +0200)]
x86/MSR: improve code gen for rdmsr_safe() and rdtsc()
To fold two 32-bit outputs from the asm()-s into a single 64-bit value
the compiler needs to emit a zero-extension insn for the low half. Both
RDMSR and RDTSC clear the upper halves of their output registers anyway,
though. So despite that zero-extending insn (a simple MOV) being cheap,
we can do better: Without one, by declaring the local variables as 64-
bit ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/traps: Re-enable interrupts after reading cr2 in the #PF handler
Hitting a page fault clobbers %cr2, so if a page fault is handled while
handling a previous page fault then %cr2 will hold the address of the
latter fault rather than the former. In particular, if a debug key
handler happens to trigger during #PF and before %cr2 is read, and that
handler itself encounters a #PF, then %cr2 will be corrupt for the outer #PF
handler.
This patch makes the page fault path delay re-enabling IRQs until %cr2
has been read in order to ensure it stays consistent.
A similar argument holds in additional cases, but they happen to be safe:
* %dr6 inside #DB: Safe because IST exceptions don't re-enable IRQs.
* MSR_XFD_ERR inside #NM: Safe because AMX isn't used in #NM handler.
While in the area, remove redundant q suffix to a movq in entry.S and
the space after the comma.
Fixes: a4cd20a19073 ("[XEN] 'd' key dumps both host and guest state.") Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Taking a fault on a non-byte-granular insn means that the "number of
bytes not handled" return value would need extra care in calculating, if
we want callers to be able to derive e.g. exception context (to be
injected to the guest) - CR2 for #PF in particular - from the value. To
simplify things rather than complicating them, reduce inline assembly to
just byte-granular string insns. On recent CPUs that's also supposed to
be more efficient anyway.
For singular element accessors, however, alignment checks are added,
hence slightly complicating the code. Misaligned (user) buffer accesses
will now be forwarded to copy_{from,to}_guest_ll().
Naturally copy_{from,to}_unsafe_ll() accessors end up being adjusted the
same way, as they're produced by mere re-processing of the same code.
Otoh copy_{from,to}_unsafe() aren't similarly adjusted, but have their
comments made match reality; down the road we may want to change their
return types, e.g. to bool.
Fixes: 76974398a63c ("Added user-memory accessing functionality for x86_64") Fixes: 7b8c36701d26 ("Introduce clear_user and clear_guest") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
tools: Add new function to do PIRQ (un)map on PVH dom0
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices/<sbdf>/irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, to get the real gsi and add a new function
xc_physdev_map_pirq_gsi to get a free pirq for gsi.
Note: why not use current function xc_physdev_map_pirq, because
it doesn't support to allocate a free pirq, what's more, to
prevent changing it and affecting its callers, so add
xc_physdev_map_pirq_gsi.
Besides, PVH dom0 doesn't have PIRQs flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for PVH dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the another hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.
On PVH dom0, when passthrough a device to domU, QEMU and xl tools
want to use gsi number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but in current codes, the gsi
number is got from file /sys/bus/pci/devices/<sbdf>/irq, that is
wrong, because irq is not equal with gsi, they are in different
spaces, so pirq mapping fails.
And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.
x86/irq: allow setting IRQ permissions from GSI instead of pIRQ
Some domains are not aware of the pIRQ abstraction layer that maps
interrupt sources into Xen space interrupt numbers. pIRQs values are
only exposed to domains that have the option to route physical
interrupts over event channels.
This creates issues for PCI-passthrough from a PVH domain, as some of
the passthrough related hypercalls use pIRQ as references to physical
interrupts on the system. One of such interfaces is
XEN_DOMCTL_irq_permission, used to grant or revoke access to
interrupts, takes a pIRQ as the reference to the interrupt to be
adjusted.
Since PVH doesn't manage interrupts in terms of pIRQs, introduce a new
hypercall that allows setting interrupt permissions based on GSI value
rather than pIRQ.
Note the GSI hypercall parameters is translated to an IRQ value (in
case there are ACPI overrides) before doing the checks.
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/riscv: introduce and initialize SBI RFENCE extension
Introduce functions to work with the SBI RFENCE extension for issuing
various fence operations to remote CPUs.
Add the sbi_init() function along with auxiliary functions and macro
definitions for proper initialization and checking the availability of
SBI extensions. Currently, this is implemented only for RFENCE.
Introduce sbi_remote_sfence_vma() to send SFENCE_VMA instructions to
a set of target HARTs. This will support the implementation of
flush_xen_tlb_range_va().
Integrate __sbi_rfence_v02 from Linux kernel 6.6.0-rc4 with minimal
modifications:
- Adapt to Xen code style.
- Use cpuid_to_hartid() instead of cpuid_to_hartid_map[].
- Update BIT(...) to BIT(..., UL).
- Rename __sbi_rfence_v02_call to sbi_rfence_v02_real and
remove the unused arg5.
- Handle NULL cpu_mask to execute rfence on all CPUs by calling
sbi_rfence_v02_real(..., 0UL, -1UL,...) instead of creating hmask.
- change type for start_addr and size to vaddr_t and size_t.
- Add an explanatory comment about when batching can and cannot occur,
and why batching happens in the first place.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/riscv: introduce functionality to work with CPU info
Introduce struct pcpu_info to store pCPU-related information.
Initially, it includes only processor_id and hart id, but it
will be extended to include guest CPU information and
temporary variables for saving/restoring vCPU registers.
Add set_processor_id() function to set processor_id stored in
pcpu_info.
Define smp_processor_id() to provide accurate information,
replacing the previous "dummy" value of 0.
Initialize tp registers to point to pcpu_info[0].
Set processor_id to 0 for logical CPU 0 and store the physical
CPU ID in pcpu_info[0].
Introduce helpers for getting/setting hart_id ( physical CPU id
in RISC-V terms ) from Xen CPU id.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Set up fixmap mappings and the L0 page table for fixmap support.
Modify the PTEs (xen_fixmap[]) directly in arch_pmap_map() instead
of using set_fixmap() which is expected to be implemented using
map_pages_to_xen(), which, in turn, is expected to use
arch_pmap_map() during early boot, resulting in a loop.
Define new macros in riscv/config.h for calculating
the FIXMAP_BASE address, including BOOT_FDT_VIRT_{START, SIZE},
XEN_VIRT_SIZE, and XEN_VIRT_END.
Update the check for Xen size in riscv/xen.lds.S to use
XEN_VIRT_SIZE instead of a hardcoded constant.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Xen's implementation of PSR only supports Intel CPUs right now, hence it can be
made dependant on CONFIG_INTEL build option.
Since platform implementation is not limited to single vendor, intermediate
option CONFIG_X86_PSR introduced, which selected by CONFIG_INTEL.
When !X86_PSR then PSR-related sysctls XEN_SYSCTL_psr_cmt_op &
XEN_SYSCTL_psr_alloc are off as well.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 30 Sep 2024 08:05:25 +0000 (10:05 +0200)]
x86: introduce x86_seg_sys
To represent the USER-MSR bitmap access, a new segment type needs
introducing, behaving like x86_seg_none in terms of address treatment,
but behaving like a system segment for page walk purposes (implicit
supervisor-mode access).
While there also add x86_seg_none handling to the test harness'es
read() hook, as will be needed for MSR-LIST support.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Thu, 26 Sep 2024 12:53:50 +0000 (12:53 +0000)]
blkif: Fix alignment description for discard request
The discard feature have an other xenstore node to described the size
of the blocks than can be discarded, "discard-granularity", which
default to "sector-size" when absent as noted in the properties and in
note 4. So discard request should be aligned on this value.
Fixes: 221f2748e8da ("blkif: reconcile protocol specification with in-use implementations") Signed-off-by: Anthony PERARD <anthony.perard@vates.tech> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
alternatives is used both at boot time, and when loading livepatch payloads.
While for the former it makes sense to panic, it's not useful for the later, as
for livepatches it's possible to fail to load the livepatch if alternatives
cannot be resolved and continue operating normally.
Relax the BUGs in _apply_alternatives() to instead return an error code. The
caller will figure out whether the failures are fatal and panic.
Print an error message to provide some user-readable information about what
went wrong.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The check against the expected Xen build ID should be done ahead of attempting
to apply the alternatives contained in the livepatch.
If the CPUID in the alternatives patching data is out of the scope of the
running Xen featureset the BUG() in _apply_alternatives() will trigger thus
bringing the system down. Note the layout of struct alt_instr could also
change between versions. It's also possible for struct exception_table_entry
to have changed format, hence leading to other kind of errors if parsing of the
payload is done ahead of checking if the Xen build-id matches.
Move the Xen build ID check as early as possible. To do so introduce a new
check_xen_buildid() function that parses and checks the Xen build-id before
moving the payload. Since the expected Xen build-id is used early to
detect whether the livepatch payload could be loaded, there's no reason to
store it in the payload struct, as a non-matching Xen build-id won't get the
payload populated in the first place.
Note printing the expected Xen build ID has part of dumping the payload
information is no longer done: all loaded payloads would have Xen build IDs
matching the running Xen, otherwise they would have failed to load.
Fixes: 879615f5db1d ('livepatch: Always check hypervisor build ID upon livepatch upload') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen/livepatch: simplify and unify logic in prepare_payload()
The following sections: .note.gnu.build-id, .livepatch.xen_depends and
.livepatch.depends are mandatory and ensured to be present by
check_special_sections() before prepare_payload() is called.
Simplify the logic in prepare_payload() by introducing a generic function to
parse the sections that contain a buildid. Note the function assumes the
buildid related section to always be present.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The Elf loading logic will initially use the `data` section field to stash a
pointer to the temporary loaded data (from the buffer allocated in
livepatch_upload(), which is later relocated and the new pointer stashed in
`load_addr`.
Remove this dual field usage and use an `addr` uniformly. Initially data will
point to the temporary buffer, until relocation happens, at which point the
pointer will be updated to the relocated address.
This avoids leaving a dangling pointer in the `data` field once the temporary
buffer is freed by livepatch_upload().
Note the `addr` field cannot retain the const attribute from the previous
`data`field, as there's logic that performs manipulations against the loaded
sections, like applying relocations or sorting the exception table.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Michal Orzel <michal.orzel@amd.com>
xen/livepatch: remove useless check for duplicated sections
The current check for duplicated sections in a payload is not effective. Such
check is done inside a loop that iterates over the sections names, it's
logically impossible for the bitmap to be set more than once.
The usage of a bitmap in check_patching_sections() has been replaced with a
boolean, since the function just cares that at least one of the special
sections is present.
No functional change intended, as the check was useless.
Fixes: 29f4ab0b0a4f ('xsplice: Implement support for applying/reverting/replacing patches.') Fixes: 76b3d4098a92 ('livepatch: Do not enforce ELF_LIVEPATCH_FUNC section presence') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
xen: introduce common macros for per-CPU sections defintion
Introduce PERCPU_BSS macro which manages:
* Alignment of the section start
* Insertion of per-CPU data sections
* Alignment and start/end markers for per-CPU data
This change simplifies the linker script maintenance and ensures a unified
approach for per-CPU sections across different architectures.
Refactor the linker scripts for Arm, PPC, and x86 architectures by using
the common macro PERCPU_BSS defined in xen/xen.lds.h to handle per-CPU
data sections.
No functional changes.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Julien Grall <jgrall@amazon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen/ucode: Make Intel's microcode_sanity_check() stricter
The SDM states that data size must be a multiple of 4, but Xen doesn't check
this propery.
This is liable to cause a later failures, but should be checked explicitly.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Matthew Barnes [Mon, 23 Sep 2024 14:35:59 +0000 (15:35 +0100)]
x86/APIC: Remove x2APIC pure cluster mode
With the introduction of mixed x2APIC mode (using cluster addressing for
IPIs and physical for external interrupts) the use of pure cluster mode
doesn't have any benefit.
Remove the mode itself, leaving only the code required for logical
addressing when sending IPIs.
Resolves: https://gitlab.com/xen-project/xen/-/issues/189 Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Tue, 24 Sep 2024 12:23:29 +0000 (14:23 +0200)]
x86/vLAPIC: prevent undue recursion of vlapic_error()
With the error vector set to an illegal value, the function invoking
vlapic_set_irq() would bring execution back here, with the non-recursive
lock already held. Avoid the call in this case, merely further updating
ESR (if necessary).
This is XSA-462 / CVE-2024-45817.
Fixes: 5f32d186a8b1 ("x86/vlapic: don't silently accept bad vectors") Reported-by: Federico Serafini <federico.serafini@bugseng.com> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Nikola Jelic [Mon, 23 Sep 2024 17:50:08 +0000 (19:50 +0200)]
x86/efi: Use generic PE/COFF structures
Adapted x86 efi parser and mkreloc utility to use generic PE header
(efi/pe.h), instead of locally defined structures for each component.
Signed-off-by: Nikola Jelic <nikola.jelic@rt-rk.com> Signed-off-by: Milan Djokic <milan.djokic@rt-rk.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Jan Beulich [Tue, 24 Sep 2024 08:34:35 +0000 (10:34 +0200)]
x86: enable long section names for xen.efi
While for our present .data.read_mostly it may be deemed tolerable that
the name is truncated to .data.re, for the planned .init.trampoline an
abbreviation to .init.tr would end up pretty meaningless. Engage the
long section names extension that GNU ld has had support for already in
2.22 (which we consider the baseline release for xen.efi building).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Frediano Ziglio <frediano.ziglio@cloud.com>
x86/mwait-idle: add dependency on general Intel CPU support
Currently mwait_idle driver in Xen only implements support for Intel CPUs.
Thus in order to reduce dead code in non-Intel build configurations it can
be made explicitly dependant on CONFIG_INTEL option.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 16 Sep 2024 11:56:06 +0000 (12:56 +0100)]
x86/boot: Drop stale comment about zeroing the stack
This used to be true, but was altered by commit 37786b23b027 ("x86/cet: Remove
writeable mapping of the BSPs shadow stack") which moved cpu0_stack into
.init.bss.stack_aligned.
Fixes: 37786b23b027 ("x86/cet: Remove writeable mapping of the BSPs shadow stack") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic()
The functions {read,write}{b,w,l,q}_cpu() do not need to be memory-ordered
atomic operations in Xen, based on their definitions for other architectures.
Therefore, {read,write}{b,w,l,q}_cpu() can be used instead of
{read,write}{b,w,l,q}(), allowing the caller to decide if additional
fences should be applied before or after {read,write}_atomic().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 23 Sep 2024 14:31:49 +0000 (16:31 +0200)]
ubsan: use linux-compat.h
Instead of replacing the s64 (and later also u64) uses, keep the file as
little modified as possible from its Linux origin. (Sadly the two cast
adjustments are needed to avoid compiler warnings.)
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The Xen community is already informally following both rules. Let's make
it explicit. Both rules have zero violations, only cautions. While we
want to go down to zero cautions in time, adding both rules to rules.rst
enables us to immediately make both rules gating in the ECLAIR job part
of gitlab-ci.