Juergen Gross [Mon, 5 Feb 2024 10:49:50 +0000 (11:49 +0100)]
tools/xenstored: add early_init() function
Some xenstored initialization needs to be done in the daemon case only,
so split it out into a new early_init() function being a stub in the
stubdom case.
Remove the call of talloc_enable_leak_report_full(), as it serves no
real purpose: the daemon only ever exits due to a crash, in which case
a log of talloc()ed memory hardly has any value.
Juergen Gross [Mon, 5 Feb 2024 10:49:47 +0000 (11:49 +0100)]
tools/xenstored: rename xenbus_evtchn()
Rename the xenbus_evtchn() function to get_xenbus_evtchn() in order to
avoid two externally visible symbols with the same name when Xenstore-
stubdom is being built with a Mini-OS with CONFIG_XENBUS set.
Cyril Rébert [Sun, 4 Feb 2024 10:19:40 +0000 (11:19 +0100)]
tools/xentop: fix sorting bug for some columns
Sort doesn't work on columns VBD_OO, VBD_RD, VBD_WR and VBD_RSECT.
Fix by adjusting variables names in compare functions.
Bug fix only. No functional change.
Fixes: 91c3e3dc91d6 ("tools/xentop: Display '-' when stats are not available.") Signed-off-by: Cyril Rébert (zithro) <slack@rabbit.lu> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
We have two copies of __bitmap_weight() that differ by whether they make
hweight32() or hweight64() calls, yet we already have hweight_long() which is
the form that __bitmap_weight() wants.
Fix hweight_long() to return unsigned int like all the other hweight helpers,
and fix __bitmap_weight() to used unsigned integers.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 2 Feb 2024 17:57:37 +0000 (17:57 +0000)]
x86/ucode: Remove accidentally introduced tabs
Fixes: cf7fe8b72dea ("x86/ucode: Fix stability of the raw CPU Policy rescan") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 5 Feb 2024 09:48:11 +0000 (10:48 +0100)]
x86/CPU: convert vendor hook invocations to altcall
While not performance critical, these hook invocations still want
converting: This way all pre-filled struct cpu_dev instances can become
__initconst_cf_clobber, thus allowing to eliminate further 8 ENDBR
during the 2nd phase of alternatives patching (besides moving previously
resident data to .init.*).
Since all use sites need touching anyway, take the opportunity and also
address a Misra C:2012 Rule 5.5 violation: Rename the this_cpu static
variable.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 5 Feb 2024 09:45:31 +0000 (10:45 +0100)]
x86/guest: finish conversion to altcall
While .setup() and .e820_fixup() don't need fiddling with for being run
only very early, both .ap_setup() and .resume() want converting too:
This way both pre-filled struct hypervisor_ops instances can become
__initconst_cf_clobber, thus allowing to eliminate up to 5 more ENDBR
(configuration dependent) during the 2nd phase of alternatives patching.
While fiddling with section annotations here, also move "ops" itself to
.data.ro_after_init.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 5 Feb 2024 09:44:46 +0000 (10:44 +0100)]
x86: arrange for ENDBR zapping from <vendor>_ctxt_switch_masking()
While altcall is already used for them, the functions want announcing in
.init.rodata.cf_clobber, even if the resulting static variables aren't
otherwise used.
While doing this also move ctxt_switch_masking to .data.ro_after_init.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 26 Jan 2024 19:57:01 +0000 (19:57 +0000)]
x86: Remove gdbstub
In 13y of working on Xen, I've never seen seen it used. The implementation
was introduced (commit b69f92f3012e, Jul 28 2004) with known issues such as:
/* Resuming after we've stopped used to work, but more through luck
than any actual intention. It doesn't at the moment. */
which appear to have gone unfixed for the 20 years since.
Nowadays there are more robust ways of inspecting crashed state, such as a
kexec crash kernel, or running Xen in a VM.
This will allow us to clean up some hooks around the codebase which are
proving awkward for other tasks.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 30 Jan 2024 09:14:00 +0000 (10:14 +0100)]
x86/spec-ctrl: Expose BHI_CTRL to guests
The CPUID feature bit signals the presence of the BHI_DIS_S control in
SPEC_CTRL MSR, first available in Intel AlderLake and Sapphire Rapids CPUs
Xen already knows how to context switch MSR_SPEC_CTRL properly between guest
and hypervisor context.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Tue, 30 Jan 2024 09:13:59 +0000 (10:13 +0100)]
x86/spec-ctrl: Expose RRSBA_CTRL to guests
The CPUID feature bit signals the presence of the RRSBA_DIS_{U,S} controls in
SPEC_CTRL MSR, first available in Intel AlderLake and Sapphire Rapids CPUs.
Xen already knows how to context switch MSR_SPEC_CTRL properly between guest
and hypervisor context.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Tue, 30 Jan 2024 09:13:58 +0000 (10:13 +0100)]
x86/spec-ctrl: Expose IPRED_CTRL to guests
The CPUID feature bit signals the presence of the IPRED_DIS_{U,S} controls in
SPEC_CTRL MSR, first available in Intel AlderLake and Sapphire Rapids CPUs.
Xen already knows how to context switch MSR_SPEC_CTRL properly between guest
and hypervisor context.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Wed, 31 Jan 2024 10:42:48 +0000 (10:42 +0000)]
tools/ocaml: Bump minimum version to OCaml 4.05
Char.lowercase got removed in OCaml 5.0 (it has been deprecated since 2014),
and doesn't build any more.
Char.lowercase_ascii has existed since OCaml 4.03, so that is the new
minimum version for oxenstored.
However, OCaml 4.05 is the oldest new-enough version found in common distros,
so pick this as a baseline.
Signed-off-by: Edwin Török <edwin.torok@cloud.com> Acked-by: Christian Lindig <christian.lindig@cloud.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
[Update CHANGELOG.md] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 31 Jan 2024 17:05:47 +0000 (17:05 +0000)]
xen/bitmap: Consistently use unsigned bits values
Right now, most of the static inline helpers take an unsigned nbits quantity,
and most of the library functions take a signed quanity. Because
BITMAP_LAST_WORD_MASK() is expressed as a divide, the compiler is forced to
emit two different paths to get the correct semantics for signed division.
Swap all signed bit-counts to being unsigned bit-counts for the simple cases.
This includes the return value of bitmap_weight().
Bloat-o-meter for a random x86 build reports:
add/remove: 0/0 grow/shrink: 8/19 up/down: 167/-413 (-246)
which all comes from compiler not emitting "dead" logic paths for negative bit
counts.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 30 Jan 2024 13:59:07 +0000 (13:59 +0000)]
x86/boot: Add braces in reloc.c
107 lines is an unreasonably large switch statement to live inside a
brace-less for loop. Drop the comment that's clumsily trying to cover the
fact that this logic has wrong-looking indentation.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 30 Jan 2024 20:44:34 +0000 (20:44 +0000)]
xen/sched: Fix UB shift in compat_set_timer_op()
Tamas reported this UBSAN failure from fuzzing:
(XEN) ================================================================================
(XEN) UBSAN: Undefined behaviour in common/sched/compat.c:48:37
(XEN) left shift of negative value -2147425536
(XEN) ----[ Xen-4.19-unstable x86_64 debug=y ubsan=y Not tainted ]----
...
(XEN) Xen call trace:
(XEN) [<ffff82d040307c1c>] R ubsan.c#ubsan_epilogue+0xa/0xd9
(XEN) [<ffff82d040308afb>] F __ubsan_handle_shift_out_of_bounds+0x11a/0x1c5
(XEN) [<ffff82d040307758>] F compat_set_timer_op+0x41/0x43
(XEN) [<ffff82d04040e4cc>] F hvm_do_multicall_call+0x77f/0xa75
(XEN) [<ffff82d040519462>] F arch_do_multicall_call+0xec/0xf1
(XEN) [<ffff82d040261567>] F do_multicall+0x1dc/0xde3
(XEN) [<ffff82d04040d2b3>] F hvm_hypercall+0xa00/0x149a
(XEN) [<ffff82d0403cd072>] F vmx_vmexit_handler+0x1596/0x279c
(XEN) [<ffff82d0403d909b>] F vmx_asm_vmexit_handler+0xdb/0x200
Left-shifting any negative value is strictly undefined behaviour in C, and
the two parameters here come straight from the guest.
The fuzzer happened to choose lo 0xf, hi 0x8000e300.
Switch everything to be unsigned values, making the shift well defined.
As GCC documents:
As an extension to the C language, GCC does not use the latitude given in
C99 and C11 only to treat certain aspects of signed '<<' as undefined.
However, -fsanitize=shift (and -fsanitize=undefined) will diagnose such
cases.
this was deemed not to need an XSA.
Note: The unsigned -> signed conversion for do_set_timer_op()'s s_time_t
parameter is also well defined. C makes it implementation defined, and GCC
defines it as reduction modulo 2^N to be within range of the new type.
Fixes: 2942f45e09fb ("Enable compatibility mode operation for HYPERVISOR_sched_op and HYPERVISOR_set_timer_op.") Reported-by: Tamas K Lengyel <tamas@tklengyel.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 30 Jan 2024 18:13:14 +0000 (18:13 +0000)]
x86/hvm: Fix UBSAN failure in do_hvm_op() printk
Tamas reported this UBSAN failure from fuzzing:
(XEN) ================================================================================
(XEN) UBSAN: Undefined behaviour in common/vsprintf.c:64:19
(XEN) negation of -9223372036854775808 cannot be represented in type 'long long int':
(XEN) ----[ Xen-4.19-unstable x86_64 debug=y ubsan=y Not tainted ]----
...
(XEN) Xen call trace:
(XEN) [<ffff82d040307c1c>] R ubsan.c#ubsan_epilogue+0xa/0xd9
(XEN) [<ffff82d04030805d>] F __ubsan_handle_negate_overflow+0x99/0xce
(XEN) [<ffff82d04028868f>] F vsprintf.c#number+0x10a/0x93e
(XEN) [<ffff82d04028ac74>] F vsnprintf+0x19e2/0x1c56
(XEN) [<ffff82d04030a47a>] F console.c#vprintk_common+0x76/0x34d
(XEN) [<ffff82d04030a79e>] F printk+0x4d/0x4f
(XEN) [<ffff82d04040c42b>] F do_hvm_op+0x288e/0x28f5
(XEN) [<ffff82d04040d385>] F hvm_hypercall+0xad2/0x149a
(XEN) [<ffff82d0403cd072>] F vmx_vmexit_handler+0x1596/0x279c
(XEN) [<ffff82d0403d909b>] F vmx_asm_vmexit_handler+0xdb/0x200
The problem is an unsigned -> signed converstion because of a bad
formatter (%ld trying to format an unsigned long).
We could fix it by swapping to %lu, but this is a useless printk() even in
debug builds, so just drop it completely.
Reported-by: Tamas K Lengyel <tamas@tklengyel.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Thu, 1 Feb 2024 17:35:22 +0000 (17:35 +0000)]
xen/arm: Properly clean update to init_ttbr and smp_up_cpu
Recent rework to the secondary boot code modified how init_ttbr and
smp_up_cpu are accessed. Rather than directly accessing them, we
are using a pointer to them.
The helper clean_dcache() is expected to take the variable in parameter
and then clean its content. As we now pass a pointer to the variable,
we will clean the area storing the address rather than the content itself.
Switch to use clean_dcache_va_range() to avoid casting the pointer.
Fixes: a5ed59e62c6f ("arm/mmu: Move init_ttbr to a new section .data.idmap") Fixes: 9a5114074b04 ("arm/smpboot: Move smp_up_cpu to a new section .data.idmap) Reported-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Signed-off-by: Julien Grall <jgrall@amazon.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Thu, 1 Feb 2024 15:21:04 +0000 (16:21 +0100)]
IOMMU: iommu_use_hap_pt() implies CONFIG_HVM
Allow the compiler a little more room on DCE by moving the compile-time-
constant condition into the predicate (from the one place where it was
added in an open-coded fashion for XSA-450).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jason Andryuk [Thu, 1 Feb 2024 15:19:36 +0000 (16:19 +0100)]
xenpm: Print message for disabled commands
xenpm get-cpufreq-states currently just prints no output when cpufreq is
disabled or HWP is running. Have it print an appropriate message. The
cpufreq disabled one mirrors the cpuidle disabled one.
cpufreq disabled:
$ xenpm get-cpufreq-states
Either Xen cpufreq is disabled or no valid information is registered!
Under HWP:
$ xenpm get-cpufreq-states
P-State information not supported. Try 'get-cpufreq-average' or 'start'.
Also allow xenpm to handle EOPNOTSUPP from the pmstat hypercalls.
EOPNOTSUPP is returned when HWP is active in some cases and allows the
differentiation from cpufreq being disabled.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 1 Feb 2024 15:18:28 +0000 (16:18 +0100)]
x86/PoD: simplify / improve p2m_pod_cache_add()
Avoid recurring MFN -> page or page -> MFN translations. Drop the pretty
pointless local variable "p". Make sure the MFN logged in a debugging
error message is actually the offending one. Return negative errno
values rather than -1 (presently no caller really cares, but imo this
should change). Adjust style.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Tue, 30 Jan 2024 13:29:15 +0000 (14:29 +0100)]
VT-d: Fix "else" vs "#endif" misplacement
In domain_pgd_maddr() the "#endif" is misplaced with respect to "else". This
generates incorrect logic when CONFIG_HVM is compiled out, as the "else" body
is executed unconditionally.
Rework the logic to use IS_ENABLED() instead of explicit #ifdef-ary, as it's
clearer to follow. This in turn involves adjusting p2m_get_pagetable() to
compile when CONFIG_HVM is disabled.
This is XSA-450 / CVE-2023-46840.
Fixes: 033ff90aa9c1 ("x86/P2M: p2m_{alloc,free}_ptp() and p2m_alloc_table() are HVM-only") Reported-by: Teddy Astie <teddy.astie@vates.tech> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 30 Jan 2024 13:28:01 +0000 (14:28 +0100)]
pci: fail device assignment if phantom functions cannot be assigned
The current behavior is that no error is reported if (some) phantom functions
fail to be assigned during device add or assignment, so the operation succeeds
even if some phantom functions are not correctly setup.
This can lead to devices possibly being successfully assigned to a domU while
some of the device phantom functions are still assigned to dom0. Even when the
device is assigned domIO before being assigned to a domU phantom functions
might fail to be assigned to domIO, and also fail to be assigned to the domU,
leaving them assigned to dom0.
Since the device can generate requests using the IDs of those phantom
functions, given the scenario above a device in such state would be in control
of a domU, but still capable of generating transactions that use a context ID
targeting dom0 owned memory.
Modify device assign in order to attempt to deassign the device if phantom
functions failed to be assigned.
Note that device addition is not modified in the same way, as in that case the
device is assigned to a trusted domain, and hence partial assign can lead to
device malfunction but not a security issue.
This is XSA-449 / CVE-2023-46839
Fixes: 4e9950dc1bd2 ('IOMMU: add phantom function support') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Wed, 24 Jan 2024 17:29:52 +0000 (18:29 +0100)]
x86/iommu: switch hwdom IOMMU to use a rangeset
The current loop that iterates from 0 to the maximum RAM address in order to
setup the IOMMU mappings is highly inefficient, and it will get worse as the
amount of RAM increases. It's also not accounting for any reserved regions
past the last RAM address.
Instead of iterating over memory addresses, iterate over the memory map regions
and use a rangeset in order to keep track of which ranges need to be identity
mapped in the hardware domain physical address space.
On an AMD EPYC 7452 with 512GiB of RAM, the time to execute
arch_iommu_hwdom_init() in nanoseconds is:
x old
+ new
N Min Max Median Avg Stddev
x 5 2.2364154e+10 2.338244e+10 2.2474685e+10 2.2622409e+10 4.2949869e+08
+ 5 1025012103303610261881028276.2 3623.1194
Difference at 95.0% confidence
-2.26214e+10 +/- 4.42931e+08
-99.9955% +/- 9.05152e-05%
(Student's t, pooled s = 3.03701e+08)
Execution time of arch_iommu_hwdom_init() goes down from ~22s to ~0.001s.
Note there's a change for HVM domains (ie: PVH dom0) that get switched to
create the p2m mappings using map_mmio_regions() instead of
p2m_add_identity_entry(), so that ranges can be mapped with a single function
call if possible. Note that the interface of map_mmio_regions() doesn't
allow creating read-only mappings, but so far there are no such mappings
created for PVH dom0 in arch_iommu_hwdom_init().
No change intended in the resulting mappings that a hardware domain gets.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Thu, 25 Jan 2024 13:26:26 +0000 (14:26 +0100)]
x86/iommu: remove regions not to be mapped
Introduce the code to remove regions not to be mapped from the rangeset
that will be used to setup the IOMMU page tables for the hardware domain.
This change also introduces two new functions: remove_xen_ranges() and
vpci_subtract_mmcfg() that copy the logic in xen_in_range() and
vpci_is_mmcfg_address() respectively and remove the ranges that would otherwise
be intercepted by the original functions.
Note that the rangeset is still not populated.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 29 Jan 2024 08:23:43 +0000 (09:23 +0100)]
x86: purge NMI_IO_APIC
Even going back to 3.2 source code, I can't spot how this watchdog mode
could ever have been enabled in Xen. The only effect its presence had
for all the years was the retaining of a dead string literal.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 29 Jan 2024 08:22:35 +0000 (09:22 +0100)]
x86/APIC: purge {GET,SET}_APIC_DELIVERY_MODE()
The few uses we have can easily be replaced, eliminating the need for
redundant APIC_DM_* and APIC_MODE_* constants. Therefore also purge all
respective APIC_MODE_* constants, introducing APIC_DM_MASK anew instead.
This is further relevant since we have a different set of APIC_MODE_*,
which could otherwise end up confusing.
No functional change intended.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 29 Jan 2024 08:21:16 +0000 (09:21 +0100)]
NUMA: no need for asm/numa.h when !NUMA
There's no point in every architecture carrying the same stubs for the
case when NUMA isn't enabled (or even supported). Move all of that to
xen/numa.h; replace explicit uses of asm/numa.h in common code. Make
inclusion of asm/numa.h dependent upon NUMA=y.
Drop the no longer applicable "implement NUMA support" comments - in a
!NUMA section this simply makes no sense.
Roger Pau Monné [Fri, 26 Jan 2024 14:54:18 +0000 (15:54 +0100)]
x86/entry: fix jump into restore_all_guest without %rbx correctly set
e047b8d0fa05 went too far when limiting obtaining the vCPU pointer. While the
code in ist_dispatch_done does indeed only need the vCPU pointer when PV32 is
enabled, the !PV32 path will end up jumping into restore_all_guest which does
require rbx == vCPU pointer.
Fix by moving the fetching of the vCPU pointer to be done outside of the PV32
code block.
Fixes: e047b8d0fa05 ('x86/entry: replace two GET_CURRENT() uses') Reported-by: Edwin Torok <edwin.torok@cloud.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Thu, 25 Jan 2024 18:36:27 +0000 (18:36 +0000)]
xen/arm64: head: Use PRINT_ID() for secondary CPU MMU-off boot code
With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.
An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.
Right now, most of the early printk messages are using PRINT() which
will add the message in .rodata. This is unlikely to be within the
same page as the rest of the idmap.
So replace all the PRINT() that can be reachable by the secondary
CPU with MMU-off with PRINT_ID().
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Julien Grall [Thu, 25 Jan 2024 18:33:50 +0000 (18:33 +0000)]
arm/smpboot: Move smp_up_cpu to a new section .data.idmap
With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.
An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.
Right now, smp_up_cpu is used by secondary CPUs to wait their turn for
booting before the MMU is on. Yet it is currently in .data which is
unlikely to be within the same page as the rest of the idmap.
Move smp_up_cpu to the recently created section .data.idmap. The idmap is
currently part of the text section and therefore will be mapped read-only
executable. This means that we need to temporarily remap
smp_up_cpu in order to update it.
Introduce a new function set_smp_up_cpu() for this purpose so the code
is not duplicated between when opening and closing the gate.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Julien Grall [Thu, 25 Jan 2024 18:32:38 +0000 (18:32 +0000)]
arm/mmu: Move init_ttbr to a new section .data.idmap
With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.
An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.
Right now, init_ttbr is used by secondary CPUs to find there page-tables
before the MMU is on. Yet it is currently in .data which is unlikely
to be within the same page as the rest of the idmap.
Create a new section .data.idmap that will be used for variables
accessed by the early boot code. The first one is init_ttbr.
The idmap is currently part of the text section and therefore will
be mapped read-only executable. This means that we need to temporarily
remap init_ttbr in order to update it.
Introduce a new function set_init_ttbr() for this purpose so the code
is not duplicated between arm64 and arm32.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Andrew Cooper [Fri, 10 Feb 2023 21:20:42 +0000 (21:20 +0000)]
x86/entry: Avoid register spilling in cr4_pv32_restore()
cr4_pv32_restore() needs two registers. Right now, it spills %rdx and
clobbers %rax.
However, %rcx is free to use at all callsites. Annotate CR4_PV32_RESTORE with
our usual clobber comments, and swap %rdx for %rcx in the non-fatal paths
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Thu, 25 Jan 2024 09:30:41 +0000 (10:30 +0100)]
tools: don't expose XENFEAT_hvm_pirqs by default
The HVM pirq feature allows routing interrupts from both physical and emulated
devices over event channels, this was done a performance improvement. However
its usage is fully undocumented, and the only reference implementation is in
Linux. It defeats the purpose of local APIC hardware virtualization, because
when using it interrupts avoid the usage of the local APIC altogether.
It has also been reported to not work properly with certain devices, at least
when using some AMD GPUs Linux attempts to route interrupts over event
channels, but Xen doesn't correctly detect such routing, which leads to the
hypervisor complaining with:
(XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15
When MSIs are attempted to be routed over event channels the entry delivery
mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
inject the interrupt following the native MSI path, and the ExtINT delivery
mode is not supported.
Disable HVM PIRQs by default and provide a per-domain option in xl.cfg to
enable such feature. Also for backwards compatibility keep the feature enabled
for any resumed domains that don't have an explicit selection.
Note that the only user of the feature (Linux) is also able to handle native
interrupts fine, as the feature was already not used if Xen reported local APIC
hardware virtualization active.
Link: https://github.com/QubesOS/qubes-issues/issues/7971 Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Thu, 25 Jan 2024 09:30:40 +0000 (10:30 +0100)]
x86/hvm: make X86_EMU_USE_PIRQ optional
Allow selecting X86_EMU_USE_PIRQ for HVM guests, so it's no longer mandated to
be always on.
There's no restriction in Xen that forces such feature to be always on for HVM
guests, as for example PVH guests don't support it, as such allow toolstack to
select whether to enabled it on a per-domain basis.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 23 Jan 2024 20:24:22 +0000 (20:24 +0000)]
x86/ucode: Fix stability of the raw CPU Policy rescan
Always run microcode_update_helper() on the BSP, so the the updated Raw CPU
policy doesn't get non-BSP topology details included.
Have calculate_raw_cpu_policy() clear the instantanious XSTATE sizes. The
value XCR0 | MSR_XSS had when we scanned the policy isn't terribly interesting
to report.
When CPUID Masking is active, it affects CPUID instructions issued by Xen
too. Transiently disable masking to get a clean scan.
Fixes: 694d79ed5aac ("x86/ucode: Refresh raw CPU policy after microcode load") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Thu, 25 Jan 2024 15:11:49 +0000 (16:11 +0100)]
pmstat: Limit hypercalls under HWP
When HWP is active, the cpufreq P-state information is not updated. In
that case, return -EOPNOTSUPP instead of bogus, incomplete info.
Similarly, set_cpufreq_para() is not applicable when HWP is active.
Many of the options already checked the governor and were inaccessible,
but SCALING_MIN/MAX_FREQ was still accessible (though it would do
nothing). Add an ealier HWP check to handle all cases.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 25 Jan 2024 15:10:58 +0000 (16:10 +0100)]
x86/entry: replace two GET_CURRENT() uses
Now that we have %r14 set up using GET_STACK_END() in a number of
places, in two places we can eliminate the redundancy of GET_CURRENT()
also invoking that macro. In handle_ist_exception() actually go a step
farther and avoid using %rbx altogether when retrieving the processor
ID: Obtain the current vCPU pointer only in the PV32-specific code
actually needing it.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 25 Jan 2024 15:10:06 +0000 (16:10 +0100)]
x86/NMI: refine "watchdog stuck" log message
Observing
"Testing NMI watchdog on all CPUs: 0 stuck"
it felt like it's not quite right, but I still read it as "no CPU stuck;
all good", when really the system suffered from what 6bdb965178bb
("x86/intel: ensure Global Performance Counter Control is setup
correctly") works around. Convert this to
"Testing NMI watchdog on all CPUs: {0} stuck"
or, with multiple CPUs having an issue, e.g.
"Testing NMI watchdog on all CPUs: {0,40} stuck"
to make more obvious that a lone number is not a count of CPUs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 22 Jan 2024 14:50:10 +0000 (14:50 +0000)]
x86/entry: Fix ELF metadata for NMI and handle_ist_exception
handle_ist_exception isn't part of the NMI handler, just like handle_exception
isn't part of #PF.
Fixes: b3a9037550df ("x86: annotate entry points with type and size") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 28 Oct 2021 19:03:21 +0000 (20:03 +0100)]
x86/kexec: Drop compatibility_mode_far
LJMP is (famously?) incompatible between Intel and AMD CPUs, and while we're
using one of the compatible forms, we've got a good stack and LRET is the far
more common way of doing this.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 23 Jan 2024 11:03:23 +0000 (12:03 +0100)]
IRQ: generalize [gs]et_irq_regs()
Move functions (and their data) to common code, and invoke the functions
on Arm as well. This is in preparation of dropping the register
parameters from handler functions.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Michal Orzel [Tue, 23 Jan 2024 11:02:44 +0000 (12:02 +0100)]
lib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps to $(targets)
At the moment, trying to run xencov read/reset (calling SYSCTL_coverage_op
under the hood) results in a crash. This is due to a profiler trying to
access data in the .init.* sections (libfdt for Arm and libelf for x86)
that are stripped after boot. Normally, the build system compiles any
*.init.o file without COV_FLAGS. However, these two libraries are
handled differently as sections will be renamed to init after linking.
To override COV_FLAGS to empty for these libraries, lib{fdt,elf}.o were
added to nocov-y. This worked until e321576f4047 ("xen/build: start using
if_changed") that added lib{fdt,elf}-temp.o and their deps to extra-y.
This way, even though these objects appear as prerequisites of
lib{fdt,elf}.o and the settings should propagate to them, make can also
build them as a prerequisite of __build, in which case COV_FLAGS would
still have the unwanted flags. Fix it by switching to $(targets) instead.
Also, for libfdt, append libfdt.o to nocov-y only if CONFIG_OVERLAY_DTB
is not set. Otherwise, there is no section renaming and we should be able
to run the coverage.
Fixes: e321576f4047 ("xen/build: start using if_changed") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 22 Jan 2024 12:55:11 +0000 (13:55 +0100)]
RISC-V: annotate entry points with type and size
Use the generic framework in xen/linkage.h. No change in generated code
except of course the converted symbols change to be hidden ones and gain
a valid size.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Mon, 22 Jan 2024 12:54:34 +0000 (13:54 +0100)]
Arm: annotate entry points with type and size
Use the generic framework in xen/linkage.h. No change in generated code
except for the changed padding value (noticable when config.gz isn't a
multiple of 4 in size). Plus of course the converted symbols change to
be hidden ones.
Note that ASM_INT() is switched to DATA(), not DATA_LOCAL(), as the only
use site wants the symbol global anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Mon, 22 Jan 2024 12:52:13 +0000 (13:52 +0100)]
x86: also mark assembler globals hidden
Let's have assembler symbols be consistent with C ones. In principle
there are (a few) cases where gas can produce smaller code this way,
just that for now there's a gas bug causing smaller code to be emitted
even when that shouldn't be the case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 22 Jan 2024 12:51:31 +0000 (13:51 +0100)]
x86: annotate entry points with type and size
Use the generic framework in xen/linkage.h.
For switch_to_kernel() and restore_all_guest() so far implicit alignment
(from being first in their respective sections) is being made explicit
(as in: using FUNC() without 2nd argument). Whereas for
{,compat}create_bounce_frame() and autogen_entrypoints[] alignment is
newly arranged for.
Except for the added/adjusted alignment padding (including their
knock-on effects) no change in generated code/data. Note that the basis
for support of weak definitions is added despite them not having any use
right now.
Note that ASM_INT() is switched to DATA(), not DATA_LOCAL(), as the only
use site wants the symbol global anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 22 Jan 2024 12:50:40 +0000 (13:50 +0100)]
common: assembly entry point type/size annotations
Recent gas versions generate minimalistic Dwarf debug info for items
annotated as functions and having their sizes specified [1]. Furthermore
generating live patches wants items properly annotated. "Borrow" Arm's
END() and (remotely) derive other annotation infrastructure from
Linux'es, for all architectures to use.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
[1] https://sourceware.org/git?p=binutils-gdb.git;a=commitdiff;h=591cc9fbbfd6d51131c0f1d4a92e7893edcc7a28
Jan Beulich [Mon, 22 Jan 2024 12:41:07 +0000 (13:41 +0100)]
x86/MCE: switch some callback invocations to altcall
While not performance critical, these hook invocations still would
better be converted: This way all pre-filled (and newly introduced)
struct mce_callback instances can become __initconst_cf_clobber, thus
allowing to eliminate another 9 ENDBR during the 2nd phase of
alternatives patching.
While this means registering callbacks a little earlier, doing so is
perhaps even advantageous, for having pointers be non-NULL earlier on.
Only one set of callbacks would only ever be registered anyway, and
neither of the respective initialization function can (subsequently)
fail.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jan 2024 12:40:32 +0000 (13:40 +0100)]
x86/MCE: separate BSP-only initialization
Several function pointers are registered over and over again, when
setting them once on the BSP suffices. Arrange for this in the vendor
init functions and mark involved registration functions __init.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jan 2024 12:40:00 +0000 (13:40 +0100)]
x86/PV: avoid indirect call for I/O emulation quirk hook
This way ioemul_handle_proliant_quirk() won't need ENDBR anymore.
While touching this code, also
- arrange for it to not be built at all when !PV,
- add "const" to the last function parameter and bring the definition
in sync with the declaration (for Misra).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jan 2024 12:39:23 +0000 (13:39 +0100)]
x86/MTRR: avoid several indirect calls
The use of (supposedly) vendor-specific hooks is a relic from the days
when Xen was still possible to build as 32-bit binary. There's no
expectation that a new need for such an abstraction would arise. Convert
mttr_if to a mere boolean and all prior calls through it to direct ones,
thus allowing to eliminate 6 ENDBR from .text.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 22 Dec 2023 21:06:16 +0000 (21:06 +0000)]
xen/livepatch: Make check_for_livepatch_work() faster in the common case
When livepatching is enabled, this function is used all the time. Really do
check the fastpath first, and annotate it likely() as this is the right answer
100% of the time (to many significant figures). This cuts out 3 pointer
dereferences in the "nothing to do path".
However, GCC still needs some help to persuade it not to set the full stack
frame (6 spilled registers, 3 slots of locals) even on the fastpath.
Create a new check_for_livepatch_work() with the fastpath only, and make the
"new" do_livepatch_work() noinline. This causes the fastpath to need no stack
frame, making it faster still.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 27 Oct 2023 16:02:21 +0000 (17:02 +0100)]
x86/vmx: Disallow the use of inactivity states
Right now, vvmx will blindly copy L12's ACTIVITY_STATE into the L02 VMCS and
enter the vCPU. Luckily for us, nested-virt is explicitly unsupported for
security bugs.
The inactivity states are HLT, SHUTDOWN and WAIT-FOR-SIPI, and as noted by the
SDM in Vol3 27.7 "Special Features of VM Entry":
If VM entry ends with the logical processor in an inactive activity state,
the VM entry generates any special bus cycle that is normally generated when
that activity state is entered from the active state.
Also,
Some activity states unconditionally block certain events.
I.e. A VMEntry with ACTIVITY=SHUTDOWN will initiate a platform reset, while a
VMEntry with ACTIVITY=WAIT-FOR-SIPI will really block everything other than
SIPIs.
Both of these activity states are for the TXT ACM to use, not for regular
hypervisors, and Xen doesn't support dropping the HLT intercept either.
There are two paths in Xen which operate on ACTIVITY_STATE.
1) The vmx_{get,set}_nonreg_state() helpers for VM-Fork.
As regular VMs can't use any inactivity states, this is just duplicating
the 0 from construct_vmcs(). Retain the ability to query activity_state,
but crash the domain on any attempt to set an inactivity state.
2) Nested virt, because of ACTIVITY_STATE in vmcs_gstate_field[].
Explicitly hide the inactivity states in the guest's view of MSR_VMX_MISC,
and remove ACTIVITY_STATE from vmcs_gstate_field[].
In virtual_vmentry(), we should trigger a VMEntry failure for the use of
any inactivity states, but there's no support for that in the code at all
so leave a TODO for when we finally start working on nested-virt in
earnest.
Reported-by: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Andrew Cooper [Wed, 1 Nov 2023 13:32:55 +0000 (13:32 +0000)]
x86/vmx: Fix IRQ handling for EXIT_REASON_INIT
When receiving an INIT, a prior bugfix tried to ignore the INIT and continue
onwards.
Unfortunately it's not safe to return at that point in vmx_vmexit_handler().
Just out of context in the first hunk is a local_irqs_enabled() which is
depended-upon by the return-to-guest path, causing the following checklock
failure in debug builds:
(XEN) Error: INIT received - ignoring
(XEN) CHECKLOCK FAILURE: prev irqsafe: 0, curr irqsafe 1
(XEN) Xen BUG at common/spinlock.c:132
(XEN) ----[ Xen-4.19-unstable x86_64 debug=y Tainted: H ]----
...
(XEN) Xen call trace:
(XEN) [<ffff82d040238e10>] R check_lock+0xcd/0xe1
(XEN) [<ffff82d040238fe3>] F _spin_lock+0x1b/0x60
(XEN) [<ffff82d0402ed6a8>] F pt_update_irq+0x32/0x3bb
(XEN) [<ffff82d0402b9632>] F vmx_intr_assist+0x3b/0x51d
(XEN) [<ffff82d040206447>] F vmx_asm_vmexit_handler+0xf7/0x210
Luckily, this is benign in release builds. Accidentally having IRQs disabled
when trying to take an IRQs-on lock isn't a deadlock-vulnerable pattern.
Drop the problematic early return. In hindsight, it's wrong to skip other
normal VMExit steps.
Fixes: b1f11273d5a7 ("x86/vmx: Don't spuriously crash the domain when INIT is received") Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 17 Jan 2024 09:43:02 +0000 (10:43 +0100)]
x86/HPET: avoid an indirect call
When this code was written, indirect branches still weren't considered
much of a problem (besides being a little slower). Instead of a function
pointer, pass a boolean to _disable_pit_irq(), thus allowing to
eliminate two ENDBR (one of them in .text).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jan 2024 09:42:27 +0000 (10:42 +0100)]
cpufreq: finish genapic conversion to altcall
Even functions used on infrequently executed paths want converting: This
way all pre-filled struct cpufreq_driver instances can become
__initconst_cf_clobber, thus allowing to eliminate another 15 ENDBR
during the 2nd phase of alternatives patching.
For acpi-cpufreq's optionally populated .get hook make sure alternatives
patching can actually see the pointer. See also the code comment.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jan 2024 09:41:52 +0000 (10:41 +0100)]
x86/APIC: finish genapic conversion to altcall
While .probe() doesn't need fiddling with for being run only very early,
init_apic_ldr() wants converting too despite not being on a frequently
executed path: This way all pre-filled struct genapic instances can
become __initconst_cf_clobber, thus allowing to eliminate 15 more ENDBR
during the 2nd phase of alternatives patching.
While fiddling with section annotations here, also move "genapic" itself
to .data.ro_after_init.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Wed, 17 Jan 2024 09:40:52 +0000 (10:40 +0100)]
x86/intel: ensure Global Performance Counter Control is setup correctly
When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
MSR contains per-counter enable bits that is ANDed with the enable bit in the
counter EVNTSEL MSR in order for a PMC counter to be enabled.
So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
bits being set by default, but at least on some Intel Sapphire and Emerald
Rapids this is no longer the case, and Xen reports:
Testing NMI watchdog on all CPUs: 0 40 stuck
The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
relevant enable bit in PERF_GLOBAL_CTRL not being set.
Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
general-purpose PMCs are enabled. Doing so brings the state of the package-BSP
PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Michal Orzel [Mon, 15 Jan 2024 12:48:59 +0000 (13:48 +0100)]
xen/arm64: head: Allow to use early printk while on 1:1 mapping
Take an example from commit 1ec3fe1f664f ("xen/arm32: head: Improve
logging in head.S") to add support for printing early boot messages
while running on identity mapping:
- define PRINT_SECT() macro to be able to specify a section for storing
a string. PRINT() will use .rodata.str and PRINT_ID() - .rodata.idmap.
This is necessary, because when running on identity mapping, the
strings need to be part of the first page that is mapped,
- move loading a runtime virtual UART address right after enabling MMU
(the corresponding steps repeated in {primary,secondary}_switched are
now consolidated in a single place),
- move early printk 'hex' string into .rodata.idmap and replace 'adr'
instruction in asm_putn with 'adr_l' to extend the addressable range,
- remove RODATA_STR() macro given no use.
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Michal Orzel [Mon, 15 Jan 2024 12:48:58 +0000 (13:48 +0100)]
xen/arm32: head: Move earlyprintk 'hex' to .rodata.idmap
Thanks to 1ec3fe1f664f ("xen/arm32: head: Improve logging in head.S"),
we can now use PRINT_ID() macro to print messages when running on
identity mapping. For that, all the strings need to be part of the first
page that is mapped. This is not the case for a 'hex' string (used by
asm_putn when printing register values), which currently resides in
.rodata.str. Move it to .rodata.idmap to allow making use of print_reg
macro from anywhere (mostly to aid debugging).
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Roger Pau Monné [Mon, 15 Jan 2024 11:20:11 +0000 (12:20 +0100)]
CirrusCI: drop FreeBSD 12
Went EOL by the end of December 2023, and the pkg repos have been shut down.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 15 Jan 2024 11:19:41 +0000 (12:19 +0100)]
x86/vPMU: drop regs parameter from interrupt functions
The vendor functions don't use the respective parameters at all. In
vpmu_do_interrupt() there's only a very limited area where the
outer context's state would be needed, retrievable by get_irq_regs().
This is in preparation of dropping the register parameters from direct
APIC vector handler functions.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 15 Jan 2024 11:18:43 +0000 (12:18 +0100)]
x86/vPIC: check values loaded from state save record
Loading is_master from the state save record can lead to out-of-bounds
accesses via at least the two container_of() uses by vpic_domain() and
__vpic_lock(). Make sure the value is consistent with the instance being
loaded.
For ->int_output (which for whatever reason isn't a 1-bit bitfield),
besides bounds checking also take ->init_state into account.
For ELCR follow what vpic_intercept_elcr_io()'s write path and
vpic_reset() do, i.e. don't insist on the internal view of the value to
be saved.
Move the instance range check as well, leaving just an assertion in the
load handler.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 15 Jan 2024 11:18:10 +0000 (12:18 +0100)]
x86/vPIT: check values loaded from state save record
In particular pit_latch_status() and speaker_ioport_read() perform
calculations which assume in-bounds values. Several of the state save
record fields can hold wider ranges, though. Refuse to load values which
cannot result from normal operation, except mode, the init state of
which (see also below) cannot otherwise be reached.
Note that ->gate should only be possible to be zero for channel 2;
enforce that as well.
Adjust pit_reset()'s writing of ->mode as well, to not unduly affect
the value pit_latch_status() may calculate. The chosen mode of 7 is
still one which cannot be established by writing the control word. Note
that with or without this adjustment effectively all switch() statements
using mode as the control expression aren't quite right when the PIT is
still in that init state; there is an apparent assumption that before
these can sensibly be invoked, the guest would init the PIT (i.e. in
particular set the mode).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 15 Jan 2024 11:16:56 +0000 (12:16 +0100)]
x86/HVM: split restore state checking from state loading
..., at least as reasonably feasible without making a check hook
mandatory (in particular strict vs relaxed/zero-extend length checking
can't be done early this way).
Note that only one of the two uses of "real" hvm_load() is accompanied
with a "checking" one. The other directly consumes hvm_save() output,
which ought to be well-formed. This means that while input data related
checks don't need repeating in the "load" function when already done by
the "check" one (albeit assertions to this effect may be desirable),
domain state related checks (e.g. has_xyz(d)) will be required in both
places.
With the split arch_hvm_{check,load}(), also invoke the latter only
after downing all the vCPU-s.
Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 15 Jan 2024 11:15:56 +0000 (12:15 +0100)]
NUMA: limit first_valid_mfn exposure
Address the TODO regarding first_valid_mfn by making the variable static
when NUMA=y, thus also addressing a Misra C:2012 rule 8.4 concern (on
x86). To carry this out, introduce two new IS_ENABLED()-like macros
conditionally inserting "static". One less macro expansion layer is
sufficient though (I might guess that some early form of IS_ENABLED()
pasted CONFIG_ onto the incoming argument, at which point the extra
layer would have been necessary), and part of the existing helper macros
can be re-used.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Mon, 15 Jan 2024 11:12:00 +0000 (12:12 +0100)]
x86emul: support SM4
Since the insns here and in particular their memory access patterns
follow the usual scheme, I didn't think it was necessary to add a
contrived test specifically for them.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 15 Jan 2024 11:11:22 +0000 (12:11 +0100)]
x86emul: support SM3
Since the insns here and in particular their memory access patterns
follow the usual scheme, I didn't think it was necessary to add a
contrived test specifically for them.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 15 Jan 2024 11:09:42 +0000 (12:09 +0100)]
x86emul: support AVX-VNNI-INT16
These are close relatives of the AVX-VNNI and AVX-VNNI-INT8 ISA
extensions. Since the insns here and in particular their memory access
patterns follow the usual scheme (and especially the word variants of
AVX-VNNI), I didn't think it was necessary to add a contrived test
specifically for them.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Fri, 12 Jan 2024 11:54:31 +0000 (11:54 +0000)]
xen/arm32: head: Improve logging in head.S
The sequence to enable the MMU on arm32 is quite complex as we may need
to jump to a temporary mapping to map Xen.
Recently, we had one bug in the logic (see f5a49eb7f8b3 ("xen/arm32:
head: Add mising isb in switch_to_runtime_mapping()") and it was
a pain to debug because there are no logging.
In order to improve the logging in the MMU switch we need to add
support for early printk while running on the identity mapping
and also on the temporary mapping.
For the identity mapping, we have only the first page of Xen mapped.
So all the strings should reside in the first page. For that purpose
a new macro PRINT_ID is introduced.
For the temporary mapping, the fixmap is already linked in the temporary
area (and so does the UART). So we just need to update the register
storing the UART address (i.e. r11) to point to the UART temporary
mapping.
Take the opportunity to introduce mov_w_on_cond in order to
conditionally execute mov_w and avoid branches.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Shawn Anastasio [Thu, 11 Jan 2024 23:24:22 +0000 (17:24 -0600)]
xen/arm: bootfdt: Harden handling of malformed mem reserve map
The early_print_info routine in bootfdt.c incorrectly stores the result
of a call to fdt_num_mem_rsv() in an unsigned int, which results in the
negative error code being interpreted incorrectly in a subsequent loop
in the case where the device tree is malformed. Fix this by properly
checking the return code for an error and calling panic().
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Javi Merino [Thu, 11 Jan 2024 12:09:27 +0000 (12:09 +0000)]
xen/common: Don't dereference overlay_node after checking that it is NULL
In remove_nodes(), overlay_node is dereferenced when printing the
error message even though it is known to be NULL. Return without
printing as an error message is already printed by the caller.
The semantic patch that spots this code is available in
Julien Grall [Fri, 12 Jan 2024 10:45:09 +0000 (10:45 +0000)]
xen/arm32: head: Rework how the fixmap and early UART mapping are prepared
Since commit 5e213f0f4d2c ("xen/arm32: head: Widen the use of the
temporary mapping"), boot_second (used to cover regions like Xen and
the fixmap) will not be mapped if the identity mapping overlap.
So it is ok to prepare the fixmap table and link it in boot_second
earlier. With that, the fixmap can also be used earlier via the
temporary mapping.
Therefore split setup_fixmap() in two:
* The table is now linked in create_page_tables() because
the boot page tables needs to be recreated for every CPU.
* The early UART mapping is only added for the boot CPU0 as the
fixmap table is not cleared when secondary CPUs boot.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Roger Pau Monné [Tue, 9 Jan 2024 13:07:49 +0000 (14:07 +0100)]
x86/iommu: introduce a rangeset to perform hwdom IOMMU setup
This change just introduces the boilerplate code in order to use a rangeset
when setting up the hardware domain IOMMU mappings. The rangeset is never
populated in this patch, so it's a non-functional change as far as the mappings
the domain gets established.
Note there will be a change for HVM domains (ie: PVH dom0) when the code
introduced here gets used: the p2m mappings will be established using
map_mmio_regions() instead of p2m_add_identity_entry(), so that ranges can be
mapped with a single function call if possible. Note that the interface of
map_mmio_regions() doesn't allow creating read-only mappings, but so far there
are no such mappings created for PVH dom0 in arch_iommu_hwdom_init().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 9 Jan 2024 13:07:17 +0000 (14:07 +0100)]
x86/HVM: drop tsc_scaling.setup() hook
This was used by VMX only, and the intended VMCS write can as well
happen from vmx_set_tsc_offset(), invoked (directly or indirectly)
almost immediately after the present call sites of the hook.
vmx_set_tsc_offset() isn't invoked frequently elsewhere, so the extra
VMCS write shouldn't raise performance concerns.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 9 Jan 2024 13:06:34 +0000 (14:06 +0100)]
x86/HVM: hide SVM/VMX when their enabling is prohibited by firmware
... or we fail to enable the functionality on the BSP for other reasons.
The only place where hardware announcing the feature is recorded is the
raw CPU policy/featureset.
Inspired by https://lore.kernel.org/all/20230921114940.957141-1-pbonzini@redhat.com/.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>