]> xenbits.xensource.com Git - xen.git/log
xen.git
15 months agotools/xenstored: move systemd handling to posix.c
Juergen Gross [Mon, 5 Feb 2024 10:49:51 +0000 (11:49 +0100)]
tools/xenstored: move systemd handling to posix.c

Move systemd handling to a new late_init() function in posix.c.

This prepares a future removal of the NO_SOCKETS macro.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
15 months agotools/xenstored: add early_init() function
Juergen Gross [Mon, 5 Feb 2024 10:49:50 +0000 (11:49 +0100)]
tools/xenstored: add early_init() function

Some xenstored initialization needs to be done in the daemon case only,
so split it out into a new early_init() function being a stub in the
stubdom case.

Remove the call of talloc_enable_leak_report_full(), as it serves no
real purpose: the daemon only ever exits due to a crash, in which case
a log of talloc()ed memory hardly has any value.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
15 months agotools/xenstored: rename xenbus_evtchn()
Juergen Gross [Mon, 5 Feb 2024 10:49:47 +0000 (11:49 +0100)]
tools/xenstored: rename xenbus_evtchn()

Rename the xenbus_evtchn() function to get_xenbus_evtchn() in order to
avoid two externally visible symbols with the same name when Xenstore-
stubdom is being built with a Mini-OS with CONFIG_XENBUS set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
15 months agotools/helpers: allocate xenstore event channel for xenstore stubdom
Juergen Gross [Mon, 5 Feb 2024 10:49:46 +0000 (11:49 +0100)]
tools/helpers: allocate xenstore event channel for xenstore stubdom

In order to prepare support of PV frontends in xenstore-stubdom, add
allocation of a Xenstore event channel to init-xenstore-domain.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
15 months agotools/xentop: fix sorting bug for some columns
Cyril Rébert [Sun, 4 Feb 2024 10:19:40 +0000 (11:19 +0100)]
tools/xentop: fix sorting bug for some columns

Sort doesn't work on columns VBD_OO, VBD_RD, VBD_WR and VBD_RSECT.
Fix by adjusting variables names in compare functions.
Bug fix only. No functional change.

Fixes: 91c3e3dc91d6 ("tools/xentop: Display '-' when stats are not available.")
Signed-off-by: Cyril Rébert (zithro) <slack@rabbit.lu>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
15 months agox86/cpu: Fix mixed tabs/spaces
Andrew Cooper [Mon, 5 Feb 2024 14:13:02 +0000 (14:13 +0000)]
x86/cpu: Fix mixed tabs/spaces

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen/bitmap: Deduplicate __bitmap_weight() implementations
Andrew Cooper [Fri, 2 Feb 2024 15:03:15 +0000 (15:03 +0000)]
xen/bitmap: Deduplicate __bitmap_weight() implementations

We have two copies of __bitmap_weight() that differ by whether they make
hweight32() or hweight64() calls, yet we already have hweight_long() which is
the form that __bitmap_weight() wants.

Fix hweight_long() to return unsigned int like all the other hweight helpers,
and fix __bitmap_weight() to used unsigned integers.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/ucode: Remove accidentally introduced tabs
Andrew Cooper [Fri, 2 Feb 2024 17:57:37 +0000 (17:57 +0000)]
x86/ucode: Remove accidentally introduced tabs

Fixes: cf7fe8b72dea ("x86/ucode: Fix stability of the raw CPU Policy rescan")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/CPU: convert vendor hook invocations to altcall
Jan Beulich [Mon, 5 Feb 2024 09:48:11 +0000 (10:48 +0100)]
x86/CPU: convert vendor hook invocations to altcall

While not performance critical, these hook invocations still want
converting: This way all pre-filled struct cpu_dev instances can become
__initconst_cf_clobber, thus allowing to eliminate further 8 ENDBR
during the 2nd phase of alternatives patching (besides moving previously
resident data to .init.*).

Since all use sites need touching anyway, take the opportunity and also
address a Misra C:2012 Rule 5.5 violation: Rename the this_cpu static
variable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/guest: finish conversion to altcall
Jan Beulich [Mon, 5 Feb 2024 09:45:31 +0000 (10:45 +0100)]
x86/guest: finish conversion to altcall

While .setup() and .e820_fixup() don't need fiddling with for being run
only very early, both .ap_setup() and .resume() want converting too:
This way both pre-filled struct hypervisor_ops instances can become
__initconst_cf_clobber, thus allowing to eliminate up to 5 more ENDBR
(configuration dependent) during the 2nd phase of alternatives patching.

While fiddling with section annotations here, also move "ops" itself to
.data.ro_after_init.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Paul Durrant <paul@xen.org>
15 months agox86: arrange for ENDBR zapping from <vendor>_ctxt_switch_masking()
Jan Beulich [Mon, 5 Feb 2024 09:44:46 +0000 (10:44 +0100)]
x86: arrange for ENDBR zapping from <vendor>_ctxt_switch_masking()

While altcall is already used for them, the functions want announcing in
.init.rodata.cf_clobber, even if the resulting static variables aren't
otherwise used.

While doing this also move ctxt_switch_masking to .data.ro_after_init.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agoxen: Remove debugger.h
Andrew Cooper [Fri, 26 Jan 2024 19:55:18 +0000 (19:55 +0000)]
xen: Remove debugger.h

With x86 having dropped gdbstub, Xen's only debugger has gone.

Drop xen/debugger.h and remove the hooks spread around the codebase.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agox86: Remove gdbstub
Andrew Cooper [Fri, 26 Jan 2024 19:57:01 +0000 (19:57 +0000)]
x86: Remove gdbstub

In 13y of working on Xen, I've never seen seen it used.  The implementation
was introduced (commit b69f92f3012e, Jul 28 2004) with known issues such as:

  /* Resuming after we've stopped used to work, but more through luck
     than any actual intention.  It doesn't at the moment. */

which appear to have gone unfixed for the 20 years since.

Nowadays there are more robust ways of inspecting crashed state, such as a
kexec crash kernel, or running Xen in a VM.

This will allow us to clean up some hooks around the codebase which are
proving awkward for other tasks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/spec-ctrl: Expose BHI_CTRL to guests
Roger Pau Monné [Tue, 30 Jan 2024 09:14:00 +0000 (10:14 +0100)]
x86/spec-ctrl: Expose BHI_CTRL to guests

The CPUID feature bit signals the presence of the BHI_DIS_S control in
SPEC_CTRL MSR, first available in Intel AlderLake and Sapphire Rapids CPUs

Xen already knows how to context switch MSR_SPEC_CTRL properly between guest
and hypervisor context.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/spec-ctrl: Expose RRSBA_CTRL to guests
Roger Pau Monné [Tue, 30 Jan 2024 09:13:59 +0000 (10:13 +0100)]
x86/spec-ctrl: Expose RRSBA_CTRL to guests

The CPUID feature bit signals the presence of the RRSBA_DIS_{U,S} controls in
SPEC_CTRL MSR, first available in Intel AlderLake and Sapphire Rapids CPUs.

Xen already knows how to context switch MSR_SPEC_CTRL properly between guest
and hypervisor context.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/spec-ctrl: Expose IPRED_CTRL to guests
Roger Pau Monné [Tue, 30 Jan 2024 09:13:58 +0000 (10:13 +0100)]
x86/spec-ctrl: Expose IPRED_CTRL to guests

The CPUID feature bit signals the presence of the IPRED_DIS_{U,S} controls in
SPEC_CTRL MSR, first available in Intel AlderLake and Sapphire Rapids CPUs.

Xen already knows how to context switch MSR_SPEC_CTRL properly between guest
and hypervisor context.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agotools/ocaml: Bump minimum version to OCaml 4.05
Edwin Török [Wed, 31 Jan 2024 10:42:48 +0000 (10:42 +0000)]
tools/ocaml: Bump minimum version to OCaml 4.05

Char.lowercase got removed in OCaml 5.0 (it has been deprecated since 2014),
and doesn't build any more.

Char.lowercase_ascii has existed since OCaml 4.03, so that is the new
minimum version for oxenstored.

However, OCaml 4.05 is the oldest new-enough version found in common distros,
so pick this as a baseline.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
[Update CHANGELOG.md]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agoxen/bitmap: Consistently use unsigned bits values
Andrew Cooper [Wed, 31 Jan 2024 17:05:47 +0000 (17:05 +0000)]
xen/bitmap: Consistently use unsigned bits values

Right now, most of the static inline helpers take an unsigned nbits quantity,
and most of the library functions take a signed quanity.  Because
BITMAP_LAST_WORD_MASK() is expressed as a divide, the compiler is forced to
emit two different paths to get the correct semantics for signed division.

Swap all signed bit-counts to being unsigned bit-counts for the simple cases.
This includes the return value of bitmap_weight().

Bloat-o-meter for a random x86 build reports:
  add/remove: 0/0 grow/shrink: 8/19 up/down: 167/-413 (-246)

which all comes from compiler not emitting "dead" logic paths for negative bit
counts.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/traps: Annotate {l,c}star_enter() as nocall
Andrew Cooper [Tue, 30 Jan 2024 15:06:32 +0000 (15:06 +0000)]
x86/traps: Annotate {l,c}star_enter() as nocall

... as with other declarations which aren't legal to call from C.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/boot: Add braces in reloc.c
Andrew Cooper [Tue, 30 Jan 2024 13:59:07 +0000 (13:59 +0000)]
x86/boot: Add braces in reloc.c

107 lines is an unreasonably large switch statement to live inside a
brace-less for loop.  Drop the comment that's clumsily trying to cover the
fact that this logic has wrong-looking indentation.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen/sched: Fix UB shift in compat_set_timer_op()
Andrew Cooper [Tue, 30 Jan 2024 20:44:34 +0000 (20:44 +0000)]
xen/sched: Fix UB shift in compat_set_timer_op()

Tamas reported this UBSAN failure from fuzzing:

  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in common/sched/compat.c:48:37
  (XEN) left shift of negative value -2147425536
  (XEN) ----[ Xen-4.19-unstable  x86_64  debug=y ubsan=y  Not tainted ]----
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040307c1c>] R ubsan.c#ubsan_epilogue+0xa/0xd9
  (XEN)    [<ffff82d040308afb>] F __ubsan_handle_shift_out_of_bounds+0x11a/0x1c5
  (XEN)    [<ffff82d040307758>] F compat_set_timer_op+0x41/0x43
  (XEN)    [<ffff82d04040e4cc>] F hvm_do_multicall_call+0x77f/0xa75
  (XEN)    [<ffff82d040519462>] F arch_do_multicall_call+0xec/0xf1
  (XEN)    [<ffff82d040261567>] F do_multicall+0x1dc/0xde3
  (XEN)    [<ffff82d04040d2b3>] F hvm_hypercall+0xa00/0x149a
  (XEN)    [<ffff82d0403cd072>] F vmx_vmexit_handler+0x1596/0x279c
  (XEN)    [<ffff82d0403d909b>] F vmx_asm_vmexit_handler+0xdb/0x200

Left-shifting any negative value is strictly undefined behaviour in C, and
the two parameters here come straight from the guest.

The fuzzer happened to choose lo 0xf, hi 0x8000e300.

Switch everything to be unsigned values, making the shift well defined.

As GCC documents:

  As an extension to the C language, GCC does not use the latitude given in
  C99 and C11 only to treat certain aspects of signed '<<' as undefined.
  However, -fsanitize=shift (and -fsanitize=undefined) will diagnose such
  cases.

this was deemed not to need an XSA.

Note: The unsigned -> signed conversion for do_set_timer_op()'s s_time_t
parameter is also well defined.  C makes it implementation defined, and GCC
defines it as reduction modulo 2^N to be within range of the new type.

Fixes: 2942f45e09fb ("Enable compatibility mode operation for HYPERVISOR_sched_op and HYPERVISOR_set_timer_op.")
Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/hvm: Fix UBSAN failure in do_hvm_op() printk
Andrew Cooper [Tue, 30 Jan 2024 18:13:14 +0000 (18:13 +0000)]
x86/hvm: Fix UBSAN failure in do_hvm_op() printk

Tamas reported this UBSAN failure from fuzzing:

  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in common/vsprintf.c:64:19
  (XEN) negation of -9223372036854775808 cannot be represented in type 'long long int':
  (XEN) ----[ Xen-4.19-unstable  x86_64  debug=y ubsan=y  Not tainted ]----
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040307c1c>] R ubsan.c#ubsan_epilogue+0xa/0xd9
  (XEN)    [<ffff82d04030805d>] F __ubsan_handle_negate_overflow+0x99/0xce
  (XEN)    [<ffff82d04028868f>] F vsprintf.c#number+0x10a/0x93e
  (XEN)    [<ffff82d04028ac74>] F vsnprintf+0x19e2/0x1c56
  (XEN)    [<ffff82d04030a47a>] F console.c#vprintk_common+0x76/0x34d
  (XEN)    [<ffff82d04030a79e>] F printk+0x4d/0x4f
  (XEN)    [<ffff82d04040c42b>] F do_hvm_op+0x288e/0x28f5
  (XEN)    [<ffff82d04040d385>] F hvm_hypercall+0xad2/0x149a
  (XEN)    [<ffff82d0403cd072>] F vmx_vmexit_handler+0x1596/0x279c
  (XEN)    [<ffff82d0403d909b>] F vmx_asm_vmexit_handler+0xdb/0x200

The problem is an unsigned -> signed converstion because of a bad
formatter (%ld trying to format an unsigned long).

We could fix it by swapping to %lu, but this is a useless printk() even in
debug builds, so just drop it completely.

Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen: Drop superfluous semi-colons
Andrew Cooper [Tue, 30 Jan 2024 22:13:17 +0000 (22:13 +0000)]
xen: Drop superfluous semi-colons

All these cases happen to be benign, but drop them anyway.  This is one step
towards making -Wextra-semi work.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen/arm: Properly clean update to init_ttbr and smp_up_cpu
Julien Grall [Thu, 1 Feb 2024 17:35:22 +0000 (17:35 +0000)]
xen/arm: Properly clean update to init_ttbr and smp_up_cpu

Recent rework to the secondary boot code modified how init_ttbr and
smp_up_cpu are accessed. Rather than directly accessing them, we
are using a pointer to them.

The helper clean_dcache() is expected to take the variable in parameter
and then clean its content. As we now pass a pointer to the variable,
we will clean the area storing the address rather than the content itself.

Switch to use clean_dcache_va_range() to avoid casting the pointer.

Fixes: a5ed59e62c6f ("arm/mmu: Move init_ttbr to a new section .data.idmap")
Fixes: 9a5114074b04 ("arm/smpboot: Move smp_up_cpu to a new section .data.idmap)
Reported-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
15 months agoshim: avoid building of vendor IOMMU code
Jan Beulich [Thu, 1 Feb 2024 15:21:51 +0000 (16:21 +0100)]
shim: avoid building of vendor IOMMU code

There's no use for IOMMU code in the shim. Disable at least the vendor-
specific code, until eventually IOMMU code can be disabled altogether.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
15 months agoIOMMU: iommu_use_hap_pt() implies CONFIG_HVM
Jan Beulich [Thu, 1 Feb 2024 15:21:04 +0000 (16:21 +0100)]
IOMMU: iommu_use_hap_pt() implies CONFIG_HVM

Allow the compiler a little more room on DCE by moving the compile-time-
constant condition into the predicate (from the one place where it was
added in an open-coded fashion for XSA-450).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agoxen/page_alloc: introduce init_free_page_fields() helper
Carlo Nonato [Thu, 1 Feb 2024 15:19:51 +0000 (16:19 +0100)]
xen/page_alloc: introduce init_free_page_fields() helper

Introduce a new helper to initialize fields that have different uses for
free pages.

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoxenpm: Print message for disabled commands
Jason Andryuk [Thu, 1 Feb 2024 15:19:36 +0000 (16:19 +0100)]
xenpm: Print message for disabled commands

xenpm get-cpufreq-states currently just prints no output when cpufreq is
disabled or HWP is running.  Have it print an appropriate message.  The
cpufreq disabled one mirrors the cpuidle disabled one.

cpufreq disabled:
$ xenpm get-cpufreq-states
Either Xen cpufreq is disabled or no valid information is registered!

Under HWP:
$ xenpm get-cpufreq-states
P-State information not supported.  Try 'get-cpufreq-average' or 'start'.

Also allow xenpm to handle EOPNOTSUPP from the pmstat hypercalls.
EOPNOTSUPP is returned when HWP is active in some cases and allows the
differentiation from cpufreq being disabled.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/PoD: simplify / improve p2m_pod_cache_add()
Jan Beulich [Thu, 1 Feb 2024 15:18:28 +0000 (16:18 +0100)]
x86/PoD: simplify / improve p2m_pod_cache_add()

Avoid recurring MFN -> page or page -> MFN translations. Drop the pretty
pointless local variable "p". Make sure the MFN logged in a debugging
error message is actually the offending one. Return negative errno
values rather than -1 (presently no caller really cares, but imo this
should change). Adjust style.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
15 months agoVT-d: Fix "else" vs "#endif" misplacement
Andrew Cooper [Tue, 30 Jan 2024 13:29:15 +0000 (14:29 +0100)]
VT-d: Fix "else" vs "#endif" misplacement

In domain_pgd_maddr() the "#endif" is misplaced with respect to "else".  This
generates incorrect logic when CONFIG_HVM is compiled out, as the "else" body
is executed unconditionally.

Rework the logic to use IS_ENABLED() instead of explicit #ifdef-ary, as it's
clearer to follow.  This in turn involves adjusting p2m_get_pagetable() to
compile when CONFIG_HVM is disabled.

This is XSA-450 / CVE-2023-46840.

Fixes: 033ff90aa9c1 ("x86/P2M: p2m_{alloc,free}_ptp() and p2m_alloc_table() are HVM-only")
Reported-by: Teddy Astie <teddy.astie@vates.tech>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agopci: fail device assignment if phantom functions cannot be assigned
Roger Pau Monné [Tue, 30 Jan 2024 13:28:01 +0000 (14:28 +0100)]
pci: fail device assignment if phantom functions cannot be assigned

The current behavior is that no error is reported if (some) phantom functions
fail to be assigned during device add or assignment, so the operation succeeds
even if some phantom functions are not correctly setup.

This can lead to devices possibly being successfully assigned to a domU while
some of the device phantom functions are still assigned to dom0.  Even when the
device is assigned domIO before being assigned to a domU phantom functions
might fail to be assigned to domIO, and also fail to be assigned to the domU,
leaving them assigned to dom0.

Since the device can generate requests using the IDs of those phantom
functions, given the scenario above a device in such state would be in control
of a domU, but still capable of generating transactions that use a context ID
targeting dom0 owned memory.

Modify device assign in order to attempt to deassign the device if phantom
functions failed to be assigned.

Note that device addition is not modified in the same way, as in that case the
device is assigned to a trusted domain, and hence partial assign can lead to
device malfunction but not a security issue.

This is XSA-449 / CVE-2023-46839

Fixes: 4e9950dc1bd2 ('IOMMU: add phantom function support')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/iommu: cleanup unused functions
Roger Pau Monne [Wed, 24 Jan 2024 17:29:53 +0000 (18:29 +0100)]
x86/iommu: cleanup unused functions

Remove xen_in_range() and vpci_is_mmcfg_address() now that hey are unused.

Adjust comments to point to the new functions that replace the existing ones.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
15 months agox86/iommu: switch hwdom IOMMU to use a rangeset
Roger Pau Monne [Wed, 24 Jan 2024 17:29:52 +0000 (18:29 +0100)]
x86/iommu: switch hwdom IOMMU to use a rangeset

The current loop that iterates from 0 to the maximum RAM address in order to
setup the IOMMU mappings is highly inefficient, and it will get worse as the
amount of RAM increases.  It's also not accounting for any reserved regions
past the last RAM address.

Instead of iterating over memory addresses, iterate over the memory map regions
and use a rangeset in order to keep track of which ranges need to be identity
mapped in the hardware domain physical address space.

On an AMD EPYC 7452 with 512GiB of RAM, the time to execute
arch_iommu_hwdom_init() in nanoseconds is:

x old
+ new
    N           Min           Max        Median           Avg        Stddev
x   5 2.2364154e+10  2.338244e+10 2.2474685e+10 2.2622409e+10 4.2949869e+08
+   5       1025012       1033036       1026188     1028276.2     3623.1194
Difference at 95.0% confidence
        -2.26214e+10 +/- 4.42931e+08
        -99.9955% +/- 9.05152e-05%
        (Student's t, pooled s = 3.03701e+08)

Execution time of arch_iommu_hwdom_init() goes down from ~22s to ~0.001s.

Note there's a change for HVM domains (ie: PVH dom0) that get switched to
create the p2m mappings using map_mmio_regions() instead of
p2m_add_identity_entry(), so that ranges can be mapped with a single function
call if possible.  Note that the interface of map_mmio_regions() doesn't
allow creating read-only mappings, but so far there are no such mappings
created for PVH dom0 in arch_iommu_hwdom_init().

No change intended in the resulting mappings that a hardware domain gets.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/iommu: remove regions not to be mapped
Roger Pau Monne [Thu, 25 Jan 2024 13:26:26 +0000 (14:26 +0100)]
x86/iommu: remove regions not to be mapped

Introduce the code to remove regions not to be mapped from the rangeset
that will be used to setup the IOMMU page tables for the hardware domain.

This change also introduces two new functions: remove_xen_ranges() and
vpci_subtract_mmcfg() that copy the logic in xen_in_range() and
vpci_is_mmcfg_address() respectively and remove the ranges that would otherwise
be intercepted by the original functions.

Note that the rangeset is still not populated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
15 months agox86: purge NMI_IO_APIC
Jan Beulich [Mon, 29 Jan 2024 08:23:43 +0000 (09:23 +0100)]
x86: purge NMI_IO_APIC

Even going back to 3.2 source code, I can't spot how this watchdog mode
could ever have been enabled in Xen. The only effect its presence had
for all the years was the retaining of a dead string literal.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/APIC: purge {GET,SET}_APIC_DELIVERY_MODE()
Jan Beulich [Mon, 29 Jan 2024 08:22:35 +0000 (09:22 +0100)]
x86/APIC: purge {GET,SET}_APIC_DELIVERY_MODE()

The few uses we have can easily be replaced, eliminating the need for
redundant APIC_DM_* and APIC_MODE_* constants. Therefore also purge all
respective APIC_MODE_* constants, introducing APIC_DM_MASK anew instead.
This is further relevant since we have a different set of APIC_MODE_*,
which could otherwise end up confusing.

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agoNUMA: no need for asm/numa.h when !NUMA
Jan Beulich [Mon, 29 Jan 2024 08:21:16 +0000 (09:21 +0100)]
NUMA: no need for asm/numa.h when !NUMA

There's no point in every architecture carrying the same stubs for the
case when NUMA isn't enabled (or even supported). Move all of that to
xen/numa.h; replace explicit uses of asm/numa.h in common code. Make
inclusion of asm/numa.h dependent upon NUMA=y.

Drop the no longer applicable "implement NUMA support" comments - in a
!NUMA section this simply makes no sense.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Julien Grall <jgrall@amazon.com>
15 months agoxen/vmap: Check the page has been mapped in vm_init_type()
Julien Grall [Mon, 29 Jan 2024 08:20:02 +0000 (09:20 +0100)]
xen/vmap: Check the page has been mapped in vm_init_type()

The function map_pages_to_xen() could fail if it can't allocate the
underlying page tables or (at least on Arm) if the area was already
mapped.

The first error is caught by clear_page() because it would fault.
However, the second error while very unlikely is not caught at all.

As this is boot code, use BUG_ON() to check if map_pages_to_xen() has
succeeded.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Elias El Yandouzi <eliasely@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/entry: fix jump into restore_all_guest without %rbx correctly set
Roger Pau Monné [Fri, 26 Jan 2024 14:54:18 +0000 (15:54 +0100)]
x86/entry: fix jump into restore_all_guest without %rbx correctly set

e047b8d0fa05 went too far when limiting obtaining the vCPU pointer.  While the
code in ist_dispatch_done does indeed only need the vCPU pointer when PV32 is
enabled, the !PV32 path will end up jumping into restore_all_guest which does
require rbx == vCPU pointer.

Fix by moving the fetching of the vCPU pointer to be done outside of the PV32
code block.

Fixes: e047b8d0fa05 ('x86/entry: replace two GET_CURRENT() uses')
Reported-by: Edwin Torok <edwin.torok@cloud.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agoxen/arm64: head: Use PRINT_ID() for secondary CPU MMU-off boot code
Julien Grall [Thu, 25 Jan 2024 18:36:27 +0000 (18:36 +0000)]
xen/arm64: head: Use PRINT_ID() for secondary CPU MMU-off boot code

With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.

An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.

Right now, most of the early printk messages are using PRINT() which
will add the message in .rodata. This is unlikely to be within the
same page as the rest of the idmap.

So replace all the PRINT() that can be reachable by the secondary
CPU with MMU-off with PRINT_ID().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
15 months agoarm/smpboot: Move smp_up_cpu to a new section .data.idmap
Julien Grall [Thu, 25 Jan 2024 18:33:50 +0000 (18:33 +0000)]
arm/smpboot: Move smp_up_cpu to a new section .data.idmap

With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.

An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.

Right now, smp_up_cpu is used by secondary CPUs to wait their turn for
booting before the MMU is on. Yet it is currently in .data which is
unlikely to be within the same page as the rest of the idmap.

Move smp_up_cpu to the recently created section .data.idmap. The idmap is
currently part of the text section and therefore will be mapped read-only
executable. This means that we need to temporarily remap
smp_up_cpu in order to update it.

Introduce a new function set_smp_up_cpu() for this purpose so the code
is not duplicated between when opening and closing the gate.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
15 months agoarm/mmu: Move init_ttbr to a new section .data.idmap
Julien Grall [Thu, 25 Jan 2024 18:32:38 +0000 (18:32 +0000)]
arm/mmu: Move init_ttbr to a new section .data.idmap

With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.

An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.

Right now, init_ttbr is used by secondary CPUs to find there page-tables
before the MMU is on. Yet it is currently in .data which is unlikely
to be within the same page as the rest of the idmap.

Create a new section .data.idmap that will be used for variables
accessed by the early boot code. The first one is init_ttbr.

The idmap is currently part of the text section and therefore will
be mapped read-only executable. This means that we need to temporarily
remap init_ttbr in order to update it.

Introduce a new function set_init_ttbr() for this purpose so the code
is not duplicated between arm64 and arm32.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
15 months agox86/entry: Avoid register spilling in cr4_pv32_restore()
Andrew Cooper [Fri, 10 Feb 2023 21:20:42 +0000 (21:20 +0000)]
x86/entry: Avoid register spilling in cr4_pv32_restore()

cr4_pv32_restore() needs two registers.  Right now, it spills %rdx and
clobbers %rax.

However, %rcx is free to use at all callsites.  Annotate CR4_PV32_RESTORE with
our usual clobber comments, and swap %rdx for %rcx in the non-fatal paths

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agotools: don't expose XENFEAT_hvm_pirqs by default
Roger Pau Monne [Thu, 25 Jan 2024 09:30:41 +0000 (10:30 +0100)]
tools: don't expose XENFEAT_hvm_pirqs by default

The HVM pirq feature allows routing interrupts from both physical and emulated
devices over event channels, this was done a performance improvement.  However
its usage is fully undocumented, and the only reference implementation is in
Linux.  It defeats the purpose of local APIC hardware virtualization, because
when using it interrupts avoid the usage of the local APIC altogether.

It has also been reported to not work properly with certain devices, at least
when using some AMD GPUs Linux attempts to route interrupts over event
channels, but Xen doesn't correctly detect such routing, which leads to the
hypervisor complaining with:

(XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15

When MSIs are attempted to be routed over event channels the entry delivery
mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
inject the interrupt following the native MSI path, and the ExtINT delivery
mode is not supported.

Disable HVM PIRQs by default and provide a per-domain option in xl.cfg to
enable such feature.  Also for backwards compatibility keep the feature enabled
for any resumed domains that don't have an explicit selection.

Note that the only user of the feature (Linux) is also able to handle native
interrupts fine, as the feature was already not used if Xen reported local APIC
hardware virtualization active.

Link: https://github.com/QubesOS/qubes-issues/issues/7971
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/hvm: make X86_EMU_USE_PIRQ optional
Roger Pau Monne [Thu, 25 Jan 2024 09:30:40 +0000 (10:30 +0100)]
x86/hvm: make X86_EMU_USE_PIRQ optional

Allow selecting X86_EMU_USE_PIRQ for HVM guests, so it's no longer mandated to
be always on.

There's no restriction in Xen that forces such feature to be always on for HVM
guests, as for example PVH guests don't support it, as such allow toolstack to
select whether to enabled it on a per-domain basis.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/ucode: Fix stability of the raw CPU Policy rescan
Andrew Cooper [Tue, 23 Jan 2024 20:24:22 +0000 (20:24 +0000)]
x86/ucode: Fix stability of the raw CPU Policy rescan

Always run microcode_update_helper() on the BSP, so the the updated Raw CPU
policy doesn't get non-BSP topology details included.

Have calculate_raw_cpu_policy() clear the instantanious XSTATE sizes.  The
value XCR0 | MSR_XSS had when we scanned the policy isn't terribly interesting
to report.

When CPUID Masking is active, it affects CPUID instructions issued by Xen
too.  Transiently disable masking to get a clean scan.

Fixes: 694d79ed5aac ("x86/ucode: Refresh raw CPU policy after microcode load")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agopmstat: Limit hypercalls under HWP
Jason Andryuk [Thu, 25 Jan 2024 15:11:49 +0000 (16:11 +0100)]
pmstat: Limit hypercalls under HWP

When HWP is active, the cpufreq P-state information is not updated.  In
that case, return -EOPNOTSUPP instead of bogus, incomplete info.

Similarly, set_cpufreq_para() is not applicable when HWP is active.
Many of the options already checked the governor and were inaccessible,
but SCALING_MIN/MAX_FREQ was still accessible (though it would do
nothing).  Add an ealier HWP check to handle all cases.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/entry: replace two GET_CURRENT() uses
Jan Beulich [Thu, 25 Jan 2024 15:10:58 +0000 (16:10 +0100)]
x86/entry: replace two GET_CURRENT() uses

Now that we have %r14 set up using GET_STACK_END() in a number of
places, in two places we can eliminate the redundancy of GET_CURRENT()
also invoking that macro. In handle_ist_exception() actually go a step
farther and avoid using %rbx altogether when retrieving the processor
ID: Obtain the current vCPU pointer only in the PV32-specific code
actually needing it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/NMI: refine "watchdog stuck" log message
Jan Beulich [Thu, 25 Jan 2024 15:10:06 +0000 (16:10 +0100)]
x86/NMI: refine "watchdog stuck" log message

Observing

"Testing NMI watchdog on all CPUs: 0 stuck"

it felt like it's not quite right, but I still read it as "no CPU stuck;
all good", when really the system suffered from what 6bdb965178bb
("x86/intel: ensure Global Performance Counter Control is setup
correctly") works around. Convert this to

"Testing NMI watchdog on all CPUs: {0} stuck"

or, with multiple CPUs having an issue, e.g.

"Testing NMI watchdog on all CPUs: {0,40} stuck"

to make more obvious that a lone number is not a count of CPUs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/p2m-pt: fix off by one in entry check assert
Roger Pau Monné [Thu, 25 Jan 2024 15:09:04 +0000 (16:09 +0100)]
x86/p2m-pt: fix off by one in entry check assert

The MMIO RO rangeset overlap check is bogus: the rangeset is inclusive so the
passed end mfn should be the last mfn to be mapped (not last + 1).

Fixes: 6fa1755644d0 ('amd/npt/shadow: replace assert that prevents creating 2M/1G MMIO entries')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
15 months agox86/entry: Fix ELF metadata for NMI and handle_ist_exception
Andrew Cooper [Mon, 22 Jan 2024 14:50:10 +0000 (14:50 +0000)]
x86/entry: Fix ELF metadata for NMI and handle_ist_exception

handle_ist_exception isn't part of the NMI handler, just like handle_exception
isn't part of #PF.

Fixes: b3a9037550df ("x86: annotate entry points with type and size")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/kexec: Drop compatibility_mode_far
Andrew Cooper [Thu, 28 Oct 2021 19:03:21 +0000 (20:03 +0100)]
x86/kexec: Drop compatibility_mode_far

LJMP is (famously?) incompatible between Intel and AMD CPUs, and while we're
using one of the compatible forms, we've got a good stack and LRET is the far
more common way of doing this.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen/riscv: introduce guest_access.h
Oleksii Kurochko [Tue, 23 Jan 2024 13:49:54 +0000 (14:49 +0100)]
xen/riscv: introduce guest_access.h

All necessary dummiy implementation of functions in this header
will be introduced in stubs.c

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen/riscv: introduce domain.h
Oleksii Kurochko [Tue, 23 Jan 2024 13:49:27 +0000 (14:49 +0100)]
xen/riscv: introduce domain.h

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoIRQ: generalize [gs]et_irq_regs()
Jan Beulich [Tue, 23 Jan 2024 11:03:23 +0000 (12:03 +0100)]
IRQ: generalize [gs]et_irq_regs()

Move functions (and their data) to common code, and invoke the functions
on Arm as well. This is in preparation of dropping the register
parameters from handler functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
15 months agolib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps to $(targets)
Michal Orzel [Tue, 23 Jan 2024 11:02:44 +0000 (12:02 +0100)]
lib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps to $(targets)

At the moment, trying to run xencov read/reset (calling SYSCTL_coverage_op
under the hood) results in a crash. This is due to a profiler trying to
access data in the .init.* sections (libfdt for Arm and libelf for x86)
that are stripped after boot. Normally, the build system compiles any
*.init.o file without COV_FLAGS. However, these two libraries are
handled differently as sections will be renamed to init after linking.

To override COV_FLAGS to empty for these libraries, lib{fdt,elf}.o were
added to nocov-y. This worked until e321576f4047 ("xen/build: start using
if_changed") that added lib{fdt,elf}-temp.o and their deps to extra-y.
This way, even though these objects appear as prerequisites of
lib{fdt,elf}.o and the settings should propagate to them, make can also
build them as a prerequisite of __build, in which case COV_FLAGS would
still have the unwanted flags. Fix it by switching to $(targets) instead.

Also, for libfdt, append libfdt.o to nocov-y only if CONFIG_OVERLAY_DTB
is not set. Otherwise, there is no section renaming and we should be able
to run the coverage.

Fixes: e321576f4047 ("xen/build: start using if_changed")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoPPC: switch entry point annotations to common model
Jan Beulich [Tue, 23 Jan 2024 11:02:05 +0000 (12:02 +0100)]
PPC: switch entry point annotations to common model

Use the generic framework in xen/linkage.h. No change in generated code
except of course the converted symbols change to be hidden ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
15 months agotools/binfile: switch to common annotations model
Jan Beulich [Mon, 22 Jan 2024 12:55:38 +0000 (13:55 +0100)]
tools/binfile: switch to common annotations model

Use DATA() / END() and drop the now redundant .global. No change in
generated data; of course the two symbols now properly gain "hidden"
binding.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
15 months agoRISC-V: annotate entry points with type and size
Jan Beulich [Mon, 22 Jan 2024 12:55:11 +0000 (13:55 +0100)]
RISC-V: annotate entry points with type and size

Use the generic framework in xen/linkage.h. No change in generated code
except of course the converted symbols change to be hidden ones and gain
a valid size.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
15 months agoArm: annotate entry points with type and size
Jan Beulich [Mon, 22 Jan 2024 12:54:34 +0000 (13:54 +0100)]
Arm: annotate entry points with type and size

Use the generic framework in xen/linkage.h. No change in generated code
except for the changed padding value (noticable when config.gz isn't a
multiple of 4 in size). Plus of course the converted symbols change to
be hidden ones.

Note that ASM_INT() is switched to DATA(), not DATA_LOCAL(), as the only
use site wants the symbol global anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
15 months agox86: also mark assembler globals hidden
Jan Beulich [Mon, 22 Jan 2024 12:52:13 +0000 (13:52 +0100)]
x86: also mark assembler globals hidden

Let's have assembler symbols be consistent with C ones. In principle
there are (a few) cases where gas can produce smaller code this way,
just that for now there's a gas bug causing smaller code to be emitted
even when that shouldn't be the case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
15 months agox86: annotate entry points with type and size
Jan Beulich [Mon, 22 Jan 2024 12:51:31 +0000 (13:51 +0100)]
x86: annotate entry points with type and size

Use the generic framework in xen/linkage.h.

For switch_to_kernel() and restore_all_guest() so far implicit alignment
(from being first in their respective sections) is being made explicit
(as in: using FUNC() without 2nd argument). Whereas for
{,compat}create_bounce_frame() and autogen_entrypoints[] alignment is
newly arranged for.

Except for the added/adjusted alignment padding (including their
knock-on effects) no change in generated code/data. Note that the basis
for support of weak definitions is added despite them not having any use
right now.

Note that ASM_INT() is switched to DATA(), not DATA_LOCAL(), as the only
use site wants the symbol global anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
15 months agocommon: assembly entry point type/size annotations
Jan Beulich [Mon, 22 Jan 2024 12:50:40 +0000 (13:50 +0100)]
common: assembly entry point type/size annotations

Recent gas versions generate minimalistic Dwarf debug info for items
annotated as functions and having their sizes specified [1]. Furthermore
generating live patches wants items properly annotated. "Borrow" Arm's
END() and (remotely) derive other annotation infrastructure from
Linux'es, for all architectures to use.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
[1] https://sourceware.org/git?p=binutils-gdb.git;a=commitdiff;h=591cc9fbbfd6d51131c0f1d4a92e7893edcc7a28

15 months agox86/MCE: switch some callback invocations to altcall
Jan Beulich [Mon, 22 Jan 2024 12:41:07 +0000 (13:41 +0100)]
x86/MCE: switch some callback invocations to altcall

While not performance critical, these hook invocations still would
better be converted: This way all pre-filled (and newly introduced)
struct mce_callback instances can become __initconst_cf_clobber, thus
allowing to eliminate another 9 ENDBR during the 2nd phase of
alternatives patching.

While this means registering callbacks a little earlier, doing so is
perhaps even advantageous, for having pointers be non-NULL earlier on.
Only one set of callbacks would only ever be registered anyway, and
neither of the respective initialization function can (subsequently)
fail.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/MCE: separate BSP-only initialization
Jan Beulich [Mon, 22 Jan 2024 12:40:32 +0000 (13:40 +0100)]
x86/MCE: separate BSP-only initialization

Several function pointers are registered over and over again, when
setting them once on the BSP suffices. Arrange for this in the vendor
init functions and mark involved registration functions __init.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/PV: avoid indirect call for I/O emulation quirk hook
Jan Beulich [Mon, 22 Jan 2024 12:40:00 +0000 (13:40 +0100)]
x86/PV: avoid indirect call for I/O emulation quirk hook

This way ioemul_handle_proliant_quirk() won't need ENDBR anymore.

While touching this code, also
- arrange for it to not be built at all when !PV,
- add "const" to the last function parameter and bring the definition
  in sync with the declaration (for Misra).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/MTRR: avoid several indirect calls
Jan Beulich [Mon, 22 Jan 2024 12:39:23 +0000 (13:39 +0100)]
x86/MTRR: avoid several indirect calls

The use of (supposedly) vendor-specific hooks is a relic from the days
when Xen was still possible to build as 32-bit binary. There's no
expectation that a new need for such an abstraction would arise. Convert
mttr_if to a mere boolean and all prior calls through it to direct ones,
thus allowing to eliminate 6 ENDBR from .text.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agocore-parking: use alternative_call()
Jan Beulich [Mon, 22 Jan 2024 12:38:24 +0000 (13:38 +0100)]
core-parking: use alternative_call()

This way we can arrange for core_parking_{performance,power}()'s ENDBR
to also be zapped.

For the decision to be taken before the 2nd alternative patching pass,
the initcall needs to become a pre-SMP one, though.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agoxen: Fold exit paths in find_text_region()
Andrew Cooper [Thu, 13 Apr 2023 18:52:10 +0000 (19:52 +0100)]
xen: Fold exit paths in find_text_region()

Despite rcu_read_unlock() being fully inlineable, the optimiser doesn't appear
willing to fold the exit paths.  Rework the logic to do so explicitly.

This compiles to marginally better code in all cases.  No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen/livepatch: Make check_for_livepatch_work() faster in the common case
Andrew Cooper [Fri, 22 Dec 2023 21:06:16 +0000 (21:06 +0000)]
xen/livepatch: Make check_for_livepatch_work() faster in the common case

When livepatching is enabled, this function is used all the time.  Really do
check the fastpath first, and annotate it likely() as this is the right answer
100% of the time (to many significant figures).  This cuts out 3 pointer
dereferences in the "nothing to do path".

However, GCC still needs some help to persuade it not to set the full stack
frame (6 spilled registers, 3 slots of locals) even on the fastpath.

Create a new check_for_livepatch_work() with the fastpath only, and make the
"new" do_livepatch_work() noinline.  This causes the fastpath to need no stack
frame, making it faster still.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/cpuid: Change cpuid() from a macro to a static inline
Andrew Cooper [Tue, 16 Jan 2024 11:50:38 +0000 (11:50 +0000)]
x86/cpuid: Change cpuid() from a macro to a static inline

Addresses MISRA Rule 5.5.  Introduces others, but lets fix one thing at a
time.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/vmx: Disallow the use of inactivity states
Andrew Cooper [Fri, 27 Oct 2023 16:02:21 +0000 (17:02 +0100)]
x86/vmx: Disallow the use of inactivity states

Right now, vvmx will blindly copy L12's ACTIVITY_STATE into the L02 VMCS and
enter the vCPU.  Luckily for us, nested-virt is explicitly unsupported for
security bugs.

The inactivity states are HLT, SHUTDOWN and WAIT-FOR-SIPI, and as noted by the
SDM in Vol3 27.7 "Special Features of VM Entry":

  If VM entry ends with the logical processor in an inactive activity state,
  the VM entry generates any special bus cycle that is normally generated when
  that activity state is entered from the active state.

Also,

  Some activity states unconditionally block certain events.

I.e. A VMEntry with ACTIVITY=SHUTDOWN will initiate a platform reset, while a
VMEntry with ACTIVITY=WAIT-FOR-SIPI will really block everything other than
SIPIs.

Both of these activity states are for the TXT ACM to use, not for regular
hypervisors, and Xen doesn't support dropping the HLT intercept either.

There are two paths in Xen which operate on ACTIVITY_STATE.

1) The vmx_{get,set}_nonreg_state() helpers for VM-Fork.

   As regular VMs can't use any inactivity states, this is just duplicating
   the 0 from construct_vmcs().  Retain the ability to query activity_state,
   but crash the domain on any attempt to set an inactivity state.

2) Nested virt, because of ACTIVITY_STATE in vmcs_gstate_field[].

   Explicitly hide the inactivity states in the guest's view of MSR_VMX_MISC,
   and remove ACTIVITY_STATE from vmcs_gstate_field[].

   In virtual_vmentry(), we should trigger a VMEntry failure for the use of
   any inactivity states, but there's no support for that in the code at all
   so leave a TODO for when we finally start working on nested-virt in
   earnest.

Reported-by: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
15 months agox86/vmx: Fix IRQ handling for EXIT_REASON_INIT
Andrew Cooper [Wed, 1 Nov 2023 13:32:55 +0000 (13:32 +0000)]
x86/vmx: Fix IRQ handling for EXIT_REASON_INIT

When receiving an INIT, a prior bugfix tried to ignore the INIT and continue
onwards.

Unfortunately it's not safe to return at that point in vmx_vmexit_handler().
Just out of context in the first hunk is a local_irqs_enabled() which is
depended-upon by the return-to-guest path, causing the following checklock
failure in debug builds:

  (XEN) Error: INIT received - ignoring
  (XEN) CHECKLOCK FAILURE: prev irqsafe: 0, curr irqsafe 1
  (XEN) Xen BUG at common/spinlock.c:132
  (XEN) ----[ Xen-4.19-unstable  x86_64  debug=y  Tainted:     H  ]----
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040238e10>] R check_lock+0xcd/0xe1
  (XEN)    [<ffff82d040238fe3>] F _spin_lock+0x1b/0x60
  (XEN)    [<ffff82d0402ed6a8>] F pt_update_irq+0x32/0x3bb
  (XEN)    [<ffff82d0402b9632>] F vmx_intr_assist+0x3b/0x51d
  (XEN)    [<ffff82d040206447>] F vmx_asm_vmexit_handler+0xf7/0x210

Luckily, this is benign in release builds.  Accidentally having IRQs disabled
when trying to take an IRQs-on lock isn't a deadlock-vulnerable pattern.

Drop the problematic early return.  In hindsight, it's wrong to skip other
normal VMExit steps.

Fixes: b1f11273d5a7 ("x86/vmx: Don't spuriously crash the domain when INIT is received")
Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/vmx: Collect all empty VMExit cases together
Andrew Cooper [Thu, 11 Jan 2024 20:26:53 +0000 (20:26 +0000)]
x86/vmx: Collect all empty VMExit cases together

... rather than having them spread out.  Explain concisely why each is empty.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
15 months agox86/HPET: avoid an indirect call
Jan Beulich [Wed, 17 Jan 2024 09:43:02 +0000 (10:43 +0100)]
x86/HPET: avoid an indirect call

When this code was written, indirect branches still weren't considered
much of a problem (besides being a little slower). Instead of a function
pointer, pass a boolean to _disable_pit_irq(), thus allowing to
eliminate two ENDBR (one of them in .text).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agocpufreq: finish genapic conversion to altcall
Jan Beulich [Wed, 17 Jan 2024 09:42:27 +0000 (10:42 +0100)]
cpufreq: finish genapic conversion to altcall

Even functions used on infrequently executed paths want converting: This
way all pre-filled struct cpufreq_driver instances can become
__initconst_cf_clobber, thus allowing to eliminate another 15 ENDBR
during the 2nd phase of alternatives patching.

For acpi-cpufreq's optionally populated .get hook make sure alternatives
patching can actually see the pointer. See also the code comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/APIC: finish genapic conversion to altcall
Jan Beulich [Wed, 17 Jan 2024 09:41:52 +0000 (10:41 +0100)]
x86/APIC: finish genapic conversion to altcall

While .probe() doesn't need fiddling with for being run only very early,
init_apic_ldr() wants converting too despite not being on a frequently
executed path: This way all pre-filled struct genapic instances can
become __initconst_cf_clobber, thus allowing to eliminate 15 more ENDBR
during the 2nd phase of alternatives patching.

While fiddling with section annotations here, also move "genapic" itself
to .data.ro_after_init.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/intel: ensure Global Performance Counter Control is setup correctly
Roger Pau Monné [Wed, 17 Jan 2024 09:40:52 +0000 (10:40 +0100)]
x86/intel: ensure Global Performance Counter Control is setup correctly

When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
MSR contains per-counter enable bits that is ANDed with the enable bit in the
counter EVNTSEL MSR in order for a PMC counter to be enabled.

So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
bits being set by default, but at least on some Intel Sapphire and Emerald
Rapids this is no longer the case, and Xen reports:

Testing NMI watchdog on all CPUs: 0 40 stuck

The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
relevant enable bit in PERF_GLOBAL_CTRL not being set.

Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agoxen/arm64: head: Allow to use early printk while on 1:1 mapping
Michal Orzel [Mon, 15 Jan 2024 12:48:59 +0000 (13:48 +0100)]
xen/arm64: head: Allow to use early printk while on 1:1 mapping

Take an example from commit 1ec3fe1f664f ("xen/arm32: head: Improve
logging in head.S") to add support for printing early boot messages
while running on identity mapping:
 - define PRINT_SECT() macro to be able to specify a section for storing
   a string. PRINT() will use .rodata.str and PRINT_ID() - .rodata.idmap.
   This is necessary, because when running on identity mapping, the
   strings need to be part of the first page that is mapped,
 - move loading a runtime virtual UART address right after enabling MMU
   (the corresponding steps repeated in {primary,secondary}_switched are
   now consolidated in a single place),
 - move early printk 'hex' string into .rodata.idmap and replace 'adr'
   instruction in asm_putn with 'adr_l' to extend the addressable range,
 - remove RODATA_STR() macro given no use.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
15 months agoxen/arm32: head: Move earlyprintk 'hex' to .rodata.idmap
Michal Orzel [Mon, 15 Jan 2024 12:48:58 +0000 (13:48 +0100)]
xen/arm32: head: Move earlyprintk 'hex' to .rodata.idmap

Thanks to 1ec3fe1f664f ("xen/arm32: head: Improve logging in head.S"),
we can now use PRINT_ID() macro to print messages when running on
identity mapping. For that, all the strings need to be part of the first
page that is mapped. This is not the case for a 'hex' string (used by
asm_putn when printing register values), which currently resides in
.rodata.str. Move it to .rodata.idmap to allow making use of print_reg
macro from anywhere (mostly to aid debugging).

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
15 months agoCirrusCI: drop FreeBSD 12
Roger Pau Monné [Mon, 15 Jan 2024 11:20:11 +0000 (12:20 +0100)]
CirrusCI: drop FreeBSD 12

Went EOL by the end of December 2023, and the pkg repos have been shut down.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/vPMU: drop regs parameter from interrupt functions
Jan Beulich [Mon, 15 Jan 2024 11:19:41 +0000 (12:19 +0100)]
x86/vPMU: drop regs parameter from interrupt functions

The vendor functions don't use the respective parameters at all. In
vpmu_do_interrupt() there's only a very limited area where the
outer context's state would be needed, retrievable by get_irq_regs().

This is in preparation of dropping the register parameters from direct
APIC vector handler functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/vIRQ: split PCI link load state checking from actual loading
Jan Beulich [Mon, 15 Jan 2024 11:19:17 +0000 (12:19 +0100)]
x86/vIRQ: split PCI link load state checking from actual loading

Move the checking into a check hook, and add checking of the padding
fields as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
15 months agox86/vPIC: check values loaded from state save record
Jan Beulich [Mon, 15 Jan 2024 11:18:43 +0000 (12:18 +0100)]
x86/vPIC: check values loaded from state save record

Loading is_master from the state save record can lead to out-of-bounds
accesses via at least the two container_of() uses by vpic_domain() and
__vpic_lock(). Make sure the value is consistent with the instance being
loaded.

For ->int_output (which for whatever reason isn't a 1-bit bitfield),
besides bounds checking also take ->init_state into account.

For ELCR follow what vpic_intercept_elcr_io()'s write path and
vpic_reset() do, i.e. don't insist on the internal view of the value to
be saved.

Move the instance range check as well, leaving just an assertion in the
load handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
15 months agox86/vPIT: check values loaded from state save record
Jan Beulich [Mon, 15 Jan 2024 11:18:10 +0000 (12:18 +0100)]
x86/vPIT: check values loaded from state save record

In particular pit_latch_status() and speaker_ioport_read() perform
calculations which assume in-bounds values. Several of the state save
record fields can hold wider ranges, though. Refuse to load values which
cannot result from normal operation, except mode, the init state of
which (see also below) cannot otherwise be reached.

Note that ->gate should only be possible to be zero for channel 2;
enforce that as well.

Adjust pit_reset()'s writing of ->mode as well, to not unduly affect
the value pit_latch_status() may calculate. The chosen mode of 7 is
still one which cannot be established by writing the control word. Note
that with or without this adjustment effectively all switch() statements
using mode as the control expression aren't quite right when the PIT is
still in that init state; there is an apparent assumption that before
these can sensibly be invoked, the guest would init the PIT (i.e. in
particular set the mode).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
15 months agox86/HVM: adjust save/restore hook registration for optional check handler
Jan Beulich [Mon, 15 Jan 2024 11:17:37 +0000 (12:17 +0100)]
x86/HVM: adjust save/restore hook registration for optional check handler

Register NULL uniformly as a first step.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86/HVM: split restore state checking from state loading
Jan Beulich [Mon, 15 Jan 2024 11:16:56 +0000 (12:16 +0100)]
x86/HVM: split restore state checking from state loading

..., at least as reasonably feasible without making a check hook
mandatory (in particular strict vs relaxed/zero-extend length checking
can't be done early this way).

Note that only one of the two uses of "real" hvm_load() is accompanied
with a "checking" one. The other directly consumes hvm_save() output,
which ought to be well-formed. This means that while input data related
checks don't need repeating in the "load" function when already done by
the "check" one (albeit assertions to this effect may be desirable),
domain state related checks (e.g. has_xyz(d)) will be required in both
places.

With the split arch_hvm_{check,load}(), also invoke the latter only
after downing all the vCPU-s.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
15 months agoNUMA: limit first_valid_mfn exposure
Jan Beulich [Mon, 15 Jan 2024 11:15:56 +0000 (12:15 +0100)]
NUMA: limit first_valid_mfn exposure

Address the TODO regarding first_valid_mfn by making the variable static
when NUMA=y, thus also addressing a Misra C:2012 rule 8.4 concern (on
x86). To carry this out, introduce two new IS_ENABLED()-like macros
conditionally inserting "static". One less macro expansion layer is
sufficient though (I might guess that some early form of IS_ENABLED()
pasted CONFIG_ onto the incoming argument, at which point the extra
layer would have been necessary), and part of the existing helper macros
can be re-used.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
15 months agoxen/riscv: introduce system.h
Oleksii Kurochko [Mon, 15 Jan 2024 11:12:52 +0000 (12:12 +0100)]
xen/riscv: introduce system.h

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
15 months agox86emul: support SM4
Jan Beulich [Mon, 15 Jan 2024 11:12:00 +0000 (12:12 +0100)]
x86emul: support SM4

Since the insns here and in particular their memory access patterns
follow the usual scheme, I didn't think it was necessary to add a
contrived test specifically for them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86emul: support SM3
Jan Beulich [Mon, 15 Jan 2024 11:11:22 +0000 (12:11 +0100)]
x86emul: support SM3

Since the insns here and in particular their memory access patterns
follow the usual scheme, I didn't think it was necessary to add a
contrived test specifically for them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86emul: support SHA512
Jan Beulich [Mon, 15 Jan 2024 11:10:40 +0000 (12:10 +0100)]
x86emul: support SHA512

Since the insns here don't access memory, I didn't think it was
necessary to extend our SHA test for them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
15 months agox86emul: support AVX-VNNI-INT16
Jan Beulich [Mon, 15 Jan 2024 11:09:42 +0000 (12:09 +0100)]
x86emul: support AVX-VNNI-INT16

These are close relatives of the AVX-VNNI and AVX-VNNI-INT8 ISA
extensions. Since the insns here and in particular their memory access
patterns follow the usual scheme (and especially the word variants of
AVX-VNNI), I didn't think it was necessary to add a contrived test
specifically for them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
16 months agoxen/arm32: head: Improve logging in head.S
Julien Grall [Fri, 12 Jan 2024 11:54:31 +0000 (11:54 +0000)]
xen/arm32: head: Improve logging in head.S

The sequence to enable the MMU on arm32 is quite complex as we may need
to jump to a temporary mapping to map Xen.

Recently, we had one bug in the logic (see f5a49eb7f8b3 ("xen/arm32:
head: Add mising isb in switch_to_runtime_mapping()") and it was
a pain to debug because there are no logging.

In order to improve the logging in the MMU switch we need to add
support for early printk while running on the identity mapping
and also on the temporary mapping.

For the identity mapping, we have only the first page of Xen mapped.
So all the strings should reside in the first page. For that purpose
a new macro PRINT_ID is introduced.

For the temporary mapping, the fixmap is already linked in the temporary
area (and so does the UART). So we just need to update the register
storing the UART address (i.e. r11) to point to the UART temporary
mapping.

Take the opportunity to introduce mov_w_on_cond in order to
conditionally execute mov_w and avoid branches.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
16 months agoxen/arm: bootfdt: Harden handling of malformed mem reserve map
Shawn Anastasio [Thu, 11 Jan 2024 23:24:22 +0000 (17:24 -0600)]
xen/arm: bootfdt: Harden handling of malformed mem reserve map

The early_print_info routine in bootfdt.c incorrectly stores the result
of a call to fdt_num_mem_rsv() in an unsigned int, which results in the
negative error code being interpreted incorrectly in a subsequent loop
in the case where the device tree is malformed. Fix this by properly
checking the return code for an error and calling panic().

Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
16 months agoxen/common: Don't dereference overlay_node after checking that it is NULL
Javi Merino [Thu, 11 Jan 2024 12:09:27 +0000 (12:09 +0000)]
xen/common: Don't dereference overlay_node after checking that it is NULL

In remove_nodes(), overlay_node is dereferenced when printing the
error message even though it is known to be NULL.  Return without
printing as an error message is already printed by the caller.

The semantic patch that spots this code is available in

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/coccinelle/null/deref_null.cocci?id=1f874787ed9a2d78ed59cb21d0d90ac0178eceb0

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities")
Signed-off-by: Javi Merino <javi.merino@cloud.com>
Reviewed-by: Vikram Garhwal <vikram.garhwal@amd.com>
16 months agoxen/arm32: head: Rework how the fixmap and early UART mapping are prepared
Julien Grall [Fri, 12 Jan 2024 10:45:09 +0000 (10:45 +0000)]
xen/arm32: head: Rework how the fixmap and early UART mapping are prepared

Since commit 5e213f0f4d2c ("xen/arm32: head: Widen the use of the
temporary mapping"), boot_second (used to cover regions like Xen and
the fixmap) will not be mapped if the identity mapping overlap.

So it is ok to prepare the fixmap table and link it in boot_second
earlier. With that, the fixmap can also be used earlier via the
temporary mapping.

Therefore split setup_fixmap() in two:
    * The table is now linked in create_page_tables() because
      the boot page tables needs to be recreated for every CPU.
    * The early UART mapping is only added for the boot CPU0 as the
      fixmap table is not cleared when secondary CPUs boot.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
16 months agox86/iommu: introduce a rangeset to perform hwdom IOMMU setup
Roger Pau Monné [Tue, 9 Jan 2024 13:07:49 +0000 (14:07 +0100)]
x86/iommu: introduce a rangeset to perform hwdom IOMMU setup

This change just introduces the boilerplate code in order to use a rangeset
when setting up the hardware domain IOMMU mappings.  The rangeset is never
populated in this patch, so it's a non-functional change as far as the mappings
the domain gets established.

Note there will be a change for HVM domains (ie: PVH dom0) when the code
introduced here gets used: the p2m mappings will be established using
map_mmio_regions() instead of p2m_add_identity_entry(), so that ranges can be
mapped with a single function call if possible.  Note that the interface of
map_mmio_regions() doesn't allow creating read-only mappings, but so far there
are no such mappings created for PVH dom0 in arch_iommu_hwdom_init().

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
16 months agox86/HVM: drop tsc_scaling.setup() hook
Jan Beulich [Tue, 9 Jan 2024 13:07:17 +0000 (14:07 +0100)]
x86/HVM: drop tsc_scaling.setup() hook

This was used by VMX only, and the intended VMCS write can as well
happen from vmx_set_tsc_offset(), invoked (directly or indirectly)
almost immediately after the present call sites of the hook.
vmx_set_tsc_offset() isn't invoked frequently elsewhere, so the extra
VMCS write shouldn't raise performance concerns.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
16 months agox86/HVM: hide SVM/VMX when their enabling is prohibited by firmware
Jan Beulich [Tue, 9 Jan 2024 13:06:34 +0000 (14:06 +0100)]
x86/HVM: hide SVM/VMX when their enabling is prohibited by firmware

... or we fail to enable the functionality on the BSP for other reasons.
The only place where hardware announcing the feature is recorded is the
raw CPU policy/featureset.

Inspired by https://lore.kernel.org/all/20230921114940.957141-1-pbonzini@redhat.com/.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>