]> xenbits.xensource.com Git - xen.git/log
xen.git
20 months agobuild: correct gas --noexecstack check
Jan Beulich [Tue, 5 Sep 2023 06:51:50 +0000 (08:51 +0200)]
build: correct gas --noexecstack check

The check was missing an escape for the inner $, thus breaking things
in the unlikely event that the underlying assembler doesn't support this
option.

Fixes: 62d22296a95d ("build: silence GNU ld warning about executable stacks")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: d1f6a58dfdc508c43a51c1865c826d519bf16493
master date: 2023-08-14 09:58:19 +0200

20 months agox86/iommu: pass full IO-APIC RTE for remapping table update
Roger Pau Monné [Tue, 5 Sep 2023 06:50:39 +0000 (08:50 +0200)]
x86/iommu: pass full IO-APIC RTE for remapping table update

So that the remapping entry can be updated atomically when possible.

Doing such update atomically will avoid Xen having to mask the IO-APIC
pin prior to performing any interrupt movements (ie: changing the
destination and vector fields), as the interrupt remapping entry is
always consistent.

This also simplifies some of the logic on both VT-d and AMD-Vi
implementations, as having the full RTE available instead of half of
it avoids to possibly read and update the missing other half from
hardware.

While there remove the explicit zeroing of new_ire fields in
ioapic_rte_to_remap_entry() and initialize the variable at definition
so all fields are zeroed.  Note fields could be also initialized with
final values at definition, but I found that likely too much to be
done at this time.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3e033172b0250446bfe119f31c7f0f51684b0472
master date: 2023-08-01 11:48:39 +0200

20 months agoiommu/vtd: rename io_apic_read_remap_rte() local variable
Roger Pau Monné [Tue, 5 Sep 2023 06:50:05 +0000 (08:50 +0200)]
iommu/vtd: rename io_apic_read_remap_rte() local variable

Preparatory change to unify the IO-APIC pin variable name between
io_apic_read_remap_rte() and amd_iommu_ioapic_update_ire(), so that
the local variable can be made a function parameter with the same name
across vendors.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: a478b38c01b65fa030303f0324a3380d872eb165
master date: 2023-07-28 09:40:42 +0200

20 months agox86/ioapic: RTE modifications must use ioapic_write_entry
Roger Pau Monné [Tue, 5 Sep 2023 06:49:37 +0000 (08:49 +0200)]
x86/ioapic: RTE modifications must use ioapic_write_entry

Do not allow to write to RTE registers using io_apic_write and instead
require changes to RTE to be performed using ioapic_write_entry.

This is in preparation for passing the full contents of the RTE to the
IOMMU interrupt remapping handlers, so remapping entries for IO-APIC
RTEs can be updated atomically when possible.

While immediately this commit might expand the number of MMIO accesses
in order to update an IO-APIC RTE, further changes will benefit from
getting the full RTE value passed to the IOMMU handlers, as the logic
is greatly simplified when the IOMMU handlers can get the complete RTE
value in one go.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ef7995ed1bcd7eac37fb3c3fe56eaa54ea9baf6c
master date: 2023-07-28 09:40:20 +0200

20 months agox86/ioapic: add a raw field to RTE struct
Roger Pau Monné [Tue, 5 Sep 2023 06:48:43 +0000 (08:48 +0200)]
x86/ioapic: add a raw field to RTE struct

Further changes will require access to the full RTE as a single value
in order to pass it to IOMMU interrupt remapping handlers.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: cdc48cb5a74b10c2b07a09d2f554756d730bfee3
master date: 2023-07-28 09:39:44 +0200

20 months agox86/ioapic: sanitize IO-APIC pins before enabling lapic LVTERR/ESR
Roger Pau Monné [Tue, 5 Sep 2023 06:47:34 +0000 (08:47 +0200)]
x86/ioapic: sanitize IO-APIC pins before enabling lapic LVTERR/ESR

The current logic to init the local APIC and the IO-APIC does init the
local APIC LVTERR/ESR before doing any sanitization on the IO-APIC pin
configuration.  It's already noted on enable_IO_APIC() that Xen
shouldn't trust the IO-APIC being empty at bootup.

At XenServer we have a system where the IO-APIC 0 is handed to Xen
with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
with a vector of 0 (all fields of the RTE are zeroed).  Once the local
APIC LVTERR/ESR is enabled periodic injections from such pin cause the
local APIC to in turn inject periodic error vectors:

APIC error on CPU0: 00(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector
APIC error on CPU0: 40(40), Received illegal vector

That prevents Xen from booting.

Move the masking of the IO-APIC pins ahead of the setup of the local
APIC.  This has the side effect of also moving the detection of the
pin where the i8259 is connected, as such detection must be done
before masking any pins.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 813da5f0e73b8cbd2ac3c7922506e58c28cd736d
master date: 2023-07-17 10:31:10 +0200

20 months agoxenalyze: Handle start-of-day ->RUNNING transitions
George Dunlap [Tue, 5 Sep 2023 06:47:14 +0000 (08:47 +0200)]
xenalyze: Handle start-of-day ->RUNNING transitions

A recent xentrace highlighted an unhandled corner case in the vcpu
"start-of-day" logic, if the trace starts after the last running ->
non-running transition, but before the first non-running -> running
transition.  Because start-of-day wasn't handled, vcpu_next_update()
was expecting p->current to be NULL, and tripping out with the
following error message when it wasn't:

vcpu_next_update: FATAL: p->current not NULL! (d32768dv$p, runstate RUNSTATE_INIT)

where 32768 is the DEFAULT_DOMAIN, and $p is the pcpu number.

Instead of calling vcpu_start() piecemeal throughout
sched_runstate_process(), call it at the top of the function if the
vcpu in question is still in RUNSTATE_INIT, so that we can handle all
the cases in one place.

Sketch out at the top of the function all cases which we need to
handle, and what to do in those cases.  Some transitions tell us where
v is running; some transitions tell us about what is (or is not)
running on p; some transitions tell us neither.

If a transition tells us where v is now running, update its state;
otherwise leave it in INIT, in order to avoid having to deal with TSC
skew on start-up.

If a transition tells us what is or is not running on p, update
p->current (either to v or NULL).  Otherwise leave it alone.

If neither, do nothing.

Reifying those rules:

- If we're continuing to run, set v to RUNNING, and use p->first_tsc
  as the runstate time.

- If we're starting to run, set v to RUNNING, and use ri->tsc as the
  runstate time.

- If v is being deschedled, leave v in the INIT state to avoid dealing
  with TSC skew; but set p->current to NULL so that whatever is
  scheduled next won't trigger the assert in vcpu_next_update().

- If a vcpu is waking up (switching from one non-runnable state to
  another non-runnable state), leave v in INIT, and p in whatever
  state it's in (which may be the default domain, or some other vcpu
  which has already run).

While here, fix the comment above vcpu_start; it's called when the
vcpu state is INIT, not when current is the default domain.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: aab4b38b5d77e3c65f44bacd56427a85b7392a11
master date: 2023-06-30 11:25:33 +0100

20 months agox86/head: check base address alignment
Roger Pau Monné [Tue, 5 Sep 2023 06:46:28 +0000 (08:46 +0200)]
x86/head: check base address alignment

Ensure that the base address is 2M aligned, or else the page table
entries created would be corrupt as reserved bits on the PDE end up
set.

We have encountered a broken firmware where grub2 would end up loading
Xen at a non 2M aligned region when using the multiboot2 protocol, and
that caused a very difficult to debug triple fault.

If the alignment is not as required by the page tables print an error
message and stop the boot.  Also add a build time check that the
calculation of symbol offsets don't break alignment of passed
addresses.

The check could be performed earlier, but so far the alignment is
required by the page tables, and hence feels more natural that the
check lives near to the piece of code that requires it.

Note that when booted as an EFI application from the PE entry point
the alignment check is already performed by
efi_arch_load_addr_check(), and hence there's no need to add another
check at the point where page tables get built in
efi_arch_memory_setup().

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 0946068e7faea22868c577d7afa54ba4970ff520
master date: 2023-05-03 13:36:25 +0200

20 months agoxen/vcpu: ignore VCPU_SSHOTTMR_future
Roger Pau Monné [Tue, 5 Sep 2023 06:45:29 +0000 (08:45 +0200)]
xen/vcpu: ignore VCPU_SSHOTTMR_future

The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
When the hypervisor returns -ETIME (timeout in the past) Linux keeps
retrying to setup the timer with a higher timeout instead of
self-injecting a timer interrupt.

On boxes without any hardware assistance for logdirty we have seen HVM
Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
logdirty is enabled:

CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 506250 nsec
CE: xen increased min_delta_ns to 759375 nsec
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
Freezing user space processes ...
INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
Task dump for CPU 14:
swapper/14      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14

Thus leading to CPU stalls and a broken system as a result.

Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
the hypervisor.  Old Linux versions are the only ones known to have
(wrongly) attempted to use the flag, and ignoring it is compatible
with the behavior expected by any guests setting that flag.

Note the usage of the flag has been removed from Linux by commit:

c06b6d70feb3 xen/x86: don't lose event interrupts

Which landed in Linux 4.7.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Henry Wang <Henry.Wang@arm.com> # CHANGELOG
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 19c6cbd90965b1440bd551069373d6fa3f2f365d
master date: 2023-05-03 13:36:05 +0200

20 months agotools/vchan: Fix -Wsingle-bit-bitfield-constant-conversion
Andrew Cooper [Tue, 8 Aug 2023 13:53:42 +0000 (14:53 +0100)]
tools/vchan: Fix -Wsingle-bit-bitfield-constant-conversion

Gitlab reports:

  node.c:158:17: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]

        ctrl->blocking = 1;
                       ^ ~
  1 error generated.
  make[4]: *** [/builds/xen-project/people/andyhhp/xen/tools/vchan/../../tools/Rules.mk:188: node.o] Error 1

In Xen 4.18, this was fixed with c/s 99ab02f63ea8 ("tools: convert bitfields
to unsigned type") but this is an ABI change which can't be backported.

Swich 1 for -1 to provide a minimally invasive way to fix the build.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
20 months agoCI: Resync FreeBSD config with staging
Andrew Cooper [Fri, 17 Feb 2023 11:16:32 +0000 (11:16 +0000)]
CI: Resync FreeBSD config with staging

CI: Update FreeBSD to 13.1

Also print the compiler version before starting.  It's not easy to find
otherwise, and does change from time to time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 5e7667ea2dd33e0e5e0f3a96db37fdb4ecd98fba)

CI: Update FreeBSD to 13.2

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit f872a624cbf92de9944483eea7674ef80ced1380)

CI: Update FreeBSD to 12.4

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit a73560896ce3c513460f26bd1c205060d6ec4f8a)

20 months agorombios: Remove the use of egrep
Andrew Cooper [Fri, 18 Aug 2023 10:05:00 +0000 (11:05 +0100)]
rombios: Remove the use of egrep

As the Alpine 3.18 container notes:

  egrep: warning: egrep is obsolescent; using grep -E

Adjust it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 5ddac3c2852ecc120acab86fc403153a2097c5dc)

20 months agorombios: Avoid using K&R function syntax
Andrew Cooper [Fri, 18 Aug 2023 09:47:46 +0000 (10:47 +0100)]
rombios: Avoid using K&R function syntax

Clang-15 complains:

  tcgbios.c:598:25: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
  void tcpa_calling_int19h()
                          ^
                           void

C2x formally removes K&R syntax.  The declarations for these functions in
32bitprotos.h are already ANSI compatible.  Update the definitions to match.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a562afa5679d4a7ceb9cb9222fec1fea9a61f738)

20 months agorombios: Work around GCC issue 99578
Andrew Cooper [Thu, 17 Aug 2023 20:32:53 +0000 (21:32 +0100)]
rombios: Work around GCC issue 99578

GCC 12 objects to pointers derived from a constant:

  util.c: In function 'find_rsdp':
  util.c:429:16: error: array subscript 0 is outside array bounds of 'uint16_t[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds]
    429 |     ebda_seg = *(uint16_t *)ADDR_FROM_SEG_OFF(0x40, 0xe);
  cc1: all warnings being treated as errors

This is a GCC bug, but work around it rather than turning array-bounds
checking off generally.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit e35138a2ffbe1fe71edaaaaae71063dc545a8416)

20 months agox86emul: rework wrapping of libc functions in test and fuzzing harnesses
Jan Beulich [Fri, 18 Aug 2023 13:04:28 +0000 (15:04 +0200)]
x86emul: rework wrapping of libc functions in test and fuzzing harnesses

Our present approach is working fully behind the compiler's back. This
was found to not work with LTO. Employ ld's --wrap= option instead. Note
that while this makes the build work at least with new enough gcc (it
doesn't with gcc7, for example, due to tool chain side issues afaict),
according to my testing things still won't work when building the
fuzzing harness with afl-cc: While with the gcc7 tool chain I see afl-as
getting invoked, this does not happen with gcc13. Yet without using that
assembler wrapper the resulting binary will look uninstrumented to
afl-fuzz.

While checking the resulting binaries I noticed that we've gained uses
of snprintf() and strstr(), which only just so happen to not cause any
problems. Add a wrappers for them as well.

Since we don't have any actual uses of v{,sn}printf(), no definitions of
their wrappers appear (just yet). But I think we want
__wrap_{,sn}printf() to properly use __real_v{,sn}printf() right away,
which means we need delarations of the latter.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 6fba45ca3be1c5d46cddb1eaf371d9e69550b244)

20 months agoConfig.mk: evaluate XEN_COMPILE_ARCH and XEN_OS immediately
Anthony PERARD [Mon, 31 Jul 2023 13:02:34 +0000 (15:02 +0200)]
Config.mk: evaluate XEN_COMPILE_ARCH and XEN_OS immediately

With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

So to avoid having these command been run more than necessary, we
will replace ?= by an equivalent but with immediate expansion.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a07414d989cf52e5e84192b78023bee1589bbda4)

20 months agobuild: evaluate XEN_BUILD_* and XEN_DOMAIN immediately
Anthony PERARD [Mon, 31 Jul 2023 13:02:18 +0000 (15:02 +0200)]
build: evaluate XEN_BUILD_* and XEN_DOMAIN immediately

With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

Also, `make -d` shows a lot of these:
    Makefile:15: not recursively expanding XEN_BUILD_DATE to export to shell function
    Makefile:16: not recursively expanding XEN_BUILD_TIME to export to shell function
    Makefile:17: not recursively expanding XEN_BUILD_HOST to export to shell function
    Makefile:14: not recursively expanding XEN_DOMAIN to export to shell function

So to avoid having these command been run more than necessary, we
will replace ?= by an equivalent but with immediate expansion.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 0c594c1b57ee2ecec5f70826c53a2cf02a9c2acb)

20 months agobuild: remove TARGET_ARCH, a duplicate of SRCARCH
Anthony PERARD [Wed, 5 Jul 2023 06:29:49 +0000 (08:29 +0200)]
build: remove TARGET_ARCH, a duplicate of SRCARCH

The same command is used to generate the value of both $(TARGET_ARCH)
and $(SRCARCH), as $(ARCH) is an alias for $(XEN_TARGET_ARCH).

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ac27b3beb9b7b423d5563768de890c7594c21b4e)

20 months agobuild: remove TARGET_SUBARCH, a duplicate of ARCH
Anthony PERARD [Wed, 5 Jul 2023 06:27:51 +0000 (08:27 +0200)]
build: remove TARGET_SUBARCH, a duplicate of ARCH

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a6ab7dd061338c33faef629cbe52ed1608571d84)

20 months agobuild: define ARCH and SRCARCH later
Anthony PERARD [Wed, 5 Jul 2023 06:25:03 +0000 (08:25 +0200)]
build: define ARCH and SRCARCH later

Defining ARCH and SRCARCH later in xen/Makefile allows to switch to
immediate evaluation variable type.

ARCH and SRCARCH depend on value defined in Config.mk and aren't used
for e.g. TARGET_SUBARCH or TARGET_ARCH, and not before they're needed in
a sub-make or a rule.

This will help reduce the number of times the shell rune is been
run.

With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

Also, `make -d` shows a lot of these:
    Makefile:39: not recursively expanding SRCARCH to export to shell function
    Makefile:38: not recursively expanding ARCH to export to shell function

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 58e0a3f3b2c430f8640ef9df67ac857b0008ebc8)

20 months agolibxl: Use XEN_LIB_DIR to store bootloader from pygrub
Anthony PERARD [Mon, 21 Aug 2023 13:53:47 +0000 (15:53 +0200)]
libxl: Use XEN_LIB_DIR to store bootloader from pygrub

In osstest, the jobs using pygrub on arm64 on the branch linux-linus
started to fails with:
    [Errno 28] No space left on device
    Error writing temporary copy of ramdisk

This is because /var/run is small when dom0 has only 512MB to work
with, /var/run is only 40MB. The size of both kernel and ramdisk on
this jobs is now about 42MB, so not enough space in /var/run.

So, to avoid writing a big binary in ramfs, we will use /var/lib
instead, like we already do when saving the device model state on
migration.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
master commit: ad89640ad766d3cb6c92fc8b6406ca6bbab44136
master date: 2023-08-08 09:45:20 +0200

20 months agox86: fix build with old gcc after CPU policy changes
Jan Beulich [Mon, 21 Aug 2023 13:53:17 +0000 (15:53 +0200)]
x86: fix build with old gcc after CPU policy changes

Old gcc won't cope with initializers involving unnamed struct/union
fields.

Fixes: 441b1b2a50ea ("x86/emul: Switch x86_emulate_ctxt to cpu_policy")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 768846690d64bc730c1a1123e8de3af731bb2eb3
master date: 2023-04-19 11:02:47 +0200

20 months agoupdate Xen version to 4.17.3-pre
Jan Beulich [Mon, 21 Aug 2023 13:52:13 +0000 (15:52 +0200)]
update Xen version to 4.17.3-pre

21 months agoUpdate Xen to version 4.17.2 RELEASE-4.17.2
Andrew Cooper [Mon, 7 Aug 2023 11:11:56 +0000 (12:11 +0100)]
Update Xen to version 4.17.2

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
21 months agox86/spec-ctrl: Mitigate Gather Data Sampling
Andrew Cooper [Wed, 4 Jan 2023 16:32:44 +0000 (16:32 +0000)]
x86/spec-ctrl: Mitigate Gather Data Sampling

This is part of XSA-435 / CVE-2022-40982

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 56d690efd3ca3c68e1d222f259fb3d216206e8e5)

21 months agox86/spec-ctrl: Enumerations for Gather Data Sampling
Andrew Cooper [Wed, 4 Jan 2023 17:32:44 +0000 (17:32 +0000)]
x86/spec-ctrl: Enumerations for Gather Data Sampling

GDS_CTRL is introduced by the August 2023 microcode.  GDS_NO is for current
and future processors not susceptible to GDS.

This is part of XSA-435 / CVE-2022-40982

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 9f585f59d90c8d3a1b21369a852b7d7eee8a29b9)

21 months agox86/cpu-policy: Hide CLWB by default on SKX/CLX/CPX
Andrew Cooper [Mon, 27 Feb 2023 15:36:49 +0000 (15:36 +0000)]
x86/cpu-policy: Hide CLWB by default on SKX/CLX/CPX

The August 2023 microcode for GDS has an impact on the CLWB instruction.  See
code comments for full details.

This is part of XSA-435 / CVE-2022-40982

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 2dd06b4ea10891750af38e4a0e1efaeb0a9b3518)

21 months agox86/spec-ctrl: Mitigate Speculative Return Stack Overflow
Andrew Cooper [Thu, 15 Jun 2023 12:46:29 +0000 (13:46 +0100)]
x86/spec-ctrl: Mitigate Speculative Return Stack Overflow

On native, synthesise the SRSO bits by probing various hardware properties as
given by AMD.

Extend the IBPB-on-entry mitigations to Zen3/4 CPUs.  There is a microcode
prerequisite to make this an effective mitigation.

This is part of XSA-434 / CVE-2023-20569

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 220c06e6fefe2378f40e2a7391f5e265a2aa50f7)

21 months agox86/spec-ctrl: Enumerations for Speculative Return Stack Overflow
Andrew Cooper [Wed, 14 Jun 2023 08:13:28 +0000 (09:13 +0100)]
x86/spec-ctrl: Enumerations for Speculative Return Stack Overflow

AMD have specified new CPUID bits relating to SRSO.

 * SRSO_NO indicates that hardware is no longer vulnerable to SRSO.
 * IBPB_BRTYPE indicates that IBPB flushes branch type information too.
 * SBPB indicates support for a relaxed form of IBPB that does not flush
   branch type information.

Current CPUs (Zen4 and older) are not expected to enumerate these bits.
Native software is expected to synthesise them for guests using model and
microcode revision checks.

Two are just status bits, and SBPB is trivial to support for guests by
tweaking the reserved bit calculation in guest_wrmsr() and feature
dependencies.  Expose all by default to guests, so they start showing up when
Xen synthesises them.

While adding feature dependenies for IBPB, fix up an overlooked issue from
XSA-422.  It's inappropriate to advertise that IBPB flushes RET predictions if
IBPB is unavailable itself.

This is part of XSA-434 / CVE-2023-20569

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 2280b0ee2aed6e0fd4af3fa31bf99bc04d038bfe)

21 months agox86/spec-ctrl: Rework ibpb_calculations()
Andrew Cooper [Thu, 27 Jul 2023 19:03:28 +0000 (20:03 +0100)]
x86/spec-ctrl: Rework ibpb_calculations()

... in order to make the SRSO mitigations easier to integrate.

 * Check for AMD/Hygon CPUs directly, rather than assuming based on IBPB.
   In particular, Xen supports synthesising the IBPB bit to guests on Intel to
   allow IBPB while dissuading the use of (legacy) IBRS.
 * Collect def_ibpb_entry rather than opencoding the BTC_NO calculation for
   both opt_ibpb_entry_{pv,hvm}.

No functional change.

This is part of XSA-434 / CVE-2023-20569

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 292f68fb77196a35ac92b296792770d0f3190d75)

21 months agox86/cpu-policy: Advertise MSR_ARCH_CAPS to guests by default
Andrew Cooper [Wed, 17 May 2023 09:13:36 +0000 (10:13 +0100)]
x86/cpu-policy: Advertise MSR_ARCH_CAPS to guests by default

With xl/libxl now able to control the policy bits for MSR_ARCH_CAPS, it is
safe to advertise to guests by default.  In turn, we don't need the special
case to expose details to dom0.

This advertises MSR_ARCH_CAPS to guests on *all* Intel hardware, even if the
register content ends up being empty.

  - Advertising ARCH_CAPS and not RSBA signals "retpoline is safe here and
    everywhere you might migrate to".  This is important because it avoids the
    guest kernel needing to rely on model checks.

  - Alternatively, levelling for safety across the Broadwell/Skylake divide
    requires advertising ARCH_CAPS and RSBA, meaning "retpoline not safe on
    some hardware you might migrate to".

On Cascade Lake and later hardware, guests can now see RDCL_NO (not vulnerable
to Meltdown) amongst others.  This causes substantial performance
improvements, as guests are no longer applying software mitigations in cases
where they don't need to.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 4b2cdbfe766e5666e6754198946df2dc16f6a642)

21 months agolibxl: allow building with old gcc again
Jan Beulich [Thu, 3 Aug 2023 15:35:39 +0000 (17:35 +0200)]
libxl: allow building with old gcc again

We can't use initializers of unnamed struct/union members just yet.

Fixes: d638fe233cb3 ("libxl: use the cpuid feature names from cpufeatureset.h")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 1aa5acbbec3f37bf38d78fa96d210053f8e8efd5)

21 months agolibxl: avoid shadowing of index()
Jan Beulich [Thu, 3 Aug 2023 15:35:26 +0000 (17:35 +0200)]
libxl: avoid shadowing of index()

Because of -Wshadow the build otherwise fails with old enough glibc.

While there also obey line length limits for msr_add().

Fixes: 6d21cedbaa34 ("libxl: add support for parsing MSR features")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 4f6afde88be3e8960eb311d16ac41d44ab71ed10)

21 months agolibxl: add support for parsing MSR features
Roger Pau Monne [Tue, 25 Jul 2023 13:05:58 +0000 (15:05 +0200)]
libxl: add support for parsing MSR features

Introduce support for handling MSR features in
libxl_cpuid_parse_config().  The MSR policies are added to the
libxl_cpuid_policy like the CPUID one, which gets passed to
xc_cpuid_apply_policy().

This allows existing users of libxl to provide MSR related features as
key=value pairs to libxl_cpuid_parse_config() without requiring the
usage of a different API.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 6d21cedbaa34b3a3856f964189e911112c732b21)

21 months agolibxl: use the cpuid feature names from cpufeatureset.h
Roger Pau Monne [Tue, 25 Jul 2023 13:05:57 +0000 (15:05 +0200)]
libxl: use the cpuid feature names from cpufeatureset.h

The current implementation in libxl_cpuid_parse_config() requires
keeping a list of cpuid feature bits that should be mostly in sync
with the contents of cpufeatureset.h.

Avoid such duplication by using the automatically generated list of
cpuid features in INIT_FEATURE_NAMES in order to map feature names to
featureset bits, and then translate from featureset bits into cpuid
leaf, subleaf, register tuple.

Note that the full contents of the previous cpuid translation table
can't be removed.  That's because some feature names allowed by libxl
are not described in the featuresets, or because naming has diverged
and the previous nomenclature is preserved for compatibility reasons.

Should result in no functional change observed by callers, albeit some
new cpuid features will be available as a result of the change.

While there constify cpuid_flags name field.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit d638fe233cb3a45105319df75df0edfed2fde5a5)

21 months agolibxl: split logic to parse user provided CPUID features
Roger Pau Monne [Tue, 25 Jul 2023 13:05:56 +0000 (15:05 +0200)]
libxl: split logic to parse user provided CPUID features

Move the CPUID value parsers out of libxl_cpuid_parse_config() into a
newly created cpuid_add() local helper.  This is in preparation for
also adding MSR feature parsing support.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit e2b1da9b8fda0ed7d3dca7bd15829cfea496973a)

21 months agolibxl: introduce MSR data in libxl_cpuid_policy
Roger Pau Monne [Wed, 26 Jul 2023 07:47:53 +0000 (09:47 +0200)]
libxl: introduce MSR data in libxl_cpuid_policy

Add a new array field to libxl_cpuid_policy in order to store the MSR
policies.

Adding the MSR data in the libxl_cpuid_policy_list type is done so
that existing users can seamlessly pass MSR features as part of the
CPUID data, without requiring the introduction of a separate
domain_build_info field, and a new set of handlers functions.

Note that support for parsing the old JSON format is kept, as that's
required in order to restore domains or received migrations from
previous tool versions.  Differentiation between the old and the new
formats is done based on whether the contents of the 'cpuid' field is
an array or a map JSON object.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 5b80cecb747b2176b9e85f6e7aa7be83416d77e1)

21 months agolibxl: change the type of libxl_cpuid_policy_list
Roger Pau Monne [Tue, 25 Jul 2023 13:05:54 +0000 (15:05 +0200)]
libxl: change the type of libxl_cpuid_policy_list

Currently libxl_cpuid_policy_list is an opaque type to the users of
libxl, and internally it's an array of xc_xend_cpuid objects.

Change the type to instead be a structure that contains one array for
CPUID policies, in preparation for it also holding another array for
MSR policies.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 4825d19603580949144ac2ac5cb22df75c9da954)

21 months agolibs/guest: introduce support for setting guest MSRs
Roger Pau Monne [Tue, 25 Jul 2023 13:05:53 +0000 (15:05 +0200)]
libs/guest: introduce support for setting guest MSRs

Like it's done with CPUID, introduce support for passing MSR values to
xc_cpuid_apply_policy().  The chosen format for expressing MSR policy
data matches the current one used for CPUID.  Note that existing
callers of xc_cpuid_apply_policy() can pass NULL as the value for the
newly introduced 'msr' parameter in order to preserve the same
functionality, and in fact that's done in libxl on this patch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit ed742cf1b65c822759833027ca5cbb087c506a41)

21 months agox86/cpu-policy: Derive RSBA/RRSBA for guest policies
Andrew Cooper [Wed, 24 May 2023 14:41:21 +0000 (15:41 +0100)]
x86/cpu-policy: Derive RSBA/RRSBA for guest policies

The RSBA bit, "RSB Alternative", means that the RSB may use alternative
predictors when empty.  From a practical point of view, this mean "Retpoline
not safe".

Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a
statement that IBRS is implemented in hardware (as opposed to the form
retrofitted to existing CPUs in microcode).

The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
property that predictions are tagged with the mode in which they were learnt.
Therefore, it means "when eIBRS is active, the RSB may fall back to
alternative predictors but restricted to the current prediction mode".  As
such, it's stronger statement than RSBA, but still means "Retpoline not safe".

CPUs are not expected to enumerate both RSBA and RRSBA.

Add feature dependencies for EIBRS and RRSBA.  While technically they're not
linked, absolutely nothing good can come of letting the guest see RRSBA
without EIBRS.  Nor a guest seeing EIBRS without IBRSB.  Furthermore, we use
this dependency to simplify the max derivation logic.

The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS
dependency maybe hiding RRSBA).  We can run any VM, even if it has been told
"somewhere you might run, Retpoline isn't safe".

The default policies are more complicated.  A guest shouldn't see both bits,
but it needs to see one if the current host suffers from any form of RSBA, and
which bit it needs to see depends on whether eIBRS is visible or not.
Therefore, the calculation must be performed after sanitise_featureset().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit e0586a4ff514590eec50185e2440b97f9a31cb7f)

21 months agox86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate
Andrew Cooper [Thu, 25 May 2023 19:31:22 +0000 (20:31 +0100)]
x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

In order to level a VM safely for migration, the toolstack needs to know the
RSBA/RRSBA properties of the CPU, whether or not they happen to be enumerated.

See the code comment for details.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 36525a964fb629d0bd26e5a1c42de467af7a42a7)

21 months agox86/spec-ctrl: Rename retpoline_safe() to retpoline_calculations()
Andrew Cooper [Fri, 26 May 2023 09:35:47 +0000 (10:35 +0100)]
x86/spec-ctrl: Rename retpoline_safe() to retpoline_calculations()

This is prep work, split out to simply the diff on the following change.

 * Rename to retpoline_calculations(), and call unconditionally.  It is
   shortly going to synthesise missing enumerations required for guest safety.
 * For the model check switch statement, store the result in a variable and
   break rather than returning directly.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 724c0d94ff79b208312d30676392bfdd693403be)

21 months agox86/spec-ctrl: Use a taint for CET without MSR_SPEC_CTRL
Andrew Cooper [Mon, 5 Jun 2023 10:09:11 +0000 (11:09 +0100)]
x86/spec-ctrl: Use a taint for CET without MSR_SPEC_CTRL

Reword the comment for 'S' to include an incompatible set of features on the
same core.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 3f63f4510422c29fda7ba238b880cbb53eca34fe)

21 months agox86/spec-ctrl: Fix the rendering of FB_CLEAR
Andrew Cooper [Mon, 12 Jun 2023 19:24:00 +0000 (20:24 +0100)]
x86/spec-ctrl: Fix the rendering of FB_CLEAR

FB_CLEAR is a read-only status bit, not a read-write control.  Move it from
"Hardware features" into "Hardware hints".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 921afcbae843bb3f575a8f4a270b8e6cf471f4ca)

21 months agox86/cpu-policy: Rearrange guest_common_default_feature_adjustments()
Andrew Cooper [Fri, 10 Mar 2023 16:23:20 +0000 (16:23 +0000)]
x86/cpu-policy: Rearrange guest_common_default_feature_adjustments()

This is prep work, split out to simply the diff on the following change.

 * Split the INTEL check out of the IvyBridge RDRAND check, as the former will
   be reused.
 * Use asm/intel-family.h to remove a raw 0x3a model number.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 064f572f96f1558faae0a74cad616ba95ec8ff34)

21 months agox86/spec-ctrl: Update hardware hints
Andrew Cooper [Tue, 30 May 2023 15:03:16 +0000 (16:03 +0100)]
x86/spec-ctrl: Update hardware hints

 * Rename IBRS_ALL to EIBRS.  EIBRS is the term that everyone knows, and this
   makes ARCH_CAPS_EIBRS match the X86_FEATURE_EIBRS form.
 * Print RRSBA too, which is also a hint about behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 94200e1bae07e725cc07238c11569c5cab7befb7)

21 months agox86/spec-ctrl: Remove opencoded MSR_ARCH_CAPS check
Andrew Cooper [Mon, 15 May 2023 18:15:48 +0000 (19:15 +0100)]
x86/spec-ctrl: Remove opencoded MSR_ARCH_CAPS check

MSR_ARCH_CAPS data is now included in featureset information.  Replace
opencoded checks with regular feature ones.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 511b9f286c3dadd041e0d90beeff7d47c9bf3b7a)

21 months agox86/tsx: Remove opencoded MSR_ARCH_CAPS check
Andrew Cooper [Mon, 15 May 2023 18:05:01 +0000 (19:05 +0100)]
x86/tsx: Remove opencoded MSR_ARCH_CAPS check

The current cpu_has_tsx_ctrl tristate is serving double pupose; to signal the
first pass through tsx_init(), and the availability of MSR_TSX_CTRL.

Drop the variable, replacing it with a once boolean, and altering
cpu_has_tsx_ctrl to come out of the feature information.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 205a9f970378c31ae3e00b52d59103a2e881b9e0)

21 months agox86/vtx: Remove opencoded MSR_ARCH_CAPS check
Andrew Cooper [Mon, 15 May 2023 15:59:25 +0000 (16:59 +0100)]
x86/vtx: Remove opencoded MSR_ARCH_CAPS check

MSR_ARCH_CAPS data is now included in featureset information.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 8f6bc7f9b72eb7cf0c8c5ae5d80498a58ba0b7c3)

21 months agox86/boot: Expose MSR_ARCH_CAPS data in guest max policies
Andrew Cooper [Fri, 12 May 2023 14:53:35 +0000 (15:53 +0100)]
x86/boot: Expose MSR_ARCH_CAPS data in guest max policies

We already have common and default feature adjustment helpers.  Introduce one
for max featuresets too.

Offer MSR_ARCH_CAPS unconditionally in the max policy, and stop clobbering the
data inherited from the Host policy.  This will be necessary to level a VM
safely for migration.  Annotate the ARCH_CAPS CPUID bit as special.  Note:
ARCH_CAPS is still max-only for now, so will not be inhereted by the default
policies.

With this done, the special case for dom0 can be shrunk to just resampling the
Host policy (as ARCH_CAPS isn't visible by default yet).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit bbb289f3d5bdd3358af748d7c567343532ac45b5)

21 months agox86/boot: Record MSR_ARCH_CAPS for the Raw and Host CPU policy
Andrew Cooper [Fri, 12 May 2023 14:37:02 +0000 (15:37 +0100)]
x86/boot: Record MSR_ARCH_CAPS for the Raw and Host CPU policy

Extend x86_cpu_policy_fill_native() with a read of ARCH_CAPS based on the
CPUID information just read, removing the specially handling in
calculate_raw_cpu_policy().

Right now, the only use of x86_cpu_policy_fill_native() outside of Xen is the
unit tests.  Getting MSR data in this context is left to whomever first
encounters a genuine need to have it.

Extend generic_identify() to read ARCH_CAPS into x86_capability[], which is
fed into the Host Policy.  This in turn means there's no need to special case
arch_caps in calculate_host_policy().

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 70553000d6b44dd7c271a35932b0b3e1f22c5532)

21 months agox86/cpu-policy: MSR_ARCH_CAPS feature names
Andrew Cooper [Fri, 12 May 2023 17:50:59 +0000 (18:50 +0100)]
x86/cpu-policy: MSR_ARCH_CAPS feature names

Seed the default visibility from the dom0 special case, which for the most
part just exposes the *_NO bits.  EIBRS is the one non-*_NO bit, which is
"just" a status bit to the guest indicating a change in implemention of IBRS
which is already fully supported.

Insert a block dependency from the ARCH_CAPS CPUID bit to the entire content
of the MSR.  This is because MSRs have no structure information similar to
CPUID, and used by x86_cpu_policy_clear_out_of_range_leaves(), in order to
bulk-clear inaccessable words.

The overall CPUID bit is still max-only, so all of MSR_ARCH_CAPS is hidden in
the default policies.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ce8c930851a5ca21c4e70f83be7e8b290ce1b519)

21 months agox86/cpu-policy: Infrastructure for MSR_ARCH_CAPS
Andrew Cooper [Fri, 12 May 2023 16:55:21 +0000 (17:55 +0100)]
x86/cpu-policy: Infrastructure for MSR_ARCH_CAPS

Bits through 24 are already defined, meaning that we're not far off needing
the second word.  Put both in right away.

As both halves are present now, the arch_caps field is full width.  Adjust the
unit test, which notices.

The bool bitfield names in the arch_caps union are unused, and somewhat out of
date.  They'll shortly be automatically generated.

Add CPUID and MSR prefixes to the ./xen-cpuid verbose output, now that there
are a mix of the two.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit d9fe459ffad8a6eac2f695adb2331aff83c345d1)

21 months agox86/boot: Adjust MSR_ARCH_CAPS handling for the Host policy
Andrew Cooper [Mon, 15 May 2023 13:14:53 +0000 (14:14 +0100)]
x86/boot: Adjust MSR_ARCH_CAPS handling for the Host policy

We are about to move MSR_ARCH_CAPS into featureset, but the order of
operations (copy raw policy, then copy x86_capabilitiles[] in) will end up
clobbering the ARCH_CAPS value.

Some toolstacks use this information to handle TSX compatibility across the
CPUs and microcode versions where support was removed.

To avoid this transient breakage, read from raw_cpu_policy rather than
modifying it in place.  This logic will be removed entirely in due course.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 43912f8dbb1888ffd7f00adb10724c70e71927c4)

21 months agox86/boot: Rework dom0 feature configuration
Andrew Cooper [Fri, 12 May 2023 12:52:39 +0000 (13:52 +0100)]
x86/boot: Rework dom0 feature configuration

Right now, dom0's feature configuration is split between between the common
path and a dom0-specific one.  This mostly is by accident, and causes some
very subtle bugs.

First, start by clearly defining init_dom0_cpuid_policy() to be the domain
that Xen builds automatically.  The late hwdom case is still constructed in a
mostly normal way, with the control domain having full discretion over the CPU
policy.

Identifying this highlights a latent bug - the two halves of the MSR_ARCH_CAPS
bodge are asymmetric with respect to the hardware domain.  This means that
shim, or a control-only dom0 sees the MSR_ARCH_CAPS CPUID bit but none of the
MSR content.  This in turn declares the hardware to be retpoline-safe by
failing to advertise the {R,}RSBA bits appropriately.  Restrict this logic to
the hardware domain, although the special case will cease to exist shortly.

For the CPUID Faulting adjustment, the comment in ctxt_switch_levelling()
isn't actually relevant.  Provide a better explanation.

Move the recalculate_cpuid_policy() call outside of the dom0-cpuid= case.
This is no change for now, but will become necessary shortly.

Finally, place the second half of the MSR_ARCH_CAPS bodge after the
recalculate_cpuid_policy() call.  This is necessary to avoid transiently
breaking the hardware domain's view while the handling is cleaned up.  This
special case will cease to exist shortly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ef1987fcb0fdfaa7ee148024037cb5fa335a7b2d)

21 months agox86/cpuid: Calculate FEATURESET_NR_ENTRIES more helpfully
Andrew Cooper [Wed, 10 May 2023 18:58:43 +0000 (19:58 +0100)]
x86/cpuid: Calculate FEATURESET_NR_ENTRIES more helpfully

When adding new featureset words, it is convenient to split the work into
several patches.  However, GCC 12 spotted that the way we prefer to split the
work results in a real (transient) breakage whereby the policy <-> featureset
helpers perform out-of-bounds accesses on the featureset array.

Fix this by having gen-cpuid.py calculate FEATURESET_NR_ENTRIES from the
comments describing the word blocks, rather than from the XEN_CPUFEATURE()
with the greatest value.

For simplicty, require that the word blocks appear in order.  This can be
revisted if we find a good reason to have blocks out of order.

No functional change.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 56e2c8e5860090a35d5f0cafe168223a2a7c0e62)

21 months agox86: Remove temporary {cpuid,msr}_policy defines
Andrew Cooper [Wed, 29 Mar 2023 12:07:03 +0000 (13:07 +0100)]
x86: Remove temporary {cpuid,msr}_policy defines

With all code areas updated, drop the temporary defines and adjust all
remaining users.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 994c1553a158ada9db5ab64c9178a0d23c0a42ce)

21 months agolibx86: Update library API for cpu_policy
Andrew Cooper [Mon, 3 Apr 2023 13:18:43 +0000 (14:18 +0100)]
libx86: Update library API for cpu_policy

Adjust the API and comments appropriately.

x86_cpu_policy_fill_native() will eventually contain MSR reads, but leave a
TODO in the short term.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 1b67fccf3b02825f6a036bad06cd17963d0972d2)

tools/libs/guest: Fix build following libx86 changes

I appear to have lost this hunk somewhere...

Fixes: 1b67fccf3b02 ("libx86: Update library API for cpu_policy")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 48d76e6da92f9ef76c8468e299349a2f698362fa)

21 months agotools/fuzz: Rework afl-policy-fuzzer
Andrew Cooper [Mon, 3 Apr 2023 16:14:14 +0000 (17:14 +0100)]
tools/fuzz: Rework afl-policy-fuzzer

With cpuid_policy and msr_policy merged to form cpu_policy, merge the
respective fuzzing logic.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a16dcd48c2db3f6820a15ea482551d289bd9cdec)

21 months agox86/emul: Switch x86_emulate_ctxt to cpu_policy
Andrew Cooper [Mon, 3 Apr 2023 19:03:57 +0000 (20:03 +0100)]
x86/emul: Switch x86_emulate_ctxt to cpu_policy

As with struct domain, retain cpuid as a valid alias for local code clarity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 441b1b2a50ea3656954d75e06d42c96d619ea0fc)

21 months agox86/boot: Merge CPUID policy initialisation logic into cpu-policy.c
Andrew Cooper [Mon, 3 Apr 2023 18:06:02 +0000 (19:06 +0100)]
x86/boot: Merge CPUID policy initialisation logic into cpu-policy.c

Switch to the newer cpu_policy nomenclature.  Do some easy cleanup of
includes.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 8eb56eb959a50bf9afd0fd590ec394e9145970a4)

21 months agox86/boot: Move MSR policy initialisation logic into cpu-policy.c
Andrew Cooper [Mon, 3 Apr 2023 16:48:43 +0000 (17:48 +0100)]
x86/boot: Move MSR policy initialisation logic into cpu-policy.c

Switch to the newer cpu_policy nomenclature.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 4f20f596ce9bd95bde077a1ae0d7e07d20a5f6be)

21 months agox86: Out-of-inline the policy<->featureset convertors
Andrew Cooper [Thu, 30 Mar 2023 17:21:01 +0000 (18:21 +0100)]
x86: Out-of-inline the policy<->featureset convertors

These are already getting over-large for being inline functions, and are only
going to grow further over time.  Out of line them, yielding the following net
delta from bloat-o-meter:

  add/remove: 2/0 grow/shrink: 0/4 up/down: 276/-1877 (-1601)

Switch to the newer cpu_policy terminology while doing so.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 1027df4c00823f8b448e3a6861cc7b6ce61ba4e4)

21 months agox86: Drop struct old_cpu_policy
Andrew Cooper [Wed, 29 Mar 2023 11:01:33 +0000 (12:01 +0100)]
x86: Drop struct old_cpu_policy

With all the complicated callers of x86_cpu_policies_are_compatible() updated
to use a single cpu_policy object, we can drop the final user of struct
old_cpu_policy.

Update x86_cpu_policies_are_compatible() to take (new) cpu_policy pointers,
reducing the amount of internal pointer chasing, and update all callers to
pass their cpu_policy objects directly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 66c5c99656314451ff9520f91cff5bb39fee9fed)

21 months agox86: Merge xc_cpu_policy's cpuid and msr objects
Andrew Cooper [Wed, 29 Mar 2023 11:37:33 +0000 (12:37 +0100)]
x86: Merge xc_cpu_policy's cpuid and msr objects

Right now, they're the same underlying type, containing disjoint information.

Use a single object instead.  Also take the opportunity to rename 'entries' to
'msrs' which is more descriptive, and more in line with nr_msrs being the
count of MSR entries in the API.

test-tsx uses xg_private.h to access the internals of xc_cpu_policy, so needs
updating at the same time.  Take the opportunity to improve the code clarity
by passing a cpu_policy rather than an xc_cpu_policy into some functions.

No practical change.  This undoes the transient doubling of storage space from
earlier patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c9985233ca663fea20fc8807cf509d2e3fef0dca)

21 months agox86: Merge a domain's {cpuid,msr} policy objects
Andrew Cooper [Wed, 29 Mar 2023 10:32:25 +0000 (11:32 +0100)]
x86: Merge a domain's {cpuid,msr} policy objects

Right now, they're the same underlying type, containing disjoint information.

Drop the d->arch.msr pointer, and union d->arch.cpuid to give it a second name
of cpu_policy in the interim.

Merge init_domain_{cpuid,msr}_policy() into a single init_domain_cpu_policy(),
moving the implementation into cpu-policy.c

No practical change.  This undoes the transient doubling of storage space from
earlier patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit bd13dae34809e61e37ba1cd5de893c5c10c46256)

21 months agox86: Merge the system {cpuid,msr} policy objects
Andrew Cooper [Wed, 29 Mar 2023 06:39:44 +0000 (07:39 +0100)]
x86: Merge the system {cpuid,msr} policy objects

Right now, they're the same underlying type, containing disjoint information.

Introduce a new cpu-policy.{h,c} to be the new location for all policy
handling logic.  Place the combined objects in __ro_after_init, which is new
since the original logic was written.

As we're trying to phase out the use of struct old_cpu_policy entirely, rework
update_domain_cpu_policy() to not pointer-chase through system_policies[].

This in turn allows system_policies[] in sysctl.c to become static and reduced
in scope to XEN_SYSCTL_get_cpu_policy.

No practical change.  This undoes the transient doubling of storage space from
earlier patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 6bc33366795d14a21a3244d0f3b63f7dccea87ef)

21 months agox86: Merge struct msr_policy into struct cpu_policy
Andrew Cooper [Tue, 28 Mar 2023 20:24:20 +0000 (21:24 +0100)]
x86: Merge struct msr_policy into struct cpu_policy

As with the cpuid side, use a temporary define to make struct msr_policy still
work.

Note, this means that domains now have two separate struct cpu_policy
allocations with disjoint information, and system policies are in a similar
position, as well as xc_cpu_policy objects in libxenguest.  All of these
duplications will be addressed in the following patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 03812da3754d550dd8cbee7289469069ea6f0073)

21 months agox86: Rename struct cpuid_policy to struct cpu_policy
Andrew Cooper [Tue, 28 Mar 2023 17:55:19 +0000 (18:55 +0100)]
x86: Rename struct cpuid_policy to struct cpu_policy

Also merge lib/x86/cpuid.h entirely into lib/x86/cpu-policy.h

Use a temporary define to make struct cpuid_policy still work.

There's one forward declaration of struct cpuid_policy in
tools/tests/x86_emulator/x86-emulate.h that isn't covered by the define, and
it's easier to rename that now than to rearrange the includes.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 743e530380a007774017df9dc2d8cb0659040ee3)

21 months agox86: Rename {domctl,sysctl}.cpu_policy.{cpuid,msr}_policy fields
Andrew Cooper [Tue, 28 Mar 2023 19:48:29 +0000 (20:48 +0100)]
x86: Rename {domctl,sysctl}.cpu_policy.{cpuid,msr}_policy fields

These weren't great names to begin with, and using {leaves,msrs} matches up
better with the existing nr_{leaves,msr} parameters anyway.

Furthermore, by renaming these fields we can get away with using some #define
trickery to avoid the struct {cpuid,msr}_policy merge needing to happen in a
single changeset.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 21e3ef57e0406b6b9a783f721f29df8f91a00f99)

xen: Correct comments after renaming xen_{dom,sys}ctl_cpu_policy fields

Fixes: 21e3ef57e040 ("x86: Rename {domctl,sysctl}.cpu_policy.{cpuid,msr}_policy fields")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 6e06d229d538ea51b92dc189546c522f5e903511)

21 months agox86: Rename struct cpu_policy to struct old_cpuid_policy
Andrew Cooper [Tue, 28 Mar 2023 19:31:33 +0000 (20:31 +0100)]
x86: Rename struct cpu_policy to struct old_cpuid_policy

We want to merge struct cpuid_policy and struct msr_policy together, and the
result wants to be called struct cpu_policy.

The current struct cpu_policy, being a pair of pointers, isn't terribly
useful.  Rename the type to struct old_cpu_policy, but it will disappear
entirely once the merge is complete.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c2ec94c370f211d73f336ccfbdb32499f1b05f82)

21 months agox86/sysctl: Retrofit XEN_SYSCTL_cpu_featureset_{pv,hvm}_max
Andrew Cooper [Fri, 10 Mar 2023 19:37:56 +0000 (19:37 +0000)]
x86/sysctl: Retrofit XEN_SYSCTL_cpu_featureset_{pv,hvm}_max

Featuresets are supposed to be disappearing when the CPU policy infrastructure
is complete, but that has taken longer than expected, and isn't going to be
complete imminently either.

In the meantime, Xen does have proper default/max featuresets, and xen-cpuid
can even get them via the XEN_SYSCTL_cpu_policy_* interface, but only knows
now to render them nicely via the featureset interface.

Differences between default and max are a frequent source of errors,
frequently too in secret leading up to an embargo, so extend the featureset
sysctl to allow xen-cpuid to render them all nicely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
(cherry picked from commit 433d012c6c2737ad5a9aaa994355a4140d601852)

21 months agotools/xen-cpuid: Rework the handling of dynamic featuresets
Andrew Cooper [Fri, 10 Mar 2023 19:04:22 +0000 (19:04 +0000)]
tools/xen-cpuid: Rework the handling of dynamic featuresets

struct fsinfo is the vestigial remnant of an older internal design which
didn't survive very long.

Simplify things by inlining get_featureset() and having a single memory
allocation that gets reused.  This in turn changes featuresets[] to be a
simple list of names, so rename it to fs_names[].

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ec3474e1dd42e6f410601f50b6e74fb7c442cfb9)

21 months agolibs/vchan: Fix -Wsingle-bit-bitfield-constant-conversion
Andrew Cooper [Tue, 8 Aug 2023 13:53:42 +0000 (14:53 +0100)]
libs/vchan: Fix -Wsingle-bit-bitfield-constant-conversion

Gitlab reports:

  init.c:348:18: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
          ctrl->is_server = 1;
                          ^ ~
  1 error generated.
  make[6]: *** [/builds/xen-project/people/andyhhp/xen/tools/libs/vchan/../../../tools/Rules.mk:188: init.o] Error 1

In Xen 4.18, this was fixed with c/s 99ab02f63ea8 ("tools: convert bitfields
to unsigned type") but this is an ABI change which can't be backported.

Swich 1 for -1 to provide a minimally invasive way to fix the build.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
21 months agosubdom: Fix -Werror=address failure in tmp_emulator
Andrew Cooper [Thu, 3 Aug 2023 19:52:08 +0000 (20:52 +0100)]
subdom: Fix -Werror=address failure in tmp_emulator

The opensuse-tumbleweed build jobs currently fail with:

  /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.c: In function 'rsa_private':
  /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.c:56:7: error: the comparison will always evaluate as 'true' for the address of 'p' will never be NULL [-Werror=address]
     56 |   if (!key->p || !key->q || !key->u) {
        |       ^
  In file included from /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.c:17:
  /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.h:28:12: note: 'p' declared here
     28 |   tpm_bn_t p;
        |            ^

This is because all tpm_bn_t's are 1-element arrays (of either a GMP or
OpenSSL BIGNUM flavour).

Adjust it to compile.  No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
(cherry picked from commit 46c5ef609b09cf51d7535aebbc05816eafca4c8d)

21 months agotools: drop bogus and obsolete ptyfuncs.m4
Olaf Hering [Fri, 12 May 2023 12:26:14 +0000 (12:26 +0000)]
tools: drop bogus and obsolete ptyfuncs.m4

According to openpty(3) it is required to include <pty.h> to get the
prototypes for openpty() and login_tty(). But this is not what the
function AX_CHECK_PTYFUNCS actually does. It makes no attempt to include
the required header.

The two source files which call openpty() and login_tty() already contain
the conditionals to include the required header.

Remove the bogus m4 file to fix build with clang, which complains about
calls to undeclared functions.

Remove usage of INCLUDE_LIBUTIL_H in libxl_bootloader.c, it is already
covered by inclusion of libxl_osdep.h.

Remove usage of PTYFUNCS_LIBS in libxl/Makefile, it is already covered
by UTIL_LIBS from config/StdGNU.mk.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 42abf5b9c53eb1b1a902002fcda68708234152c3)

21 months agoarm: Avoid using solaris syntax for .section directive
Khem Raj [Thu, 3 Aug 2023 14:31:36 +0000 (16:31 +0200)]
arm: Avoid using solaris syntax for .section directive

Assembler from binutils 2.41 will rejects ([1], [2]) the following
syntax

.section "name", #alloc

for any other any target other than ELF SPARC. This means we can't use
it in the Arm code.

So switch to the GNU syntax

.section name [, "flags"[, @type]]

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=11601
[2] https://sourceware.org/binutils/docs-2.41/as.html#Section

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
[jgrall: Reword commit message]
Acked-by: Julien Grall <jgrall@amazon.com>
master commit: dfc490a3740bb7d6889939934afadcb58891fbce
master date: 2023-08-02 22:29:52 +0100

21 months agoamd: disable C6 after 1000 days on Zen2
Roger Pau Monné [Thu, 3 Aug 2023 14:30:49 +0000 (16:30 +0200)]
amd: disable C6 after 1000 days on Zen2

As specified on Errata 1474:

"A core will fail to exit CC6 after about 1044 days after the last
system reset. The time of failure may vary depending on the spread
spectrum and REFCLK frequency."

Detect when running on AMD Zen2 and setup a timer to prevent entering
C6 after 1000 days of uptime.  Take into account the TSC value at boot
in order to account for any time elapsed before Xen has been booted.
Worst case we end up disabling C6 before strictly necessary, but that
would still be safe, and it's better than not taking the TSC value
into account and hanging.

Disable C6 by updating the MSR listed in the revision guide, this
avoids applying workarounds in the CPU idle drivers, as the processor
won't be allowed to enter C6 by the hardware itself.

Print a message once C6 is disabled in order to let the user know.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f7065b24f4fb8813a896b883e6ffd03d67f8a8f2
master date: 2023-07-31 15:05:48 +0200

21 months agotools/xenstore: fix XSA-417 patch
Juergen Gross [Thu, 3 Aug 2023 14:30:27 +0000 (16:30 +0200)]
tools/xenstore: fix XSA-417 patch

The fix for XSA-417 had a bug: domain_alloc_permrefs() will not return
a negative value in case of an error, but a plain errno value.

Note this is not considered to be a security issue, as the only case
where domain_alloc_permrefs() will return an error is a failed memory
allocation. As a guest should not be able to drive Xenstore out of
memory, this is NOT a problem a guest can trigger at will.

Fixes: ab128218225d ("tools/xenstore: fix checking node permissions")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
master commit: 0c53c638e16278078371ce028c74693841d7738a
master date: 2023-07-21 08:32:43 +0200

21 months agox86: fix early boot output
Jan Beulich [Thu, 3 Aug 2023 14:29:27 +0000 (16:29 +0200)]
x86: fix early boot output

Loading the VGA base address involves sym_esi(), i.e. %esi still needs
to hold the relocation base address. Therefore the address of the
message to output cannot be "passed" in %esi. Put the message offset in
%ecx instead, adding it into %esi _after_ its last use as base address.

Fixes: b28044226e1c ("x86: make Xen early boot code relocatable")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b1c16800e52743d9afd9af62c810f03af16dd942
master date: 2023-07-19 10:22:56 +0200

21 months agoocaml/libs/xc: Fix NULL dereference with physinfo_arch_caps()
Edwin Török [Thu, 3 Aug 2023 14:29:05 +0000 (16:29 +0200)]
ocaml/libs/xc: Fix NULL dereference with physinfo_arch_caps()

`Tag_cons` is `0` and is meant to be used as the tag argument for
`caml_alloc`/`caml_alloc_small` when constructing a non-empty list.

The empty list is `Val_emptylist` instead, which is really just `Val_int(0)`.

Assigning `0` to a list value like this is equivalent to assigning the naked
pointer `NULL` to the field.  Naked pointers are not valid in OCaml 5, however
even in OCaml <5.x any attempt to iterate on the list will lead to a segfault.

The list currently only has an opaque type, so no code would have reason to
iterate on it currently, but we shouldn't construct invalid OCaml values that
might lead to a crash when exploring the type.

`Val_emptylist` is available since OCaml 3.01 as a constant.

Fixes: e5ac68a0110c ("x86/hvm: Revert per-domain APIC acceleration support")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
master commit: 99e45548934923f0d2c1d52ae1239ffe4ef17a06
master date: 2023-07-13 11:06:07 +0100

21 months agoxen/arm: Add Cortex-A77 erratum 1508412 handling
Luca Fancellu [Mon, 17 Jul 2023 12:25:46 +0000 (13:25 +0100)]
xen/arm: Add Cortex-A77 erratum 1508412 handling

Cortex-A77 cores (r0p0, r1p0) could deadlock on a sequence of a
store-exclusive or read of PAR_EL1 and a load with device or non-cacheable
memory attributes.
A workaround is available, but it depends on a firmware counterpart.

The proposed workaround from the errata document is to modify the software
running at EL1 and above to include a DMB SY before and after accessing
PAR_EL1.

In conjunction to the above, the firmware needs to use a specific write
sequence to several IMPLEMENTATION DEFINED registers to have the hardware
insert a DMB SY after all load-exclusive and store-exclusive instructions.

Apply the workaround to Xen where PAR_EL1 is read, implementing an helper
function to do that.
Since Xen can be interrupted by irqs in any moment, add a barrier on
entry/exit when we are running on the affected cores.

A guest without the workaround can deadlock the system, so warn the users
of Xen with the above type of cores to use only trusted guests, by
printing a message on Xen startup.

This is XSA-436 / CVE-2023-34320.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
[stefano: add XSA-436 to commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
21 months agox86/amd: Fix DE_CFG truncation in amd_check_zenbleed()
Andrew Cooper [Fri, 28 Jul 2023 17:42:12 +0000 (18:42 +0100)]
x86/amd: Fix DE_CFG truncation in amd_check_zenbleed()

This line:

val &= ~chickenbit;

ends up truncating val to 32 bits, and turning off various errata workarounds
in Zen2 systems.

Fixes: f91c5ea97067 ("x86/amd: Mitigations for Zenbleed")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c0dd53b8cbd1e47e9c89873a9265a7170bdc6b4c)

21 months agoautomation: qemu-smoke-arm64.sh: Increase RAM size
Michal Orzel [Tue, 27 Sep 2022 09:47:20 +0000 (11:47 +0200)]
automation: qemu-smoke-arm64.sh: Increase RAM size

In the follow-up patch we will add new jobs using debug Xen builds.
Because the debug builds take more space and we might end up in
a situation when there is not enough free space (especially during
a static memory test that reserves some region in the middle), increase
RAM size for QEMU from 1GB to 2GB.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
(cherry picked from commit a0030a83e82a1fb03d6e3b7692678812d5971608)

21 months agox86/amd: Mitigations for Zenbleed
Andrew Cooper [Mon, 22 May 2023 22:03:00 +0000 (23:03 +0100)]
x86/amd: Mitigations for Zenbleed

Zenbleed is a malfunction on AMD Zen2 uarch parts which results in corruption
of the vector registers.  An attacker can trigger this bug deliberately in
order to access stale data in the physical vector register file.  This can
include data from sibling threads, or a higher-privilege context.

Microcode is the preferred mitigation but in the case that's not available use
the chickenbit as instructed by AMD.  Re-evaluate the mitigation on late
microcode load too.

This is XSA-433 / CVE-2023-20593.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit f91c5ea970675637721bb7f18adaa189837eb783)

21 months agoupdate qemuu tag
Jan Beulich [Fri, 21 Jul 2023 06:23:52 +0000 (08:23 +0200)]
update qemuu tag

21 months agotools: Remove the use of K&R functions
Andrew Cooper [Fri, 21 Jul 2023 06:23:19 +0000 (08:23 +0200)]
tools: Remove the use of K&R functions

Clang-15 (as seen in the FreeBSD 14 tests) complains:

  xg_main.c:1248 error: a function declaration without a
  prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
  xg_init()
         ^
          void

The error message is a bit confusing but appears to new as part of
-Wdeprecated-non-prototype which is part of supporting C2x which formally
removes K&R syntax.

Either way, fix the identified function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: e2312e41f05c0f2e3b714710bd2551a3cd74cedd
master date: 2023-02-17 11:01:54 +0000

21 months agoxen/x86: Remove the use of K&R functions
Andrew Cooper [Mon, 17 Jul 2023 07:32:07 +0000 (09:32 +0200)]
xen/x86: Remove the use of K&R functions

Clang-15 (as seen in the FreeBSD 14 tests) complains:

  arch/x86/time.c:1364:20: error: a function declaration without a
  prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
  s_time_t get_s_time()
                     ^
                      void

The error message is a bit confusing but appears to new as part of
-Wdeprecated-non-prototype which is part of supporting C2x which formally
removes K&R syntax.

Either way, fix the identified functions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 22b2fa4766728c3057757c00e79da5f7803fff33
master date: 2023-02-17 11:01:54 +0000

21 months agoiommu/vtd: fix address translation for leaf entries
Roger Pau Monné [Mon, 17 Jul 2023 06:29:18 +0000 (08:29 +0200)]
iommu/vtd: fix address translation for leaf entries

Fix two issues related to leaf address lookups in VT-d:

* When translating an address that falls inside of a superpage in the
  IOMMU page tables the fetching of the PTE value wasn't masking of the
  contiguous related data, which caused the returned data to be
  corrupt as it would contain bits that the caller would interpret as
  part of the address.

* When the requested leaf address wasn't mapped by a superpage the
  returned value wouldn't have any of the low 12 bits set, thus missing
  the permission bits expected by the caller.

Take the opportunity to also adjust the function comment to note that
when returning the full PTE the bits above PADDR_BITS are removed.

Fixes: c71e55501a61 ('VT-d: have callers specify the target level for page table walks')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 82b28deb25f37e8422b14493a2efa2852638206d
master date: 2023-06-19 15:46:03 +0200

21 months agoiommu/amd-vi: fix checking for Invalidate All support in amd_iommu_resume()
Roger Pau Monné [Mon, 17 Jul 2023 06:28:49 +0000 (08:28 +0200)]
iommu/amd-vi: fix checking for Invalidate All support in amd_iommu_resume()

The iommu local variable does not point to to a valid amd_iommu element
after the call to for_each_amd_iommu().  Instead check whether any IOMMU
on the system doesn't support Invalidate All in order to perform the
per-domain and per-device flushes.

Fixes: 9c46139de889 ('amd iommu: Support INVALIDATE_IOMMU_ALL command.')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 5ecbb779748a56495f2c892f0610d57dd623c7cd
master date: 2023-06-13 14:41:32 +0200

21 months agox86/microcode: Add missing unlock in microcode_update_helper()
Alejandro Vallejo [Mon, 17 Jul 2023 06:28:19 +0000 (08:28 +0200)]
x86/microcode: Add missing unlock in microcode_update_helper()

microcode_update_helper() may return early while holding
cpu_add_remove_lock, hence preventing any writers from taking it again.

Leave through the `put` label instead so it's properly released.

Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b35b22acb887f682efe8385b3df165220bc84c86
master date: 2023-06-05 16:11:10 +0100

21 months agovpci/header: cope with devices not having vpci allocated
Roger Pau Monné [Mon, 17 Jul 2023 06:27:30 +0000 (08:27 +0200)]
vpci/header: cope with devices not having vpci allocated

When traversing the list of pci devices assigned to a domain cope with
some of them not having the vpci struct allocated. It should be
possible for the hardware domain to have read-only devices assigned
that are not handled by vPCI, such support will be added by further
patches.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ee045f3a4a6dddb09f5aa96a50cceaae97d3245f
master date: 2023-05-26 09:18:37 +0200

21 months agotools: convert bitfields to unsigned type
Olaf Hering [Mon, 17 Jul 2023 06:27:04 +0000 (08:27 +0200)]
tools: convert bitfields to unsigned type

clang complains about the signed type:

implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Backport: Dropped the libxenvchan change, for the original commit saying

"The potential ABI change in libxenvchan is covered by the Xen version
 based SONAME."

which won't hold on stable trees.
master commit: 99ab02f63ea813f2e467a39a7736bf460a3f3495
master date: 2023-05-16 20:03:02 +0100

23 months agopci: fix pci_get_pdev() to always account for the segment
Roger Pau Monné [Tue, 23 May 2023 12:56:35 +0000 (14:56 +0200)]
pci: fix pci_get_pdev() to always account for the segment

When a domain parameter is provided to pci_get_pdev() the search
function would match against the bdf, without taking the segment into
account.

Fix this and also account for the passed segment.

Fixes: 8cf6e0738906 ('PCI: simplify (and thus correct) pci_get_pdev{,_by_domain}()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c7908869ac26961a3919491705e521179ad3fc0e
master date: 2023-05-22 16:11:55 +0200

23 months agosched/null: avoid crash after failed domU creation
Stewart Hildebrand [Tue, 23 May 2023 12:56:04 +0000 (14:56 +0200)]
sched/null: avoid crash after failed domU creation

When creating a domU, but the creation fails, there is a corner case that may
lead to a crash in the null scheduler when running a debug build of Xen.

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
(XEN) ****************************************

The events leading to the crash are:

* null_unit_insert() was invoked with the unit offline. Since the unit was
  offline, unit_assign() was not called, and null_unit_insert() returned.
* Later during domain creation, the unit was onlined
* Eventually, domain creation failed due to bad configuration
* null_unit_remove() was invoked with the unit still online. Since the unit was
  online, it called unit_deassign() and triggered an ASSERT.

To fix this, only call unit_deassign() when npc->unit is non-NULL in
null_unit_remove.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
master commit: c2eae2614c8f04e384cd3334c3f06f31a6cb5f41
master date: 2023-05-22 16:11:40 +0200

23 months agoiommu/amd-vi: fix assert comparing boolean to enum
Roger Pau Monné [Tue, 23 May 2023 12:55:30 +0000 (14:55 +0200)]
iommu/amd-vi: fix assert comparing boolean to enum

Or else when iommu_intremap is set to iommu_intremap_full the assert
triggers.

Fixes: 1ba66a870eba ('AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4c507d8a6b6e8be90881a335b0a66eb28e0f7737
master date: 2023-05-12 09:35:36 +0200

23 months agodocs/man: fix xenstore-write synopsis
Yann Dirson [Tue, 23 May 2023 12:55:11 +0000 (14:55 +0200)]
docs/man: fix xenstore-write synopsis

Reported-by: zithro <slack@rabbit.lu>
Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 8b1ac353b4db7c5bb2f82cb6afee9cc641e756a4
master date: 2023-05-09 10:37:29 +0100

23 months agons16550: enable memory decoding on MMIO-based PCI console card
Marek Marczykowski-Górecki [Tue, 23 May 2023 12:54:38 +0000 (14:54 +0200)]
ns16550: enable memory decoding on MMIO-based PCI console card

pci_serial_early_init() enables PCI_COMMAND_IO for IO-based UART
devices, add setting PCI_COMMAND_MEMORY for MMIO-based UART devices too.
Note the MMIO-based devices in practice need a "pci" sub-option,
otherwise a few parameters are not initialized (including bar_idx,
reg_shift, reg_width etc). The "pci" is not supposed to be used with
explicit BDF, so do not key setting PCI_COMMAND_MEMORY on explicit BDF
being set. Contrary to the IO-based UART, pci_serial_early_init() will
not attempt to set BAR0 address, even if user provided io_base manually
- in most cases, those are with an offest and the current cmdline syntax
doesn't allow expressing it. Due to this, enable PCI_COMMAND_MEMORY only
if uart->bar is already populated. In similar spirit, this patch does
not support setting BAR0 of the bridge.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: a16fb78515d54be95f81c0d1c0a3a7b954a54d0a
master date: 2023-05-08 14:15:38 +0200

23 months agotools/libs/guest: assist gcc13's realloc analyzer
Olaf Hering [Tue, 23 May 2023 12:54:17 +0000 (14:54 +0200)]
tools/libs/guest: assist gcc13's realloc analyzer

gcc13 fails to track the allocated memory in backup_ptes:

xg_offline_page.c: In function 'backup_ptes':
xg_offline_page.c:191:13: error: pointer 'orig' may be used after 'realloc' [-Werror=use-after-free]
  191 |             free(orig);

Assist the analyzer by slightly rearranging the code:
In case realloc succeeds, the previous allocation is either extended
or released internally. In case realloc fails, the previous allocation
is left unchanged. Return an error in this case, the caller will
release the currently allocated memory in its error path.

http://bugzilla.suse.com/show_bug.cgi?id=1210570

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Compile-tested-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 99a9c3d7141063ae3f357892c6181cfa3be8a280
master date: 2023-05-03 15:06:41 +0200

23 months agox86/mm: replace bogus assertion in paging_log_dirty_op()
Jan Beulich [Tue, 23 May 2023 12:53:39 +0000 (14:53 +0200)]
x86/mm: replace bogus assertion in paging_log_dirty_op()

While I was the one to introduce it, I don't think it is correct: A
bogus continuation call issued by a tool stack domain may find another
continuation in progress. IOW we've been asserting caller controlled
state (which is reachable only via a domctl), and the early (lock-less)
check in paging_domctl() helps in a limited way only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 0956aa2219745a198bb6a0a99e2108a3c09b280e
master date: 2023-05-03 13:38:30 +0200