]> xenbits.xensource.com Git - xen.git/log
xen.git
14 months agoamd-vi: fix IVMD memory type checks
Roger Pau Monné [Tue, 27 Feb 2024 12:54:04 +0000 (13:54 +0100)]
amd-vi: fix IVMD memory type checks

The current code that parses the IVMD blocks is relaxed with regard to the
restriction that such unity regions should always fall into memory ranges
marked as reserved in the memory map.

However the type checks for the IVMD addresses are inverted, and as a result
IVMD ranges falling into RAM areas are accepted.  Note that having such ranges
in the first place is a firmware bug, as IVMD should always fall into reserved
ranges.

Fixes: ed6c77ebf0c1 ('AMD/IOMMU: check / convert IVMD ranges for being / to be reserved')
Reported-by: Ox <oxjo@proton.me>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: oxjo <oxjo@proton.me>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 83afa313583019d9f159c122cecf867735d27ec5
master date: 2024-02-06 11:56:13 +0100

14 months agotools/xentop: fix sorting bug for some columns
Cyril Rébert (zithro) [Tue, 27 Feb 2024 12:53:42 +0000 (13:53 +0100)]
tools/xentop: fix sorting bug for some columns

Sort doesn't work on columns VBD_OO, VBD_RD, VBD_WR and VBD_RSECT.
Fix by adjusting variables names in compare functions.
Bug fix only. No functional change.

Fixes: 91c3e3dc91d6 ("tools/xentop: Display '-' when stats are not available.")
Signed-off-by: Cyril Rébert (zithro) <slack@rabbit.lu>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 29f17d837421f13c0e0010802de1b2d51d2ded4a
master date: 2024-02-05 17:58:23 +0000

15 months agox86/ucode: Fix stability of the raw CPU Policy rescan
Andrew Cooper [Thu, 1 Feb 2024 17:02:24 +0000 (18:02 +0100)]
x86/ucode: Fix stability of the raw CPU Policy rescan

Always run microcode_update_helper() on the BSP, so the the updated Raw CPU
policy doesn't get non-BSP topology details included.

Have calculate_raw_cpu_policy() clear the instantanious XSTATE sizes.  The
value XCR0 | MSR_XSS had when we scanned the policy isn't terribly interesting
to report.

When CPUID Masking is active, it affects CPUID instructions issued by Xen
too.  Transiently disable masking to get a clean scan.

Fixes: 694d79ed5aac ("x86/ucode: Refresh raw CPU policy after microcode load")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cf7fe8b72deaa94157ddf97d4bb391480205e9c2
master date: 2024-01-25 17:46:57 +0000

15 months agox86/p2m-pt: fix off by one in entry check assert
Roger Pau Monné [Thu, 1 Feb 2024 17:01:52 +0000 (18:01 +0100)]
x86/p2m-pt: fix off by one in entry check assert

The MMIO RO rangeset overlap check is bogus: the rangeset is inclusive so the
passed end mfn should be the last mfn to be mapped (not last + 1).

Fixes: 6fa1755644d0 ('amd/npt/shadow: replace assert that prevents creating 2M/1G MMIO entries')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
master commit: 610775d0dd61c1bd2f4720c755986098e6a5bafd
master date: 2024-01-25 16:09:04 +0100

15 months agolib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps to $(targets)
Michal Orzel [Thu, 1 Feb 2024 17:01:27 +0000 (18:01 +0100)]
lib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps to $(targets)

At the moment, trying to run xencov read/reset (calling SYSCTL_coverage_op
under the hood) results in a crash. This is due to a profiler trying to
access data in the .init.* sections (libfdt for Arm and libelf for x86)
that are stripped after boot. Normally, the build system compiles any
*.init.o file without COV_FLAGS. However, these two libraries are
handled differently as sections will be renamed to init after linking.

To override COV_FLAGS to empty for these libraries, lib{fdt,elf}.o were
added to nocov-y. This worked until e321576f4047 ("xen/build: start using
if_changed") that added lib{fdt,elf}-temp.o and their deps to extra-y.
This way, even though these objects appear as prerequisites of
lib{fdt,elf}.o and the settings should propagate to them, make can also
build them as a prerequisite of __build, in which case COV_FLAGS would
still have the unwanted flags. Fix it by switching to $(targets) instead.

Also, for libfdt, append libfdt.o to nocov-y only if CONFIG_OVERLAY_DTB
is not set. Otherwise, there is no section renaming and we should be able
to run the coverage.

Fixes: e321576f4047 ("xen/build: start using if_changed")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 79519fcfa0605bbf19d8c02b979af3a2c8afed68
master date: 2024-01-23 12:02:44 +0100

15 months agox86/vmx: Disallow the use of inactivity states
Andrew Cooper [Thu, 1 Feb 2024 17:00:32 +0000 (18:00 +0100)]
x86/vmx: Disallow the use of inactivity states

Right now, vvmx will blindly copy L12's ACTIVITY_STATE into the L02 VMCS and
enter the vCPU.  Luckily for us, nested-virt is explicitly unsupported for
security bugs.

The inactivity states are HLT, SHUTDOWN and WAIT-FOR-SIPI, and as noted by the
SDM in Vol3 27.7 "Special Features of VM Entry":

  If VM entry ends with the logical processor in an inactive activity state,
  the VM entry generates any special bus cycle that is normally generated when
  that activity state is entered from the active state.

Also,

  Some activity states unconditionally block certain events.

I.e. A VMEntry with ACTIVITY=SHUTDOWN will initiate a platform reset, while a
VMEntry with ACTIVITY=WAIT-FOR-SIPI will really block everything other than
SIPIs.

Both of these activity states are for the TXT ACM to use, not for regular
hypervisors, and Xen doesn't support dropping the HLT intercept either.

There are two paths in Xen which operate on ACTIVITY_STATE.

1) The vmx_{get,set}_nonreg_state() helpers for VM-Fork.

   As regular VMs can't use any inactivity states, this is just duplicating
   the 0 from construct_vmcs().  Retain the ability to query activity_state,
   but crash the domain on any attempt to set an inactivity state.

2) Nested virt, because of ACTIVITY_STATE in vmcs_gstate_field[].

   Explicitly hide the inactivity states in the guest's view of MSR_VMX_MISC,
   and remove ACTIVITY_STATE from vmcs_gstate_field[].

   In virtual_vmentry(), we should trigger a VMEntry failure for the use of
   any inactivity states, but there's no support for that in the code at all
   so leave a TODO for when we finally start working on nested-virt in
   earnest.

Reported-by: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
master commit: 3643bb53a05b7c8fbac072c63bef1538f2a6d0d2
master date: 2024-01-18 20:59:06 +0000

15 months agox86/vmx: Fix IRQ handling for EXIT_REASON_INIT
Andrew Cooper [Thu, 1 Feb 2024 16:59:57 +0000 (17:59 +0100)]
x86/vmx: Fix IRQ handling for EXIT_REASON_INIT

When receiving an INIT, a prior bugfix tried to ignore the INIT and continue
onwards.

Unfortunately it's not safe to return at that point in vmx_vmexit_handler().
Just out of context in the first hunk is a local_irqs_enabled() which is
depended-upon by the return-to-guest path, causing the following checklock
failure in debug builds:

  (XEN) Error: INIT received - ignoring
  (XEN) CHECKLOCK FAILURE: prev irqsafe: 0, curr irqsafe 1
  (XEN) Xen BUG at common/spinlock.c:132
  (XEN) ----[ Xen-4.19-unstable  x86_64  debug=y  Tainted:     H  ]----
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040238e10>] R check_lock+0xcd/0xe1
  (XEN)    [<ffff82d040238fe3>] F _spin_lock+0x1b/0x60
  (XEN)    [<ffff82d0402ed6a8>] F pt_update_irq+0x32/0x3bb
  (XEN)    [<ffff82d0402b9632>] F vmx_intr_assist+0x3b/0x51d
  (XEN)    [<ffff82d040206447>] F vmx_asm_vmexit_handler+0xf7/0x210

Luckily, this is benign in release builds.  Accidentally having IRQs disabled
when trying to take an IRQs-on lock isn't a deadlock-vulnerable pattern.

Drop the problematic early return.  In hindsight, it's wrong to skip other
normal VMExit steps.

Fixes: b1f11273d5a7 ("x86/vmx: Don't spuriously crash the domain when INIT is received")
Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d1f8883aebe00f6a9632d77ab0cd5c6d02c9cbe4
master date: 2024-01-18 20:59:06 +0000

15 months agox86/intel: ensure Global Performance Counter Control is setup correctly
Roger Pau Monné [Thu, 1 Feb 2024 16:59:25 +0000 (17:59 +0100)]
x86/intel: ensure Global Performance Counter Control is setup correctly

When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
MSR contains per-counter enable bits that is ANDed with the enable bit in the
counter EVNTSEL MSR in order for a PMC counter to be enabled.

So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
bits being set by default, but at least on some Intel Sapphire and Emerald
Rapids this is no longer the case, and Xen reports:

Testing NMI watchdog on all CPUs: 0 40 stuck

The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
relevant enable bit in PERF_GLOBAL_CTRL not being set.

Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
general-purpose PMCs are enabled.  Doing so brings the state of the package-BSP
PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 6bdb965178bbb3fc50cd4418d4770a7789956e2c
master date: 2024-01-17 10:40:52 +0100

15 months agoCirrusCI: drop FreeBSD 12
Roger Pau Monné [Thu, 1 Feb 2024 16:58:59 +0000 (17:58 +0100)]
CirrusCI: drop FreeBSD 12

Went EOL by the end of December 2023, and the pkg repos have been shut down.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: c2ce3466472e9c9eda79f5dc98eb701bc6fdba20
master date: 2024-01-15 12:20:11 +0100

15 months agox86/amd: Extend CPU erratum #1474 fix to more affected models
Roger Pau Monné [Thu, 1 Feb 2024 16:58:17 +0000 (17:58 +0100)]
x86/amd: Extend CPU erratum #1474 fix to more affected models

Erratum #1474 has now been extended to cover models from family 17h ranges
00-2Fh, so the errata now covers all the models released under Family
17h (Zen, Zen+ and Zen2).

Additionally extend the workaround to Family 18h (Hygon), since it's based on
the Zen architecture and very likely affected.

Rename all the zen2 related symbols to fam17, since the errata doesn't
exclusively affect Zen2 anymore.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 23db507a01a4ec5259ec0ab43d296a41b1c326ba
master date: 2023-12-21 12:19:40 +0000

15 months agoVT-d: Fix "else" vs "#endif" misplacement
Andrew Cooper [Tue, 30 Jan 2024 13:34:40 +0000 (14:34 +0100)]
VT-d: Fix "else" vs "#endif" misplacement

In domain_pgd_maddr() the "#endif" is misplaced with respect to "else".  This
generates incorrect logic when CONFIG_HVM is compiled out, as the "else" body
is executed unconditionally.

Rework the logic to use IS_ENABLED() instead of explicit #ifdef-ary, as it's
clearer to follow.  This in turn involves adjusting p2m_get_pagetable() to
compile when CONFIG_HVM is disabled.

This is XSA-450 / CVE-2023-46840.

Fixes: 033ff90aa9c1 ("x86/P2M: p2m_{alloc,free}_ptp() and p2m_alloc_table() are HVM-only")
Reported-by: Teddy Astie <teddy.astie@vates.tech>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cc6ba68edf6dcd18c3865e7d7c0f1ed822796426
master date: 2024-01-30 14:29:15 +0100

15 months agopci: fail device assignment if phantom functions cannot be assigned
Roger Pau Monné [Tue, 30 Jan 2024 13:33:48 +0000 (14:33 +0100)]
pci: fail device assignment if phantom functions cannot be assigned

The current behavior is that no error is reported if (some) phantom functions
fail to be assigned during device add or assignment, so the operation succeeds
even if some phantom functions are not correctly setup.

This can lead to devices possibly being successfully assigned to a domU while
some of the device phantom functions are still assigned to dom0.  Even when the
device is assigned domIO before being assigned to a domU phantom functions
might fail to be assigned to domIO, and also fail to be assigned to the domU,
leaving them assigned to dom0.

Since the device can generate requests using the IDs of those phantom
functions, given the scenario above a device in such state would be in control
of a domU, but still capable of generating transactions that use a context ID
targeting dom0 owned memory.

Modify device assign in order to attempt to deassign the device if phantom
functions failed to be assigned.

Note that device addition is not modified in the same way, as in that case the
device is assigned to a trusted domain, and hence partial assign can lead to
device malfunction but not a security issue.

This is XSA-449 / CVE-2023-46839

Fixes: 4e9950dc1bd2 ('IOMMU: add phantom function support')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cb4ecb3cc17b02c2814bc817efd05f3f3ba33d1e
master date: 2024-01-30 14:28:01 +0100

16 months agox86/x2apic: introduce a mixed physical/cluster mode
Roger Pau Monné [Tue, 12 Dec 2023 13:45:52 +0000 (14:45 +0100)]
x86/x2apic: introduce a mixed physical/cluster mode

The current implementation of x2APIC requires to either use Cluster Logical or
Physical mode for all interrupts.  However the selection of Physical vs Logical
is not done at APIC setup, an APIC can be addressed both in Physical or Logical
destination modes concurrently.

Introduce a new x2APIC mode called Mixed, which uses Logical Cluster mode for
IPIs, and Physical mode for external interrupts, thus attempting to use the
best method for each interrupt type.

Using Physical mode for external interrupts allows more vectors to be used, and
interrupt balancing to be more accurate.

Using Logical Cluster mode for IPIs allows fewer accesses to the ICR register
when sending those, as multiple CPUs can be targeted with a single ICR register
write.

A simple test calling flush_tlb_all() 10000 times on a tight loop on AMD EPYC
9754 with 512 CPUs gives the following figures in nano seconds:

x mixed
+ phys
* cluster
    N           Min           Max        Median           Avg        Stddev
x  25 3.5131328e+08 3.5716441e+08 3.5410987e+08 3.5432659e+08     1566737.4
+  12  1.231082e+09  1.238824e+09 1.2370528e+09 1.2357981e+09     2853892.9
Difference at 95.0% confidence
8.81472e+08 +/- 1.46849e+06
248.774% +/- 0.96566%
(Student's t, pooled s = 2.05985e+06)
*  11 3.5099276e+08 3.5561459e+08 3.5461234e+08 3.5415668e+08     1415071.9
No difference proven at 95.0% confidence

So Mixed has no difference when compared to Cluster mode, and Physical mode is
248% slower when compared to either Mixed or Cluster modes with a 95%
confidence.

Note that Xen uses Cluster mode by default, and hence is already using the
fastest way for IPI delivery at the cost of reducing the amount of vectors
available system-wide.

Make the newly introduced mode the default one.

Note the printing of the APIC addressing mode done in connect_bsp_APIC() has
been removed, as with the newly introduced mixed mode this would require more
fine grained printing, or else would be incorrect.  The addressing mode can
already be derived from the APIC driver in use, which is printed by different
helpers.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Henry Wang <Henry.Wang@arm.com>
master commit: e3c409d59ac87ccdf97b8c7708c81efa8069cb31
master date: 2023-11-07 09:59:48 +0000

17 months agoxen/arm: page: Avoid pointer overflow on cache clean & invalidate
Michal Orzel [Tue, 12 Dec 2023 13:27:10 +0000 (14:27 +0100)]
xen/arm: page: Avoid pointer overflow on cache clean & invalidate

On Arm32, after cleaning and invalidating the last dcache line of the top
domheap page i.e. VA = 0xfffff000 (as a result of flushing the page to
RAM), we end up adding the value of a dcache line size to the pointer
once again, which results in a pointer arithmetic overflow (with 64B line
size, operation 0xffffffc0 + 0x40 overflows to 0x0). Such behavior is
undefined and given the wide range of compiler versions we support, it is
difficult to determine what could happen in such scenario.

Modify clean_and_invalidate_dcache_va_range() as well as
clean_dcache_va_range() and invalidate_dcache_va_range() due to similarity
of handling to prevent pointer arithmetic overflow. Modify the loops to
use an additional variable to store the index of the next cacheline.
Add an assert to prevent passing a region that wraps around which is
illegal and would end up in a page fault anyway (region 0-2MB is
unmapped). Lastly, return early if size passed is 0.

Note that on Arm64, we don't have this problem given that the max VA
space we support is 48-bits.

This is XSA-447 / CVE-2023-46837.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
master commit: 190b7f49af6487a9665da63d43adc9d9a5fbd01e
master date: 2023-12-12 14:01:00 +0100

17 months agoxen/sched: fix sched_move_domain()
Juergen Gross [Tue, 12 Dec 2023 13:26:35 +0000 (14:26 +0100)]
xen/sched: fix sched_move_domain()

Do cleanup in sched_move_domain() in a dedicated service function,
which is called either in error case with newly allocated data, or in
success case with the old data to be freed.

This will at once fix some subtle bugs which sneaked in due to
forgetting to overwrite some pointers in the error case.

Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
Reported-by: René Winther Højgaard <renewin@proton.me>
Initial-fix-by: Jan Beulich <jbeulich@suse.com>
Initial-fix-by: George Dunlap <george.dunlap@cloud.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@cloud.com>
master commit: 23792cc0f22cff4e106d838b83aa9ae1cb6ffaf4
master date: 2023-12-07 13:37:25 +0000

17 months agoOnly compile the hypervisor with -Wdeclaration-after-statement
Julien Grall [Tue, 12 Dec 2023 13:26:18 +0000 (14:26 +0100)]
Only compile the hypervisor with -Wdeclaration-after-statement

Right now, all tools and hypervisor will be complied with the option
-Wdeclaration-after-statement. While most of the code in the hypervisor
is controlled by us, for tools we may import external libraries.

The build will fail if one of them are using the construct we are
trying to prevent. This is the case when building against Python 3.12
and Yocto:

| In file included from /srv/storage/alex/yocto/build-virt/tmp/work/core2-64-poky-linux/xen-tools/4.17+stable/recipe-sysroot/usr/include/python3.12/Python.h:44,
|                  from xen/lowlevel/xc/xc.c:8:
| /srv/storage/alex/yocto/build-virt/tmp/work/core2-64-poky-linux/xen-tools/4.17+stable/recipe-sysroot/usr/include/python3.12/object.h: In function 'Py_SIZE':
| /srv/storage/alex/yocto/build-virt/tmp/work/core2-64-poky-linux/xen-tools/4.17+stable/recipe-sysroot/usr/include/python3.12/object.h:233:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
|   233 |     PyVarObject *var_ob = _PyVarObject_CAST(ob);
|       |     ^~~~~~~~~~~
| In file included from /srv/storage/alex/yocto/build-virt/tmp/work/core2-64-poky-linux/xen-tools/4.17+stable/recipe-sysroot/usr/include/python3.12/Python.h:53:
| /srv/storage/alex/yocto/build-virt/tmp/work/core2-64-poky-linux/xen-tools/4.17+stable/recipe-sysroot/usr/include/python3.12/cpython/longintrepr.h: In function '_PyLong_CompactValue':
| /srv/storage/alex/yocto/build-virt/tmp/work/core2-64-poky-linux/xen-tools/4.17+stable/recipe-sysroot/usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
|   121 |     Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
|       |     ^~~~~~~~~~
| cc1: all warnings being treated as errors

Looking at the tools directory, a fair few directory already add
-Wno-declaration-after-statement to inhibit the default behavior.

We have always build the hypervisor with the flag, so for now remove
only the flag for anything but the hypervisor. We can decide at later
time whether we want to relax.

Also remove the -Wno-declaration-after-statement in some subdirectory
as the flag is now unnecessary.

Part of the commit message was take from Alexander's first proposal:

Link: https://lore.kernel.org/xen-devel/20231128174729.3880113-1-alex@linutronix.de/
Reported-by: Alexander Kanavin <alex@linutronix.de>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
xen/hypervisor: Don't use cc-option-add for -Wdeclaration-after-statement

Per Andrew's comment in [1] all the compilers we support should
recognize the flag.

I forgot to address the comment while committing.

[1] fcf00090-304a-49f7-8a61-a54347e90a3b@citrix.com

Signed-off-by: Julien Grall <jgrall@amazon.com>
master commit: 40be6307ec005539635e7b8fcef67e989dc441f6
master date: 2023-12-06 19:12:40 +0000
master commit: d4bfd3899886d0fbe259c20660dadb1e00170f2d
master date: 2023-12-06 19:19:59 +0000

17 months agoxen/domain: fix error path in domain_create()
Stewart Hildebrand [Wed, 6 Dec 2023 09:41:19 +0000 (10:41 +0100)]
xen/domain: fix error path in domain_create()

If rangeset_new() fails, err would not be set to an appropriate error
code. Set it to -ENOMEM.

Fixes: 580c458699e3 ("xen/domain: Call arch_domain_create() as early as possible in domain_create()")
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ff1178062094837d55ef342070e58316c43a54c9
master date: 2023-12-05 10:00:51 +0100

17 months agoxen/sched: fix adding offline cpu to cpupool
Juergen Gross [Wed, 6 Dec 2023 09:40:54 +0000 (10:40 +0100)]
xen/sched: fix adding offline cpu to cpupool

Trying to add an offline cpu to a cpupool can crash the hypervisor,
as the probably non-existing percpu area of the cpu is accessed before
the availability of the cpu is being tested. This can happen in case
the cpupool's granularity is "core" or "socket".

Fix that by testing the cpu to be online.

Fixes: cb563d7665f2 ("xen/sched: support core scheduling for moving cpus to/from cpupools")
Reported-by: René Winther Højgaard <renewin@proton.me>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 06e8d65d33896aa90f5b6d9b2bce7f11433b33c9
master date: 2023-12-05 09:57:38 +0100

17 months agox86emul: avoid triggering event related assertions
Jan Beulich [Wed, 6 Dec 2023 09:40:19 +0000 (10:40 +0100)]
x86emul: avoid triggering event related assertions

The assertion at the end of x86_emulate_wrapper() as well as the ones
in x86_emul_{hw_exception,pagefault}() can trigger if we ignore
X86EMUL_EXCEPTION coming back from certain hook functions. Squash
exceptions when merely probing MSRs, plus on SWAPGS'es "best effort"
error handling path.

In adjust_bnd() add another assertion after the read_xcr(0, ...)
invocation, paralleling the one in x86emul_get_fpu() - XCR0 reads should
never fault when XSAVE is (implicitly) known to be available.

Also update the respective comment in x86_emulate_wrapper().

Fixes: 14a6be89ec04 ("x86emul: correct EFLAGS.TF handling")
Fixes: cb2626c75813 ("x86emul: conditionally clear BNDn for branches")
Fixes: 6eb43fcf8a0b ("x86emul: support SWAPGS")
Reported-by: AFL
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 787d11c5aaf4d3411d4658cff137cd49b0bd951b
master date: 2023-12-05 09:57:05 +0100

17 months agotools/xg: Fix potential memory leak in cpu policy getters/setters
Alejandro Vallejo [Wed, 6 Dec 2023 09:39:55 +0000 (10:39 +0100)]
tools/xg: Fix potential memory leak in cpu policy getters/setters

They allocate two different hypercall buffers, but leak the first
allocation if the second one failed due to an early return that bypasses
cleanup.

Remove the early exit and go through _post() instead. Invoking _post() is
benign even if _pre() failed.

Fixes: 6b85e427098c ('x86/sysctl: Implement XEN_SYSCTL_get_cpu_policy')
Fixes: 60529dfeca14 ('x86/domctl: Implement XEN_DOMCTL_get_cpu_policy')
Fixes: 14ba07e6f816 ('x86/domctl: Implement XEN_DOMCTL_set_cpumsr_policy')
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 1571ff7a987b88b20598a6d49910457f3b2c59f1
master date: 2023-12-01 10:53:07 +0100

17 months agoxen/x86: In x2APIC mode, derive LDR from APIC ID
Alejandro Vallejo [Wed, 6 Dec 2023 09:39:15 +0000 (10:39 +0100)]
xen/x86: In x2APIC mode, derive LDR from APIC ID

Both Intel and AMD manuals agree that in x2APIC mode, the APIC LDR and ID
registers are derivable from each other through a fixed formula.

Xen uses that formula, but applies it to vCPU IDs (which are sequential)
rather than x2APIC IDs (which are not, at the moment). As I understand it,
this is an attempt to tightly pack vCPUs into clusters so each cluster has
16 vCPUs rather than 8, but this is a spec violation.

This patch fixes the implementation so we follow the x2APIC spec for new
VMs, while preserving the behaviour (buggy or fixed) for migrated-in VMs.

While touching that area, remove the existing printk statement in
vlapic_load_fixup() (as the checks it performed didn't make sense in x2APIC
mode and wouldn't affect the outcome) and put another printk as an else
branch so we get warnings trying to load nonsensical LDR values we don't
know about.

Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation")
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 90309854fd2440fb08b4c808f47d7670ba0d250d
master date: 2023-11-29 10:05:55 +0100

17 months agolivepatch: do not use .livepatch.funcs section to store internal state
Roger Pau Monné [Wed, 6 Dec 2023 09:38:03 +0000 (10:38 +0100)]
livepatch: do not use .livepatch.funcs section to store internal state

Currently the livepatch logic inside of Xen will use fields of struct
livepatch_func in order to cache internal state of patched functions.  Note
this is a field that is part of the payload, and is loaded as an ELF section
(.livepatch.funcs), taking into account the SHF_* flags in the section
header.

The flags for the .livepatch.funcs section, as set by livepatch-build-tools,
are SHF_ALLOC, which leads to its contents (the array of livepatch_func
structures) being placed in read-only memory:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
[...]
  [ 4] .livepatch.funcs  PROGBITS         0000000000000000  00000080
       0000000000000068  0000000000000000   A       0     0     8

This previously went unnoticed, as all writes to the fields of livepatch_func
happen in the critical region that had WP disabled in CR0.  After 8676092a0f16
however WP is no longer toggled in CR0 for patch application, and only the
hypervisor .text mappings are made write-accessible.  That leads to the
following page fault when attempting to apply a livepatch:

----[ Xen-4.19-unstable  x86_64  debug=y  Tainted:   C    ]----
CPU:    4
RIP:    e008:[<ffff82d040221e81>] common/livepatch.c#apply_payload+0x45/0x1e1
[...]
Xen call trace:
   [<ffff82d040221e81>] R common/livepatch.c#apply_payload+0x45/0x1e1
   [<ffff82d0402235b2>] F check_for_livepatch_work+0x385/0xaa5
   [<ffff82d04032508f>] F arch/x86/domain.c#idle_loop+0x92/0xee

Pagetable walk from ffff82d040625079:
 L4[0x105] = 000000008c6c9063 ffffffffffffffff
 L3[0x141] = 000000008c6c6063 ffffffffffffffff
 L2[0x003] = 000000086a1e7063 ffffffffffffffff
 L1[0x025] = 800000086ca5d121 ffffffffffffffff

****************************************
Panic on CPU 4:
FATAL PAGE FAULT
[error_code=0003]
Faulting linear address: ffff82d040625079
****************************************

Fix this by moving the internal Xen function patching state out of
livepatch_func into an area not allocated as part of the ELF payload.  While
there also constify the array of livepatch_func structures in order to prevent
further surprises.

Note there's still one field (old_addr) that gets set during livepatch load.  I
consider this fine since the field is read-only after load, and at the point
the field gets set the underlying mapping hasn't been made read-only yet.

Fixes: 8676092a0f16 ('x86/livepatch: Fix livepatch application when CET is active')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
xen/livepatch: fix livepatch tests

The current set of in-tree livepatch tests in xen/test/livepatch started
failing after the constify of the payload funcs array, and the movement of the
status data into a separate array.

Fix the tests so they respect the constness of the funcs array and also make
use of the new location of the per-func state data.

Fixes: 82182ad7b46e ('livepatch: do not use .livepatch.funcs section to store internal state')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: 82182ad7b46e0f7a3856bb12c7a9bf2e2a4570bc
master date: 2023-11-27 15:16:01 +0100
master commit: 902377b690f42ddf44ae91c4b0751d597f1cd694
master date: 2023-11-29 10:46:42 +0000

17 months agox86/mem_sharing: Release domain if we are not able to enable memory sharing
Frediano Ziglio [Wed, 6 Dec 2023 09:37:13 +0000 (10:37 +0100)]
x86/mem_sharing: Release domain if we are not able to enable memory sharing

In case it's not possible to enable memory sharing (mem_sharing_control
fails) we just return the error code without releasing the domain
acquired some lines above by rcu_lock_live_remote_domain_by_id().

Fixes: 72f8d45d69b8 ("x86/mem_sharing: enable mem_sharing on first memop")
Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
master commit: fbcec32d6d3ea0ac329301925b317478316209ed
master date: 2023-11-27 12:06:13 +0000

17 months agoxen/sched: fix sched_move_domain()
Juergen Gross [Thu, 23 Nov 2023 11:13:53 +0000 (12:13 +0100)]
xen/sched: fix sched_move_domain()

When moving a domain out of a cpupool running with the credit2
scheduler and having multiple run-queues, the following ASSERT() can
be observed:

(XEN) Xen call trace:
(XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
(XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
(XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
(XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
(XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
(XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
(XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
(XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
(XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
(XEN) ****************************************

This is happening as sched_move_domain() is setting a different cpu
for a scheduling unit without telling the scheduler. When this unit is
removed from the scheduler, the ASSERT() will trigger.

In non-debug builds the result is usually a clobbered pointer, leading
to another crash a short time later.

Fix that by swapping the two involved actions (setting another cpu and
removing the unit from the scheduler).

Link: https://github.com/Dasharo/dasharo-issues/issues/488
Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
master commit: 4709ec82917668c2df958ef91b4f21c049c76bee
master date: 2023-11-20 10:49:29 +0100

17 months agox86/spec-ctrl: Add SRSO whitepaper URL
Andrew Cooper [Thu, 23 Nov 2023 11:13:31 +0000 (12:13 +0100)]
x86/spec-ctrl: Add SRSO whitepaper URL

... now that it exists in public.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 78a86b26868c12ae1cc3dd2a8bb9aa5eebaa41fd
master date: 2023-11-07 17:47:34 +0000

17 months agox86/i8259: do not assume interrupts always target CPU0
Roger Pau Monné [Thu, 23 Nov 2023 11:12:47 +0000 (12:12 +0100)]
x86/i8259: do not assume interrupts always target CPU0

Sporadically we have seen the following during AP bringup on AMD platforms
only:

microcode: CPU59 updated from revision 0x830107a to 0x830107a, date = 2023-05-17
microcode: CPU60 updated from revision 0x830104d to 0x830107a, date = 2023-05-17
CPU60: No irq handler for vector 27 (IRQ -2147483648)
microcode: CPU61 updated from revision 0x830107a to 0x830107a, date = 2023-05-17

This is similar to the issue raised on Linux commit 36e9e1eab777e, where they
observed i8259 (active) vectors getting delivered to CPUs different than 0.

On AMD or Hygon platforms adjust the target CPU mask of i8259 interrupt
descriptors to contain all possible CPUs, so that APs will reserve the vector
at startup if any legacy IRQ is still delivered through the i8259.  Note that
if the IO-APIC takes over those interrupt descriptors the CPU mask will be
reset.

Spurious i8259 interrupt vectors however (IRQ7 and IRQ15) can be injected even
when all i8259 pins are masked, and hence would need to be handled on all CPUs.

Continue to reserve PIC vectors on CPU0 only, but do check for such spurious
interrupts on all CPUs if the vendor is AMD or Hygon.  Note that once the
vectors get used by devices detecting PIC spurious interrupts will no longer be
possible, however the device driver should be able to cope with spurious
interrupts.  Such PIC spurious interrupts occurring when the vector is in use
by a local APIC routed source will lead to an extra EOI, which might
unintentionally clear a different vector from ISR.  Note this is already the
current behavior, so assume it's infrequent enough to not cause real issues.

Finally, adjust the printed message to display the CPU where the spurious
interrupt has been received, so it looks like:

microcode: CPU1 updated from revision 0x830107a to 0x830107a, date = 2023-05-17
cpu1: spurious 8259A interrupt: IRQ7
microcode: CPU2 updated from revision 0x830104d to 0x830107a, date = 2023-05-17

Amends: 3fba06ba9f8b ('x86/IRQ: re-use legacy vector ranges on APs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 87f37449d586b4d407b75235bb0a171e018e25ec
master date: 2023-11-02 10:50:59 +0100

17 months agox86/x2apic: remove usage of ACPI_FADT_APIC_CLUSTER
Roger Pau Monné [Thu, 23 Nov 2023 11:12:18 +0000 (12:12 +0100)]
x86/x2apic: remove usage of ACPI_FADT_APIC_CLUSTER

The ACPI FADT APIC_CLUSTER flag mandates that when the interrupt delivery is
Logical mode APIC must be configured for Cluster destination model.  However in
apic_x2apic_probe() such flag is incorrectly used to gate whether Physical mode
can be used.

Since Xen when in x2APIC mode only uses Logical mode together with Cluster
model completely remove checking for ACPI_FADT_APIC_CLUSTER, as Xen always
fulfills the requirement signaled by the flag.

Fixes: eb40ae41b658 ('x86/Kconfig: add option for default x2APIC destination mode')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 26a449ce32cef33f2cb50602be19fcc0c4223ba9
master date: 2023-11-02 10:50:26 +0100

17 months agox86/pv-shim: fix grant table operations for 32-bit guests
David Woodhouse [Thu, 23 Nov 2023 11:11:21 +0000 (12:11 +0100)]
x86/pv-shim: fix grant table operations for 32-bit guests

When switching to call the shim functions from the normal handlers, the
compat_grant_table_op() function was omitted, leaving it calling the
real grant table operations in !PV_SHIM_EXCLUSIVE builds. This leaves a
32-bit shim guest failing to set up its real grant table with the parent
hypervisor.

Fixes: e7db635f4428 ("x86/pv-shim: Don't modify the hypercall table")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 93ec30bc545f15760039c23ee4b97b80c0b3b3b3
master date: 2023-10-31 16:10:14 +0000

17 months agox86/mem_sharing: add missing m2p entry when mapping shared_info page
Tamas K Lengyel [Thu, 23 Nov 2023 11:10:46 +0000 (12:10 +0100)]
x86/mem_sharing: add missing m2p entry when mapping shared_info page

When mapping in the shared_info page to a fork the m2p entry wasn't set
resulting in the shared_info being reset even when the fork reset was called
with only reset_state and not reset_memory. This results in an extra
unnecessary TLB flush.

Fixes: 1a0000ac775 ("mem_sharing: map shared_info page to same gfn during fork")
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 23eb39acf011ef9bbe02ed4619c55f208fbcd39b
master date: 2023-10-31 16:10:14 +0000

17 months agoupdate Xen version to 4.18.1-pre
Jan Beulich [Thu, 23 Nov 2023 11:09:43 +0000 (12:09 +0100)]
update Xen version to 4.18.1-pre

17 months agoSUPPORT.md: Update release notes URL RELEASE-4.18.0
Julien Grall [Thu, 16 Nov 2023 21:44:21 +0000 (21:44 +0000)]
SUPPORT.md: Update release notes URL

Signed-off-by: Julien Grall <julien@xen.org>
17 months agoSUPPORT.md: Define support lifetime
Julien Grall [Wed, 15 Nov 2023 12:16:32 +0000 (12:16 +0000)]
SUPPORT.md: Define support lifetime

Signed-off-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
17 months agoConfig.mk: Fix tag for mini-os
Julien Grall [Wed, 15 Nov 2023 12:23:13 +0000 (12:23 +0000)]
Config.mk: Fix tag for mini-os

Signed-off-by: Julien Grall <jgrall@amazon.com>
17 months agoSet 4.18 version
Julien Grall [Wed, 15 Nov 2023 12:12:02 +0000 (12:12 +0000)]
Set 4.18 version

Signed-off-by: Julien Grall <jgrall@amazon.com>
17 months agoREADME: make heading say 4.18
Julien Grall [Wed, 15 Nov 2023 12:11:27 +0000 (12:11 +0000)]
README: make heading say 4.18

Signed-off-by: Julien Grall <jgrall@amazon.com>
17 months agoConfig.mk: Bump tags to 4.18.0 final
Julien Grall [Wed, 15 Nov 2023 12:09:51 +0000 (12:09 +0000)]
Config.mk: Bump tags to 4.18.0 final

Signed-off-by: Julien Grall <jgrall@amazon.com>
17 months agox86/spec-ctrl: Remove conditional IRQs-on-ness for INT $0x80/0x82 paths
Andrew Cooper [Thu, 26 Oct 2023 13:37:38 +0000 (14:37 +0100)]
x86/spec-ctrl: Remove conditional IRQs-on-ness for INT $0x80/0x82 paths

Before speculation defences, some paths in Xen could genuinely get away with
being IRQs-on at entry.  But XPTI invalidated this property on most paths, and
attempting to maintain it on the remaining paths was a mistake.

Fast forward, and DO_SPEC_CTRL_COND_IBPB (protection for AMD BTC/SRSO) is not
IRQ-safe, running with IRQs enabled in some cases.  The other actions taken on
these paths happen to be IRQ-safe.

Make entry_int82() and int80_direct_trap() unconditionally Interrupt Gates
rather than Trap Gates.  Remove the conditional re-adjustment of
int80_direct_trap() in smp_prepare_cpus(), and have entry_int82() explicitly
enable interrupts when safe to do so.

In smp_prepare_cpus(), with the conditional re-adjustment removed, the
clearing of pv_cr3 is the only remaining action gated on XPTI, and it is out
of place anyway, repeating work already done by smp_prepare_boot_cpu().  Drop
the entire if() condition to avoid leaving an incorrect vestigial remnant.

Also drop comments which make incorrect statements about when its safe to
enable interrupts.

This is XSA-446 / CVE-2023-46836

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit a48bb129f1b9ff55c22cf6d2b589247c8ba3b10e)

17 months agoiommu/amd-vi: use correct level for quarantine domain page tables
Roger Pau Monne [Wed, 11 Oct 2023 11:14:21 +0000 (13:14 +0200)]
iommu/amd-vi: use correct level for quarantine domain page tables

The current setup of the quarantine page tables assumes that the quarantine
domain (dom_io) has been initialized with an address width of
DEFAULT_DOMAIN_ADDRESS_WIDTH (48).

However dom_io being a PV domain gets the AMD-Vi IOMMU page tables levels based
on the maximum (hot pluggable) RAM address, and hence on systems with no RAM
above the 512GB mark only 3 page-table levels are configured in the IOMMU.

On systems without RAM above the 512GB boundary amd_iommu_quarantine_init()
will setup page tables for the scratch page with 4 levels, while the IOMMU will
be configured to use 3 levels only.  The page destined to be used as level 1,
and to contain a directory of PTEs ends up being the address in a PTE itself,
and thus level 1 page becomes the leaf page.  Without the level mismatch it's
level 0 page that should be the leaf page instead.

The level 1 page won't be used as such, and hence it's not possible to use it
to gain access to other memory on the system.  However that page is not cleared
in amd_iommu_quarantine_init() as part of re-initialization of the device
quarantine page tables, and hence data on the level 1 page can be leaked
between device usages.

Fix this by making sure the paging levels setup by amd_iommu_quarantine_init()
match the number configured on the IOMMUs.

Note that IVMD regions are not affected by this issue, as those areas are
mapped taking the configured paging levels into account.

This is XSA-445 / CVE-2023-46835

Fixes: ea38867831da ('x86 / iommu: set up a scratch page in the quarantine domain')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit fe1e4668b373ec4c1e5602e75905a9fa8cc2be3f)

17 months agodocs/sphinx: Fix indexing
Andrew Cooper [Wed, 8 Nov 2023 14:53:23 +0000 (14:53 +0000)]
docs/sphinx: Fix indexing

sphinx-build reports:

  docs/designs/launch/hyperlaunch.rst: WARNING: document isn't included in any toctree
  docs/designs/launch/hyperlaunch-devicetree.rst: WARNING: document isn't included in any toctree
  docs/misc/xen-makefiles/makefiles.rst: WARNING: document isn't included in any toctree
  docs/misra/C-language-toolchain.rst: WARNING: document isn't included in any toctree
  docs/misra/C-runtime-failures.rst: WARNING: document isn't included in any toctree
  docs/misra/documenting-violations.rst: WARNING: document isn't included in any toctree
  docs/misra/exclude-list.rst: WARNING: document isn't included in any toctree
  docs/misra/xen-static-analysis.rst: WARNING: document isn't included in any toctree

Create an up-to-date index.rst in misra/ create an "unsorted docs" section at
the top level to contain everything else.  They can be re-filed at a later
date.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit fb41228ececea948c7953c8c16fe28fd65c6536b)

17 months agodocs/sphinx: Fix syntax issues in exclude-list.rst
Andrew Cooper [Wed, 8 Nov 2023 14:47:10 +0000 (14:47 +0000)]
docs/sphinx: Fix syntax issues in exclude-list.rst

sphinx-build reports:

  docs/misra/exclude-list.rst:50: WARNING: Inline emphasis start-string without end-string.

'*' either needs escaping, or put in a literal block.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit ab03b284b4f2fbf405fcd821105c85e1a38d314d)

17 months agodocs/sphinx: Fix syntax issues in hyperlaunch.rst
Andrew Cooper [Wed, 8 Nov 2023 14:38:33 +0000 (14:38 +0000)]
docs/sphinx: Fix syntax issues in hyperlaunch.rst

sphinx-build reports:

  docs/designs/launch/hyperlaunch.rst:111: WARNING: Title underline too short.
  docs/designs/launch/hyperlaunch.rst:203: WARNING: Unexpected indentation.
  docs/designs/launch/hyperlaunch.rst:216: WARNING: Unexpected indentation.

Nested lists require newlines as delimiters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 93ad5dd9743f54cbd1f98658de9cd3ddc7a98fb6)

17 months agodocs: Delete kconfig docs to fix licensing violation
Andrew Cooper [Wed, 8 Nov 2023 14:23:46 +0000 (14:23 +0000)]
docs: Delete kconfig docs to fix licensing violation

These 3 Kconfig docs were imported from Linux erroneously.  They are
GPL-2.0-only in Linux, but have no SPDX tag and were placed in such a way to
be included by the blanket statement saying that all RST files are CC-BY-4.0.

We should not be carrying a shadow copy of these docs.  They aren't even wired
into our Sphinx docs, and anyone wanting to refer to Kconfig docs is going to
look at the Linux docs anyway.  These, and more docs can be found at:

  https://www.kernel.org/doc/html/latest/kbuild/

which also have corrections vs the snapshot we took.

Fixes: f80fe2b34f08 ("xen: Update Kconfig to Linux v5.4")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 044503f61c9571f3be06e105d59253f3c5632442)

17 months agodocs/misra: Add missing SPDX tag
Andrew Cooper [Wed, 8 Nov 2023 13:51:37 +0000 (13:51 +0000)]
docs/misra: Add missing SPDX tag

One file is missing an SDPX tag, but is covered by the blanketing license
statement in docs/README.sources saying that RST files are CC-BY-4.0

Fixes: 3c911be55f1c ("docs/misra: document the C dialect and translation toolchain assumptions.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 8ee1a332b4083523c5ca715d13a0ea4c52417940)

18 months agogolang: Fix bindings after XSA-443
Jason Andryuk [Fri, 3 Nov 2023 19:45:51 +0000 (15:45 -0400)]
golang: Fix bindings after XSA-443

The new bootloader_restrict and bootloader_user fields in the libxl idl
change the bindings.  Update them.

Fixes: 1f762642d2ca ("libxl: add support for running bootloader in restricted mode")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: George Dunlap <george.dunlap@cloud.com>
(cherry picked from commit 1f849edc2f9ca7dc2f9ed7b0585c31bd6b81d7ef)

18 months agogolang: Fixup binding for Arm FF-A
Jason Andryuk [Fri, 3 Nov 2023 19:45:50 +0000 (15:45 -0400)]
golang: Fixup binding for Arm FF-A

The new FF-A TEE type changed the go bindings.  Update them.

Fixes: 8abdd8d52862 ("tools: add Arm FF-A mediator")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: George Dunlap <george.dunlap@cloud.com>
(cherry picked from commit 1429f9c5486d94296ada441f14d1b7934885da06)

18 months agox86/time: Fix UBSAN failure in __update_vcpu_system_time()
Andrew Cooper [Wed, 1 Nov 2023 20:19:52 +0000 (20:19 +0000)]
x86/time: Fix UBSAN failure in __update_vcpu_system_time()

As reported:

  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in arch/x86/time.c:1542:32
  (XEN) member access within null pointer of type 'union vcpu_info_t'
  (XEN) ----[ Xen-4.19-unstable  x86_64  debug=y ubsan=y  Not tainted ]----
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040345036>] R common/ubsan/ubsan.c#ubsan_epilogue+0xa/0xd2
  (XEN)    [<ffff82d0403456e8>] F __ubsan_handle_type_mismatch+0x133/0x49b
  (XEN)    [<ffff82d040345b4a>] F __ubsan_handle_type_mismatch_v1+0xfa/0xfc
  (XEN)    [<ffff82d040623356>] F arch/x86/time.c#__update_vcpu_system_time+0x212/0x30f
  (XEN)    [<ffff82d040623461>] F update_vcpu_system_time+0xe/0x10
  (XEN)    [<ffff82d04062389d>] F arch/x86/time.c#local_time_calibration+0x1f7/0x523
  (XEN)    [<ffff82d0402a64b5>] F common/softirq.c#__do_softirq+0x1f4/0x31a
  (XEN)    [<ffff82d0402a67ad>] F do_softirq+0x13/0x15
  (XEN)    [<ffff82d0405a95dc>] F arch/x86/domain.c#idle_loop+0x2e0/0x367
  (XEN)
  (XEN) ================================================================================

GCC 10 at least doesn't consider it valid to derive a pointer from vcpu_info()
prior to checking that the underlying map pointer is good.

Reorder actions so the map pointer is checked first.

Fixes: 20279afd7323 ("x86: split populating of struct vcpu_time_info into a separate function")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 801b804945bb7ccbd760d25637b720d8aac7e004)

18 months agoCHANGELOG.md: Finalize the 4.18 release date 4.18.0-rc5
Henry Wang [Tue, 31 Oct 2023 14:49:24 +0000 (22:49 +0800)]
CHANGELOG.md: Finalize the 4.18 release date

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit e4fdec09bb338d62bd339c3c5ce6284aabeb19ee)

18 months agoCHANGELOG: More 4.18 content
Andrew Cooper [Tue, 31 Oct 2023 13:19:53 +0000 (13:19 +0000)]
CHANGELOG: More 4.18 content

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Henry Wang <Henry.Wang@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit bf51f85f20114ca73b95056d5e808a4845f094ee)

18 months agoCHANGELOG: Reformat
Andrew Cooper [Tue, 31 Oct 2023 13:19:52 +0000 (13:19 +0000)]
CHANGELOG: Reformat

Collect all x86 and ARM changes together instead of having them scattered.
Tweak grammar as necessary.

No change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Henry Wang <Henry.Wang@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit d9c11660781942d7af44287732e2f6783840635e)

18 months agodocs: Fix IOMMU command line docs some more
Andrew Cooper [Tue, 31 Oct 2023 12:02:15 +0000 (12:02 +0000)]
docs: Fix IOMMU command line docs some more

Make the command line docs match the actual implementation, and state that the
default behaviour is selected at compile time.

Fixes: 980d6acf1517 ("IOMMU: make DMA containment of quarantined devices optional")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 850382254b78e07e7ccbf80010c3b43897a158f9)

18 months agoautomation: fix race condition in adl-suspend test
Marek Marczykowski-Górecki [Tue, 31 Oct 2023 02:16:53 +0000 (03:16 +0100)]
automation: fix race condition in adl-suspend test

If system suspends too quickly, the message for the test controller to
wake up the system may be not sent to the console before suspending.
This will cause the test to timeout.

Fix this by calling sync on the console and waiting a bit after printing
the message. The test controller then resumes the system 30s after the
message, so as long as the delay + suspending takes less time it is
okay.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit df43b54590e10b6199e44741b453fcbae2b06d25)

18 months agoTurn off debug by default
Julien Grall [Fri, 27 Oct 2023 13:08:16 +0000 (14:08 +0100)]
Turn off debug by default

Signed-off-by: Julien Grall <jgrall@amazon.com>
18 months agoConfig.mk: switch to named tags (for stable branch)
Julien Grall [Fri, 27 Oct 2023 13:07:09 +0000 (14:07 +0100)]
Config.mk: switch to named tags (for stable branch)

Signed-off-by: Julien Grall <julien@xen.org>
18 months agodocs/arm: Document where Xen should be loaded in memory
Julien Grall [Tue, 24 Oct 2023 10:28:58 +0000 (11:28 +0100)]
docs/arm: Document where Xen should be loaded in memory

In commit 9d267c049d92 ("xen/arm64: Rework the memory layout"),
we decided to require Xen to be loaded below 2 TiB to simplify
the logic to enable the MMU. The limit was decided based on
how known platform boot plus some slack.

We had a recent report that this is not sufficient on the AVA
platform with a old firmware [1]. But the restriction is not
going to change in Xen 4.18. So document the limit clearly
in docs/misc/arm/booting.txt.

[1] https://lore.kernel.org/20231013122658.1270506-3-leo.yan@linaro.org

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoCHANGELOG.md: Set 4.18 release date and tag
Henry Wang [Mon, 23 Oct 2023 09:21:22 +0000 (17:21 +0800)]
CHANGELOG.md: Set 4.18 release date and tag

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
18 months agoCHANGELOG.md: Use "xenbits.xenproject.org" in links
Henry Wang [Mon, 23 Oct 2023 09:21:21 +0000 (17:21 +0800)]
CHANGELOG.md: Use "xenbits.xenproject.org" in links

Compared to "xenbits.xen.org", "xenbits.xenproject.org" appeared
later as a name, with the intention of becoming the canonical one.
Therefore, this commit unifies all the links to use "xenproject"
in the links.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
18 months agoCHANGELOG.md: Mention the MISRA-C improvement in 4.18 dev cycle
Henry Wang [Mon, 23 Oct 2023 09:21:20 +0000 (17:21 +0800)]
CHANGELOG.md: Mention the MISRA-C improvement in 4.18 dev cycle

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
18 months agox86: support data operand independent timing mode 4.18.0-rc4
Jan Beulich [Fri, 20 Oct 2023 13:50:05 +0000 (15:50 +0200)]
x86: support data operand independent timing mode

[1] specifies a long list of instructions which are intended to exhibit
timing behavior independent of the data they operate on. On certain
hardware this independence is optional, controlled by a bit in a new
MSR. Provide a command line option to control the mode Xen and its
guests are to operate in, with a build time control over the default.
Longer term we may want to allow guests to control this.

Since Arm64 supposedly also has such a control, put command line option
and Kconfig control in common files.

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html

Requested-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoCI: (More) Always pull base image when building a container
Andrew Cooper [Thu, 19 Oct 2023 13:56:26 +0000 (14:56 +0100)]
CI: (More) Always pull base image when building a container

Repeat c/s 26ecc08b98fc ("automation: Always pull base image when building a
container") for the other makefile we've got building containers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoiommu/vt-d: fix SAGAW capability parsing
Roger Pau Monne [Wed, 18 Oct 2023 16:07:33 +0000 (18:07 +0200)]
iommu/vt-d: fix SAGAW capability parsing

SAGAW is a bitmap field, with bits 1, 2 and 3 signaling support for 3, 4 and 5
level page tables respectively.  According to the Intel VT-d specification, an
IOMMU can report multiple SAGAW bits being set.

Commit 859d11b27912 claims to replace the open-coded find_first_set_bit(), but
it's actually replacing an open coded implementation to find the last set bit.
The change forces the used AGAW to the lowest supported by the IOMMU instead of
the highest one between 1 and 2.

Restore the previous SAGAW parsing by using fls() instead of
find_first_set_bit(), in order to get the highest (supported) AGAW to be used.

However there's a caveat related to the value the AW context entry field must
be set to when using passthrough mode:

"When the Translation-type (TT) field indicates pass-through processing (10b),
this field must be programmed to indicate the largest AGAW value supported by
hardware." [0]

Newer Intel IOMMU implementations support 5 level page tables for the IOMMU,
and signal such support in SAGAW bit 3.

Enabling 5 level paging support (AGAW 3) is too risky at this point in the Xen
4.18 release, so instead put a bodge to unconditionally disable passthough
mode if SAGAW has any bits greater than 2 set.  Ignore bit 0; it's reserved in
current specifications, but had a meaning in the past and is unlikely to be
reused in the future.

Note the message about unhandled SAGAW bits being set is printed
unconditionally, regardless of whether passthrough mode is enabled.  This is
done in order to easily notice IOMMU implementations with not yet supported
SAGAW values.

[0] Intel VT Directed Spec Rev 4.1

Fixes: 859d11b27912 ('VT-d: prune SAGAW recognition')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoiommu: fix quarantine mode command line documentation
Roger Pau Monne [Thu, 19 Oct 2023 10:45:51 +0000 (12:45 +0200)]
iommu: fix quarantine mode command line documentation

With the addition of per-device quarantine page tables the sink page is now
exclusive for each device, and thus writable.  Update the documentation to
reflect the current implementation.

Fixes: 14dd241aad8a ('IOMMU/x86: use per-device page tables for quarantining')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoEFI: reduce memory map logging level
Jan Beulich [Thu, 19 Oct 2023 12:08:22 +0000 (14:08 +0200)]
EFI: reduce memory map logging level

With the release build default now being INFO, the typically long EFI
memory map will want logging at DEBUG level only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
18 months agoautomation: extract QEMU log in relevant hardware tests
Marek Marczykowski-Górecki [Fri, 6 Oct 2023 02:05:19 +0000 (04:05 +0200)]
automation: extract QEMU log in relevant hardware tests

Let it be printed to the console too. QEMU and Linux messages have
different enough format that it should be possible to distinguish them.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoautomation: improve checking for MSI/MSI-X in PCI passthrough tests
Marek Marczykowski-Górecki [Fri, 6 Oct 2023 02:05:18 +0000 (04:05 +0200)]
automation: improve checking for MSI/MSI-X in PCI passthrough tests

Checking /proc/interrupts is unreliable because different drivers set
different names there. Install pciutils and use lspci instead.
In fact, the /proc/interrupts content was confusing enough that
adl-pci-hvm had it wrong (MSI-X is in use there). Fix this too.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoautomation: cleanup test alpine install
Marek Marczykowski-Górecki [Fri, 6 Oct 2023 02:05:17 +0000 (04:05 +0200)]
automation: cleanup test alpine install

Remove parts of initramfs for the test system (domU, and in few tests
dom0 too) that are not not working and are not really needed in this
simple system.

This makes the test log much lighter on misleading error messages.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoautomation: hide timeout countdown in log
Marek Marczykowski-Górecki [Fri, 6 Oct 2023 02:05:16 +0000 (04:05 +0200)]
automation: hide timeout countdown in log

grep+sleep message every 1s makes job log unnecessary hard to read.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoautomation: include real-time view of the domU console log too
Marek Marczykowski-Górecki [Fri, 6 Oct 2023 02:05:15 +0000 (04:05 +0200)]
automation: include real-time view of the domU console log too

Passthrough domU console log to the serial console in real time, not
only after the test. First of all, this gives domU console also in case
of test failure. But also, allows correlation between domU and dom0 or
Xen messages.

To avoid ambiguity, add log prefix with 'sed'.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoconsole: make input work again for pv-shim
Manuel Bouyer [Thu, 19 Oct 2023 07:54:50 +0000 (09:54 +0200)]
console: make input work again for pv-shim

The use of rcu_lock_domain_by_id() right in switch_serial_input() makes
assumptions about domain IDs which don't hold when in shim mode: The
sole (initial) domain there has a non-zero ID. Obtain the real domain ID
in that case (generalized as get_initial_domain_id() returns zero when
not in shim mode).

Note that console_input_domain() isn't altered, for not being used when
in shim mode (or more generally on x86).

Fixes: c2581c58bec9 ("xen/console: skip switching serial input to non existing domains")
Signed-off-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86/pvh: fix identity mapping of low 1MB
Roger Pau Monné [Thu, 19 Oct 2023 07:52:43 +0000 (09:52 +0200)]
x86/pvh: fix identity mapping of low 1MB

The mapping of memory regions below the 1MB mark was all done by the PVH dom0
builder code, causing the region to be avoided by the arch specific IOMMU
hardware domain initialization code.  That lead to the IOMMU being enabled
without reserved regions in the low 1MB identity mapped in the p2m for PVH
hardware domains.  Firmware which happens to be missing RMRR/IVMD ranges
describing E820 reserved regions in the low 1MB would transiently trigger IOMMU
faults until the p2m is populated by the PVH dom0 builder:

AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb380 flags 0x20 RW
AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb340 flags 0
AMD-Vi: IO_PAGE_FAULT: 0000:00:13.2 d0 addr 00000000000ea1c0 flags 0
AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb480 flags 0x20 RW
AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb080 flags 0x20 RW
AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb400 flags 0
AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb040 flags 0

Those errors have been observed on the osstest pinot{0,1} boxes (AMD Fam15h
Opteron(tm) Processor 3350 HE).

Rely on the IOMMU arch init code to create any identity mappings for reserved
regions in the low 1MB range (like it already does for reserved regions
elsewhere), and leave the mapping of any holes to be performed by the dom0
builder code.

Fixes: 6b4f6a31ace1 ('x86/PVH: de-duplicate mappings for first Mb of Dom0 memory')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86/microcode: Disable microcode update handler if DIS_MCU_UPDATE is set
Alejandro Vallejo [Wed, 30 Aug 2023 15:53:26 +0000 (16:53 +0100)]
x86/microcode: Disable microcode update handler if DIS_MCU_UPDATE is set

If IA32_MSR_MCU_CONTROL exists then it's possible a CPU may be unable to
perform microcode updates. This is controlled through the DIS_MCU_LOAD bit
and is intended for baremetal clouds where the owner may not trust the
tenant to choose the microcode version in use. If we notice that bit being
set then simply disable the "apply_microcode" handler so we can't even try
to perform update (as it's known to be silently dropped).

While at it, remove the Intel family check, as microcode loading is
supported on every Intel64 CPU.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86: Read MSR_ARCH_CAPS immediately after early_microcode_init()
Alejandro Vallejo [Wed, 30 Aug 2023 15:53:25 +0000 (16:53 +0100)]
x86: Read MSR_ARCH_CAPS immediately after early_microcode_init()

Move MSR_ARCH_CAPS read code from tsx_init() to early_cpu_init(). Because
microcode updates might make them that MSR to appear/have different values
we also must reload it after a microcode update in early_microcode_init().

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86/microcode: Ignore microcode loading interface for revision = -1
Alejandro Vallejo [Wed, 30 Aug 2023 15:53:24 +0000 (16:53 +0100)]
x86/microcode: Ignore microcode loading interface for revision = -1

Some hypervisors report ~0 as the microcode revision to mean "don't issue
microcode updates". Ignore the microcode loading interface in that case.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86/microcode: WARN->INFO for the "no ucode loading" log message
Alejandro Vallejo [Wed, 30 Aug 2023 15:53:23 +0000 (16:53 +0100)]
x86/microcode: WARN->INFO for the "no ucode loading" log message

Currently there's a printk statement triggered when no ucode loading
facilities are discovered. This statement should have severity INFO rather
than WARNING because it's not reporting anything wrong. Warnings ought
to be reserved for recoverable system errors.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agotools/pygrub: Fix pygrub's --entry flag for python3
Alejandro Vallejo [Wed, 11 Oct 2023 12:25:20 +0000 (13:25 +0100)]
tools/pygrub: Fix pygrub's --entry flag for python3

string.atoi() has been deprecated since Python 2.0, has a big scary warning
in the python2.7 docs and is absent from python3 altogether. int() does the
same thing and is compatible with both.

See https://docs.python.org/2/library/string.html#string.atoi:

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86/amd: Address AMD erratum #1485
Alejandro Vallejo [Fri, 13 Oct 2023 15:38:01 +0000 (16:38 +0100)]
x86/amd: Address AMD erratum #1485

This erratum has been observed to cause #UD exceptions.

Fix adapted off Linux's mailing list:
  https://lore.kernel.org/lkml/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com/T/#u

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoxen/pdx: Make CONFIG_PDX_COMPRESSION a common Kconfig option
Alejandro Vallejo [Tue, 8 Aug 2023 13:02:20 +0000 (14:02 +0100)]
xen/pdx: Make CONFIG_PDX_COMPRESSION a common Kconfig option

Adds a new compile-time flag to allow disabling PDX compression and
compiles out compression-related code/data. It also shorts the pdx<->pfn
conversion macros and creates stubs for masking functions.

While at it, removes the old arch-defined CONFIG_HAS_PDX flag.  Despite the
illusion of choice, it was not optional.

There are ARM and PPC platforms with sparse RAM banks - leave compression
active by default there.  However, there are no known production x86 systems
with sparse RAM banks, so disable compression.  RISC-V platforms are unknown
right now.  These decisions can be revisited if our understanding changes.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoxen/arm: Check return code from recursive calls to scan_pfdt_node()
Michal Orzel [Mon, 16 Oct 2023 12:45:59 +0000 (14:45 +0200)]
xen/arm: Check return code from recursive calls to scan_pfdt_node()

At the moment, we do not check a return code from scan_pfdt_node()
called recursively. This means that any issue that may occur while
parsing and copying the passthrough nodes is hidden and Xen continues
to boot a domain despite errors. This may lead to incorrect device tree
generation and various guest issues (e.g. trap on attempt to access MMIO
not mapped in P2M). Fix it.

Fixes: 669ecdf8d6cd ("xen/arm: copy dtb fragment to guest dtb")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agocxenstored: wait until after reset to notify dom0less domains
George Dunlap [Fri, 13 Oct 2023 23:06:24 +0000 (16:06 -0700)]
cxenstored: wait until after reset to notify dom0less domains

Commit fc2b57c9a ("xenstored: send an evtchn notification on
introduce_domain") introduced the sending of an event channel to the
guest when first introduced, so that dom0less domains waiting for the
connection would know that xenstore was ready to use.

Unfortunately, it was introduced in introduce_domain(), which 1) is
called by other functions, where such functionality is unneeded, and
2) after the main XS_INTRODUCE call, calls domain_conn_reset().  This
introduces a race condition, whereby if xenstored is delayed, a domain
can wake up, send messages to the buffer, only to have them deleted by
xenstore before finishing its processing of the XS_INTRODUCE message.

Move the connect-and-notfy call into do_introduce() instead, after the
domain_conn_rest(); predicated on the state being in the
XENSTORE_RECONNECT state.

(We don't need to check for "restoring", since that value is always
passed as "false" from do_domain_introduce()).

Also take the opportunity to add a missing wmb barrier after resetting
the indexes of the ring in domain_conn_reset.

This change will also remove an extra event channel notification for
dom0 (because the notification is now done by do_introduce which is not
called for dom0.) The extra dom0 event channel notification was only
introduced by fc2b57c9a and was never present before. It is not needed
because dom0 is the one to tell xenstored the connection parameters, so
dom0 has to know that the ring page is setup correctly by the time
xenstored starts looking at it. It is dom0 that performs the ring page
init.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
CC: jgross@suse.com
CC: julien@xen.org
CC: wl@xen.org
18 months agoget_maintainer: Add THE REST for sections with reviewers only
Anthony PERARD [Tue, 17 Oct 2023 07:53:34 +0000 (09:53 +0200)]
get_maintainer: Add THE REST for sections with reviewers only

Sometime, a contributer would like to be CCed on part of the changes,
and it could happen that we end-up with a section that doesn't have
any maintainer, but a Ack from a maintainer would still be needed.

Rework get_maintainer so if there's no maintainers beside THE REST, it
doesn't drop THE REST emails.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoxen/irq: address violations of MISRA C:2012 Rule 8.2
Federico Serafini [Tue, 17 Oct 2023 07:52:51 +0000 (09:52 +0200)]
xen/irq: address violations of MISRA C:2012 Rule 8.2

Add missing parameter names. No functional change.

Signed-off-by: Federico Serafini <federico.serafini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86/paging: address a violation of MISRA C:2012 Rule 8.3
Federico Serafini [Tue, 17 Oct 2023 07:52:18 +0000 (09:52 +0200)]
x86/paging: address a violation of MISRA C:2012 Rule 8.3

Make function declaration and definition consistent.
No functional change.

Signed-off-by: Federico Serafini <federico.serafini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agox86/mem_access: address violations of MISRA C:2012 Rule 8.3
Federico Serafini [Tue, 17 Oct 2023 07:51:07 +0000 (09:51 +0200)]
x86/mem_access: address violations of MISRA C:2012 Rule 8.3

Make function declarations and definitions consistent.
No functional change.

Signed-off-by: Federico Serafini <federico.serafini@bugseng.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agoxenalyze: Reduce warnings about leaving a vcpu in INIT
George Dunlap [Mon, 9 Oct 2023 10:19:57 +0000 (11:19 +0100)]
xenalyze: Reduce warnings about leaving a vcpu in INIT

We warn when we see data for a vcpu moving into a non-RUNNING state,
just so that people know why we're ignoring it.  On full traces, this
happens only once.  However, if the trace was limited to a subset of
pcpus, then this will happen every time the domain in question is
woken on that pcpu.

Add a 'delayed_init' flag to the vcpu struct to indicate when a vcpu
has experienced a delayed init.  Print a warning message once when
entering the state, and once when leaving it.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
18 months agoxenalyze: Fix interrupt EIP reporting
George Dunlap [Fri, 6 Oct 2023 15:54:10 +0000 (16:54 +0100)]
xenalyze: Fix interrupt EIP reporting

EIP lists are generalized across several use cases.  For many of them,
it make sense to have a cycle per sample; but not really for interrupt
EIP lists.  For this reason, it normally just passes 0 as for the tsc
value, which will in turn down at the bottom of update_cycles(),
update only the summary.event_count, but nothing else.

The dump_eip() function attempted to handle this by calling the generic
cycle print handler if the summary contained *any* cycles, and by collecting
and printing its own stats, based solely on counts, if not.

Unfortunately, it used the wrong element for this: it collected the
total from samples.count rather samples.event_count; in the case that
there are no cycles, this will always be zero.  It then divided by
this zero value.  This results in output that looked like this:

```
  ffff89d29656                                             :        0  -nan%
  ffff89d298b6                                             :        0  -nan%
  ffff89d298c0                                             :        0  -nan%
```

It's better than nothing, but a lot less informative than one would
like.

Use event_count rather than count for collecting the total, and the
reporting when there are no cycles in the summary information.  This results
in output that looks like this:

```
   ffff89d29656                                             :        2  1.21%
   ffff89d298b6                                             :        1  0.61%
   ffff89d298c0                                             :        1  0.61%
```

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
18 months agoxenalyze: Don't expect an HVM_HANDLER trace for PAUSE vmexits
George Dunlap [Fri, 6 Oct 2023 15:22:34 +0000 (16:22 +0100)]
xenalyze: Don't expect an HVM_HANDLER trace for PAUSE vmexits

Neither vmx nor svm trace anything, nor is there anything obvious
worth tracing.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
18 months agoxenalyze: AMD's VMEXIT_VINTR doesn't need a trace record
George Dunlap [Thu, 5 Oct 2023 16:26:38 +0000 (17:26 +0100)]
xenalyze: AMD's VMEXIT_VINTR doesn't need a trace record

Just like Intel's PENDING_VIRT_INTR, AMD's VINTR doesn't need an HVM
trace record.  Expect that.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
18 months agoxenalyze: Only accumulate data from one vmexit without a handler
George Dunlap [Fri, 6 Oct 2023 14:55:02 +0000 (15:55 +0100)]
xenalyze: Only accumulate data from one vmexit without a handler

If a vmentry/exit arc unexpectedly doesn't have a handler, we throw an
error, and then log the information under HVM event 0; thus those
particular traces within the vmexit reason will have stats gathered,
and will show up with "(no handler)".  This is useful in the event
that there are unusual paths through the hypervisor which don't have
trace points.

However, if there are more than one of these, then although we detect and warn
that this is happening, we subsequently continue to stuff data about all exits
into that one exit, even though we only show it in one place.

Instead, refator things to only allow a single exit reason to be
accumulated into any given event.

Also put a comment explaining what's going on, and how to fix it.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
18 months agoMAINTAINERS: Make Bob Eschleman a reviewer
George Dunlap [Wed, 4 Oct 2023 15:12:41 +0000 (16:12 +0100)]
MAINTAINERS: Make Bob Eschleman a reviewer

Following a conversation with Bob Eschleman, it was agreed that
Bobby would prefer to return to being a Reviewer.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
18 months agoxen/arm: vtimer: Don't read/use the secure physical timer interrupt for ACPI
Julien Grall [Thu, 5 Oct 2023 16:52:41 +0000 (17:52 +0100)]
xen/arm: vtimer: Don't read/use the secure physical timer interrupt for ACPI

Per ACPI 6.5 section 5.2.25 ("Generic Timer Description Table (GTDT)"),
the fields "Secure EL1 Timer GSIV/Flags" are optional and an OS running
in non-secure world is meant to ignore the values.

However, Xen is trying to reserve the value. The ACPI tables for Graviton
2 metal instances will provide the value 0 which is not a correct PPI
(PPIs start at 16) and would have in fact been already reserved by Xen
as this is an SGI. Xen will hit the BUG() and panic().

For the Device-Tree case, I couldn't find a statement suggesting
that the secure physical timer interrupt  is ignored. In fact, I have
found some code in Linux using it as a fallback. That said, it should
never be used.

As I am not aware of any issue when booting using Device-Tree, the
physical timer interrupt is only ignored for ACPI.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
18 months agodocs/misra: add deviations.rst to document additional deviations. 4.18.0-rc3
Nicola Vetrini [Fri, 13 Oct 2023 10:14:53 +0000 (12:14 +0200)]
docs/misra: add deviations.rst to document additional deviations.

This file contains the deviation that are not marked by
a deviation comment, as specified in
docs/misra/documenting-violations.rst.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
18 months agoxen/arm: Validate generic timer frequency
Michal Orzel [Thu, 28 Sep 2023 12:34:35 +0000 (14:34 +0200)]
xen/arm: Validate generic timer frequency

Generic timer dt node property "clock-frequency" (refer Linux dt binding
Documentation/devicetree/bindings/timer/arm,arch_timer.yaml) is used to
override the incorrect value set by firmware in CNTFRQ_EL0. If the value
of this property is 0 (i.e. by mistake), Xen would continue to work and
use the value from the sysreg instead. The logic is thus incorrect and
results in inconsistency when creating timer node for domUs:
 - libxl domUs: clock_frequency member of struct xen_arch_domainconfig
   is set to 0 and libxl ignores setting the "clock-frequency" property,
 - dom0less domUs: "clock-frequency" property is taken from dt_host and
   thus set to 0.

Property "clock-frequency" is used to override the value from sysreg,
so if it is also invalid, there is nothing we can do and we shall panic
to let user know about incorrect setting. Going even further, we operate
under assumption that the frequency must be at least 1KHz (i.e. cpu_khz
not 0) in order for Xen to boot. Therefore, introduce a new helper
validate_timer_frequency() to verify this assumption and use it to check
the frequency obtained either from dt property or from sysreg.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
19 months agox86/pv: Correct the auditing of guest breakpoint addresses
Andrew Cooper [Tue, 19 Sep 2023 11:13:50 +0000 (12:13 +0100)]
x86/pv: Correct the auditing of guest breakpoint addresses

The use of access_ok() is buggy, because it permits access to the compat
translation area.  64bit PV guests don't use the XLAT area, but on AMD
hardware, the DBEXT feature allows a breakpoint to match up to a 4G aligned
region, allowing the breakpoint to reach outside of the XLAT area.

Prior to c/s cda16c1bb223 ("x86: mirror compat argument translation area for
32-bit PV"), the live GDT was within 4G of the XLAT area.

All together, this allowed a malicious 64bit PV guest on AMD hardware to place
a breakpoint over the live GDT, and trigger a #DB livelock (CVE-2015-8104).

Introduce breakpoint_addr_ok() and explain why __addr_ok() happens to be an
appropriate check in this case.

For Xen 4.14 and later, this is a latent bug because the XLAT area has moved
to be on its own with nothing interesting adjacent.  For Xen 4.13 and older on
AMD hardware, this fixes a PV-trigger-able DoS.

This is part of XSA-444 / CVE-2023-34328.

Fixes: 65e355490817 ("x86/PV: support data breakpoint extension registers")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
19 months agox86/svm: Fix asymmetry with AMD DR MASK context switching
Andrew Cooper [Thu, 21 Sep 2023 16:26:23 +0000 (17:26 +0100)]
x86/svm: Fix asymmetry with AMD DR MASK context switching

The handling of MSR_DR{0..3}_MASK is asymmetric between PV and HVM guests.

HVM guests context switch in based on the guest view of DBEXT, whereas PV
guest switch in base on the host capability.  Both guest types leave the
context dirty for the next vCPU.

This leads to the following issue:

 * PV or HVM vCPU has debugging active (%dr7 + mask)
 * Switch out deactivates %dr7 but leaves other state stale in hardware
 * HVM vCPU with debugging activate but can't see DBEXT is switched in
 * Switch in loads %dr7 but leaves the mask MSRs alone

Now, the HVM vCPU is operating in the context of the prior vCPU's mask MSR,
and furthermore in a case where it genuinely expects there to be no masking
MSRs.

As a stopgap, adjust the HVM path to switch in/out the masks based on host
capabilities rather than guest visibility (i.e. like the PV path).  Adjustment
of the of the intercepts still needs to be dependent on the guest visibility
of DBEXT.

This is part of XSA-444 / CVE-2023-34327

Fixes: c097f54912d3 ("x86/SVM: support data breakpoint extension registers")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
19 months agolibxl: limit bootloader execution in restricted mode
Roger Pau Monne [Thu, 28 Sep 2023 10:22:35 +0000 (12:22 +0200)]
libxl: limit bootloader execution in restricted mode

Introduce a timeout for bootloader execution when running in restricted mode.

Allow overwriting the default time out with an environment provided value.

This is part of XSA-443 / CVE-2023-34325

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
19 months agolibxl: add support for running bootloader in restricted mode
Roger Pau Monne [Mon, 25 Sep 2023 12:30:20 +0000 (14:30 +0200)]
libxl: add support for running bootloader in restricted mode

Much like the device model depriv mode, add the same kind of support for the
bootloader.  Such feature allows passing a UID as a parameter for the
bootloader to run as, together with the bootloader itself taking the necessary
actions to isolate.

Note that the user to run the bootloader as must have the right permissions to
access the guest disk image (in read mode only), and that the bootloader will
be run in non-interactive mode when restricted.

If enabled bootloader restrict mode will attempt to re-use the user(s) from the
QEMU depriv implementation if no user is provided on the configuration file or
the environment.  See docs/features/qemu-deprivilege.pandoc for more
information about how to setup those users.

Bootloader restrict mode is not enabled by default as it requires certain
setup to be done first (setup of the user(s) to use in restrict mode).

This is part of XSA-443 / CVE-2023-34325

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
19 months agotools/pygrub: Deprivilege pygrub
Alejandro Vallejo [Mon, 25 Sep 2023 17:32:25 +0000 (18:32 +0100)]
tools/pygrub: Deprivilege pygrub

Introduce a --runas=<uid> flag to deprivilege pygrub on Linux and *BSDs. It
also implicitly creates a chroot env where it drops a deprivileged forked
process. The chroot itself is cleaned up at the end.

If the --runas arg is present, then pygrub forks, leaving the child to
deprivilege itself, and waiting for it to complete. When the child exists,
the parent performs cleanup and exits with the same error code.

This is roughly what the child does:
  1. Initialize libfsimage (this loads every .so in memory so the chroot
     can avoid bind-mounting /{,usr}/lib*
  2. Create a temporary empty chroot directory
  3. Mount tmpfs in it
  4. Bind mount the disk inside, because libfsimage expects a path, not a
     file descriptor.
  5. Remount the root tmpfs to be stricter (ro,nosuid,nodev)
  6. Set RLIMIT_FSIZE to a sensibly high amount (128 MiB)
  7. Depriv gid, groups and uid

With this scheme in place, the "output" files are writable (up to
RLIMIT_FSIZE octets) and the exposed filesystem is immutable and contains
the single only file we can't easily get rid of (the disk).

If running on Linux, the child process also unshares mount, IPC, and
network namespaces before dropping its privileges.

This is part of XSA-443 / CVE-2023-34325

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
19 months agotools/libfsimage: Export a new function to preload all plugins
Alejandro Vallejo [Mon, 25 Sep 2023 17:32:24 +0000 (18:32 +0100)]
tools/libfsimage: Export a new function to preload all plugins

This is work required in order to let pygrub operate in highly deprivileged
chroot mode. This patch adds a function that preloads every plugin, hence
ensuring that a on function exit, every shared library is loaded in memory.

The new "init" function is supposed to be used before depriv, but that's
fine because it's not acting on untrusted data.

This is part of XSA-443 / CVE-2023-34325

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
19 months agotools/pygrub: Open the output files earlier
Alejandro Vallejo [Mon, 25 Sep 2023 17:32:23 +0000 (18:32 +0100)]
tools/pygrub: Open the output files earlier

This patch allows pygrub to get ahold of every RW file descriptor it needs
early on. A later patch will clamp the filesystem it can access so it can't
obtain any others.

This is part of XSA-443 / CVE-2023-34325

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
19 months agotools/pygrub: Small refactors
Alejandro Vallejo [Mon, 25 Sep 2023 17:32:22 +0000 (18:32 +0100)]
tools/pygrub: Small refactors

Small tidy up to ensure output_directory always has a trailing '/' to ease
concatenating paths and that `output` can only be a filename or None.

This is part of XSA-443 / CVE-2023-34325

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
19 months agotools/pygrub: Remove unnecessary hypercall
Alejandro Vallejo [Mon, 25 Sep 2023 17:32:21 +0000 (18:32 +0100)]
tools/pygrub: Remove unnecessary hypercall

There's a hypercall being issued in order to determine whether PV64 is
supported, but since Xen 4.3 that's strictly true so it's not required.

Plus, this way we can avoid mapping the privcmd interface altogether in the
depriv pygrub.

This is part of XSA-443 / CVE-2023-34325

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>