]> xenbits.xensource.com Git - xen.git/log
xen.git
21 months agovpci/header: cope with devices not having vpci allocated
Roger Pau Monné [Mon, 17 Jul 2023 06:32:34 +0000 (08:32 +0200)]
vpci/header: cope with devices not having vpci allocated

When traversing the list of pci devices assigned to a domain cope with
some of them not having the vpci struct allocated. It should be
possible for the hardware domain to have read-only devices assigned
that are not handled by vPCI, such support will be added by further
patches.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ee045f3a4a6dddb09f5aa96a50cceaae97d3245f
master date: 2023-05-26 09:18:37 +0200

21 months agotools: convert bitfields to unsigned type
Olaf Hering [Mon, 17 Jul 2023 06:32:19 +0000 (08:32 +0200)]
tools: convert bitfields to unsigned type

clang complains about the signed type:

implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Backport: Dropped the libxenvchan change, for the original commit saying

"The potential ABI change in libxenvchan is covered by the Xen version
 based SONAME."

which won't hold on stable trees.
master commit: 99ab02f63ea813f2e467a39a7736bf460a3f3495
master date: 2023-05-16 20:03:02 +0100

23 months agopci: fix pci_get_pdev_by_domain() to always account for the segment
Roger Pau Monné [Tue, 23 May 2023 13:03:41 +0000 (15:03 +0200)]
pci: fix pci_get_pdev_by_domain() to always account for the segment

When a domain parameter is provided to pci_get_pdev_by_domain() the
search function would match against bus and devfn, without taking the
segment into account.

Fix this and also account for the passed segment.

Fixes: 8cf6e0738906 ('PCI: simplify (and thus correct) pci_get_pdev{,_by_domain}()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c7908869ac26961a3919491705e521179ad3fc0e
master date: 2023-05-22 16:11:55 +0200

23 months agosched/null: avoid crash after failed domU creation
Stewart Hildebrand [Tue, 23 May 2023 13:03:19 +0000 (15:03 +0200)]
sched/null: avoid crash after failed domU creation

When creating a domU, but the creation fails, there is a corner case that may
lead to a crash in the null scheduler when running a debug build of Xen.

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
(XEN) ****************************************

The events leading to the crash are:

* null_unit_insert() was invoked with the unit offline. Since the unit was
  offline, unit_assign() was not called, and null_unit_insert() returned.
* Later during domain creation, the unit was onlined
* Eventually, domain creation failed due to bad configuration
* null_unit_remove() was invoked with the unit still online. Since the unit was
  online, it called unit_deassign() and triggered an ASSERT.

To fix this, only call unit_deassign() when npc->unit is non-NULL in
null_unit_remove.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
master commit: c2eae2614c8f04e384cd3334c3f06f31a6cb5f41
master date: 2023-05-22 16:11:40 +0200

23 months agoiommu/amd-vi: fix assert comparing boolean to enum
Roger Pau Monné [Tue, 23 May 2023 13:02:50 +0000 (15:02 +0200)]
iommu/amd-vi: fix assert comparing boolean to enum

Or else when iommu_intremap is set to iommu_intremap_full the assert
triggers.

Fixes: 1ba66a870eba ('AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4c507d8a6b6e8be90881a335b0a66eb28e0f7737
master date: 2023-05-12 09:35:36 +0200

23 months agodocs/man: fix xenstore-write synopsis
Yann Dirson [Tue, 23 May 2023 13:02:34 +0000 (15:02 +0200)]
docs/man: fix xenstore-write synopsis

Reported-by: zithro <slack@rabbit.lu>
Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 8b1ac353b4db7c5bb2f82cb6afee9cc641e756a4
master date: 2023-05-09 10:37:29 +0100

23 months agons16550: enable memory decoding on MMIO-based PCI console card
Marek Marczykowski-Górecki [Tue, 23 May 2023 13:02:09 +0000 (15:02 +0200)]
ns16550: enable memory decoding on MMIO-based PCI console card

pci_serial_early_init() enables PCI_COMMAND_IO for IO-based UART
devices, add setting PCI_COMMAND_MEMORY for MMIO-based UART devices too.
Note the MMIO-based devices in practice need a "pci" sub-option,
otherwise a few parameters are not initialized (including bar_idx,
reg_shift, reg_width etc). The "pci" is not supposed to be used with
explicit BDF, so do not key setting PCI_COMMAND_MEMORY on explicit BDF
being set. Contrary to the IO-based UART, pci_serial_early_init() will
not attempt to set BAR0 address, even if user provided io_base manually
- in most cases, those are with an offest and the current cmdline syntax
doesn't allow expressing it. Due to this, enable PCI_COMMAND_MEMORY only
if uart->bar is already populated. In similar spirit, this patch does
not support setting BAR0 of the bridge.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: a16fb78515d54be95f81c0d1c0a3a7b954a54d0a
master date: 2023-05-08 14:15:38 +0200

23 months agotools/libs/guest: assist gcc13's realloc analyzer
Olaf Hering [Tue, 23 May 2023 13:01:53 +0000 (15:01 +0200)]
tools/libs/guest: assist gcc13's realloc analyzer

gcc13 fails to track the allocated memory in backup_ptes:

xg_offline_page.c: In function 'backup_ptes':
xg_offline_page.c:191:13: error: pointer 'orig' may be used after 'realloc' [-Werror=use-after-free]
  191 |             free(orig);

Assist the analyzer by slightly rearranging the code:
In case realloc succeeds, the previous allocation is either extended
or released internally. In case realloc fails, the previous allocation
is left unchanged. Return an error in this case, the caller will
release the currently allocated memory in its error path.

http://bugzilla.suse.com/show_bug.cgi?id=1210570

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Compile-tested-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 99a9c3d7141063ae3f357892c6181cfa3be8a280
master date: 2023-05-03 15:06:41 +0200

23 months agox86/mm: replace bogus assertion in paging_log_dirty_op()
Jan Beulich [Tue, 23 May 2023 13:01:24 +0000 (15:01 +0200)]
x86/mm: replace bogus assertion in paging_log_dirty_op()

While I was the one to introduce it, I don't think it is correct: A
bogus continuation call issued by a tool stack domain may find another
continuation in progress. IOW we've been asserting caller controlled
state (which is reachable only via a domctl), and the early (lock-less)
check in paging_domctl() helps in a limited way only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 0956aa2219745a198bb6a0a99e2108a3c09b280e
master date: 2023-05-03 13:38:30 +0200

23 months agoxen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM
Juergen Gross [Tue, 23 May 2023 13:00:59 +0000 (15:00 +0200)]
xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b033eddc9779109c06a26936321d27a2ef4e088b
master date: 2023-05-02 12:04:58 +0200

23 months agox86/msi: clear initial MSI-X state on boot
Marek Marczykowski-Górecki [Tue, 23 May 2023 13:00:30 +0000 (15:00 +0200)]
x86/msi: clear initial MSI-X state on boot

Some firmware/devices are found to not reset MSI-X properly, leaving
MASKALL set. Jason reports on his machine MASKALL persists through a
warm reboot, but is cleared on cold boot. Xen relies on initial state
being MASKALL clear. Especially, pci_reset_msix_state() assumes if
MASKALL is set, it was Xen setting it due to msix->host_maskall or
msix->guest_maskall. Clearing just MASKALL is risky if ENABLE is set,
so clear them both.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
master commit: 913751d7af6e78d65c1e2adf4887193c827f0c5e
master date: 2023-04-25 12:16:17 +0200

23 months agox86/extable: hide use of negative offset from array start
Jan Beulich [Tue, 23 May 2023 13:00:05 +0000 (15:00 +0200)]
x86/extable: hide use of negative offset from array start

In COVERAGE=y but DEBUG=n builds (observed by randconfig testing) gcc12
takes issue with the subtraction of 1 from __stop___pre_ex_table[],
considering this an out of bounds access. Not being able to know that
the symbol actually marks the end of an array, the compiler is kind of
right with this diagnosis. Move the subtraction into the function.

Reported-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 353b8cc56862dd808b75c6c96cd780cfee8f28bc
master date: 2023-02-22 13:50:20 +0100

23 months agoupdate Xen version to 4.16.5-pre
Jan Beulich [Tue, 23 May 2023 12:58:17 +0000 (14:58 +0200)]
update Xen version to 4.16.5-pre

2 years agoupdate Xen version to 4.16.4 RELEASE-4.16.4
Jan Beulich [Thu, 27 Apr 2023 12:54:26 +0000 (14:54 +0200)]
update Xen version to 4.16.4

2 years agoautomation: Remove installation of packages from test scripts
Michal Orzel [Tue, 25 Apr 2023 07:22:59 +0000 (09:22 +0200)]
automation: Remove installation of packages from test scripts

Now, when these packages are already installed in the respective
containers, we can remove them from the test scripts.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 72cfe1c3ad1fae95f4f0ac51dbdd6838264fdd7f
master date: 2022-12-09 14:55:33 -0800

2 years agoxen/ELF: Fix ELF32 PRI formatters
Andrew Cooper [Mon, 24 Apr 2023 11:06:28 +0000 (13:06 +0200)]
xen/ELF: Fix ELF32 PRI formatters

It is rude to hide width formatting inside a PRI* macro, doubly so when it's
only in one bitness of the macro.

However its fully buggy when all the users use %#"PRI because then it expands
to the common trap of %#08x which does not do what the author intends.

Switch the 32bit ELF PRI formatters to use plain integer PRI's, just like on
the 64bit side already.  No practical change.

Fixes: 7597fabca76e ("livepatch: Include sizes when an mismatch occurs")
Fixes: 380b229634f8 ("xsplice: Implement payload loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: cfa2bb82c01f0c656804cedd8f44eb2a99a2b5bc
master date: 2023-04-19 15:55:29 +0100

2 years agox86/livepatch: Fix livepatch application when CET is active
Andrew Cooper [Mon, 24 Apr 2023 11:05:52 +0000 (13:05 +0200)]
x86/livepatch: Fix livepatch application when CET is active

Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3
or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows:

  (XEN) livepatch: lp: Verifying enabled expectations for all functions
  (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns
  (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 CPUs
  (XEN) livepatch: lp: Applying 1 functions
  (XEN) hi_func: Hi! (called 1 times)
  (XEN) Hook executing.
  (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu))' failed at arch/x86/smp.c:265
  (XEN) *** DOUBLE FAULT ***
  <many double faults>

The assertion failure is from a global (system wide) TLB flush initiated by
modify_xen_mappings().  I'm not entirely sure when this broke, and I'm not
sure exactly what causes the #DF's, but it doesn't really matter either
because they highlight a latent bug that I'd overlooked with the CET-SS vs
patching work the first place.

While we're careful to arrange for the patching CPU to avoid encountering
non-shstk memory with transient shstk perms, other CPUs can pick these
mappings up too if they need to re-walk for uarch reasons.

Another bug is that for livepatching, we only disable CET if shadow stacks are
in use.  Running on Intel CET systems when Xen is only using CET-IBT will
crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET
still active.

Also, we never went and cleared the dirty bits on .rodata.  This would
matter (for the same reason it matters on .text - it becomes a valid target
for WRSS), but we never actually patch .rodata anyway.

Therefore rework how we do patching for both alternatives and livepatches.

Introduce modify_xen_mappings_lite() with a purpose similar to
modify_xen_mappings(), but stripped down to the bare minimum as it's used in
weird contexts.  Leave all complexity to the caller to handle.

Instead of patching by clearing CR0.WP (and having to jump through some
fragile hoops to disable CET in order to do this), just transiently relax the
permissions on .text via l2_identmap[].

Note that neither alternatives nor livepatching edit .rodata, so we don't need
to relax those permissions at this juncture.

The perms are relaxed globally, but this is safe enough.  Alternatives run
before we boot APs, and Livepatching runs in a quiesced state where the other
CPUs are not doing anything interesting.

This approach is far more robust.

Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: 8676092a0f16ca6ad188d3fb270784a2caecf542
master date: 2023-04-18 20:20:26 +0100

2 years agox86/hvm: Disallow disabling paging in 64bit mode
Andrew Cooper [Mon, 24 Apr 2023 11:05:24 +0000 (13:05 +0200)]
x86/hvm: Disallow disabling paging in 64bit mode

The Long Mode consistency checks exist to "ensure that the processor does not
enter an undefined mode or state that results in unpredictable behavior".  APM
Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
preventing the OS from trying to exit Long mode while in 64bit mode.  This
could leave the CPU in Protected Mode with an %rip above the 4G boundary.

Experimentally, AMD CPUs really do permit this state transition.  An OS which
tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
to be going on behind the scenes ought to result in sane continued execution.

Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
section instructs peoples to switch to a compatibility mode segment first
before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
Architecture".

Either way, this appears to have been a genuine oversight in the AMD64 spec.

Intel, on the other hand, rejects this state transition with #GP.

Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
4.1.2 "Paging-Mode Enable" was altered from

  If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
  exception (#GP); software should clear CR4.PCIDE before attempting to
  disable paging.

to

  If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
  clear CR0.PG causes a general-protection exception (#GP). Software should
  transition to compatibility mode and clear CR4.PCIDE before attempting to
  disable paging.

which acknowledges this corner case, but there doesn't appear to be any other
discussion even in the relevant Long Mode sections.

So it appears that Intel spotted and addressed the corner case in IA-32e mode,
but were 15 years late to document it.

Xen was written to the AMD spec, and misses the check.  Follow the Intel
behaviour, because it is more sensible and avoids hitting a VMEntry failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 18c128ba66e6308744850aca96dbffd18f91c29b
master date: 2023-04-14 18:18:20 +0100

2 years agox86emul: pull permission check ahead for REP INS/OUTS
Jan Beulich [Mon, 24 Apr 2023 11:03:59 +0000 (13:03 +0200)]
x86emul: pull permission check ahead for REP INS/OUTS

Based on observations on a fair range of hardware from both primary
vendors even zero-iteration-count instances of these insns perform the
port related permission checking first.

Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f41c88a6fca59f99a2eb5e7ed3d90ab7bca08b1b
master date: 2023-03-30 13:07:16 +0200

2 years agotools/xenstore: fix quota check in transaction_fix_domains()
Juergen Gross [Mon, 24 Apr 2023 11:03:36 +0000 (13:03 +0200)]
tools/xenstore: fix quota check in transaction_fix_domains()

Today when finalizing a transaction the number of node quota is checked
to not being exceeded after the transaction. This check is always done,
even if the transaction is being performed by a privileged connection,
or if there were no nodes created in the transaction.

Correct that by checking quota only if:
- the transaction is being performed by an unprivileged guest, and
- at least one node was created in the transaction

Reported-by: Julien Grall <julien@xen.org>
Fixes: f2bebf72c4d5 ("xenstore: rework of transaction handling")
Signed-off-by: Juergen Gross <jgross@suse.com>
master commit: f6b801c36bd5e4ab22a9f80c8d57121b62b139af
master date: 2023-03-29 22:02:36 +0100

2 years agoCI: Remove llvm-8 from the Debian Stretch container
Andrew Cooper [Fri, 24 Mar 2023 17:59:56 +0000 (17:59 +0000)]
CI: Remove llvm-8 from the Debian Stretch container

For similar reasons to c/s a6b1e2b80fe20.  While this container is still
build-able for now, all the other problems with explicitly-versioned compilers
remain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7a298375721636290a57f31bb0f7c2a5a38956a4)

2 years agoautomation: Remove non-debug x86_32 build jobs
Anthony PERARD [Fri, 24 Feb 2023 17:29:15 +0000 (17:29 +0000)]
automation: Remove non-debug x86_32 build jobs

In the interest of having less jobs, we remove the x86_32 build jobs
that do release build. Debug build is very likely to be enough to find
32bit build issues.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 7b66792ea7f77fb9e587e1e9c530a7c869eecba1)

2 years agoautomation: Remove CentOS 7.2 containers and builds
Anthony PERARD [Tue, 21 Feb 2023 16:55:36 +0000 (16:55 +0000)]
automation: Remove CentOS 7.2 containers and builds

We already have a container which track the latest CentOS 7, no need
for this one as well.

Also, 7.2 have outdated root certificate which prevent connection to
website which use Let's Encrypt.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit ba512629f76dfddb39ea9133ee51cdd9e392a927)

2 years agoautomation: Switch arm32 cross builds to run on arm64
Michal Orzel [Tue, 14 Feb 2023 15:38:38 +0000 (16:38 +0100)]
automation: Switch arm32 cross builds to run on arm64

Due to the limited x86 CI resources slowing down the whole pipeline,
switch the arm32 cross builds to be executed on arm64 which is much more
capable. For that, rename the existing debian container dockerfile
from unstable-arm32-gcc to unstable-arm64v8-arm32-gcc and use
arm64v8/debian:unstable as an image. Note, that we cannot use the same
container name as we have to keep the backwards compatibility.
Take the opportunity to remove extra empty line at the end of a file.

Modify the tag of .arm32-cross-build-tmpl to arm64 and update the build
jobs accordingly.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit a35fccc8df93de7154dba87db6e7bcf391e9d51c)

2 years agoCI: Drop automation/configs/
Andrew Cooper [Thu, 29 Dec 2022 15:39:13 +0000 (15:39 +0000)]
CI: Drop automation/configs/

Having 3 extra hypervisor builds on the end of a full build is deeply
confusing to debug if one of them fails, because the .config file presented in
the artefacts is not the one which caused a build failure.  Also, the log
tends to be truncated in the UI.

PV-only is tested as part of PV-Shim in a full build anyway, so doesn't need
repeating.  HVM-only and neither appear frequently in randconfig, so drop all
the logic here to simplify things.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7b20009a812f26e74bdbde2ab96165376b3dad34)

2 years agobump default SeaBIOS version to 1.16.0
Jan Beulich [Fri, 6 May 2022 12:46:52 +0000 (14:46 +0200)]
bump default SeaBIOS version to 1.16.0

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 944e389daa133dd310d87c4eebacba9f6da76018)

2 years agons16550: correct name/value pair parsing for PCI port/bridge
Jan Beulich [Fri, 31 Mar 2023 06:42:02 +0000 (08:42 +0200)]
ns16550: correct name/value pair parsing for PCI port/bridge

First of all these were inverted: "bridge=" caused the port coordinates
to be established, while "port=" controlled the bridge coordinates. And
then the error messages being identical also wasn't helpful. While
correcting this also move both case blocks close together.

Fixes: 97fd49a7e074 ("ns16550: add support for UART parameters to be specifed with name-value pairs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e692b22230b411d762ac9e278a398e28df474eae
master date: 2023-03-29 14:55:37 +0200

2 years agovpci/msix: handle accesses adjacent to the MSI-X table
Roger Pau Monné [Fri, 31 Mar 2023 06:41:27 +0000 (08:41 +0200)]
vpci/msix: handle accesses adjacent to the MSI-X table

The handling of the MSI-X table accesses by Xen requires that any
pages part of the MSI-X related tables are not mapped into the domain
physmap.  As a result, any device registers in the same pages as the
start or the end of the MSIX or PBA tables is not currently
accessible, as the accesses are just dropped.

Note the spec forbids such placing of registers, as the MSIX and PBA
tables must be 4K isolated from any other registers:

"If a Base Address register that maps address space for the MSI-X
Table or MSI-X PBA also maps other usable address space that is not
associated with MSI-X structures, locations (e.g., for CSRs) used in
the other address space must not share any naturally aligned 4-KB
address range with one where either MSI-X structure resides."

Yet the 'Intel Wi-Fi 6 AX201' device on one of my boxes has registers
in the same page as the MSIX tables, and thus won't work on a PVH dom0
without this fix.

In order to cope with the behavior passthrough any accesses that fall
on the same page as the MSIX tables (but don't fall in between) to the
underlying hardware.  Such forwarding also takes care of the PBA
accesses, so it allows to remove the code doing this handling in
msix_{read,write}.  Note that as a result accesses to the PBA array
are no longer limited to 4 and 8 byte sizes, there's no access size
restriction for PBA accesses documented in the specification.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
vpci/msix: restore PBA access length and alignment restrictions

Accesses to the PBA array have the same length and alignment
limitations as accesses to the MSI-X table:

"For all accesses to MSI-X Table and MSI-X PBA fields, software must
use aligned full DWORD or aligned full QWORD transactions; otherwise,
the result is undefined."

Introduce such length and alignment checks into the handling of PBA
accesses for vPCI.  This was a mistake of mine for not reading the
specification correctly.

Note that accesses must now be aligned, and hence there's no longer a
need to check that the end of the access falls into the PBA region as
both the access and the region addresses must be aligned.

Fixes: b177892d2d ('vpci/msix: handle accesses adjacent to the MSI-X table')
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b177892d2d0e8a31122c218989f43130aeba5282
master date: 2023-03-28 14:20:35 +0200
master commit: 7a502b4fbc339e9d3d3d45fb37f09da06bc3081c
master date: 2023-03-29 14:56:33 +0200

2 years agox86/ucode: Fix error paths control_thread_fn()
Andrew Cooper [Fri, 31 Mar 2023 06:40:56 +0000 (08:40 +0200)]
x86/ucode: Fix error paths control_thread_fn()

These two early exits skipped re-enabling the watchdog, restoring the NMI
callback, and clearing the nmi_patch global pointer.  Always execute the tail
of the function on the way out.

Fixes: 8dd4dfa92d62 ("x86/microcode: Synchronize late microcode loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: fc2e1f3aad602a66c14b8285a1bd38a82f8fd02d
master date: 2023-03-28 11:57:56 +0100

2 years agox86/vmx: Don't spuriously crash the domain when INIT is received
Andrew Cooper [Fri, 31 Mar 2023 06:40:27 +0000 (08:40 +0200)]
x86/vmx: Don't spuriously crash the domain when INIT is received

In VMX operation, the handling of INIT IPIs is changed.  Instead of the CPU
resetting, the next VMEntry fails with EXIT_REASON_INIT.  From the TXT spec,
the intent of this behaviour is so that an entity which cares can scrub
secrets from RAM before participating in an orderly shutdown.

Right now, Xen's behaviour is that when an INIT arrives, the HVM VM which
schedules next is killed (citing an unknown VMExit), *and* we ignore the INIT
and continue blindly onwards anyway.

This patch addresses only the first of these two problems by ignoring the INIT
and continuing without crashing the VM in question.

The second wants addressing too, just as soon as we've figured out something
better to do...

Discovered as collateral damage from when an AP triple faults on S3 resume on
Intel TigerLake platforms.

Link: https://github.com/QubesOS/qubes-issues/issues/7283
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: b1f11273d5a774cc88a3685c96c2e7cf6385e3b6
master date: 2023-03-24 22:49:58 +0000

2 years agox86/shadow: Fix build with no PG_log_dirty
Andrew Cooper [Fri, 31 Mar 2023 06:39:49 +0000 (08:39 +0200)]
x86/shadow: Fix build with no PG_log_dirty

Gitlab Randconfig found:

  arch/x86/mm/shadow/common.c: In function 'shadow_prealloc':
  arch/x86/mm/shadow/common.c:1023:18: error: implicit declaration of function
      'paging_logdirty_levels'; did you mean 'paging_log_dirty_init'? [-Werror=implicit-function-declaration]
   1023 |         count += paging_logdirty_levels();
        |                  ^~~~~~~~~~~~~~~~~~~~~~
        |                  paging_log_dirty_init
  arch/x86/mm/shadow/common.c:1023:18: error: nested extern declaration of 'paging_logdirty_levels' [-Werror=nested-externs]

The '#if PG_log_dirty' expression is currently SHADOW_PAGING && !HVM &&
PV_SHIM_EXCLUSIVE.  Move the declaration outside.

Fixes: 33fb3a661223 ("x86/shadow: account for log-dirty mode when pre-allocating")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d14cb105b1c54ad7b4228d858ae85aa8a672bbd
master date: 2023-03-24 12:16:31 +0000

2 years agox86/nospec: Fix evaluate_nospec() code generation under Clang
Andrew Cooper [Fri, 31 Mar 2023 06:39:32 +0000 (08:39 +0200)]
x86/nospec: Fix evaluate_nospec() code generation under Clang

It turns out that evaluate_nospec() code generation is not safe under Clang.
Given:

  void eval_nospec_test(int x)
  {
      if ( evaluate_nospec(x) )
          asm volatile ("nop #true" ::: "memory");
      else
          asm volatile ("nop #false" ::: "memory");
  }

Clang emits:

  <eval_nospec_test>:
         0f ae e8                lfence
         85 ff                   test   %edi,%edi
         74 02                   je     <eval_nospec_test+0x9>
         90                      nop
         c3                      ret
         90                      nop
         c3                      ret

which is not safe because the lfence has been hoisted above the conditional
jump.  Clang concludes that both barrier_nospec_true()'s have identical side
effects and can safely be merged.

Clang can be persuaded that the side effects are different if there are
different comments in the asm blocks.  This is fragile, but no more fragile
that other aspects of this construct.

Introduce barrier_nospec_false() with a separate internal comment to prevent
Clang merging it with barrier_nospec_true() despite the otherwise-identical
content.  The generated code now becomes:

  <eval_nospec_test>:
         85 ff                   test   %edi,%edi
         74 05                   je     <eval_nospec_test+0x9>
         0f ae e8                lfence
         90                      nop
         c3                      ret
         0f ae e8                lfence
         90                      nop
         c3                      ret

which has the correct number of lfence's, and in the correct place.

Link: https://github.com/llvm/llvm-project/issues/55084
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: bc3c133841435829ba5c0a48427e2a77633502ab
master date: 2023-03-24 12:16:31 +0000

2 years agox86/shadow: fix and improve sh_page_has_multiple_shadows()
Jan Beulich [Fri, 31 Mar 2023 06:38:42 +0000 (08:38 +0200)]
x86/shadow: fix and improve sh_page_has_multiple_shadows()

While no caller currently invokes the function without first making sure
there is at least one shadow [1], we'd better eliminate UB here:
find_first_set_bit() requires input to be non-zero to return a well-
defined result.

Further, using find_first_set_bit() isn't very efficient in the first
place for the intended purpose.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[1] The function has exactly two uses, and both are from OOS code, which
    is HVM-only. For HVM (but not for PV) sh_mfn_is_a_page_table(),
    guarding the call to sh_unsync(), guarantees at least one shadow.
    Hence even if sh_page_has_multiple_shadows() returned a bogus value
    when invoked for a PV domain, the subsequent is_hvm_vcpu() and
    oos_active checks (the former being redundant with the latter) will
    compensate. (Arguably that oos_active check should come first, for
    both clarity and efficiency reasons.)
master commit: 2896224a4e294652c33f487b603d20bd30955f21
master date: 2023-03-24 11:07:08 +0100

2 years agoVT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)
Marek Marczykowski-Górecki [Fri, 31 Mar 2023 06:38:07 +0000 (08:38 +0200)]
VT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)

If the scope for IGD's IOMMU contains additional device that doesn't
actually exist, iommu=no-igfx would not disable that IOMMU. In this
particular case (Thinkpad x230) it included 00:02.1, but there is no
such device on this platform. Consider only existing devices for the
"gfx only" check as well as the establishing of IGD DRHD address
(underlying is_igd_drhd(), which is used to determine applicability of
two workarounds).

Fixes: 2d7f191b392e ("VT-d: generalize and correct "iommu=no-igfx" handling")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 49de6749baa8d0addc3048defd4ef3e85cb135e9
master date: 2023-03-23 09:16:41 +0100

2 years agoAMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
Jan Beulich [Fri, 31 Mar 2023 06:36:59 +0000 (08:36 +0200)]
AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

An earlier change with the same title (commit 1ba66a870eba) altered only
the path where x2apic_phys was already set to false (perhaps from the
command line). The same of course needs applying when the variable
wasn't modified yet from its initial value.

Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 0d2686f6b66b4b1b3c72c3525083b0ce02830054
master date: 2023-03-21 09:23:25 +0100

2 years agolibacpi: fix PCI hotplug AML
David Woodhouse [Tue, 21 Mar 2023 12:53:25 +0000 (13:53 +0100)]
libacpi: fix PCI hotplug AML

The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).

The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.

As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:

+                    ShiftRight (0x4, \_GPE.PH00, Local1)
+                    Return (Local1) /* IN status as the _STA */

Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?

Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*

Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.

Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b190af7d3e90f58da5f58044b8dea7261b8b483d
master date: 2023-03-20 17:12:34 +0100

2 years agobunzip: work around gcc13 warning
Jan Beulich [Tue, 21 Mar 2023 12:52:58 +0000 (13:52 +0100)]
bunzip: work around gcc13 warning

While provable that length[0] is always initialized (because symCount
cannot be zero), upcoming gcc13 fails to recognize this and warns about
the unconditional use of the value immediately following the loop.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511.

Reported-by: Martin Liška <martin.liska@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 402195e56de0aacf97e05c80ed367d464ca6938b
master date: 2023-03-14 10:45:28 +0100

2 years agoVT-d: constrain IGD check
Jan Beulich [Tue, 21 Mar 2023 12:52:20 +0000 (13:52 +0100)]
VT-d: constrain IGD check

Marking a DRHD as controlling an IGD isn't very sensible without
checking that at the very least it's a graphics device that lives at
0000:00:02.0. Re-use the reading of the class-code to control both the
clearing of "gfx_only" and the setting of "igd_drhd_address".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: f8c4317295fa1cde1a81779b7e362651c084efb8
master date: 2023-03-14 10:44:08 +0100

2 years agox86/altp2m: help gcc13 to avoid it emitting a warning
Jan Beulich [Tue, 21 Mar 2023 12:51:42 +0000 (13:51 +0100)]
x86/altp2m: help gcc13 to avoid it emitting a warning

Switches of altp2m-s always expect a valid altp2m to be in place (and
indeed altp2m_vcpu_initialise() sets the active one to be at index 0).
The compiler, however, cannot know that, and hence it cannot eliminate
p2m_get_altp2m()'s case of returnin (literal) NULL. If then the compiler
decides to special case that code path in the caller, the dereference in
instances of

    atomic_dec(&p2m_get_altp2m(v)->active_vcpus);

can, to the code generator, appear to be NULL dereferences, leading to

In function 'atomic_dec',
    inlined from '...' at ...:
./arch/x86/include/asm/atomic.h:182:5: error: array subscript 0 is outside array bounds of 'int[0]' [-Werror=array-bounds=]

Aid the compiler by adding a BUG_ON() checking the return value of the
problematic p2m_get_altp2m(). Since with the use of the local variable
the 2nd p2m_get_altp2m() each will look questionable at the first glance
(Why is the local variable not used here?), open-code the only relevant
piece of p2m_get_altp2m() there.

To avoid repeatedly doing these transformations, and also to limit how
"bad" the open-coding really is, convert the entire operation to an
inline helper, used by all three instances (and accepting the redundant
BUG_ON(idx >= MAX_ALTP2M) in two of the three cases).

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: be62b1fc2aa7375d553603fca07299da765a89fe
master date: 2023-03-13 15:16:21 +0100

2 years agocore-parking: fix build with gcc12 and NR_CPUS=1
Jan Beulich [Tue, 21 Mar 2023 12:50:18 +0000 (13:50 +0100)]
core-parking: fix build with gcc12 and NR_CPUS=1

Gcc12 takes issue with core_parking_remove()'s

    for ( ; i < cur_idle_nums; ++i )
        core_parking_cpunum[i] = core_parking_cpunum[i + 1];

complaining that the right hand side array access is past the bounds of
1. Clearly the compiler can't know that cur_idle_nums can only ever be
zero in this case (as the sole CPU cannot be parked).

Arrange for core_parking.c's contents to not be needed altogether, and
then disable its building when NR_CPUS == 1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 4b0422f70feb4b1cd04598ffde805fc224f3812e
master date: 2023-03-13 15:15:42 +0100

2 years agotools/xenmon: Fix xenmon.py for with python3.x
Bernhard Kaindl [Tue, 21 Mar 2023 12:49:47 +0000 (13:49 +0100)]
tools/xenmon: Fix xenmon.py for with python3.x

Fixes for Py3:
* class Delayed(): file not defined; also an error for pylint -E.  Inherit
  object instead for Py2 compatibility.  Fix DomainInfo() too.
* Inconsistent use of tabs and spaces for indentation (in one block)

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3a59443c1d5ae0677a792c660ccd3796ce036732
master date: 2023-02-06 10:22:12 +0000

2 years agotools/python: change 's#' size type for Python >= 3.10
Marek Marczykowski-Górecki [Tue, 21 Mar 2023 12:49:28 +0000 (13:49 +0100)]
tools/python: change 's#' size type for Python >= 3.10

Python < 3.10 by default uses 'int' type for data+size string types
(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
required to define PY_SSIZE_T_CLEAN before including Python.h, and using
Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
supported since Python 2.5.

Adjust bindings accordingly.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 897257ba49d0a6ddcf084960fd792ccce9c40f94
master date: 2023-02-06 08:50:13 +0100

2 years agox86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path
Andrew Cooper [Fri, 10 Feb 2023 21:11:14 +0000 (21:11 +0000)]
x86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path

As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
the two hunks visible in the patch, RET's are not safe prior to this point.

CR4_PV32_RESTORE hides a CALL/RET pair in certain configurations (PV32
compiled in, SMEP or SMAP active), and the RET can be attacked with one of
several known speculative issues.

Furthermore, CR4_PV32_RESTORE also hides a reference to the cr4_pv32_mask
global variable, which is not safe when XPTI is active before restoring Xen's
full pagetables.

This crash has gone unnoticed because it is only AMD CPUs which permit the
SYSCALL instruction in compatibility mode, and these are not vulnerable to
Meltdown so don't activate XPTI by default.

This is XSA-429 / CVE-2022-42331

Fixes: 5e7962901131 ("x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point")
Fixes: 5784de3e2067 ("x86: Meltdown band-aid against malicious 64-bit PV guests")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit df5b055b12116d9e63ced59ae5389e69a2a3de48)

2 years agox86/HVM: serialize pinned cache attribute list manipulation
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: serialize pinned cache attribute list manipulation

While the RCU variants of list insertion and removal allow lockless list
traversal (with RCU just read-locked), insertions and removals still
need serializing amongst themselves. To keep things simple, use the
domain lock for this purpose.

This is CVE-2022-42334 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 829ec245cf66560e3b50d140ccb3168e7fb7c945)

2 years agox86/HVM: bound number of pinned cache attribute regions
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: bound number of pinned cache attribute regions

This is exposed via DMOP, i.e. to potentially not fully privileged
device models. With that we may not permit registration of an (almost)
unbounded amount of such regions.

This is CVE-2022-42333 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a5e768640f786b681063f4e08af45d0c4e91debf)

2 years agox86/shadow: account for log-dirty mode when pre-allocating
Jan Beulich [Tue, 21 Mar 2023 11:59:44 +0000 (11:59 +0000)]
x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
(cherry picked from commit 91767a71061035ae42be93de495cd976f863a41a)

2 years agox86/ucode/AMD: late load the patch on every logical thread
Sergey Dyasli [Fri, 3 Mar 2023 07:17:40 +0000 (08:17 +0100)]
x86/ucode/AMD: late load the patch on every logical thread

Currently late ucode loading is performed only on the first core of CPU
siblings.  But according to the latest recommendation from AMD, late
ucode loading should happen on every logical thread/core on AMD CPUs.

To achieve that, introduce is_cpu_primary() helper which will consider
every logical cpu as "primary" when running on AMD CPUs.  Also include
Hygon in the check for future-proofing.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f1315e48a03a42f78f9b03c0a384165baf02acae
master date: 2023-02-28 14:51:28 +0100

2 years agolibs/guest: Fix leak on realloc failure in backup_ptes()
Edwin Török [Fri, 3 Mar 2023 07:17:23 +0000 (08:17 +0100)]
libs/guest: Fix leak on realloc failure in backup_ptes()

From `man 2 realloc`:

  If realloc() fails, the original block is left untouched; it is not freed or moved.

Found using GCC -fanalyzer:

  |  184 |         backup->entries = realloc(backup->entries,
  |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  |      |         |               | |
  |      |         |               | (91) when ‘realloc’ fails
  |      |         |               (92) ‘old_ptes.entries’ leaks here; was allocated at (44)
  |      |         (90) ...to here

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 275d13184cfa52ebe4336ed66526ce93716adbe0
master date: 2023-02-27 15:51:23 +0000

2 years agolibs/guest: Fix resource leaks in xc_core_arch_map_p2m_tree_rw()
Andrew Cooper [Fri, 3 Mar 2023 07:17:04 +0000 (08:17 +0100)]
libs/guest: Fix resource leaks in xc_core_arch_map_p2m_tree_rw()

Edwin, with the help of GCC's -fanalyzer, identified that p2m_frame_list_list
gets leaked.  What fanalyzer can't see is that the live_p2m_frame_list_list
and live_p2m_frame_list foreign mappings are leaked too.

Rework the logic so the out path is executed unconditionally, which cleans up
all the intermediate allocations/mappings appropriately.

Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear p2m table")
Reported-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1868d7f22660c8980bd0a7e53f044467e8b63bb5
master date: 2023-02-27 15:51:23 +0000

2 years agotools: Use PKG_CONFIG_FILE instead of PKG_CONFIG variable
Bertrand Marquis [Fri, 3 Mar 2023 07:16:45 +0000 (08:16 +0100)]
tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG variable

Replace PKG_CONFIG variable name with PKG_CONFIG_FILE for the name of
the pkg-config file.
This is preventing a conflict in some build systems where PKG_CONFIG
actually contains the path to the pkg-config executable to use, as the
default assignment in libs.mk is using a weak assignment (?=).

This problem has been found when trying to build the latest version of
Xen tools using buildroot.

Fixes: d400dc5729e4 ("tools: tweak tools/libs/libs.mk for being able to support libxenctrl")
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: b97e2fe7b9e1f4706693552697239ac2b71efee4
master date: 2023-02-24 17:44:29 +0000

2 years agoxen: Fix Clang -Wunicode diagnostic when building asm-macros
Andrew Cooper [Fri, 3 Mar 2023 07:15:50 +0000 (08:15 +0100)]
xen: Fix Clang -Wunicode diagnostic when building asm-macros

While trying to work around a different Clang-IAS bug (parent changeset), I
stumbled onto:

  In file included from arch/x86/asm-macros.c:3:
  ./arch/x86/include/asm/spec_ctrl_asm.h:144:19: error: \u used with
  no following hex digits; treating as '\' followed by identifier [-Werror,-Wunicode]
  .L\@_fill_rsb_loop\uniq:
                    ^

It turns out that Clang -E is sensitive to the file extension of the source
file it is processing.  Furthermore, C explicitly permits the use of \u
escapes in identifier names, so the diagnostic would be reasonable in
principle if we trying to compile the result.

asm-macros should really have been .S from the outset, as it is ultimately
generating assembly, not C.  Rename it, which causes Clang not to complain.

We need to introduce rules for generating a .i file from .S, and substituting
c_flags for a_flags lets us drop the now-redundant -D__ASSEMBLY__.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 53f0d02040b1df08f0589f162790ca376e1c2040
master date: 2023-02-24 17:44:29 +0000

2 years agoxen: Work around Clang-IAS macro \@ expansion bug
Andrew Cooper [Fri, 3 Mar 2023 07:14:57 +0000 (08:14 +0100)]
xen: Work around Clang-IAS macro \@ expansion bug

https://github.com/llvm/llvm-project/issues/60792

It turns out that Clang-IAS does not expand \@ uniquely in a translaition
unit, and the XSA-426 change tickles this bug:

  <instantiation>:4:1: error: invalid symbol redefinition
  .L1_fill_rsb_loop:
  ^
  make[3]: *** [Rules.mk:247: arch/x86/acpi/cpu_idle.o] Error 1

Extend DO_OVERWRITE_RSB with an optional parameter so C callers can mix %= in
too, which Clang does seem to expand properly.

Fixes: 63305e5392ec ("x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a2adacff0b91cc7b977abb209dc419a2ef15963f
master date: 2023-02-24 17:44:29 +0000

2 years agox86: perform mem_sharing teardown before paging teardown
Tamas K Lengyel [Fri, 3 Mar 2023 07:14:25 +0000 (08:14 +0100)]
x86: perform mem_sharing teardown before paging teardown

An assert failure has been observed in p2m_teardown when performing vm
forking and then destroying the forked VM (p2m-basic.c:173). The assert
checks whether the domain's shared pages counter is 0. According to the
patch that originally added the assert (7bedbbb5c31) the p2m_teardown
should only happen after mem_sharing already relinquished all shared pages.

In this patch we flip the order in which relinquish ops are called to avoid
tripping the assert. Conceptually sharing being torn down makes sense to
happen before paging is torn down.

Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 2869349f0cb3a89dcbf1f1b30371f58df6309312
master date: 2023-02-23 12:35:48 +0100

2 years agox86/ucode/AMD: apply the patch early on every logical thread
Sergey Dyasli [Fri, 3 Mar 2023 07:14:01 +0000 (08:14 +0100)]
x86/ucode/AMD: apply the patch early on every logical thread

The original issue has been reported on AMD Bulldozer-based CPUs where
ucode loading loses the LWP feature bit in order to gain the IBPB bit.
LWP disabling is per-SMT/CMT core modification and needs to happen on
each sibling thread despite the shared microcode engine. Otherwise,
logical CPUs will end up with different cpuid capabilities.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
Guests running under Xen happen to be not affected because of levelling
logic for the feature masking/override MSRs which causes the LWP bit to
fall out and hides the issue. The latest recommendation from AMD, after
discussing this bug, is to load ucode on every logical CPU.

In Linux kernel this issue has been addressed by e7ad18d1169c
("x86/microcode/AMD: Apply the patch early on every logical thread").
Follow the same approach in Xen.

Introduce SAME_UCODE match result and use it for early AMD ucode
loading. Take this opportunity and move opt_ucode_allow_same out of
compare_revisions() to the relevant callers and also modify the warning
message based on it. Intel's side of things is modified for consistency
but provides no functional change.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f4ef8a41b80831db2136bdaff9f946a1a4b051e7
master date: 2023-02-21 15:08:05 +0100

2 years agocredit2: respect credit2_runqueue=all when arranging runqueues
Marek Marczykowski-Górecki [Fri, 3 Mar 2023 07:13:20 +0000 (08:13 +0100)]
credit2: respect credit2_runqueue=all when arranging runqueues

Documentation for credit2_runqueue=all says it should create one queue
for all pCPUs on the host. But since introduction
sched_credit2_max_cpus_runqueue, it actually created separate runqueue
per socket, even if the CPUs count is below
sched_credit2_max_cpus_runqueue.

Adjust the condition to skip syblink check in case of
credit2_runqueue=all.

Fixes: 8e2aa76dc167 ("xen: credit2: limit the max number of CPUs in a runqueue")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1f5747ee929fbbcae58d7234c6c38a77495d0cfe
master date: 2023-02-15 16:12:42 +0100

2 years agox86/shskt: Disable CET-SS on parts susceptible to fractured updates
Andrew Cooper [Fri, 3 Mar 2023 07:12:24 +0000 (08:12 +0100)]
x86/shskt: Disable CET-SS on parts susceptible to fractured updates

Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack
Token".

Architecturally, an event delivery which starts in CPL<3 and switches shadow
stack will first validate the Supervisor Shadow Stack Token (setting the busy
bit), then pushes CS/LIP/SSP.  One example of this is an NMI interrupting Xen.

Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc
between setting the busy bit and completing the event injection renders the
action non-restartable, because when it comes time to restart, the busy bit is
found to be already set.

This is far more easily encountered under virt, yet it is not the fault of the
hypervisor, nor the fault of the guest kernel.  The fault lies somewhere
between the architectural specification, and the uarch behaviour.

Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor
shadow stacks are safe to use.  Because of how Xen lays out its shadow stacks,
fracturing is not expected to be a problem on native.

Detect this case on boot and default to not using shstk if virtualised.
Specifying `cet=shstk` on the command line will override this heuristic and
enable shadow stacks irrespective.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 01e7477d1b081cff4288ff9f51ec59ee94c03ee0
master date: 2023-02-09 18:26:17 +0000

2 years agox86/cpuid: Infrastructure for leaves 7:1{ecx,edx}
Andrew Cooper [Fri, 3 Mar 2023 07:06:44 +0000 (08:06 +0100)]
x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}

We don't actually need ecx yet, but adding it in now will reduce the amount to
which leaf 7 is out of order in a featureset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b4a23bf6293aadecfd03bf9e83974443e2eac9cb
master date: 2023-02-09 18:26:17 +0000

2 years agolibs/util: Fix parallel build between flex/bison and CC rules
Anthony PERARD [Fri, 3 Mar 2023 07:06:23 +0000 (08:06 +0100)]
libs/util: Fix parallel build between flex/bison and CC rules

flex/bison generate two targets, and when those targets are
prerequisite of other rules they are considered independently by make.

We can have a situation where the .c file is out-of-date but not the
.h, git checkout for example. In this case, if a rule only have the .h
file as prerequiste, make will procced and start to build the object.
In parallel, another target can have the .c file as prerequisite and
make will find out it need re-generating and do so, changing the .h at
the same time. This parallel task breaks the first one.

To avoid this scenario, we put both the header and the source as
prerequisite for all object even if they only need the header.

Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bf652a50fb3bb3b1b3d93db6fb79bc28f978fe75
master date: 2023-02-09 18:26:17 +0000

2 years agoautomation: Remove clang-8 from Debian unstable container
Anthony PERARD [Tue, 21 Feb 2023 16:55:38 +0000 (16:55 +0000)]
automation: Remove clang-8 from Debian unstable container

First, apt complain that it isn't the right way to add keys anymore,
but hopefully that's just a warning.

Second, we can't install clang-8:
The following packages have unmet dependencies:
 clang-8 : Depends: libstdc++-8-dev but it is not installable
           Depends: libgcc-8-dev but it is not installable
           Depends: libobjc-8-dev but it is not installable
           Recommends: llvm-8-dev but it is not going to be installed
           Recommends: libomp-8-dev but it is not going to be installed
 libllvm8 : Depends: libffi7 (>= 3.3~20180313) but it is not installable
E: Unable to correct problems, you have held broken packages.

clang on Debian unstable is now version 14.0.6.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a6b1e2b80fe2053b1c9c9843fb086a668513ea36)

2 years agox86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions
Andrew Cooper [Thu, 8 Sep 2022 20:27:58 +0000 (21:27 +0100)]
x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions

This is XSA-426 / CVE-2022-27672

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 63305e5392ec2d17b85e7996a97462744425db80)

2 years agotools/ocaml/libs: Fix memory/resource leaks with caml_alloc_custom()
Andrew Cooper [Wed, 1 Feb 2023 11:27:42 +0000 (11:27 +0000)]
tools/ocaml/libs: Fix memory/resource leaks with caml_alloc_custom()

All caml_alloc_*() functions can throw exceptions, and longjump out of
context.  If this happens, we leak the xch/xce handle.

Reorder the logic to allocate the the Ocaml object first.

Fixes: 8b3c06a3e545 ("tools/ocaml/xenctrl: OCaml 5 support, fix use-after-free")
Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d69ccf52ad467ccc22029172a8e61dc621187889)

2 years agotools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released
Andrew Cooper [Tue, 31 Jan 2023 17:19:30 +0000 (17:19 +0000)]
tools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released

The intf->{addr,len} references in the xc_map_foreign_range() call are unsafe.
From the manual:

  https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

More than what the manual says, the intf pointer is (potentially) invalidated
by caml_enter_blocking_section() if another thread happens to perform garbage
collection at just the right (wrong) moment.

Rewrite the logic.  There's no need to stash data in the Ocaml object until
the success path at the very end.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9e7c74e6f9fd2e44df1212643b80af9032b45b07)

2 years agotools/ocaml/xc: Fix binding for xc_domain_assign_device()
Edwin Török [Thu, 12 Jan 2023 11:38:38 +0000 (11:38 +0000)]
tools/ocaml/xc: Fix binding for xc_domain_assign_device()

The patch adding this binding was plain broken, and unreviewed.  It modified
the C stub to add a 4th parameter without an equivalent adjustment in the
Ocaml side of the bindings.

In 64bit builds, this causes us to dereference whatever dead value is in %rcx
when trying to interpret the rflags parameter.

This has gone unnoticed because Xapi doesn't use this binding (it has its
own), but unbreak the binding by passing RDM_RELAXED unconditionally for
now (matching the libxl default behaviour).

Fixes: 9b34056cb4 ("tools: extend xc_assign_device() to support rdm reservation policy")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 4250683842104f02996428f93927a035c8e19266)

2 years agotools/ocaml/evtchn: Don't reference Custom objects with the GC lock released
Edwin Török [Thu, 12 Jan 2023 17:48:29 +0000 (17:48 +0000)]
tools/ocaml/evtchn: Don't reference Custom objects with the GC lock released

The modification to the _H() macro for Ocaml 5 support introduced a subtle
bug.  From the manual:

  https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

Previously, the value was a naked C pointer, so dereferencing it wasn't
"accessing any Ocaml data", but the fix to avoid naked C pointers added a
layer of indirection through an Ocaml Custom object, meaning that the common
pattern of using _H() in a blocking section is unsafe.

In order to fix:

 * Drop the _H() macro and replace it with a static inline xce_of_val().
 * Opencode the assignment into Data_custom_val() in the two constructors.
 * Rename "value xce" parameters to "value xce_val" so we can consistently
   have "xenevtchn_handle *xce" on the stack, and obtain the pointer with the
   GC lock still held.

Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 2636d8ff7a670c4d2485757dbe966e36c259a960)

2 years agotools/ocaml/libs: Allocate the correct amount of memory for Abstract_tag
Andrew Cooper [Tue, 31 Jan 2023 10:59:42 +0000 (10:59 +0000)]
tools/ocaml/libs: Allocate the correct amount of memory for Abstract_tag

caml_alloc() takes units of Wsize (word size), not bytes.  As a consequence,
we're allocating 4 or 8 times too much memory.

Ocaml has a helper, Wsize_bsize(), but it truncates cases which aren't an
exact multiple.  Use a BUILD_BUG_ON() to cover the potential for truncation,
as there's no rounding-up form of the helper.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Fixes: d3e649277a13 ("ocaml: add mmap bindings implementation.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 36eb2de31b6ecb8787698fb1a701bd708c8971b2)

2 years agotools/ocaml/libs: Don't declare stubs as taking void
Edwin Török [Thu, 12 Jan 2023 11:28:29 +0000 (11:28 +0000)]
tools/ocaml/libs: Don't declare stubs as taking void

There is no such thing as an Ocaml function (C stub or otherwise) taking no
parameters.  In the absence of any other parameters, unit is still passed.

This doesn't explode with any ABI we care about, but would malfunction for an
ABI environment such as stdcall.

Fixes: c3afd398ba7f ("ocaml: Add XS bindings.")
Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ff8b560be80b9211c303d74df7e4b3921d2bb8ca)

2 years agotools/oxenstored: validate config file before live update
Edwin Török [Tue, 11 May 2021 15:56:50 +0000 (15:56 +0000)]
tools/oxenstored: validate config file before live update

The configuration file can contain typos or various errors that could prevent
live update from succeeding (e.g. a flag only valid on a different version).
Unknown entries in the config file would be ignored on startup normally,
add a strict --config-test that live-update can use to check that the config file
is valid *for the new binary*.

For compatibility with running old code during live update recognize
--live --help as an equivalent to --config-test.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit e6f07052ce4a0f0b7d4dc522d87465efb2d9ee86)

2 years agotools/ocaml/xb: Drop Xs_ring.write
Edwin Török [Fri, 16 Dec 2022 18:25:20 +0000 (18:25 +0000)]
tools/ocaml/xb: Drop Xs_ring.write

This function is unusued (only Xs_ring.write_substring is used), and the
bytes/string conversion here is backwards: the C stub implements the bytes
version and then we use a Bytes.unsafe_of_string to convert a string into
bytes.

However the operation here really is read-only: we read from the string and
write it to the ring, so the C stub should implement the read-only string
version, and if needed we could use Bytes.unsafe_to_string to be able to send
'bytes'. However that is not necessary as the 'bytes' version is dropped above.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 01f139215e678c2dc7d4bb3f9f2777069bb1b091)

2 years agotools/ocaml/xb,mmap: Use Data_abstract_val wrapper
Edwin Török [Fri, 16 Dec 2022 18:25:10 +0000 (18:25 +0000)]
tools/ocaml/xb,mmap: Use Data_abstract_val wrapper

This is not strictly necessary since it is essentially a no-op currently: a
cast to void * and value *, even in OCaml 5.0.

However it does make it clearer that what we have here is not a regular OCaml
value, but one allocated with Abstract_tag or Custom_tag, and follows the
example from the manual more closely:
https://v2.ocaml.org/manual/intfc.html#ss:c-outside-head

It also makes it clearer that these modules have been reviewed for
compat with OCaml 5.0.

We cannot use OCaml finalizers here, because we want exact control over when
to unmap these pages from remote domains.

No functional change.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d2ccc637111d6dbcf808aaffeec7a46f0b1e1c81)

2 years agotools/ocaml/xenctrl: Use larger chunksize in domain_getinfolist
Edwin Török [Tue, 1 Nov 2022 17:59:17 +0000 (17:59 +0000)]
tools/ocaml/xenctrl: Use larger chunksize in domain_getinfolist

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall.  Nevertheless, getting domain info in
blocks of 1024 is far more efficient than blocks of 2.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the previous change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 95db09b1b154fb72fad861815ceae1f3fa49fc4e)

2 years agotools/ocaml/xenctrl: Make domain_getinfolist tail recursive
Edwin Török [Tue, 1 Nov 2022 17:59:16 +0000 (17:59 +0000)]
tools/ocaml/xenctrl: Make domain_getinfolist tail recursive

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall.  xenopsd was further observed to be
wasting excessive quantites of time manipulating the list of already-obtained
domains.

Implement a tail recursive `rev_concat` equivalent to `concat |> rev`, and use
it instead of calling `@` multiple times.

An incidental benefit is that the list of domains will now be in domid order,
instead of having pairs of 2 domains changing direction every time.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the subsequent change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit c3b6be714c64aa62b56d0bce96f4b6a10b5c2078)

2 years agolibxl: fix guest kexec - skip cpuid policy
Jason Andryuk [Tue, 7 Feb 2023 16:06:47 +0000 (17:06 +0100)]
libxl: fix guest kexec - skip cpuid policy

When a domain performs a kexec (soft reset), libxl__build_pre() is
called with the existing domid.  Calling libxl__cpuid_legacy() on the
existing domain fails since the cpuid policy has already been set, and
the guest isn't rebuilt and doesn't kexec.

xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM

During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
issue.  Before commit 34990446ca91, the libxl__cpuid_legacy() failure
would have been ignored, so kexec would continue.

Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 1e454c2b5b1172e0fc7457e411ebaba61db8fc87
master date: 2023-01-26 10:58:23 +0100

2 years agons16550: fix an incorrect assignment to uart->io_size
Ayan Kumar Halder [Tue, 7 Feb 2023 16:05:56 +0000 (17:05 +0100)]
ns16550: fix an incorrect assignment to uart->io_size

uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
is assigned to it, it should be converted to size in bytes.

Fixes: 17b516196c ("ns16550: add ACPI support for ARM only")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 352c89f72ddb67b8d9d4e492203f8c77f85c8df1
master date: 2023-01-24 16:54:38 +0100

2 years agox86/shadow: fix PAE check for top-level table unshadowing
Jan Beulich [Tue, 7 Feb 2023 16:05:22 +0000 (17:05 +0100)]
x86/shadow: fix PAE check for top-level table unshadowing

Clearly within the for_each_vcpu() the vCPU of this loop is meant, not
the (loop invariant) one the fault occurred on.

Fixes: 3d5e6a3ff383 ("x86 hvm: implement HVMOP_pagetable_dying")
Fixes: ef3b0d8d2c39 ("x86/shadow: shadow_table[] needs only one entry for PV-only configs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f8fdceefbb1193ec81667eb40b83bc525cb71204
master date: 2023-01-20 09:23:42 +0100

2 years agox86/vmx: Support for CPUs without model-specific LBR
Andrew Cooper [Tue, 7 Feb 2023 16:04:49 +0000 (17:04 +0100)]
x86/vmx: Support for CPUs without model-specific LBR

Ice Lake (server at least) has both architectural LBR and model-specific LBR.
Sapphire Rapids does not have model-specific LBR at all.  I.e. On SPR and
later, model_specific_lbr will always be NULL, so we must make changes to
avoid reliably hitting the domain_crash().

The Arch LBR spec states that CPUs without model-specific LBR implement
MSR_DBG_CTL.LBR by discarding writes and always returning 0.

Do this for any CPU for which we lack model-specific LBR information.

Adjust the now-stale comment, now that the Arch LBR spec has created a way to
signal "no model specific LBR" to guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 3edca52ce736297d7fcf293860cd94ef62638052
master date: 2023-01-12 18:42:00 +0000

2 years agox86/vmx: Calculate model-specific LBRs once at start of day
Andrew Cooper [Tue, 7 Feb 2023 16:04:18 +0000 (17:04 +0100)]
x86/vmx: Calculate model-specific LBRs once at start of day

There is no point repeating this calculation at runtime, especially as it is
in the fallback path of the WRSMR/RDMSR handlers.

Move the infrastructure higher in vmx.c to avoid forward declarations,
renaming last_branch_msr_get() to get_model_specific_lbr() to highlight that
these are model-specific only.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: e94af0d58f86c3a914b9cbbf4d9ed3d43b974771
master date: 2023-01-12 18:42:00 +0000

2 years agotools: Fix build with recent QEMU, use "--enable-trace-backends"
Anthony PERARD [Tue, 7 Feb 2023 16:03:51 +0000 (17:03 +0100)]
tools: Fix build with recent QEMU, use "--enable-trace-backends"

The configure option "--enable-trace-backend" isn't accepted anymore
and we should use "--enable-trace-backends" instead which was
introduce in 2014 and allow multiple backends.

"--enable-trace-backends" was introduced by:
    5b808275f3bb ("trace: Multi-backend tracing")
The backward compatible option "--enable-trace-backend" is removed by
    10229ec3b0ff ("configure: remove backwards-compatibility and obsolete options")

As we already use ./configure options that wouldn't be accepted by
older version of QEMU's configure, we will simply use the new spelling
for the option and avoid trying to detect which spelling to use.

We already make use if "--firmwarepath=" which was introduced by
    3d5eecab4a5a ("Add --firmwarepath to configure")
which already include the new spelling for "--enable-trace-backends".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
master commit: e66d450b6e0ffec635639df993ab43ce28b3383f
master date: 2023-01-11 10:45:29 +0100

2 years agox86/S3: Restore Xen's MSR_PAT value on S3 resume
Andrew Cooper [Tue, 7 Feb 2023 16:03:09 +0000 (17:03 +0100)]
x86/S3: Restore Xen's MSR_PAT value on S3 resume

There are two paths in the trampoline, and Xen's PAT needs setting up in both,
not just the boot path.

Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4d975798e11579fdf405b348543061129e01b0fb
master date: 2023-01-10 21:21:30 +0000

2 years agox86/time: prevent overflow with high frequency TSCs
Neowutran [Tue, 20 Dec 2022 12:51:42 +0000 (13:51 +0100)]
x86/time: prevent overflow with high frequency TSCs

Make sure tsc_khz is promoted to a 64-bit type before multiplying by
1000 to avoid an 'overflow before widen' bug. Otherwise just above
4.294GHz the value will overflow. Processors with clocks this high are
now in production and require this to work correctly.

Signed-off-by: Neowutran <xen@neowutran.ovh>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ad15a0a8ca2515d8ac58edfc0bc1d3719219cb77
master date: 2022-12-19 11:34:16 +0100

2 years agoioreq_broadcast(): accept partial broadcast success
Per Bilse [Tue, 20 Dec 2022 12:50:47 +0000 (13:50 +0100)]
ioreq_broadcast(): accept partial broadcast success

Avoid incorrectly triggering an error when a broadcast buffered ioreq
is not handled by all registered clients, as long as the failure is
strictly because the client doesn't handle buffered ioreqs.

Signed-off-by: Per Bilse <per.bilse@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
master commit: a44734df6c24fadbdb001f051cc5580c467caf7d
master date: 2022-12-07 12:17:30 +0100

2 years agoupdate Xen version to 4.16.4-pre
Jan Beulich [Tue, 20 Dec 2022 12:50:16 +0000 (13:50 +0100)]
update Xen version to 4.16.4-pre

2 years agoupdate Xen version to 4.16.3 RELEASE-4.16.3
Jan Beulich [Mon, 19 Dec 2022 08:08:32 +0000 (09:08 +0100)]
update Xen version to 4.16.3

2 years agotools/oxenstored: Render backtraces more nicely in Syslog
Andrew Cooper [Thu, 1 Dec 2022 21:06:25 +0000 (21:06 +0000)]
tools/oxenstored: Render backtraces more nicely in Syslog

fallback_exception_handler feeds a string with embedded newlines directly into
syslog().  While this is an improvement on getting nothing, syslogd escapes
all control characters it gets, and emits one (long) log line.

Fix the problem generally in the syslog stub.  As we already have a local copy
of the string, split it in place and emit one syslog() call per line.

Also tweak Logging.msg_of to avoid putting an extra newline on a string which
already ends with one.

Fixes: ee7815f49faf ("tools/oxenstored: Set uncaught exception handler")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d2162d884cba0ff7b2ac0d832f4e044444bda2e1)

2 years agotools/oxenstored/syslog: Avoid potential NULL dereference
Edwin Török [Tue, 8 Nov 2022 14:24:19 +0000 (14:24 +0000)]
tools/oxenstored/syslog: Avoid potential NULL dereference

strdup() may return NULL.  Check for this before passing to syslog().

Drop const from c_msg.  It is bogus, as demonstrated by the need to cast to
void * in order to free the memory.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit acd3fb6d65905f8a185dcb9fe6a330a591b96203)

2 years agotools/oxenstored: Set uncaught exception handler
Edwin Török [Mon, 7 Nov 2022 17:41:36 +0000 (17:41 +0000)]
tools/oxenstored: Set uncaught exception handler

Unhandled exceptions go to stderr by default, but this doesn't typically work
for oxenstored because:
 * daemonize reopens stderr as /dev/null
 * systemd redirects stderr to /dev/null too

Debugging an unhandled exception requires reproducing the issue locally when
using --no-fork, and is not conducive to figuring out what went wrong on a
remote system.

Install a custom handler which also tries to render the backtrace to the
configured syslog facility, and DAEMON|ERR otherwise.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ee7815f49faf743e960dac9e72809eb66393bc6d)

2 years agotools/oxenstored: Log live update issues at warning level
Edwin Török [Tue, 8 Nov 2022 08:57:47 +0000 (08:57 +0000)]
tools/oxenstored: Log live update issues at warning level

During live update, oxenstored tries a best effort approach to recover as many
domains and information as possible even if it encounters errors restoring
some domains.

However, logging about misunderstood input is more severe than simply info.
Log it at warning instead.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 3f02e0a70fe9f8143454b742563433958d4a87f8)

2 years agotools/oxenstored: Keep /dev/xen/evtchn open across live update
Edwin Török [Thu, 3 Nov 2022 15:31:39 +0000 (15:31 +0000)]
tools/oxenstored: Keep /dev/xen/evtchn open across live update

Closing the evtchn handle will unbind and free all local ports.  The new
xenstored would need to rebind all evtchns, which is work that we don't want
or need to be doing during the critical handover period.

However, it turns out that the Windows PV drivers also rebind their local port
too across suspend/resume, leaving (o)xenstored with a stale idea of the
remote port to use.  In this case, reusing the established connection is the
only robust option.

Therefore:
 * Have oxenstored open /dev/xen/evtchn without CLOEXEC at start of day.
 * Extend the handover information with the evtchn fd, domexc virq local port,
   and the local port number for each domain connection.
 * Have (the new) oxenstored recover the open handle using Xeneventchn.fdopen,
   and use the provided local ports rather than trying to rebind them.

When this new information isn't present (i.e. live updating from an oxenstored
prior to this change), the best-effort status quo will have to do.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9b224c25293a53fcbe32da68052d861dda71a6f4)

2 years agotools/oxenstored: Rework Domain evtchn handling to use port_pair
Andrew Cooper [Wed, 30 Nov 2022 11:59:34 +0000 (11:59 +0000)]
tools/oxenstored: Rework Domain evtchn handling to use port_pair

Inter-domain event channels are always a pair of local and remote ports.
Right now the handling is asymmetric, caused by the fact that the evtchn is
bound after the associated Domain object is constructed.

First, move binding of the event channel into the Domain.make() constructor.
This means the local port no longer needs to be an option.  It also removes
the final callers of Domain.bind_interdomain.

Next, introduce a new port_pair type to encapsulate the fact that these two
should be updated together, and replace the previous port and remote_port
fields.  This refactoring also changes the Domain.get_port interface (removing
an option) so take the opportunity to name it get_local_port instead.

Also, this fixes a use-after-free risk with Domain.close.  Once the evtchn has
been unbound, the same local port number can be reused for a different
purpose, so explicitly invalidate the ports to prevent their accidental misuse
in the future.

This also cleans up some of the debugging, to always print a port pair.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit df2db174b36eba67c218763ef621c67912202fc6)

2 years agotools/oxenstored: Implement Domain.rebind_evtchn
Andrew Cooper [Wed, 30 Nov 2022 11:55:58 +0000 (11:55 +0000)]
tools/oxenstored: Implement Domain.rebind_evtchn

Generally speaking, the event channel local/remote port is fixed for the
lifetime of the associated domain object.  The exception to this is a
secondary XS_INTRODUCE (defined to re-bind to a new event channel) which pokes
around at the domain object's internal state.

We need to refactor the evtchn handling to support live update, so start by
moving the relevant manipulation into Domain.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit aecdc28d9538ca2a1028ef9bc6550cb171dbbed4)

2 years agotools/oxenstored: Rename some 'port' variables to 'remote_port'
Andrew Cooper [Wed, 30 Nov 2022 03:17:28 +0000 (03:17 +0000)]
tools/oxenstored: Rename some 'port' variables to 'remote_port'

This will make the logic clearer when we plumb local_port through these
functions.

While doing this, rearrange the construct in Domains.create0 to separate the
remote port handling from the interface handling.  (The interface logic is
dubious in several ways, but not altered by this cleanup.)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 31fbee749a75621039ca601eaee7222050a7dd83)

2 years agotools/oxenstored: Bind the DOM_EXC VIRQ in in Event.init()
Andrew Cooper [Tue, 29 Nov 2022 21:05:43 +0000 (21:05 +0000)]
tools/oxenstored: Bind the DOM_EXC VIRQ in in Event.init()

Xenstored always needs to bind the DOM_EXC VIRQ.

Instead of doing it shortly after the call to Event.init(), do it in the
constructor directly.  This removes the need for the field to be a mutable
option.

It will also simplify a future change to support live update.  Rename the
field from virq_port (which could be any VIRQ) to it's proper name.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9804a5db435fe40c8ded8cf36c2d2b2281c56f1d)

2 years agotools/oxenstored: Style fixes to Domain
Andrew Cooper [Wed, 30 Nov 2022 14:56:43 +0000 (14:56 +0000)]
tools/oxenstored: Style fixes to Domain

This file has some style problems so severe that they interfere with the
readability of the subsequent bugfix patches.

Fix these issues ahead of time, to make the subsequent changes more readable.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit b45bfaf359e4821b1bf98a4fcd194d7fd176f167)

2 years agotools/ocaml/evtchn: Extend the init() binding with a cloexec flag
Edwin Török [Thu, 3 Nov 2022 14:50:38 +0000 (14:50 +0000)]
tools/ocaml/evtchn: Extend the init() binding with a cloexec flag

For live update, oxenstored wants to clear CLOEXEC on the evtchn handle, so it
survives the execve() into the new oxenstored.

Have the new interface match how cloexec works in other Ocaml standard
libraries.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9bafe4a53306e7aa2ce6ffc96f7477c6f329f7a7)

2 years agotools/ocaml/evtchn: Add binding for xenevtchn_fdopen()
Edwin Török [Mon, 14 Nov 2022 13:36:19 +0000 (13:36 +0000)]
tools/ocaml/evtchn: Add binding for xenevtchn_fdopen()

For live update, the new oxenstored needs to reconstruct an evtchn object
around an existing file descriptor.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 7ba68a6c558e1fd811c95cb7215a5cd07a3cc2ea)

2 years agotools/ocaml/evtchn: OCaml 5 support, fix potential resource leak
Edwin Török [Tue, 18 Jan 2022 15:04:48 +0000 (15:04 +0000)]
tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak

There is no binding for xenevtchn_close().  In principle, this is a resource
leak, but the typical usage is as a singleton that lives for the lifetime of
the program.

Ocaml 5 no longer permits storing a naked C pointer in an Ocaml value.

Therefore, use a Custom block.  This allows us to use the finaliser callback
to call xenevtchn_close(), if the Ocaml object goes out of scope.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 22d5affdf0cecfa6faae46fbaec68b8018835220)

2 years agotools/oxenstored: Fix incorrect scope after an if statement
Andrew Cooper [Fri, 11 Nov 2022 18:50:34 +0000 (18:50 +0000)]
tools/oxenstored: Fix incorrect scope after an if statement

A debug statement got inserted into a single-expression if statement.

Insert brackets to give the intended meaning, rather than the actual meaning
where the "let con = Connections..." is outside and executed unconditionally.

This results in some unnecessary ring checks for domains which otherwise have
IO credit.

Fixes: 42f0581a91d4 ("tools/oxenstored: Implement live update for socket connections")
Reported-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ee36179371fd4215a43fb179be2165f65c1cd1cd)

2 years agotools/ocaml/xenstored/store.ml: fix build error
Edwin Török [Wed, 9 Nov 2022 09:48:33 +0000 (10:48 +0100)]
tools/ocaml/xenstored/store.ml: fix build error

Building with Dune in release mode fails with:
```
File "ocaml/xenstored/store.ml", line 464, characters 13-32:
Warning 18: this type-based record disambiguation is not principal.
File "ocaml/xenstored/store.ml", line 1:
Error: Some fatal warnings were triggered (1 occurrences)
```

This is a warning to help keep the code futureproof, quoting from its
documentation:
> Check information path during type-checking, to make sure that all types are
> derived in a principal way. When using labelled arguments and/or polymorphic
> methods, this flag is required to ensure future versions of the compiler will
> be able to infer types correctly, even if internal algorithms change. All
> programs accepted in -principal mode are also accepted in the default mode with
> equivalent types, but different binary signatures, and this may slow down type
> checking; yet it is a good idea to use it once before publishing source code.

Fixes: db471408edd46 "tools/ocaml/xenstored: Fix quota bypass on domain shutdown"
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 124492eff8e4acdaaed939fa9406b108c55fec73)

2 years agotools/ocaml/xenstored: fix live update exception
Edwin Török [Fri, 21 Oct 2022 07:59:25 +0000 (08:59 +0100)]
tools/ocaml/xenstored: fix live update exception

During live update we will load the /tool/xenstored path from the previous binary,
and then try to mkdir /tool again which will fail with EEXIST.
Check for existence of the path before creating it.

The write call to /tool/xenstored should not need any changes
(and we do want to overwrite any previous path, in case it changed).

Prior to 7110192b1df6 live update would work only if the binary path was
specified, and with 7110192b1df6 and this live update also works when
no binary path is specified in `xenstore-control live-update`.

Fixes: 7110192b1df6 ("tools/oxenstored: Fix Oxenstored Live Update")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit f838b956779ff8a0b94636462f3c6d95c3adeb73)

2 years agotools/oxenstored: Fix Oxenstored Live Update
Andrew Cooper [Wed, 19 Oct 2022 17:12:33 +0000 (18:12 +0100)]
tools/oxenstored: Fix Oxenstored Live Update

tl;dr This hunk was part of the patch emailed to xen-devel, but was missing
from what ultimately got committed.

https://lore.kernel.org/xen-devel/4164cb728313c3b9fc38cf5e9ecb790ac93a9600.1610748224.git.edvin.torok@citrix.com/
is the patch in question, but was part of a series that had threading issues.
I have a vague recollection that I sourced the commits from a local branch,
which clearly wasn't as up-to-date as I had thought.

Either way, it's my fault/mistake, and this hunk should have been part of what
got comitted.

Fixes: 00c48f57ab36 ("tools/oxenstored: Start live update process")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit 7110192b1df697be84a50f741651d4c3cb129504)

2 years agox86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled
Jan Beulich [Thu, 8 Dec 2022 09:12:41 +0000 (10:12 +0100)]
x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled

Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has
exposed a problem with the marking of the respective vector as
pending: For quite some time Linux has been checking whether any stale
ISR or IRR bits would still be set while preparing the LAPIC for use.
This check is now triggering on the upcall vector, as the registration,
at least for APs, happens before the LAPIC is actually enabled.

In software-disabled state an LAPIC would not accept any interrupt
requests and hence no IRR bit would newly become set while in this
state. As a result it is also wrong for us to mark the upcall vector as
having a pending request when the vLAPIC is in this state.

To compensate for the "enabled" check added to the assertion logic, add
logic to (conditionally) mark the upcall vector as having a request
pending at the time the LAPIC is being software-enabled by the guest.
Note however that, like for the pt_may_unmask_irq() we already have
there, long term we may need to find a different solution. This will be
especially relevant in case yet better LAPIC acceleration would
eliminate notifications of guest writes to this and other registers.

Fixes: 7b5b8ca7dffd ("x86/upcall: inject a spurious event after setting upcall vector")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: f5d0279839b58cb622f0995dbf9cff056f03082e
master date: 2022-12-06 13:51:49 +0100