xen: sched: reassign vCPUs to pCPUs, when they come back online
When a vcpu that was offline, comes back online, we do want it to either
be assigned to a pCPU, or go into the wait list.
Detecting that a vcpu is coming back online is a bit tricky. Basically,
if the vcpu is waking up, and is neither assigned to a pCPU, nor in the
wait list, it must be coming back from offline.
When this happens, we put it in the waitqueue, and we "tickle" an idle
pCPU (if any), to go pick it up.
Looking at the patch, it seems that the vcpu wakeup code is getting
complex, and hence that it could potentially introduce latencies.
However, all this new logic is triggered only by the case of a vcpu
coming online, so, basically, the overhead during normal operations is
just an additional 'if()'.
Signed-off-by: Dario Faggioli <dario.faggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
--- Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Roger Pau Monne <roger.pau@citrix.com>
xen: sched: deal with vCPUs being or becoming online or offline
If a vCPU is, or is going, offline we want it to be neither
assigned to a pCPU, nor in the wait list, so:
- if an offline vcpu is inserted (or migrated) it must not
go on a pCPU, nor in the wait list;
- if an offline vcpu is removed, we are sure that it is
neither on a pCPU nor in the wait list already, so we
should just bail, avoiding doing any further action;
- if a vCPU goes offline we need to remove it either from
its pCPU or from the wait list.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Roger Pau Monne <roger.pau@citrix.com>
---
Changes from v1:
* improved wording in changelog and comments
* this patch is the result of the merge of patches 2 and 3 from v1
Gang Wei Intel email address has been bouncing for some time now, and
the other maintainer is non-responsive to patches [0], so remove
maintainers and declare INTEL(R) TRUSTED EXECUTION TECHNOLOGY (TXT)
orphaned.
Andrew Cooper [Wed, 24 Jul 2019 14:05:16 +0000 (15:05 +0100)]
x86/dmi: Drop trivial callback functions
dmi_check_system() returns the number of matches. This being nonzero is more
efficient than making a function pointer call to a trivial function to modify
a variable.
No functional change, but this results in less compiled code, which is
also (fractionally) quicker to run.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/iommu: avoid mapping the interrupt address range for hwdom
Current code only prevent mapping the lapic page into the guest
physical memory map. Expand the range to be 0xFEEx_xxxx as described
in the Intel VTd specification section 3.13 "Handling Requests to
Interrupt Address Range".
AMD also lists this address range in the AMD SR5690 Databook, section
2.4.4 "MSI Interrupt Handling and MSI to HT Interrupt Conversion".
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Thu, 25 Jul 2019 10:16:21 +0000 (12:16 +0200)]
iommu / x86: move call to scan_pci_devices() out of vendor code
It's not vendor specific so it doesn't really belong there.
Scanning the PCI topology also really doesn't have much to do with IOMMU
initialization. It doesn't depend on there even being an IOMMU. This patch
moves to the call to the beginning of iommu_hardware_setup() but only
places it there because the topology information would be otherwise unused.
Subsequent patches will actually make use of the PCI topology during
(x86) IOMMU initialization.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 25 Jul 2019 10:14:52 +0000 (12:14 +0200)]
x86/IOMMU: don't restrict IRQ affinities to online CPUs
In line with "x86/IRQ: desc->affinity should strictly represent the
requested value" the internally used IRQ(s) also shouldn't be restricted
to online ones. Make set_desc_affinity() (set_msi_affinity() then does
by implication) cope with a NULL mask being passed (just like
assign_irq_vector() does), and have IOMMU code pass NULL instead of
&cpu_online_map (when, for VT-d, there's no NUMA node information
available).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Brian Woods <brian.woods@amd.com>
pv_raise_interrupt() is only called for NMIs these days, so the MCE
specific part can be removed. Rename pv_raise_interrupt() to
pv_raise_nmi() and NMI_MCE_SOFTIRQ to NMI_SOFTIRQ.
Additionally there is no need to pin the vcpu which the NMI is delivered
to; that is a leftover of (already removed) MCE handling. So remove the
pinning, too. Note that pinning was introduced by commit 355b0469a8
adding MCE support (with NMI support existing already). MCE using that
pinning was removed with commit 3a91769d6e again without cleaning up the
code.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-and-tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 19 Oct 2017 10:50:18 +0000 (11:50 +0100)]
passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
DMA-ing to the stack is considered bad practice. In this case, if a
timeout occurs because of a sluggish device which is processing the
request, the completion notification will corrupt the stack of a
subsequent deeper call tree.
Place the poll_slot in a percpu area and DMA to that instead.
Fix the declaration of saddr in struct qinval_entry, to avoid a shift by
two. The requirement here is that the DMA address is dword aligned,
which is covered by poll_slot's type.
This change does not address other issues. Correlating completions
after a timeout with their request is a more complicated change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <JBeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.
While there convert {IGD/IOH}_DEV to be a pci_sbdf_t itself instead of
a device number.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Commit 0763cd2687897b55e7 ("xen/sched: don't disable scheduler on cpus
during suspend") removed a lock in restore_vcpu_affinity() which needs
to stay: cpumask_scratch_cpu() must be protected by the scheduler
lock. restore_vcpu_affinity() is being called by thaw_domains(), so
with multiple domains in the system another domain might already be
running and the scheduler might make use of cpumask_scratch_cpu()
already.
Viktor Mitin [Tue, 18 Jun 2019 08:58:51 +0000 (11:58 +0300)]
xen/arm: remove unused dt_device_node parameter
Some of the function generating nodes (e.g make_timer_node)
take in a dt_device_node parameter, but never used it.
It is actually misused when creating DT for DomU.
So it is the best to remove the parameter.
Igor Druzhinin [Fri, 19 Jul 2019 13:07:48 +0000 (14:07 +0100)]
x86/crash: fix kexec transition breakage
Following 6ff560f7f ("x86/SMP: don't try to stop already stopped CPUs")
an incorrect condition was placed into kexec transition path
leaving crashing CPU always online breaking kdump kernel entering.
Correct it by unifying the condition with smp_send_stop().
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 10:05:27 +0000 (12:05 +0200)]
AMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback
Both users will want to know IOMMU properties (specifically the IRTE
size) subsequently. Leverage this to avoid pointless calls to the
callback when IVRS mapping table entries are unpopulated. To avoid
leaking interrupt remapping tables (bogusly) allocated for IOMMUs
themselves, this requires suppressing their allocation in the first
place, taking a step further what commit 757122c0cf ('AMD/IOMMU: don't
"add" IOMMUs') had done.
Additionally suppress the call for alias entries, as again both users
don't care about these anyway. In fact this eliminates a fair bit of
redundancy from dump output.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com>
Jan Beulich [Mon, 22 Jul 2019 10:03:46 +0000 (12:03 +0200)]
AMD/IOMMU: process softirqs while dumping IRTs
When there are sufficiently many devices listed in the ACPI tables (no
matter if they actually exist), output may take way longer than the
watchdog would like.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com>
Jan Beulich [Mon, 22 Jul 2019 09:59:01 +0000 (11:59 +0200)]
AMD/IOMMU: free more memory when cleaning up after error
The interrupt remapping in-use bitmaps were leaked in all cases. The
ring buffers and the mapping of the MMIO space were leaked for any IOMMU
that hadn't been enabled yet.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com>
Jan Beulich [Mon, 22 Jul 2019 09:50:58 +0000 (11:50 +0200)]
x86/vLAPIC: avoid speculative out of bounds accesses
Array indexes used in the MSR read/write emulation functions as well as
the direct VMX / APIC-V hook are derived from guest controlled values.
Restrict their ranges to limit the side effects of speculative
execution.
Along these lines also constrain the vlapic_lvt_mask[] access.
Remove the unused vlapic_lvt_{vector,dm}() instead of adjusting them.
This is part of the speculative hardening effort.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:48:08 +0000 (11:48 +0200)]
x86/IRQ: move {,_}clear_irq_vector()
This is largely to drop a forward declaration. There's one functional
change - clear_irq_vector() gets marked __init, as its only caller is
check_timer(). Beyond this only a few stray blanks get removed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:47:38 +0000 (11:47 +0200)]
x86/IRQ: eliminate some on-stack cpumask_t instances
Use scratch_cpumask where possible, to avoid creating these possibly
large stack objects. We can't use it in _assign_irq_vector() and
set_desc_affinity(), as these get called in IRQ context.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:45:58 +0000 (11:45 +0200)]
x86/IRQ: make fixup_irqs() skip unconnected internally used interrupts
Since the "Cannot set affinity ..." warning is a one time one, avoid
triggering it already at boot time when parking secondary threads and
the serial console uses a (still unconnected at that time) PCI IRQ.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:44:50 +0000 (11:44 +0200)]
x86/IRQ: target online CPUs when binding guest IRQ
fixup_irqs() skips interrupts without action. Hence such interrupts can
retain affinity to just offline CPUs. With "noirqbalance" in effect,
pirq_guest_bind() so far would have left them alone, resulting in a non-
working interrupt.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:44:02 +0000 (11:44 +0200)]
x86/IRQ: fix locking around vector management
All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
fields, and hence ought to be called with the descriptor lock held in
addition to vector_lock. This is currently the case for only
set_desc_affinity() (in the common case) and destroy_irq(), which also
clarifies what the nesting behavior between the locks has to be.
Reflect the new expectation by having these functions all take a
descriptor as parameter instead of an interrupt number.
Also take care of the two special cases of calls to set_desc_affinity():
set_ioapic_affinity_irq() and VT-d's dma_msi_set_affinity() get called
directly as well, and in these cases the descriptor locks hadn't got
acquired till now. For set_ioapic_affinity_irq() this means acquiring /
releasing of the IO-APIC lock can be plain spin_{,un}lock() then.
Drop one of the two leading underscores from all three functions at
the same time.
There's one case left where descriptors get manipulated with just
vector_lock held: setup_vector_irq() assumes its caller to acquire
vector_lock, and hence can't itself acquire the descriptor locks (wrong
lock order). I don't currently see how to address this.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> [VT-d] Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:43:16 +0000 (11:43 +0200)]
x86/IRQ: consolidate use of ->arch.cpu_mask
Mixed meaning was implied so far by different pieces of code -
disagreement was in particular about whether to expect offline CPUs'
bits to possibly be set. Switch to a mostly consistent meaning
(exception being high priority interrupts, which would perhaps better
be switched to the same model as well in due course). Use the field to
record the vector allocation mask, i.e. potentially including bits of
offline (parked) CPUs. This implies that before passing the mask to
certain functions (most notably cpu_mask_to_apicid()) it needs to be
further reduced to the online subset.
The exception of high priority interrupts is also why for the moment
_bind_irq_vector() is left as is, despite looking wrong: It's used
exclusively for IRQ0, which isn't supposed to move off CPU0 at any time.
The prior lack of restricting to online CPUs in set_desc_affinity()
before calling cpu_mask_to_apicid() in particular allowed (in x2APIC
clustered mode) offlined CPUs to end up enabled in an IRQ's destination
field. (I wonder whether vector_allocation_cpumask_flat() shouldn't
follow a similar model, using cpu_present_map in favor of
cpu_online_map.)
For IO-APIC code it was definitely wrong to potentially store, as a
fallback, TARGET_CPUS (i.e. all online ones) into the field, as that
would have caused problems when determining on which CPUs to release
vectors when they've gone out of use. Disable interrupts instead when
no valid target CPU can be established (which code elsewhere should
guarantee to never happen), and log a message in such an unlikely event.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:42:32 +0000 (11:42 +0200)]
x86/IRQ: desc->affinity should strictly represent the requested value
desc->arch.cpu_mask reflects the actual set of target CPUs. Don't ever
fiddle with desc->affinity itself, except to store caller requested
values. Note that assign_irq_vector() now takes a NULL incoming CPU mask
to mean "all CPUs" now, rather than just "all currently online CPUs".
This way no further affinity adjustment is needed after onlining further
CPUs.
This renders both set_native_irq_info() uses (which weren't using proper
locking anyway) redundant - drop the function altogether.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:41:55 +0000 (11:41 +0200)]
x86/IRQ: deal with move cleanup count state in fixup_irqs()
The cleanup IPI may get sent immediately before a CPU gets removed from
the online map. In such a case the IPI would get handled on the CPU
being offlined no earlier than in the interrupts disabled window after
fixup_irqs()' main loop. This is too late, however, because a possible
affinity change may incur the need for vector assignment, which will
fail when the IRQ's move cleanup count is still non-zero.
To fix this
- record the set of CPUs the cleanup IPIs gets actually sent to alongside
setting their count,
- adjust the count in fixup_irqs(), accounting for all CPUs that the
cleanup IPI was sent to, but that are no longer online,
- bail early from the cleanup IPI handler when the CPU is no longer
online, to prevent double accounting.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:41:02 +0000 (11:41 +0200)]
x86/IRQ: deal with move-in-progress state in fixup_irqs()
The flag being set may prevent affinity changes, as these often imply
assignment of a new vector. When there's no possible destination left
for the IRQ, the clearing of the flag needs to happen right from
fixup_irqs().
Additionally _assign_irq_vector() needs to avoid setting the flag when
there's no online CPU left in what gets put into ->arch.old_cpu_mask.
The old vector can be released right away in this case.
Also extend the log message about broken affinity to include the new
affinity as well, allowing to notice issues with affinity changes not
actually having taken place. Swap the if/else-if order there at the
same time to reduce the amount of conditions checked.
At the same time replace two open coded instances of the new helper
function.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ross Lagerwall [Mon, 22 Jul 2019 09:35:19 +0000 (11:35 +0200)]
tools/libxc: allow controlling the max C-state sub-state
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Make handling in do_pm_op() more homogeneous: Before interpreting
op->cpuid as such, handle all operations not acting on a particular
CPU. Also expose the setting via xenpm.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ross Lagerwall [Mon, 22 Jul 2019 09:34:32 +0000 (11:34 +0200)]
x86: allow limiting the max C-state sub-state
Allow limiting the max C-state sub-state by appending to the max_cstate
command-line parameter. E.g. max_cstate=1,0
The limit only applies to the highest legal C-state. For example:
max_cstate = 1, max_csubstate = 0 ==> C0, C1 okay, but not C1E
max_cstate = 1, max_csubstate = 1 ==> C0, C1 and C1E okay, but not C2
max_cstate = 2, max_csubstate = 0 ==> C0, C1, C1E, C2 okay, but not C3
max_cstate = 2, max_csubstate = 1 ==> C0, C1, C1E, C2 okay, but not C3
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:34:03 +0000 (11:34 +0200)]
x86/AMD: make C-state handling independent of Dom0
At least for more recent CPUs, following what BKDG / PPR suggest for the
BIOS to surface via ACPI we can make ourselves independent of Dom0
uploading respective data.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:32:20 +0000 (11:32 +0200)]
x86/cpuidle: really use C1 for "urgent" CPUs
For one on recent AMD CPUs entering C1 (if available at all) requires
use of MWAIT, while HLT (i.e. default_idle()) would put the processor
into as deep as CC6. And then even on other vendors' CPUs we should
avoid entering default_idle() when the intended state can be reached
by using the active idle driver's facilities.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:31:38 +0000 (11:31 +0200)]
x86/cpuidle: switch to uniform meaning of "max_cstate="
While the MWAIT idle driver already takes it to mean an actual C state,
the ACPI idle driver so far used it as a list index. The list index,
however, is an implementation detail of Xen and affected by firmware
settings (i.e. not necessarily uniform for a particular system).
While touching this code also avoid invoking menu_get_trace_data()
when tracing is not active. For consistency do this also for the
MWAIT driver.
Note that I'm intentionally not adding any sorting logic to set_cx():
Before and after this patch we assume entries to arrive in order, so
this would be an orthogonal change.
Take the opportunity and add minimal documentation for the command line
option.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 22 Jul 2019 09:30:10 +0000 (11:30 +0200)]
x86/shadow: ditch dangerous declarations
This started out with me noticing the latent bug of there being HVM
related declarations in common.c that their producer doesn't see, and
that hence could go out of sync at any time. However, go farther than
fixing just that and move the functions actually using these into hvm.c.
This way the items in question can simply become static, and no separate
declarations are needed at all.
Within the moved code constify and rename or outright delete the struct
vcpu * local variables and re-format a comment.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mtrr: Skip cache flushes on CPUs with cache self-snooping
Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.
On some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:
clocksource: timekeeping watchdog on CPU1: Marking clocksource
'tsc-early' as unstable because the skew is too large:
clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last: fffedb90 mask: ffffffff
clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
mask: ffffffffffffffff
tsc: Marking TSC unstable due to clocksource watchdog
As per measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers:
Platform Before After
104-core (208 Threads) Skylake 1437ms 28ms
2-core ( 4 Threads) Haswell 114ms 2ms
Use alternatives patching instead of static_cpu_has() (which we don't
have [yet]).
Interestingly we've been lacking the 2nd wbinvd(), which I'm taking the
liberty here.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Processors which have self-snooping capability can handle conflicting
memory type across CPUs by snooping its own cache. However, there exists
CPU models in which having conflicting memory types still leads to
unpredictable behavior, machine check errors, or hangs.
Clear this feature on affected CPUs to prevent its use.
Strip Yonah - as per ark.intel.com it doesn't look to be 64-bit capable.
Call the new function on the boot CPU only. Don't clear the CPU feature
flag itself, as it is exposed to guests (who could otherwise observe it
disappear after migration).
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Fri, 19 Jul 2019 11:49:47 +0000 (13:49 +0200)]
x86/mem_sharing: compile mem_sharing subsystem only when kconfig is enabled
Disable it by default as it is only an experimental subsystem.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Fri, 19 Jul 2019 11:48:38 +0000 (13:48 +0200)]
x86/mem_sharing: copy a page_lock version to be internal to memshr
Patch cf4b30dca0a "Add debug code to detect illegal page_lock and put_page_type
ordering" added extra sanity checking to page_lock/page_unlock for debug builds
with the assumption that no hypervisor path ever locks two pages at once.
This assumption doesn't hold during memory sharing so we copy a version of
page_lock/unlock to be used exclusively in the memory sharing subsystem
without the sanity checks.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Fri, 19 Jul 2019 11:47:17 +0000 (13:47 +0200)]
x86/mem_sharing: reorder when pages are unlocked and released
Calling _put_page_type while also holding the page_lock for that page
can cause a deadlock. There may be code-paths still in place where this
is an issue, but for normal sharing purposes this has been tested and
works.
Removing grabbing the extra page reference at certain points is done
because it is no longer needed, a reference is held till necessary with
this reorder thus the extra reference is redundant.
The comment being dropped is incorrect since it's now out-of-date.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 18 Jul 2019 16:53:03 +0000 (17:53 +0100)]
xen/trace: Add trace.h to MAINTAINER
... to match the existing trace.c entry.
Reported-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
To remove a device from a domain, a qmp command is sent to qemu. But it is
handled by qemu asychronously. Even the qmp command is claimed to be done,
the actual handling in qemu side may happen later.
This behavior brings two questions:
1. Attaching a device back to a domain right after detaching the device from
that domain would fail with error:
libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an
error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device
2. Accesses to PCI configuration space in Qemu may overlap with later device
reset issued by 'xl' or by pciback.
In order to avoid mentioned questions, wait for the completion of device
removal by querying all pci devices using qmp command and ensuring the target
device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked
by qemu.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Message-Id: <1562133373-19208-1-git-send-email-chao.gao@intel.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Daniel P. Smith [Thu, 18 Jul 2019 21:11:44 +0000 (22:11 +0100)]
golang/xenlight: Fixing compilation for go 1.11
This deals with two casting issues for compiling under go 1.11:
- explicitly cast to *C.xentoollog_logger for Ctx.logger pointer
- add cast to unsafe.Pointer for the C string cpath
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
George Dunlap [Mon, 8 Jul 2019 10:56:24 +0000 (06:56 -0400)]
MAINTAINERS: Make myself libxl golang binding maintainer
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Thu, 18 Jul 2019 13:29:35 +0000 (14:29 +0100)]
xen/trace: Fix build with !CONFIG_TRACEBUFFER
GCC reports:
In file included from hvm.c:24:0:
/local/xen.git/xen/include/xen/trace.h: In function ‘tb_control’:
/local/xen.git/xen/include/xen/trace.h:60:13: error: ‘ENOSYS’
undeclared (first use in this function)
return -ENOSYS;
^~~~~~
Include xen/errno.h to resolve the issue. While tweaking this, add comments
to the #else and #endif, as they are a fair distance apart.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sat, 13 Apr 2019 21:03:05 +0000 (22:03 +0100)]
x86/mm: Provide more useful information in diagnostics
* alloc_l?_table() should identify the failure, not just state that there is
one.
* get_page() should use %pd for the two domains, to render system domains in
a more obvious way.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 17 Jul 2019 13:46:08 +0000 (15:46 +0200)]
x86emul: add a PCLMUL/VPCLMUL test case to the harness
Also use this for AVX512_VBMI2 VPSH{L,R}D{,V}{D,Q,W} testing (only the
quad word right shifts get actually used; the assumption is that their
"left" counterparts as well as the double word and word forms then work
as well).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
Jan Beulich [Wed, 17 Jul 2019 13:43:57 +0000 (15:43 +0200)]
x86emul: restore ordering within main switch statement
Incremental additions and/or mistakes have lead to some code blocks
sitting in "unexpected" places. Re-sort the case blocks (opcode space;
major opcode; 66/F3/F2 prefix; legacy/VEX/EVEX encoding).
As an exception the opcode space 0x0f EVEX-encoded VPEXTRW is left at
its current place, to keep it close to the "pextr" label.
Pure code movement.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
Jan Beulich [Wed, 17 Jul 2019 13:43:06 +0000 (15:43 +0200)]
x86emul: support GFNI insns
As to the feature dependency adjustment, while strictly speaking SSE is
a sufficient prereq (to have XMM registers), vectors of bytes and qwords
have got introduced only with SSE2. gcc, for example, uses a similar
connection in its respective intrinsics header.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:41:58 +0000 (15:41 +0200)]
x86emul: support VAES insns
As to the feature dependency adjustment, just like for VPCLMULQDQ while
strictly speaking AVX is a sufficient prereq (to have YMM registers),
256-bit vectors of integers have got fully introduced with AVX2 only.
A new test case (also covering AESNI) will be added to the harness by a
subsequent patch.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
Jan Beulich [Wed, 17 Jul 2019 13:41:20 +0000 (15:41 +0200)]
x86emul: support VPCLMULQDQ insns
As to the feature dependency adjustment, while strictly speaking AVX is
a sufficient prereq (to have YMM registers), 256-bit vectors of integers
have got fully introduced with AVX2 only. Sadly gcc can't be used as a
reference here: They don't provide any AVX512-independent built-in at
all.
Along the lines of PCLMULQDQ, since the insns here and in particular
their memory access patterns follow the usual scheme, I didn't think it
was necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:40:42 +0000 (15:40 +0200)]
x86emul: support AVX512_VNNI insns
Along the lines of the 4FMAPS case, convert the 4VNNIW-based table
entries to a decoder adjustment. Because of the current sharing of table
entries between different (implied) opcode prefixes and with the same
major opcodes being used for vp4dpwssd{,s}, which have a different
memory operand size and different Disp8 scaling, the pre-existing table
entries get converted to a decoder override. The table entries will now
represent the insns here, in line with other table entries preferably
representing the prefix-66 insns.
As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:39:54 +0000 (15:39 +0200)]
x86emul: support AVX512_4VNNIW insns
As in a few cases before, since the insns here and in particular their
memory access patterns follow the AVX512_4FMAPS scheme, I didn't think
it was necessary to add contrived tests specifically for them, beyond
the Disp8 scaling ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:39:10 +0000 (15:39 +0200)]
x86emul: support AVX512_4FMAPS insns
A decoder adjustment is needed here because of the current sharing of
table entries between different (implied) opcode prefixes: The same
major opcodes are used for vfmsub{132,213}{p,s}{s,d}, which have a
different memory operand size and different Disp8 scaling.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:38:35 +0000 (15:38 +0200)]
x86emul: support remaining AVX512_VBMI2 insns
As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:37:54 +0000 (15:37 +0200)]
x86emul: support of AVX512_IFMA insns
Once again take the liberty and also correct the (public interface) name
of the AVX512_IFMA feature flag to match the SDM, on the assumption that
no external consumer has actually been using that flag so far.
As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:37:00 +0000 (15:37 +0200)]
x86emul: support of AVX512* population count insns
Plus the only other AVX512_BITALG one.
As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Xen internal running status(trace event at pre-defined trace point)
will be saved to trace memory when enabled.
Trace event data and config params can be read/changed
by system control hypercall at run time.
Can be disabled for smaller code footprint.
Signed-off-by: Baodong Chen <chenbaodong@mxnavi.com> Acked-by: George Dunlap <george.dunlap@citrix.com> [tracing] Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Jul 2019 13:34:23 +0000 (15:34 +0200)]
dom_cow is needed for mem-sharing only
A couple of adjustments are needed to code checking for dom_cow, but
since there are pretty few it is probably better to adjust those than
to set up and keep around a never used domain.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
Jan Beulich [Wed, 17 Jul 2019 13:33:05 +0000 (15:33 +0200)]
x86/PV: drop page table ownership check from emul-priv-op.c:read_cr()
We have such a check here but no-where else. It shouldn't have been
added by af909e7e16 ("M2P translation cannot be handled through flat
table with") in the first place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 15 Jul 2019 16:21:02 +0000 (17:21 +0100)]
x86/suspend: Don't save/restore %cr8
%cr8 is an alias of APIC_TASKPRI, which is handled by
lapic_{suspend,resume}() with the rest of the Local APIC state. Saving
and restoring the TPR state in isolation is not a clever idea.
Drop it all.
While editing wakeup_prot.S, trim its include list to just the headers
which are used, which is precicely none of them.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 11 Jul 2019 14:50:17 +0000 (09:50 -0500)]
x86/smpboot: Remove redundant order calculations
The GDT and IDT allocations are all order 0, and not going to change.
Use an explicit 0, instead of calling get_order_from_pages(). This
allows for the removal of the 'order' local parameter in both
cpu_smpboot_{alloc,free}().
While making this adjustment, rearrange cpu_smpboot_free() to fold the
two "if ( remove )" clauses. There is no explicit requirements for the
order of free()s.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 16 Jul 2019 11:29:02 +0000 (13:29 +0200)]
mm.h: fix BUG_ON() condition in put_page_alloc_ref()
The BUG_ON() was misplaced when this function was introduced in commit ec83f825 "mm.h: add helper function to test-and-clear _PGC_allocated".
It will fire incorrectly if _PGC_allocated is already clear on entry. Thus
it should be moved after the if statement.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 16 Jul 2019 07:10:36 +0000 (09:10 +0200)]
mm.h: add helper function to test-and-clear _PGC_allocated
The _PGC_allocated flag is set on a page when it is assigned to a domain
along with an initial reference count of at least 1. To clear this
'allocation' reference it is necessary to test-and-clear _PGC_allocated and
then only drop the reference if the test-and-clear succeeds. This is open-
coded in many places. It is also unsafe to test-and-clear _PGC_allocated
unless the caller holds an additional reference.
This patch adds a helper function, put_page_alloc_ref(), to replace all the
open-coded test-and-clear/put_page occurrences. That helper function
incorporates a check that an additional page reference is held and will
BUG() if it is not.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 16 Jul 2019 07:09:44 +0000 (09:09 +0200)]
x86/hvm: make hvmemul_virtual_to_linear()'s reps parameter optional
A majority of callers wants just a single iteration handled. Allow to
express this by passing in a NULL pointer, instead of setting up a local
variable just to hold the "1" to pass in here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
EPT differs from NPT and shadow when translating page orders to levels
in the physmap page tables. EPT page tables level for order 0 pages is
0, while NPT and shadow instead use 1, ie: EPT page tables levels
starts at 0 while NPT and shadow starts at 1.
Fix the p2m_entry_modify call in atomic_write_ept_entry to always add
one to the level, in order to match NPT and shadow usage.
While there also add a check to ensure p2m_entry_modify is never
called with level == 0. That should allow to catch future errors
related to the level parameter.
Fixes: c7a4c088ad1c ('x86/mm: split p2m ioreq server pages special handling into helper') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Fri, 17 May 2019 10:08:56 +0000 (11:08 +0100)]
tools/xenstored: Drop mapping of the ring via foreign map
This is a vestigial remnent of the pre xenstored stub domain days.
Foreign mapping via MFN is a privileged operation which is not
necessary, because grant details are unconditionally set up during
domain construction. In practice, this means xenstored never uses its
ability to foreign map the ring.
Drop the ability completely, which removes the penultimate use of the
unstable libxc interface.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Fri, 17 May 2019 10:06:16 +0000 (11:06 +0100)]
tools/xenstored: Make gnttab interface mandatory
xenstored currently requires an libxc and evtchn interface, but leaves
the gnttab interface as optional.
gnttab is ubiquitous these days, and in practice mandatory in all cases
where xenstored isn't running as root in dom0 (due to the inability to
foreign map by MFN).
The toolstack has unconditionally set up grant details for many years
now, and longterm it would be good to phase out the use of libxc. This
requires that xenstored map the store ring by grant map, rather than
foreign map.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Wed, 26 Jun 2019 13:37:26 +0000 (14:37 +0100)]
libxl: fix pci device re-assigning after domain reboot
After a reboot of a guest only the first pci device configuration will
be retrieved from Xenstore resulting in loss of any further assigned
passed through pci devices.
The main reason is that all passed through pci devices reside under a
common root device "0" in Xenstore. So when the device list is rebuilt
from Xenstore after a reboot the sub-devices below that root device
need to be selected instead of using the root device number as a
selector.
Fix that by adding a new member to struct libxl_device_type which when
set is used to get the number of devices. Add such a member for pci to
get the correct number of pci devices instead of implying it from the
number of pci root devices (which will always be 1).
While at it fix the type of libxl__device_pci_from_xs_be() to match
the one of the .from_xenstore member of struct libxl_device_type. This
fixes a latent bug checking the return value of a function returning
void.
Andrew Cooper [Thu, 4 Jul 2019 15:13:32 +0000 (16:13 +0100)]
x86/ctxt-switch: Document and improve GDT handling
Calling virt_to_mfn() in the context switch path is a lot
of wasted cycles for a result which is constant after boot.
Begin by documenting how Xen handles the GDTs across context switch.
The loop in write_full_gdt_ptes() is unnecessary, because
NR_RESERVED_GDT_PAGES is 1. Dropping it makes the code substantially
more clear, and with it dropped, write_full_gdt_ptes() becomes more
obviously a poor name, so rename it to update_xen_slot_in_full_gdt().
Furthermore, load_full_gdt() is completely independent of the current
CPU, and load_default_gdt() only needs the current CPU's regular
GDT. (This is a change in behaviour, as previously it may have used the
compat GDT, but either will do.)
Add two extra per-cpu variables which cache the L1e for the regular and compat
GDT, calculated in cpu_smpboot_alloc()/trap_init() as appropriate, so
update_xen_slot_in_full_gdt() doesn't need to waste time performing the same
calculation on every context switch.
One performance scenario of Jüergen's (time to build the hypervisor on
an 8 CPU system, with two single-vCPU MiniOS VMs constantly interrupting
dom0 with events) shows the following, average over 5 measurements:
elapsed user system
Unpatched 66.51 232.93 109.21
Patched 57.00 225.47 105.47
which is a substantial improvement.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Will Abele [Tue, 9 Jul 2019 13:22:23 +0000 (13:22 +0000)]
xen/arm: use correct device tree root node name
The root node of a device tree should not have a node name. This is
specified in section 2.2.1 of version 0.2 of the device tree
specification, available from devicetree.org.
Linux Kernel versions prior to 4.15 misinterpret flattened device trees
with a "/" as the name of the root node as an FDT version older than 16.
Linux then fails to parse the FDT.
Signed-off-by: Will Abele <will.abele@starlab.io> Reviewed-by: Julien Grall <julien.grall@arm.com>
xen/arm: optee: check if OP-TEE is virtualization-aware
This is workaround for OP-TEE 3.5. This is the first OP-TEE release
which supports virtualization, but there is no way to tell if
OP-TEE was built with that support enabled. We can probe for it
by calling SMC that is available only when OP-TEE is built with
virtualization support.
xen/arm: tee: place OP-TEE Kconfig option right after TEE
It is nicer, when options for particular TEE mediators (currently,
OP-TEE only) are following generic "Enable TEE mediators support"
option in the menuconfig:
[*] Enable TEE mediators support
[ ] Enable OP-TEE mediator
Amit Singh Tomar [Sun, 23 Jun 2019 12:56:31 +0000 (18:26 +0530)]
xen/arm: domain_build: Black list devices using PPIs
Currently, the vGIC is not able to cope with hardware PPIs routed to guests.
One of the solutions to this problem is to skip any device that uses PPI
source completely while building the domain itself.
This patch goes through all the interrupt sources of a device and skip it
if one of the interrupts sources is a PPI. It fixes XEN boot on i.MX8MQ by
skipping the PMU node.
Andrew Cooper [Mon, 8 Jul 2019 22:12:06 +0000 (23:12 +0100)]
x86/gnttab: Use explicit instruction size in gnttab_clear_flags()
The OpenSUSE Leap compilers complain about ambiguity:
In file included from grant_table.c:33:
In file included from ...xen/include/xen/grant_table.h:30:
...xen/include/asm/grant_table.h:67:19: error: ambiguous instructions require
an explicit suffix (could be 'andb', 'andw', 'andl', or 'andq')
asm volatile ("lock and %1,%0" : "+m" (*addr) : "ir" ((uint16_t)~mask));
^
<inline asm>:1:2: note: instantiated into assembly here
lock and $-17,(%rsi)
^
Full logs: https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/247600284 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>