]> xenbits.xensource.com Git - people/royger/xen.git/log
people/royger/xen.git
5 years agox86/tlb: use Xen L0 assisted TLB flush when available flush.v9 gitlab/flush.v9
Roger Pau Monne [Thu, 19 Dec 2019 13:16:16 +0000 (14:16 +0100)]
x86/tlb: use Xen L0 assisted TLB flush when available

Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.

The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.

Using x2APIC and ALLBUT shorthand:
real 4m35.973s
user 4m35.110s
sys 36m24.117s

Using L0 assisted flush:
real    1m2.596s
user    4m34.818s
sys     5m16.374s

The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook.

Note that the Xen implementation completely ignores the dirty CPU mask
and the linear address passed in, and always performs a global TLB
flush on all vCPUs. This is a limitation of the hypercall provided by
Xen. Also note that local TLB flushes are not performed using the
assisted TLB flush, only remote ones.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v5:
 - Clarify commit message.
 - Test for assisted flush at setup, do this for all hypervisors.
 - Return EOPNOTSUPP if assisted flush is not available.

Changes since v4:
 - Adjust order calculation.

Changes since v3:
 - Use an alternative call for the flush hook.

Changes since v1:
 - Add a L0 assisted hook to hypervisor ops.

5 years agox86/tlb: allow disabling the TLB clock
Roger Pau Monne [Mon, 27 Jan 2020 09:41:24 +0000 (10:41 +0100)]
x86/tlb: allow disabling the TLB clock

The TLB clock is helpful when running Xen on bare metal because when
doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
last flush.

This is not the case however when Xen is running virtualized, and the
underlying hypervisor provides mechanism to assist in performing TLB
flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
order to perform a TLB flush without having to IPI each CPU. When
using such mechanisms it's no longer possible to keep a timestamp of
the flushes on each CPU, as they are performed by the underlying
hypervisor.

Offer a boolean in order to signal Xen that the timestamped TLB
shouldn't be used. This avoids keeping the timestamps of the flushes,
and also forces NEED_FLUSH to always return true.

No functional change intended, as this change doesn't introduce any
user that disables the timestamped TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/tlb: introduce a flush HVM ASIDs flag
Roger Pau Monne [Mon, 27 Jan 2020 10:23:08 +0000 (11:23 +0100)]
x86/tlb: introduce a flush HVM ASIDs flag

Introduce a specific flag to request a HVM guest linear TLB flush,
which is an ASID/VPID tickle that forces a guest linear to guest
physical TLB flush for all HVM guests.

This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the guest page tables in that case
(ie: when using HAP). Note that shadow paging code already takes care
of issuing the necessary flushes when the shadow page tables are
modified.

In order to keep the previous behavior modify all shadow code TLB
flushes to also flush the guest linear to physical TLB if the guest is
HVM. I haven't looked at each specific shadow code TLB flush in order
to figure out whether it actually requires a guest TLB flush or not,
so there might be room for improvement in that regard.

Also perform ASID/VPID flushes when modifying the p2m tables as it's a
requirement for AMD hardware. Finally keep the flush in
switch_cr3_cr4, as it's not clear whether code could rely on
switch_cr3_cr4 also performing a guest linear TLB flush. A following
patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
not be necessary.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v8:
 - Don't flush host TLB on HAP changes.
 - Introduce a helper for shadow changes that only flushes ASIDs/VPIDs
   when the guest is HVM.
 - Introduce a helper for HAP that only flushes ASIDs/VPIDs.

Changes since v7:
 - Do not perform an ASID flush in filtered_flush_tlb_mask: the
   requested flush is related to the page need_tlbflush field and not
   to p2m changes (applies to both callers).

Changes since v6:
 - Add ASID/VPID flushes when modifying the p2m.
 - Keep the ASID/VPID flush in switch_cr3_cr4.

Changes since v5:
 - Rename FLUSH_GUESTS_TLB to FLUSH_HVM_ASID_CORE.
 - Clarify commit message.
 - Define FLUSH_HVM_ASID_CORE to 0 when !CONFIG_HVM.

5 years agox86/apic: simplify disconnect_bsp_APIC setup of LVT{0/1}
Roger Pau Monne [Thu, 23 Jan 2020 17:37:47 +0000 (18:37 +0100)]
x86/apic: simplify disconnect_bsp_APIC setup of LVT{0/1}

There's no need to read the current values of LVT{0/1} for the
purposes of the function, which seem to be to save the currently
selected vector: in the destination modes used (ExtINT and NMI) the
vector field is ignored and hence can be set to 0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/hvmloader: round up memory BAR size to 4K
Roger Pau Monne [Tue, 14 Jan 2020 18:06:26 +0000 (19:06 +0100)]
x86/hvmloader: round up memory BAR size to 4K

When placing memory BARs with sizes smaller than 4K multiple memory
BARs can end up mapped to the same guest physical address, and thus
won't work correctly.

Round up all memory BAR sizes to be at least 4K, so that they are
naturally aligned to a page size and thus don't end up sharing a page.
Also add a couple of asserts to the current code to make sure the MMIO
hole is properly sized and aligned.

Note that the guest can still move the BARs around and create this
collisions, and that BARs not filling up a physical page might leak
access to other MMIO regions placed in the same host physical page.

This is however no worse than what's currently done, and hence should
be considered an improvement over the current state.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jason Andryuk <jandryuk@gmail.com>
---
Changes since v1:
 - Do the round up when sizing the BARs, so that the MMIO hole is
   correctly sized.
 - Add some asserts that the hole is properly sized and size-aligned.
 - Dropped Jason Tested-by since the code has changed.
---
Jason, can you give this a spin? Thanks.

5 years ago(no commit message)
Roger Pau Monne [Thu, 31 Oct 2019 09:38:27 +0000 (10:38 +0100)]

5 years agotools/libxc: misc: Mark const the parameter 'params' of xc_set_parameters()
Julien Grall [Mon, 30 Mar 2020 19:21:53 +0000 (20:21 +0100)]
tools/libxc: misc: Mark const the parameter 'params' of xc_set_parameters()

The parameter 'params' of xc_set_parameters() should never be modified.
So mark it as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/libxc: misc: Mark const the parameter 'keys' of xc_send_debug_keys()
Julien Grall [Mon, 30 Mar 2020 19:21:52 +0000 (20:21 +0100)]
tools/libxc: misc: Mark const the parameter 'keys' of xc_send_debug_keys()

OCaml is using a string to describe the parameter 'keys' of
xc_send_debug_keys(). Since Ocaml 4.06.01, String_val() will return a
const char * when using -safe-string. This will result to a build
failure because xc_send_debug_keys() expects a char *.

The function should never modify the parameter 'keys' and therefore the
parameter should be const. Unfortunately, this is not directly possible
because DECLARE_HYPERCALL_BOUNCE() is expecting a non-const variable.

A new macro DECLARE_HYPERCALL_BOUNCE_IN() is introduced and will take
care of const parameter. The first user will be xc_send_debug_keys() but
this can be used in more place in the future.

Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/public: sysctl: set_parameter.params and debug.keys should be const
Julien Grall [Mon, 30 Mar 2020 19:21:51 +0000 (20:21 +0100)]
xen/public: sysctl: set_parameter.params and debug.keys should be const

The fields set_parameter.params and debug.keys should never be modified
by the hypervisor. So mark them as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild,arm: Fix deps check of head.o
Anthony PERARD [Tue, 31 Mar 2020 10:30:47 +0000 (11:30 +0100)]
build,arm: Fix deps check of head.o

arm*/head.o isn't in obj-y or extra-y, so make don't load the
associated .*.d file (or .*.cmd file when if_changed will be used).
There is a workaround where .*.d file is added manually into DEPS.

Changing DEPS isn't needed, we can simply add head.o into extra-y and
the dependency files will be loaded.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agoxen/arm: Configure early printk via Kconfig
Anthony PERARD [Tue, 31 Mar 2020 10:30:46 +0000 (11:30 +0100)]
xen/arm: Configure early printk via Kconfig

At the moment, early printk can only be configured on the make command
line. It is not very handy because a user has to remove the option
everytime it is using another command other than compiling the
hypervisor.

Furthermore, early printk is one of the few odds one that are not
using Kconfig.

So this is about time to move it to Kconfig.

The new kconfigs options allow a user to eather select a UART driver
to use at boot time, and set the parameters, or it is still possible
to select a platform which will set the parameters.

If CONFIG_EARLY_PRINTK is present in the environment or on the make
command line, make will return an error.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agoxen/arm: Rename all early printk macro
Anthony PERARD [Tue, 31 Mar 2020 10:30:45 +0000 (11:30 +0100)]
xen/arm: Rename all early printk macro

We are going to move the generation of the early printk macro into
Kconfig. This means all macro will be prefix with CONFIG_. We do that
ahead of the change.

We also take the opportunity to better name some variables, which are
used by only one driver and wouldn't make sens for other UART driver.
Thus,
    - EARLY_UART_REG_SHIFT became CONFIG_EARLY_UART_8250_REG_SHIFT
    - EARLY_PRINTK_VERSION_* became CONFIG_EARLY_UART_SCIF_VERSION_*

The other variables are change to have the prefix CONFIG_EARLY_UART_
when they change a parameter of the driver. So we have now:
    - CONFIG_EARLY_UART_BAUD_RATE
    - CONFIG_EARLY_UART_BASE_ADDRESS
    - CONFIG_EARLY_UART_INIT

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86: compress lines for immediate return
Simran Singhal [Tue, 31 Mar 2020 06:51:21 +0000 (08:51 +0200)]
x86: compress lines for immediate return

Compress two lines into a single line if immediate return statement is found.
It also remove variables retval, freq, effective, vector, ovf and now
as they are no longer needed.

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: remove unnecessary cast on void pointer
Simran Singhal [Tue, 31 Mar 2020 06:50:25 +0000 (08:50 +0200)]
x86: remove unnecessary cast on void pointer

Assignment to a typed pointer is sufficient in C.
No cast is needed.

Also, changed some u64/u32 to uint64_t/uint32_t.

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoSVM: split _np_enable VMCB field
Jan Beulich [Tue, 31 Mar 2020 06:46:44 +0000 (08:46 +0200)]
SVM: split _np_enable VMCB field

The nest paging enable is actually just a single bit within the 64-bit
VMCB field, which is particularly relevant for uses like the one in
nsvm_vcpu_vmentry(). Split the field, adding definitions for a few other
bits at the same time. To be able to generate accessors for bitfields,
VMCB_ACCESSORS() needs the type part broken out, as typeof() can't be
applied to bitfields. Unfortunately this means specification of the same
type in two distinct places.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agodocs/README: Fix a broken url
Ian Jackson [Mon, 30 Mar 2020 13:52:12 +0000 (14:52 +0100)]
docs/README: Fix a broken url

There was a / missing here.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agodocs etc.: https: Fix references to other Xen pages
Ian Jackson [Mon, 30 Mar 2020 13:51:51 +0000 (14:51 +0100)]
docs etc.: https: Fix references to other Xen pages

Change the url scheme to https.  This is all in-tree references to
xenbits and the main website except for those in Config.mk.

We leave Config.mk alone for now because those urls are used by CI
systems and we need to check that nothing breaks when we change the
download method.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agodocs etc.: https: Fix references to wiki.xen[project].org
Ian Jackson [Mon, 30 Mar 2020 13:43:06 +0000 (14:43 +0100)]
docs etc.: https: Fix references to wiki.xen[project].org

Change the url scheme to https.  This is all in-tree references to the
Xen wiki.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agoscripts: Use stat to check lock claim
Jason Andryuk [Thu, 12 Mar 2020 14:54:17 +0000 (10:54 -0400)]
scripts: Use stat to check lock claim

Replace the perl locking check with stat(1).  Stat is able to fstat
stdin (file descriptor 0) when passed '-' as an argument.  This is now
used to check $_lockfd.  stat(1) support for '-' was introduced to
coreutils in 2009.

After A releases its lock, script B will return from flock and execute
stat.  Since the lockfile has been removed by A, stat prints an error to
stderr and exits non-zero.  Redirect stderr to /dev/null to avoid
filling /var/log/xen/xen-hotplug.log with "No such file or directory"
messages.

Placing the stat call inside the "if" condition ensures we only check
the stat output when the command completed successfully.

This change removes the only runtime dependency of the xen toolstack on
perl.

Suggested-by: Ian Jackson <ian.jackson@citrix.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/x86: Remove parentheses from return arguments
Simran Singhal [Sun, 29 Mar 2020 06:37:47 +0000 (12:07 +0530)]
xen/x86: Remove parentheses from return arguments

This patch remove unnecessary parentheses from return arguments.

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/python: mismatch between pyxc_methods flags and PyObject definitions
YOUNG, MICHAEL A [Tue, 17 Mar 2020 23:01:43 +0000 (23:01 +0000)]
tools/python: mismatch between pyxc_methods flags and PyObject definitions

pygrub in xen-4.13.0 with python 3.8.2 fails with the error

Traceback (most recent call last):
  File "/usr/libexec/xen/bin/pygrub", line 21, in <module>
    import xen.lowlevel.xc
SystemError: bad call flags

This patch fixes mismatches in tools/python/xen/lowlevel/xc/xc.c
between the flag bits defined in pyxc_methods and the parameters passed
to the corresponding PyObject definitions.

With this patch applied pygrub works as expected.

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
5 years agodocs/designs: Add a design document for migration of xenstore data
Paul Durrant [Fri, 27 Mar 2020 13:46:10 +0000 (13:46 +0000)]
docs/designs: Add a design document for migration of xenstore data

This patch details proposes extra migration data and xenstore protocol
extensions to support non-cooperative live migration of guests.

NOTE: doc/misc/xenstore.txt is also amended to replace the <mfn> term
      for the INTRODUCE operation with the <gfn>, since this is what
      it actually is.

Signed-off-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agodocs/designs: Add a design document for non-cooperative live migration
Paul Durrant [Fri, 27 Mar 2020 13:46:09 +0000 (13:46 +0000)]
docs/designs: Add a design document for non-cooperative live migration

It has become apparent to some large cloud providers that the current
model of cooperative migration of guests under Xen is not usable as it
relies on software running inside the guest, which is likely beyond the
provider's control.
This patch introduces a proposal for non-cooperative live migration,
designed not to rely on any guest-side software.

Signed-off-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agoautomation/gitlab: add https transport support to Debian images
Roger Pau Monne [Fri, 27 Mar 2020 11:49:47 +0000 (12:49 +0100)]
automation/gitlab: add https transport support to Debian images

The LLVM repos have switched from http to https, and trying to access
using http will get redirected to https. Add the apt-transport-https
package to the x86 Debian containers that use the LLVM repos, in order
to support the https transport method.

Note that on Arm we only test with gcc, so don't add the package for
the Debian Arm container.

This fixes the following error seen on the QEMU smoke tests:

E: The method driver /usr/lib/apt/methods/https could not be found.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/nvmx: update exit bitmap when using virtual interrupt delivery
Roger Pau Monne [Fri, 27 Mar 2020 12:45:59 +0000 (13:45 +0100)]
x86/nvmx: update exit bitmap when using virtual interrupt delivery

Force an update of the EOI exit bitmap in nvmx_update_apicv, because
the one performed in vmx_intr_assist might not be reached if the
interrupt is intercepted by nvmx_intr_intercept returning true.

Extract the code to update the exit bitmap from vmx_intr_assist into a
helper and use it in nvmx_update_apicv.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/nvmx: split updating RVI from SVI in nvmx_update_apicv
Roger Pau Monne [Fri, 27 Mar 2020 12:45:58 +0000 (13:45 +0100)]
x86/nvmx: split updating RVI from SVI in nvmx_update_apicv

Updating SVI is required when an interrupt has been injected using the
Ack on exit VMEXIT feature, so that the in service interrupt in the
GUEST_INTR_STATUS matches the vector that is signaled in
VM_EXIT_INTR_INFO.

Updating RVI however is not tied to the Ack on exit feature, as it
signals the next vector to be injected, and hence should always be
updated to the next pending vector, regardless of whether Ack on exit
is enabled.

When not using the Ack on exit feature preserve the previous vector in
SVI, so that it's not lost when RVI is updated to contain the pending
vector to inject.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/ucode: Drop the sanity check for interrupts being disabled
Andrew Cooper [Fri, 27 Mar 2020 12:02:09 +0000 (12:02 +0000)]
x86/ucode: Drop the sanity check for interrupts being disabled

Of the substantial number of things which can go wrong during microcode load,
this is not one.  Loading occurs entirely within the boundary of a single
WRMSR instruction.  Its certainly not a BUG()-worthy condition.

Xen has legitimate reasons to not want interrupts enabled at this point, but
that is to do with organising the system rendezvous.  As these are private low
level helpers invoked only from the microcode core logic, forgo the check
entirely.

While dropping system.h, clean up the processor.h include which was an
oversight in the previous header cleanup.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Fix potential buffer overrun with equiv table handling
Andrew Cooper [Fri, 27 Mar 2020 11:59:02 +0000 (11:59 +0000)]
x86/ucode/amd: Fix potential buffer overrun with equiv table handling

find_equiv_cpu_id() loops until it finds a 0 installed_cpu entry.  Well formed
AMD microcode containers have this property.

Extend the checking in install_equiv_cpu_table() to reject tables which don't
have a sentinal at the end.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: Introduce a xmemdup_bytes() helper
Andrew Cooper [Fri, 20 Mar 2020 20:53:58 +0000 (20:53 +0000)]
xen: Introduce a xmemdup_bytes() helper

Use it to simplify the x86 microcode logic, taking the opportunity to drop the
-ENOMEM printks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agosoftirq: adjust comment placement
Juergen Gross [Fri, 27 Mar 2020 10:44:09 +0000 (11:44 +0100)]
softirq: adjust comment placement

With commit cef21210fb133 ("rcu: don't process callbacks when holding
a rcu_read_lock()") the comment in process_pending_softirqs() about
not entering the scheduler should have been moved.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibx86/CPUID: fix (not just) leaf 7 processing
Jan Beulich [Fri, 27 Mar 2020 10:40:59 +0000 (11:40 +0100)]
libx86/CPUID: fix (not just) leaf 7 processing

For one, subleaves within the respective union shouldn't live in
separate sub-structures. And then x86_cpuid_policy_fill_native() should,
as it did originally, iterate over all subleaves here as well as over
all main leaves. Switch to using a "<= MIN()"-based approach similar to
that used in x86_cpuid_copy_to_buffer(). Also follow this for the
extended main leaves then.

Fixes: 1bd2b750537b ("libx86: Fix 32bit stubdom build of x86_cpuid_policy_fill_native()")
Fixes: 97e4ebdcd765 ("x86/CPUID: support leaf 7 subleaf 1 / AVX512_BF16")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: x86: make init_intel_cacheinfo() void
Dario Faggioli [Thu, 26 Mar 2020 17:17:32 +0000 (18:17 +0100)]
xen: x86: make init_intel_cacheinfo() void

It seems that we took this code from Linux, back when the function was
'unsigned int' and the return value was used.

But we are currently not doing anything with such value, so let's get
rid of it and make the function void. As an anecdote, that's pretty much
the same that happened in Linux as, since commit 807e9bc8e2fe6 ("x86/CPU:
Move cpu_detect_cache_sizes() into init_intel_cacheinfo()") the function
is void there too.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoSVM: Add union intstat_t for offset 68h in vmcb struct
Pu Wen [Thu, 26 Mar 2020 13:44:30 +0000 (21:44 +0800)]
SVM: Add union intstat_t for offset 68h in vmcb struct

According to chapter "Appendix B Layout of VMCB" in the new version
(v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as
GUEST_INTERRUPT_MASK.

In current xen codes, it use whole u64 interrupt_shadow to setup
interrupt shadow, which will misuse other bit in VMCB offset 68h
as part of interrupt_shadow, causing svm_get_interrupt_shadow() to
mistake the guest having interrupts enabled as being in an interrupt
shadow.  This has been observed to cause SeaBIOS to hang on boot.

Add union intstat_t for VMCB offset 68h and fix codes to only use
bit 0 as intr_shadow according to the new APM description.

Reference:
[1] https://www.amd.com/system/files/TechDocs/24593.pdf

Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/ucode: Document the behaviour of the microcode_ops hooks
Andrew Cooper [Fri, 20 Mar 2020 15:37:28 +0000 (15:37 +0000)]
x86/ucode: Document the behaviour of the microcode_ops hooks

... and struct cpu_signature for good measure.

No comment is passed on the suitability of the behaviour...

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: Drop raw_smp_processor_id()
Andrew Cooper [Thu, 19 Mar 2020 18:29:06 +0000 (18:29 +0000)]
xen: Drop raw_smp_processor_id()

There is only a single user of raw_smp_processor_id() left in the tree (and it
is unconditionally compiled out).  Drop the alias from all architectures.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/ucode: Fix error paths in apply_microcode()
Andrew Cooper [Fri, 20 Mar 2020 20:03:32 +0000 (20:03 +0000)]
x86/ucode: Fix error paths in apply_microcode()

In the unlikley case that patch application completes, but the resutling
revision isn't expected, sig->rev doesn't get updated to match reality.

It will get adjusted the next time collect_cpu_info() gets called, but in the
meantime Xen might operate on a stale value.  Nothing good will come of this.

Rewrite the logic to always update the stashed revision, before worrying about
whether the attempt was a success or failure.

Take the opportunity to make the printk() messages as consistent as possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/ucode/amd: Fix assertion in compare_patch()
Andrew Cooper [Thu, 19 Mar 2020 15:55:26 +0000 (15:55 +0000)]
x86/ucode/amd: Fix assertion in compare_patch()

This is clearly a typo.

Fixes: 9da23943ccd "microcode: introduce a global cache of ucode patch"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agocpu: sync any remaining RCU callbacks before CPU up/down
Igor Druzhinin [Thu, 26 Mar 2020 11:49:42 +0000 (12:49 +0100)]
cpu: sync any remaining RCU callbacks before CPU up/down

During CPU down operation RCU callbacks are scheduled to finish
off some actions later as soon as CPU is fully dead (the same applies
to CPU up operation in case error path is taken). If in the same grace
period another CPU up operation is performed on the same CPU, RCU callback
will be called later on a CPU in a potentially wrong (already up again
instead of still being down) state leading to eventual state inconsistency
and/or crash.

In order to avoid it - flush RCU callbacks explicitly before starting the
next CPU up/down operation.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: add assertions to debug build
Juergen Gross [Thu, 26 Mar 2020 11:46:48 +0000 (12:46 +0100)]
rcu: add assertions to debug build

Xen's RCU implementation relies on no softirq handling taking place
while being in a RCU critical section. Add ASSERT()s in debug builds
in order to catch any violations.

For that purpose modify rcu_read_[un]lock() to use a dedicated percpu
counter additional to preempt_[en|dis]able() as this enables to test
that condition in __do_softirq() (ASSERT_NOT_IN_ATOMIC() is not
usable there due to __cpu_up() calling process_pending_softirqs()
while holding the cpu hotplug lock).

While at it switch the rcu_read_[un]lock() implementation to static
inline functions instead of macros.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: don't process callbacks when holding a rcu_read_lock()
Juergen Gross [Thu, 26 Mar 2020 11:46:11 +0000 (12:46 +0100)]
rcu: don't process callbacks when holding a rcu_read_lock()

Some keyhandlers are calling process_pending_softirqs() while holding
a rcu_read_lock(). This is wrong, as process_pending_softirqs() might
activate rcu calls which should not happen inside a rcu_read_lock().

For that purpose modify process_pending_softirqs() to not allow rcu
callback processing when a rcu_read_lock() is being held.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: don't use stop_machine_run() for rcu_barrier()
Juergen Gross [Thu, 26 Mar 2020 11:43:23 +0000 (12:43 +0100)]
rcu: don't use stop_machine_run() for rcu_barrier()

Today rcu_barrier() is calling stop_machine_run() to synchronize all
physical cpus in order to ensure all pending rcu calls have finished
when returning.

As stop_machine_run() is using tasklets this requires scheduling of
idle vcpus on all cpus imposing the need to call rcu_barrier() on idle
cpus only in case of core scheduling being active, as otherwise a
scheduling deadlock would occur.

There is no need at all to do the syncing of the cpus in tasklets, as
rcu activity is started in __do_softirq() called whenever softirq
activity is allowed. So rcu_barrier() can easily be modified to use
softirq for synchronization of the cpus no longer requiring any
scheduling activity.

As there already is a rcu softirq reuse that for the synchronization.

Remove the barrier element from struct rcu_data as it isn't used.

Finally switch rcu_barrier() to return void as it now can never fail.

Partially-based-on-patch-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoatomics: introduce smp_mb__[after|before]_atomic() barriers
Juergen Gross [Thu, 26 Mar 2020 11:42:19 +0000 (12:42 +0100)]
atomics: introduce smp_mb__[after|before]_atomic() barriers

When using atomic variables for synchronization barriers are needed
to ensure proper data serialization. Introduce smp_mb__before_atomic()
and smp_mb__after_atomic() as in the Linux kernel for that purpose.

Use the same definitions as in the Linux kernel.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agox86emul: support AVX512_BF16 insns
Jan Beulich [Thu, 26 Mar 2020 11:39:08 +0000 (12:39 +0100)]
x86emul: support AVX512_BF16 insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: vendor specific SYSENTER/SYSEXIT behavior in long mode
Jan Beulich [Thu, 26 Mar 2020 11:36:30 +0000 (12:36 +0100)]
x86emul: vendor specific SYSENTER/SYSEXIT behavior in long mode

Intel CPUs permit both insns there while AMD ones don't.

While at it also
- drop the ring 0 check from SYSENTER handling - neither Intel's nor
  AMD's insn pages have any indication of #GP(0) getting raised when
  executed from ring 0, and trying it out in practice also confirms
  the check shouldn't be there,
- move SYSENTER segment register writing until after the (in principle
  able to fail) MSR reads.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: vendor specific near indirect branch behavior in 64-bit mode
Jan Beulich [Thu, 26 Mar 2020 11:34:16 +0000 (12:34 +0100)]
x86emul: vendor specific near indirect branch behavior in 64-bit mode

Intel CPUs ignore operand size overrides here, while AMD ones don't.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: vendor specific direct branch behavior in 64-bit mode
Jan Beulich [Thu, 26 Mar 2020 11:32:07 +0000 (12:32 +0100)]
x86emul: vendor specific direct branch behavior in 64-bit mode

Intel CPUs ignore operand size overrides here, while AMD ones don't.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: vendor specific near RET behavior in 64-bit mode
Jan Beulich [Thu, 26 Mar 2020 11:29:50 +0000 (12:29 +0100)]
x86emul: vendor specific near RET behavior in 64-bit mode

Intel CPUs ignore operand size overrides here, while AMD ones don't.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: add wrappers to check for AMD-like behavior
Jan Beulich [Thu, 26 Mar 2020 11:27:36 +0000 (12:27 +0100)]
x86emul: add wrappers to check for AMD-like behavior

These are to aid readbility at their use sites, in particular because
we're going to gain more of them.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/nvmx: only update SVI when using Ack on exit
Roger Pau Monné [Thu, 26 Mar 2020 11:25:40 +0000 (12:25 +0100)]
x86/nvmx: only update SVI when using Ack on exit

Check whether there's a valid interrupt in VM_EXIT_INTR_INFO in order
to decide whether to update SVI in nvmx_update_apicv. If Ack on exit
is not being used VM_EXIT_INTR_INFO won't have a valid interrupt and
hence SVI shouldn't be updated to signal the interrupt is currently in
service because it won't be Acked.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoRevert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"
Roger Pau Monné [Thu, 26 Mar 2020 11:25:07 +0000 (12:25 +0100)]
Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"

This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458.

The commit is wrong, as the whole point of nvmx_update_apicv is to
update the guest interrupt status field when the Ack on exit VMEXIT
control feature is enabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agosched: fix cpu offlining with core scheduling
Juergen Gross [Thu, 26 Mar 2020 11:23:59 +0000 (12:23 +0100)]
sched: fix cpu offlining with core scheduling

Offlining a cpu with core scheduling active can result in a hanging
system. Reason is the scheduling resource and unit of the to be removed
cpus needs to be split in order to remove the cpu from its cpupool and
move it to the idle scheduler. In case one of the involved cpus happens
to have received a sched slave event due to a vcpu former having been
running on that cpu being woken up again, it can happen that this cpu
will enter sched_wait_rendezvous_in() while its scheduling resource is
just about to be split. It might wait for ever for the other sibling
to join, which will never happen due to the resources already being
modified.

This can easily be avoided by:
- resetting the rendezvous counters of the idle unit which is kept
- checking for a new scheduling resource in sched_wait_rendezvous_in()
  after reacquiring the scheduling lock and resetting the counters in
  that case without scheduling another vcpu
- moving schedule resource modifications (in schedule_cpu_rm()) and
  retrieving (schedule(), sched_slave() is fine already, others are not
  critical) into locked regions

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agomm: add 'is_special_page' inline function...
Paul Durrant [Tue, 24 Mar 2020 16:40:50 +0000 (17:40 +0100)]
mm: add 'is_special_page' inline function...

... to cover xenheap and PGC_extra pages.

PGC_extra pages are intended to hold data structures that are associated
with a domain and may be mapped by that domain. They should not be treated
as 'normal' guest pages (i.e. RAM or page tables). Hence, in many cases
where code currently tests is_xen_heap_page() it should also check for
the PGC_extra bit in 'count_info'.

This patch therefore defines is_special_page() to cover both cases and
converts tests of is_xen_heap_page() (or open coded tests of PGC_xen_heap)
to is_special_page() where the page is assigned to a domain.

Signed-off-by: Paul Durrant <paul@xen.org>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agox86 / ioreq: use a MEMF_no_refcount allocation for server pages...
Paul Durrant [Tue, 24 Mar 2020 16:40:09 +0000 (17:40 +0100)]
x86 / ioreq: use a MEMF_no_refcount allocation for server pages...

... now that it is safe to assign them.

This avoids relying on libxl (or whatever toolstack is in use) setting
max_pages up with sufficient 'slop' to allow all necessary ioreq server
pages to be allocated.

Signed-off-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agomm: keep PGC_extra pages on a separate list
Paul Durrant [Tue, 24 Mar 2020 16:37:27 +0000 (17:37 +0100)]
mm: keep PGC_extra pages on a separate list

This patch adds a new page_list_head into struct domain to hold PGC_extra
pages. This avoids them getting confused with 'normal' domheap pages where
the domain's page_list is walked.

A new dump loop is also added to dump_pageframe_info() to unconditionally
dump the 'extra page list'.

Signed-off-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agosched: fix onlining cpu with core scheduling active
Juergen Gross [Tue, 24 Mar 2020 16:36:44 +0000 (17:36 +0100)]
sched: fix onlining cpu with core scheduling active

When onlining a cpu cpupool_cpu_add() checks whether all siblings of
the new cpu are free in order to decide whether to add it to cpupool0.
In case the added cpu is not the last sibling to be onlined this test
is wrong as it only checks for all online siblings to be free. The
test should include the check for the number of siblings having
reached the scheduling granularity of cpupool0, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86/mce: correct the machine check vendor for Hygon
Pu Wen [Tue, 24 Mar 2020 09:56:22 +0000 (10:56 +0100)]
x86/mce: correct the machine check vendor for Hygon

Currently the xl dmesg output on Hygon platforms will be
"(XEN) CPU0: AMD Fam18h machine check reporting enabled",
which is misleading as AMD does not have family 18h (Hygon
negotiated with AMD to confirm that only Hygon has family 18h).

To correct this, add Hygon machine check type and vendor string.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoevtchn: change evtchn port type to evtchn_port_t
Yan Yankovskyi [Tue, 24 Mar 2020 09:50:38 +0000 (10:50 +0100)]
evtchn: change evtchn port type to evtchn_port_t

struct evtchn_set_priority uses uint32_t type for event channel port.
Replace the type with evtchn_port_t. Such change is also done in Linux.

Signed-off-by: Yan Yankovskyi <yyankovskyi@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/tboot: check return value of dmar_table allocation
Hongyan Xia [Tue, 24 Mar 2020 09:44:22 +0000 (10:44 +0100)]
x86/tboot: check return value of dmar_table allocation

The allocation can just return NULL. Return an error value early instead
of crashing later on.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: add -MP to CFLAGS along with -MMD
David Woodhouse [Tue, 24 Mar 2020 09:43:51 +0000 (10:43 +0100)]
build: add -MP to CFLAGS along with -MMD

This causes gcc (yes, and clang) to emit phony targets for each dependency.

This means that when a header file is deleted, the C files which *used*
to include it will no longer stop building with bogus out-of-date
dependencies like this:

  make[5]: *** No rule to make target
  '/home/dwmw2/git/xen/xen/include/asm/hvm/svm/amd-iommu-proto.h',
  needed by 'p2m.o'. Stop.

Based on -MP post-dating -MP by many years it is assumed that the
behavior of -MP isn't the defualt just out of extreme caution. We're
sufficiently convinced that there are no undue side effects of this.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: Fix xl shutdown for HVM without PV drivers
Olaf Hering [Wed, 18 Mar 2020 16:51:51 +0000 (17:51 +0100)]
libxl: Fix xl shutdown for HVM without PV drivers

A return value of zero means no PV drivers. Restore a hunk which was removed.

Fixes commit b183e180bce93037d3ef385a8c2338bbfb7f23d9

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/ucode: Rationalise startup and family/model checks
Andrew Cooper [Thu, 19 Mar 2020 13:54:19 +0000 (13:54 +0000)]
x86/ucode: Rationalise startup and family/model checks

Drop microcode_init_{intel,amd}(), export {intel,amd}_ucode_ops, and use a
switch statement in early_microcode_init() rather than probing each vendor in
turn.  This allows the microcode_ops pointer to become local to core.c.

As there are no external users of microcode_ops, there is no need for
collect_cpu_info() to implement sanity checks.  Move applicable checks to
early_microcode_init() so they are performed once, rather than repeatedly.

The Intel logic guarding the read of MSR_PLATFORM_ID is contrary to the SDM,
which states that the MSR has been architectural since the Pentium Pro
(06-01-xx), and lists no family/model restrictions in the pseudo-code for
microcode loading.  Either way, Xen's 64bit-only nature already makes this
check redundant.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode: Move interface from processor.h to microcode.h
Andrew Cooper [Wed, 18 Mar 2020 20:18:21 +0000 (20:18 +0000)]
x86/ucode: Move interface from processor.h to microcode.h

This reduces the complexity of processor.h, particularly the need to include
public/xen.h.  Substitute processor.h includes for microcode.h in some
sources, and add microcode.h includes in others.

Only 4 of the function declarations are actually called externally.  Move the
vendor init declarations to private.h

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode: Move microcode into its own directory
Andrew Cooper [Wed, 18 Mar 2020 20:02:34 +0000 (20:02 +0000)]
x86/ucode: Move microcode into its own directory

Split the existing asm/microcode.h in half, keeping the per-cpu cpu_sig
available to external users, and moving everything else into private.h

Take the opportunity to trim and clean up the include lists for all 3 source
files, all of which include rather more than necessary.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode: Remove declarations for non-external functions
Andrew Cooper [Wed, 18 Mar 2020 21:34:20 +0000 (21:34 +0000)]
x86/ucode: Remove declarations for non-external functions

Neither microcode_free_patch() nor early_microcode_update_cpu() have external
callers.  Make them static.

early_microcode_update_cpu()'s sole caller is following a use of
microcode_ops, making the error path dead.  Drop it as well.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibxl: make creation of xenstore 'suspend event channel' node optional...
Paul Durrant [Thu, 19 Mar 2020 11:47:48 +0000 (11:47 +0000)]
libxl: make creation of xenstore 'suspend event channel' node optional...

... and, if it is not created, make the top level 'device' node in
xenstore writable by the guest instead.

The purpose and semantics of the suspend event channel node are explained
in xenstore-paths.pandoc [1]. It was originally introduced in xend by
commit 17636f47a474 "Teach xc_save to use event-channel-based domain
suspend if available.". Note that, because, the top-level frontend
'device' node was created writable by the guest in xend, there was no
need to explicitly create the 'suspend-event-channel' node as a writable
node.

However, libxl creates the 'device' node as read-only by the guest and so
explicit creation of the 'suspend-event-channel' node is necessary to make
it usable. This unfortunately has the side-effect of making some old
Windows PV drivers [2] cease to function. This is because they scan the top
level 'device' node, find the 'suspend' node and expect it to contain the
usual sub-nodes describing a PV frontend. When this is found not to be the
case, enumeration ceases and (because the 'suspend' node is observed before
the 'vbd' node) no system disk is enumerated. Windows will then crash with
bugcheck code 0x7B (missing system disk).

This patch adds a boolean 'xend_suspend_evtchn_compat' field into
libxl_create_info and a similarly named option in xl.cfg to set it.
If the value is true then the xenstore node is not created. Instead the
old xend behaviour of making top level device node writable by the guest is
re-instated. If the value is false (the default) then the current libxl
behaviour persists.

xenstore-paths.pandoc is also modified to say that the suspend event
channel node may not exist and, if it does not exist, then the guest may
create it. A note is also added concerning the writability of the top
level device node.

[1] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore-paths.pandoc;hb=HEAD#l177
[2] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/para-virtualized_windows_drivers_guide/sect-para-virtualized_windows_drivers_guide-installing_and_configuring_the_para_virtualized_drivers-installing_the_para_virtualized_drivers

NOTE: While adding the new LIBXL_HAVE_CREATEINFO_... definition into
      libxl.h, this patch corrects the previous stanza which erroneously
      implies libxl_domain_create_info is a function.

Signed-off-by: Paul Durrant <paul@xen.org>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: create domain 'error' node in xenstore
Paul Durrant [Thu, 19 Mar 2020 11:47:47 +0000 (11:47 +0000)]
libxl: create domain 'error' node in xenstore

Several PV drivers (both historically and currently [1]) report errors
by writing text into /local/domain/$DOMID/error. This patch creates the
node in libxl and makes it writable by the domain, and also adds some
text into xenstore-paths.pandoc to state what the node is for.

[1] https://xenbits.xen.org/gitweb/?p=pvdrivers/win/xenvif.git;a=blob;f=src/xenvif/frontend.c;hb=HEAD#l459

Signed-off-by: Paul Durrant <paul@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/mem_sharing: move mem_sharing_domain declaration
Tamas K Lengyel [Wed, 18 Mar 2020 15:31:06 +0000 (09:31 -0600)]
x86/mem_sharing: move mem_sharing_domain declaration

Due to recent reshuffling of header include paths mem_sharing no longer
compiles. Fix it by moving mem_sharing_domain declaration to location it
is used in.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/shim: fix ballooning up the guest
Igor Druzhinin [Wed, 18 Mar 2020 11:55:54 +0000 (12:55 +0100)]
x86/shim: fix ballooning up the guest

args.preempted is meaningless here as it doesn't signal whether the
hypercall was preempted before. Use start_extent instead which is
correct (as long as the hypercall was invoked in a "normal" way).

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agolibfdt: fix undefined behaviour in _fdt_splice()
Jan Beulich [Tue, 17 Mar 2020 15:20:08 +0000 (16:20 +0100)]
libfdt: fix undefined behaviour in _fdt_splice()

Along the lines of commit d0b3ab0a0f46 ("libfdt: Fix undefined behaviour
in fdt_offset_ptr()"), _fdt_splice() similarly may not use pointer
arithmetic to do overflow checks.

[upstream commit 73d6e9ecb4179b510408bc526240f829262df361]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agolibfdt: Fix undefined behaviour in fdt_offset_ptr()
David Gibson [Tue, 17 Mar 2020 15:18:57 +0000 (16:18 +0100)]
libfdt: Fix undefined behaviour in fdt_offset_ptr()

Using pointer arithmetic to generate a pointer outside a known object is,
technically, undefined behaviour in C.  Unfortunately, we were using that
in fdt_offset_ptr() to detect overflows.

To fix this we need to do our bounds / overflow checking on the offsets
before constructing pointers from them.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[upstream commit d0b3ab0a0f46ac929b4713da46f7fdcd893dd3bd]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agox86: reduce mce.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:18:08 +0000 (16:18 +0100)]
x86: reduce mce.h include dependencies

Drop the public header #include as not needed by the header itself. Add
one that was missing, and move all inside the inclusion guard.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce hvm.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:17:20 +0000 (16:17 +0100)]
x86/HVM: reduce hvm.h include dependencies

Drop #include-s not needed by the header itself, and add smaller scope
ones instead. Put the ones needed into whichever other files actually
need them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce io.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:16:34 +0000 (16:16 +0100)]
x86/HVM: reduce io.h include dependencies

Drop #include-s not needed by the header itself as well as one include
of the header which isn't needed. Put the one needed into the file
actually requiring it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce vlapic.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:15:49 +0000 (16:15 +0100)]
x86/HVM: reduce vlapic.h include dependencies

Drop #include-s not needed by the header itself.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce vioapic.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:14:57 +0000 (16:14 +0100)]
x86/HVM: reduce vioapic.h include dependencies

Drop an #include not needed by the header itself. While verifying the
header (now) builds standalone, I noticed an omission in a public header
which gets taken care of here as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce vpic.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:14:05 +0000 (16:14 +0100)]
x86/HVM: reduce vpic.h include dependencies

Drop an #include not needed by the header itself.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce vpt.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:13:20 +0000 (16:13 +0100)]
x86/HVM: reduce vpt.h include dependencies

Drop #include-s not needed by the header itself.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce vcpu.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:12:26 +0000 (16:12 +0100)]
x86/HVM: reduce vcpu.h include dependencies

Drop #include-s not needed by the header itself. Put the ones needed
into whichever other files actually need them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: reduce domain.h include dependencies
Jan Beulich [Tue, 17 Mar 2020 15:11:33 +0000 (16:11 +0100)]
x86/HVM: reduce domain.h include dependencies

Drop #include-s not needed by the header itself. Put the ones needed
into whichever other files actually need them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/vvmx: Fix deadlock with MSR bitmap merging
Andrew Cooper [Wed, 11 Mar 2020 18:22:37 +0000 (18:22 +0000)]
x86/vvmx: Fix deadlock with MSR bitmap merging

c/s c47984aabead "nvmx: implement support for MSR bitmaps" introduced a use of
map_domain_page() which may get used in the middle of context switch.

This is not safe, and causes Xen to deadlock on the mapcache lock:

  (XEN) Xen call trace:
  (XEN)    [<ffff82d08022d6ae>] R _spin_lock+0x34/0x5e
  (XEN)    [<ffff82d0803219d7>] F map_domain_page+0x250/0x527
  (XEN)    [<ffff82d080356332>] F do_page_fault+0x420/0x780
  (XEN)    [<ffff82d08038da3d>] F x86_64/entry.S#handle_exception_saved+0x68/0x94
  (XEN)    [<ffff82d08031729f>] F __find_next_zero_bit+0x28/0x69
  (XEN)    [<ffff82d080321a4d>] F map_domain_page+0x2c6/0x527
  (XEN)    [<ffff82d08029eeb2>] F nvmx_update_exec_control+0x1d7/0x323
  (XEN)    [<ffff82d080299f5a>] F vmx_update_cpu_exec_control+0x23/0x40
  (XEN)    [<ffff82d08029a3f7>] F arch/x86/hvm/vmx/vmx.c#vmx_ctxt_switch_from+0xb7/0x121
  (XEN)    [<ffff82d08031d796>] F arch/x86/domain.c#__context_switch+0x124/0x4a9
  (XEN)    [<ffff82d080320925>] F context_switch+0x154/0x62c
  (XEN)    [<ffff82d080252f3e>] F common/sched/core.c#sched_context_switch+0x16a/0x175
  (XEN)    [<ffff82d080253877>] F common/sched/core.c#schedule+0x2ad/0x2bc
  (XEN)    [<ffff82d08022cc97>] F common/softirq.c#__do_softirq+0xb7/0xc8
  (XEN)    [<ffff82d08022cd38>] F do_softirq+0x18/0x1a
  (XEN)    [<ffff82d0802a2fbb>] F vmx_asm_do_vmentry+0x2b/0x30

Convert the domheap page into being a xenheap page.

Fixes: c47984aabead - nvmx: implement support for MSR bitmaps
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/APIC: reduce rounding errors in calculations
Jan Beulich [Mon, 16 Mar 2020 16:32:41 +0000 (17:32 +0100)]
x86/APIC: reduce rounding errors in calculations

Dividing by HZ/10 just to subsequently multiply by HZ again in all uses
of the respective variable is pretty pointlessly introducing rounding
(really: truncation) errors. While transforming the respective
expressions it became apparent that "result" would be left unused except
for its use as function return value. As the sole caller of the function
doesn't look at the returned value, simply convert the function to have
"void" return type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/time: reduce rounding errors in calculations
Jan Beulich [Mon, 16 Mar 2020 16:31:35 +0000 (17:31 +0100)]
x86/time: reduce rounding errors in calculations

Plain (unsigned) integer division simply truncates the results. The
overall errors are smaller though if we use proper rounding. (Extend
this to the purely cosmetic aspect of time.c's freq_string(), which
before this change I've frequently observed to report e.g. NN.999MHz
HPET clock speeds.)

While adding the rounding logic, also switch to using an unsigned
constant for the other, original half of bus_cycle's calculation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agospinlocks: fix placement of preempt_[dis|en]able()
Juergen Gross [Mon, 16 Mar 2020 10:27:29 +0000 (11:27 +0100)]
spinlocks: fix placement of preempt_[dis|en]able()

In case Xen ever gains preemption support the spinlock coding's
placement of preempt_disable() and preempt_enable() should be outside
of the locked section.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorwlocks: call preempt_disable() when taking a rwlock
Juergen Gross [Mon, 16 Mar 2020 10:26:45 +0000 (11:26 +0100)]
rwlocks: call preempt_disable() when taking a rwlock

Similar to spinlocks preemption should be disabled while holding a
rwlock.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/APIC: adjust types and comments in calibrate_APIC_clock()
Jan Beulich [Mon, 16 Mar 2020 10:26:10 +0000 (11:26 +0100)]
x86/APIC: adjust types and comments in calibrate_APIC_clock()

First and foremost the comment talking about potential underflow being
taken care of by using signed long type variables was true only on
32-bit, which we've not been supporting for quite some time. Drop the
comment and change all involved types to unsigned. Take the opportunity
and also replace bus_cycle's fixed width type.

Additionally there's no point using an "arbitrary (but long enough)
timeout" here. Just use the maximum possible value; Linux does so too,
just as an additional data point.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agokconfig: expose all{yes,no}config targets
Jan Beulich [Mon, 16 Mar 2020 10:25:45 +0000 (11:25 +0100)]
kconfig: expose all{yes,no}config targets

Without having them at least at the xen/Makefile level they're (close
to?) inaccessible. As I'm uncertain about their utility at the top
level, I'm leaving it at that for now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoAMD/IOMMU: fix off-by-one in amd_iommu_get_paging_mode() callers
Jan Beulich [Mon, 16 Mar 2020 10:24:29 +0000 (11:24 +0100)]
AMD/IOMMU: fix off-by-one in amd_iommu_get_paging_mode() callers

amd_iommu_get_paging_mode() expects a count, not a "maximum possible"
value. Prior to b4f042236ae0 dropping the reference, the use of our mis-
named "max_page" in amd_iommu_domain_init() may have lead to such a
misunderstanding. In an attempt to avoid such confusion in the future,
rename the function's parameter and - while at it - convert it to an
inline function.

Also replace a literal 4 by an expression tying it to a wider use
constant, just like amd_iommu_quarantine_init() does.

Fixes: ea38867831da ("x86 / iommu: set up a scratch page in the quarantine domain")
Fixes: b4f042236ae0 ("AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agolibxl: fix cleanup bug in initiate_domain_create()
Paweł Marczewski [Fri, 13 Mar 2020 11:25:10 +0000 (11:25 +0000)]
libxl: fix cleanup bug in initiate_domain_create()

In case of errors, we immediately call domcreate_complete()
which cleans up the console_xswait object. Make sure it is initialized
before we start cleanup.

Signed-off-by: Paweł Marczewski <pawel@invisiblethingslab.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibfsimage: fix parentheses in macro parameters
Roger Pau Monne [Fri, 13 Mar 2020 08:45:58 +0000 (09:45 +0100)]
libfsimage: fix parentheses in macro parameters

VERIFY_DN_TYPE and VERIFY_OS_TYPE should use parentheses when
accessing the type parameter. Note that none of the current usages
require this, it's just done for correctness.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibfsimage: fix clang 10 build
Roger Pau Monne [Fri, 13 Mar 2020 08:45:57 +0000 (09:45 +0100)]
libfsimage: fix clang 10 build

clang complains with:

fsys_zfs.c:826:2: error: converting the enum constant to a boolean [-Werror,-Wint-in-bool-context]
        VERIFY_DN_TYPE(dn, DMU_OT_PLAIN_FILE_CONTENTS);
        ^
/wrkdirs/usr/ports/sysutils/xen-tools/work/xen-4.13.0/tools/libfsimage/zfs/../../../tools/libfsimage/zfs/fsys_zfs.h:74:11: note: expanded from macro 'VERIFY_DN_TYPE'
        if (type && (dnp)->dn_type != type) { \
                 ^
1 error generated.

Fix this by not forcing an implicit conversion of the enum into a
boolean and instead comparing with the 0 enumerator.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/helpers: xen-init-dom0: Mark clear_domid_history() static
Julien Grall [Thu, 12 Mar 2020 20:24:07 +0000 (20:24 +0000)]
tools/helpers: xen-init-dom0: Mark clear_domid_history() static

xen-init-dom0 is a standalone binary, so all the functions but the
main() should be static.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Cc: paul@xen.org
Acked-by: Wei Liu <wl@xen.org>
5 years agoscripts: Replace tabs in locking.sh
Jason Andryuk [Thu, 12 Mar 2020 14:54:16 +0000 (10:54 -0400)]
scripts: Replace tabs in locking.sh

Replace two stray tabs with spaces to make the file whitespace
consistent.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agorcu: fix rcu_lock_domain()
Juergen Gross [Wed, 11 Mar 2020 12:18:49 +0000 (13:18 +0100)]
rcu: fix rcu_lock_domain()

rcu_lock_domain() misuses the domain structure as rcu lock, which is
working only as long as rcu_read_lock() isn't evaluating the lock.

Fix that by adding a rcu lock to struct domain and use that for
rcu_lock_domain().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: use rcu softirq for forcing quiescent state
Juergen Gross [Wed, 11 Mar 2020 12:17:41 +0000 (13:17 +0100)]
rcu: use rcu softirq for forcing quiescent state

As rcu callbacks are processed in __do_softirq() there is no need to
use the scheduling softirq for forcing quiescent state. Any other
softirq would do the job and the scheduling one is the most expensive.

So use the already existing rcu softirq for that purpose. For telling
apart why the rcu softirq was raised add a flag for the current usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agomemaccess: reduce include dependencies
Jan Beulich [Tue, 10 Mar 2020 16:06:57 +0000 (17:06 +0100)]
memaccess: reduce include dependencies

The common header doesn't itself need to include public/vm_event.h nor
public/memory.h. Drop their inclusion. This requires using the non-
typedef names in two prototypes and an inline function; by not changing
the callers and function definitions at the same time it'll remain
certain that the build would fail if the typedef itself was changed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agox86 / p2m: replace page_list check in p2m_alloc_table...
Paul Durrant [Tue, 10 Mar 2020 16:06:09 +0000 (17:06 +0100)]
x86 / p2m: replace page_list check in p2m_alloc_table...

... with a check of domain_tot_pages().

The check of page_list prevents the prior allocation of PGC_extra pages,
whereas what the code is trying to verify is that the toolstack has not
already RAM for the domain.

Signed-off-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agovmevent: reduce include dependencies
Jan Beulich [Tue, 10 Mar 2020 14:38:25 +0000 (15:38 +0100)]
vmevent: reduce include dependencies

There's no need for virtually everything to include public/vm_event.h.
Move its inclusion out of sched.h. This requires using the non-typedef
name in p2m_mem_paging_resume()'s prototype; by not changing the
function definition at the same time it'll remain certain that the build
would fail if the typedef itself was changed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agoIOMMU: iommu_snoop is x86-only
Jan Beulich [Tue, 10 Mar 2020 14:37:30 +0000 (15:37 +0100)]
IOMMU: iommu_snoop is x86-only

In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor. Provide a #define for the opposite case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agoIOMMU: iommu_qinval is x86-only
Jan Beulich [Tue, 10 Mar 2020 14:36:45 +0000 (15:36 +0100)]
IOMMU: iommu_qinval is x86-only

In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agoIOMMU: iommu_igfx is x86-only
Jan Beulich [Tue, 10 Mar 2020 14:35:57 +0000 (15:35 +0100)]
IOMMU: iommu_igfx is x86-only

In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>