]> xenbits.xensource.com Git - people/hx242/xen.git/log
people/hx242/xen.git
4 years agotools: Commit flex (2.6.4) & bison (3.3.2) output from Debian buster staging2
Ian Jackson [Fri, 12 Jun 2020 15:03:25 +0000 (16:03 +0100)]
tools: Commit flex (2.6.4) & bison (3.3.2) output from Debian buster

These files are in tree so that people can build (including from git)
without needing less-than-a-decade-old flex and bison.

We should update them periodically.  Debian buster has been Debian
stable for a while.  Our CI is running buster.

There should be no significant functional change; it's possible that
there are bugfixes but I have not reviewed the changes.  I *have*
checked that the flex I am using has the fix for CVE-2016-6354.

CC: Paul Durrant <paul@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoTemp
Hongyan Xia [Tue, 23 Jun 2020 16:47:49 +0000 (17:47 +0100)]
Temp

4 years agotools: Commit autoconf output from Debian buster
Ian Jackson [Fri, 12 Jun 2020 14:31:06 +0000 (15:31 +0100)]
tools: Commit autoconf output from Debian buster

These files are in tree so that people can build (including from git)
without needing recent autotools.

We should update them periodically.  Debian buster has been Debian
stable fopr a while.  Our CI is running buster.

There should be no significant functional change; it's possible that
there are bugfixes to the configure scripts but I have not reviewed
them.

These files were last changed in
  83c845033dc8bb3a35ae245effb7832b6823174a
  libxl: use vchan for QMP access with Linux stubdomain
where a new feature was added.  However, that commit contains a lot of
extraneous noise in configure compared to its parent.

Compared to 83c845033dc8bb3a35ae245effb7832b6823174a~, this commit
restores those extraneous changes, leaving precisely the correct
changes.  So one way of looking at the changes we are making now, is
that we are undoing accidental changes to the autoconf output.

I used Debian's autoconf 2.69-11 on amd64.

CC: Wei Liu <wl@xen.org>
CC: Nick Rosbrook <rosbrookn@gmail.com>
Reported-by: Nick Rosbrook <rosbrookn@gmail.com>
CC: Paul Durrant <paul@xen.org>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE
Tamas K Lengyel [Fri, 19 Jun 2020 13:24:55 +0000 (15:24 +0200)]
x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE

While forking VMs running a small RTOS system (Zephyr) a Xen crash has been
observed due to a mm-lock order violation while copying the HVM CPU context
from the parent. This issue has been identified to be due to
hap_update_paging_modes first getting a lock on the gfn using get_gfn. This
call also creates a shared entry in the fork's memory map for the cr3 gfn. The
function later calls hap_update_cr3 while holding the paging_lock, which
results in the lock-order violation in vmx_load_pdptrs when it tries to unshare
the above entry when it grabs the page with the P2M_UNSHARE flag set.

Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was
unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure
the p2m is properly populated.

Note that the lock order violation is avoided because before the paging_lock is
taken a lookup is performed with P2M_ALLOC that forks the page, thus the second
lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep
P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it
which don't take the paging_lock and that have no previous lookup. Currently no
other code-path exists leading there with the paging_lock taken, thus no
further adjustments are necessary.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/hvm: check against VIOAPIC_LEVEL_TRIG in hvm_gsi_deassert
Roger Pau Monné [Fri, 19 Jun 2020 13:23:50 +0000 (15:23 +0200)]
x86/hvm: check against VIOAPIC_LEVEL_TRIG in hvm_gsi_deassert

In order to avoid relying on the specific values of
VIOAPIC_{LEVEL/EDGE}_TRIG.

No functional change.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agostubdom/vtpm: add extern to function declarations
Olaf Hering [Wed, 17 Jun 2020 06:08:41 +0000 (07:08 +0100)]
stubdom/vtpm: add extern to function declarations

Code compiled with gcc10 will not link properly due to multiple definition of the same function.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Samuel Thibault <samuel.thibaut@ens-lyon.org>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoxl: Allow shutdown wait for domain death
Jason Andryuk [Wed, 17 Jun 2020 02:36:42 +0000 (03:36 +0100)]
xl: Allow shutdown wait for domain death

`xl shutdown -w` waits for the first of either domain shutdown or death.
Shutdown is the halting of the guest operating system, and death is the
freeing of domain resources.

Allow specifying -w multiple times to wait for only domain death.  This
is useful in scripts so that all resources are free before the script
continues.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools/xen-ucode: return correct exit code on failed microcode update
Igor Druzhinin [Wed, 17 Jun 2020 02:19:13 +0000 (03:19 +0100)]
tools/xen-ucode: return correct exit code on failed microcode update

Otherwise it's difficult to know if operation failed inside the automation.

While at it, also switch to returning 1 and 2 instead of errno to avoid
incompatibilies between errno and special exit code numbers.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Igor Druzhinin <igor.druzhinin@citrix.com>
4 years agox86/spec-ctrl: Hide RDRAND by default on IvyBridge client
Andrew Cooper [Fri, 12 Jun 2020 12:39:13 +0000 (13:39 +0100)]
x86/spec-ctrl: Hide RDRAND by default on IvyBridge client

To combat the absence of mitigating microcode, arrange to hide RDRAND by
default on IvyBridge client hardware.

Adjust the default feature derivation to hide RDRAND on IvyBridge client
parts, unless `cpuid=rdrand` is explicitly provided.

Adjust the restore path in xc_cpuid_apply_policy() to not hide RDRAND from VMs
which migrated from pre-4.14.

In all cases, individual guests can continue using RDRAND if explicitly
enabled in their config files.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/cpuid: Introduce missing feature adjustment in calculate_pv_def_policy()
Andrew Cooper [Mon, 15 Jun 2020 12:42:11 +0000 (13:42 +0100)]
x86/cpuid: Introduce missing feature adjustment in calculate_pv_def_policy()

This was an accidental asymmetry with the HVM side.

No change in behaviour at this point.

Fixes: 83b387382 ("x86/cpuid: Introduce and use default CPUID policies")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/hvm: Disable MPX by default
Andrew Cooper [Mon, 24 Feb 2020 17:15:56 +0000 (17:15 +0000)]
x86/hvm: Disable MPX by default

Memory Protection eXtension support has been dropped from GCC and Linux, and
will be dropped from future Intel CPUs.

With all other default/max pieces in place, move MPX from default to max.
This means that VMs won't be offered it by default, but can explicitly opt
into using it via cpuid="host,mpx=1" in their vm.cfg file.

The difference as visible to the guest is:

  diff --git a/default b/mpx
  index 0e91765d6b..c8c33cd584 100644
  --- a/default
  +++ b/mpx
  @@ -13,15 +13,17 @@ Native cpuid:
     00000004:00000004 -> 00000000:00000000:00000000:00000000
     00000005:ffffffff -> 00000000:00000000:00000000:00000000
     00000006:ffffffff -> 00000000:00000000:00000000:00000000
  -  00000007:00000000 -> 00000000:009c2fbb:00000000:9c000400
  +  00000007:00000000 -> 00000000:009c6fbb:00000000:9c000400
     00000008:ffffffff -> 00000000:00000000:00000000:00000000
     00000009:ffffffff -> 00000000:00000000:00000000:00000000
     0000000a:ffffffff -> 00000000:00000000:00000000:00000000
     0000000b:ffffffff -> 00000000:00000000:00000000:00000000
     0000000c:ffffffff -> 00000000:00000000:00000000:00000000
  -  0000000d:00000000 -> 00000007:00000240:00000340:00000000
  +  0000000d:00000000 -> 0000001f:00000240:00000440:00000000
     0000000d:00000001 -> 0000000f:00000240:00000000:00000000
     0000000d:00000002 -> 00000100:00000240:00000000:00000000
  +  0000000d:00000003 -> 00000040:000003c0:00000000:00000000
  +  0000000d:00000004 -> 00000040:00000400:00000000:00000000
     40000000:ffffffff -> 40000005:566e6558:65584d4d:4d4d566e
     40000001:ffffffff -> 0004000e:00000000:00000000:00000000
     40000002:ffffffff -> 00000001:40000000:00000000:00000000

Adjust the legacy restore path in libxc to cope safely with pre-4.14 VMs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/gen-cpuid: Distinguish default vs max in feature annotations
Andrew Cooper [Tue, 25 Feb 2020 15:33:31 +0000 (15:33 +0000)]
x86/gen-cpuid: Distinguish default vs max in feature annotations

The toolstack logic can now correctly distinguish a clean boot from a
migrate/restore.

Allow lowercase a/s/h to be used to annotate a non-default feature.

Due to the emulator work prepared earlier in 4.14, this now allows VMs to
explicity opt in to the TSXLDTRK, MOVDIR{I,64B} and SERIALIZE instructions via
their xl.cfg file, rather than getting them as a matter of default.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools/libx[cl]: Plumb bool restore down into xc_cpuid_apply_policy()
Andrew Cooper [Fri, 12 Jun 2020 13:07:10 +0000 (14:07 +0100)]
tools/libx[cl]: Plumb bool restore down into xc_cpuid_apply_policy()

In order to safely disable some features by default, without breaking
migration from 4.13 or older, the CPUID logic needs to distinguish the two
cases.

Plumb a restore boolean down from the two callers of libxl__cpuid_legacy() all
the way down into xc_cpuid_apply_policy().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools/libx[cl]: Merge xc_cpuid_set() into xc_cpuid_apply_policy()
Andrew Cooper [Fri, 12 Jun 2020 13:07:10 +0000 (14:07 +0100)]
tools/libx[cl]: Merge xc_cpuid_set() into xc_cpuid_apply_policy()

This reduces the number of CPUID handling entry-points to just one.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools/libx[cl]: Move processing loop down into xc_cpuid_set()
Andrew Cooper [Fri, 12 Jun 2020 13:07:10 +0000 (14:07 +0100)]
tools/libx[cl]: Move processing loop down into xc_cpuid_set()

Currently, libxl__cpuid_legacy() passes each element of the policy list to
xc_cpuid_set() individually.  This is wasteful both in terms of the number of
hypercalls made, and the quantity of repeated merging/auditing work performed
by Xen.

Move the loop processing down into xc_cpuid_set(), which allows us to do one
set of hypercalls, rather than one per list entry.

In xc_cpuid_set(), obtain the full host, guest max and current policies to
begin with, and loop over the xend array, processing one leaf at a time.
Replace the linear search with a binary search, seeing as the serialised
leaves are sorted.

No change in behaviour from the guests point of view.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotests/cpu-policy: Confirm that CPUID serialisation is sorted
Andrew Cooper [Fri, 12 Jun 2020 15:48:02 +0000 (16:48 +0100)]
tests/cpu-policy: Confirm that CPUID serialisation is sorted

The existing x86_cpuid_copy_to_buffer() does produce sorted results, and we're
about to start relying on this.  Extend the unit tests.

As test_cpuid_serialise_success() is a fairly limited set of synthetic
examples right now, introduce test_cpuid_current() to operate on the full
policy for the current CPU.

Tweak the fail() macro to allow for simplified control flow.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools/libx[cl]: Introduce struct xc_xend_cpuid for xc_cpuid_set()
Andrew Cooper [Fri, 12 Jun 2020 13:05:44 +0000 (14:05 +0100)]
tools/libx[cl]: Introduce struct xc_xend_cpuid for xc_cpuid_set()

In order to combine the functionality of xc_cpuid_set() with
xc_cpuid_apply_policy(), arrange to pass the data in a single contained
struct, rather than two arrays.

libxl__cpuid_policy is the ideal structure to use, but that would introduce a
reverse dependency between libxc and libxl.  Introduce xc_xend_cpuid (with a
transparent union to provide more useful names for the inputs), and use this
structure in libxl.

The public API has libxl_cpuid_policy as an opaque type referencing
libxl__cpuid_policy.  Drop the inappropriate comment about its internals, and
use xc_xend_cpuid as a differently named opaque backing object.  Users of both
libxl and libxc are not permitted to look at the internals.

No change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/boot: use BASEDIR for include path
Bertrand Marquis [Tue, 16 Jun 2020 08:31:26 +0000 (10:31 +0200)]
x86/boot: use BASEDIR for include path

Use $(BASEDIR)/include instead of $(XEN_ROOT)/xen/include for the
include path to be coherent with the rest of the Makefiles.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agolibacpi: widen TPM detection
Jason Andryuk [Tue, 16 Jun 2020 08:31:08 +0000 (10:31 +0200)]
libacpi: widen TPM detection

The hardcoded tpm_signature is too restrictive to detect many TPMs.  For
instance, it doesn't accept a QEMU emulated TPM (VID 0x1014 DID 0x0001).
Make the TPM detection match that in rombios which accepts a wider
range.

With this change, the TPM's TCPA ACPI table is generated and the guest
OS can automatically load the tpm_tis driver.  It also allows seabios to
detect and use the TPM.  However, seabios skips some TPM initialization
when running under Xen, so it will not populate any PCRs unless modified
to run the initialization under Xen.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agolibxc: xc_memshr_fork with interrupts blocked
Tamas K Lengyel [Tue, 16 Jun 2020 08:30:48 +0000 (10:30 +0200)]
libxc: xc_memshr_fork with interrupts blocked

Toolstack side for creating forks with interrupt injection blocked.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/mem_sharing: block interrupt injection for forks
Tamas K Lengyel [Tue, 16 Jun 2020 08:29:16 +0000 (10:29 +0200)]
x86/mem_sharing: block interrupt injection for forks

When running VM forks without device models (QEMU), it may
be undesirable for Xen to inject interrupts. When creating such forks from
Windows VMs we have observed the kernel trying to process interrupts
immediately after the fork is executed. However without QEMU running such
interrupt handling may not be possible because it may attempt to interact with
devices that are not emulated by a backend. In the best case scenario such
interrupt handling would only present a detour in the VM forks' execution
flow, but in the worst case as we actually observed can completely stall it.
By disabling interrupt injection a fuzzer can exercise the target code without
interference. For other use-cases this option probably doesn't make sense,
that's why this is not enabled by default.

Forks & memory sharing are only available on Intel CPUs so this only applies
to vmx. Note that this is part of the experimental VM forking feature that's
completely disabled by default and can only be enabled by using
XEN_CONFIG_EXPERT during compile time.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agogolang/xenlight: sort cases in switch statement
Nick Rosbrook [Mon, 15 Jun 2020 15:39:42 +0000 (11:39 -0400)]
golang/xenlight: sort cases in switch statement

The xenlight_golang_union_from_C function iterates over a dict to
construct a switch statement that marshals a C keyed union into a Go
type. Because python does not guarantee dict ordering across all
versions, this can result in the switch statement being generated in a
different order depending on the version of python used. For example,
running gengotypes.py with python2.7 and python3.6 will yield different
orderings.

Iterate over sorted(cases.items()) rather than cases.items() to fix
this.

This patch changes the ordering from what was previously checked-in, but
running gengotypes.py with different versions of python will now yield
the same result.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools: check go compiler version if present
Nick Rosbrook [Fri, 12 Jun 2020 14:31:02 +0000 (15:31 +0100)]
tools: check go compiler version if present

Currently, no minimum go compiler version is required by the configure
scripts. However, the go bindings actually will not build with some
older versions of go. Add a check for a minimum go version of 1.11.1 in
accordance with tools/golang/xenlight/go.mod.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools/libxc: Drop config_transformed parameter from xc_cpuid_set()
Andrew Cooper [Fri, 12 Jun 2020 10:55:19 +0000 (11:55 +0100)]
tools/libxc: Drop config_transformed parameter from xc_cpuid_set()

libxl is now the sole caller of xc_cpuid_set().  The config_transformed output
is ignored, and this patch trivially highlights the resulting memory leak.

"transformed" config is now properly forwarded on migrate as part of the
general VM state, so delete the transformation logic completely, rather than
trying to adjust just libxl to avoid leaking memory.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agox86/passthrough: introduce a flag for GSIs not requiring an EOI or unmask
Roger Pau Monne [Wed, 10 Jun 2020 14:29:23 +0000 (16:29 +0200)]
x86/passthrough: introduce a flag for GSIs not requiring an EOI or unmask

There's no need to setup a timer for GSIs that are edge triggered,
since those don't require any EIO or unmask, and hence couldn't block
other interrupts.

Note this is only used by PVH dom0, that can setup the passthrough of
edge triggered interrupts from the vIO-APIC. One example of such kind
of interrupt that can be used by a PVH dom0 would be the RTC timer.

While there introduce an out label to do the unlock and reduce code
duplication.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/passthrough: do not assert edge triggered GSIs for PVH dom0
Roger Pau Monne [Wed, 10 Jun 2020 14:29:22 +0000 (16:29 +0200)]
x86/passthrough: do not assert edge triggered GSIs for PVH dom0

Edge triggered interrupts do not assert the line, so the handling done
in Xen should also avoid asserting it. Asserting the line prevents
further edge triggered interrupts on the same vIO-APIC pin from being
delivered, since the line is not de-asserted.

One case of such kind of interrupt is the RTC timer, which is edge
triggered and available to a PVH dom0. Note this should not affect
domUs, as it only modifies the behavior of IDENTITY_GSI kind of passed
through interrupts.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoCHANGELOG: add revised kdd handshake (supporting Windows 7, 8, and 10)
Paul Durrant [Tue, 9 Jun 2020 16:29:22 +0000 (17:29 +0100)]
CHANGELOG: add revised kdd handshake (supporting Windows 7, 8, and 10)

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
4 years agoCHANGELOG: add 'domid_policy' and domid preservation on migrate
Paul Durrant [Tue, 9 Jun 2020 16:29:21 +0000 (17:29 +0100)]
CHANGELOG: add 'domid_policy' and domid preservation on migrate

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
4 years agodocs: Minor build improvements
Andrew Cooper [Mon, 8 Jun 2020 17:12:44 +0000 (18:12 +0100)]
docs: Minor build improvements

Don't use "set -x" for the figs rule.  It doesn't take effect in the recursive
make environment.

Turn the HTML manpage comments into makefile comments, not shell comments.
This saves 3x shell invocations per manpage.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools/libxengnttab: correct size of allocated memory
Juergen Gross [Wed, 20 May 2020 08:35:01 +0000 (10:35 +0200)]
tools/libxengnttab: correct size of allocated memory

The size of the memory allocated for the IOCTL_GNTDEV_MAP_GRANT_REF
ioctl() parameters is calculated wrong, which results in too much
memory allocated.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/spec-ctrl: Update docs with SRBDS workaround
Andrew Cooper [Wed, 10 Jun 2020 17:57:00 +0000 (18:57 +0100)]
x86/spec-ctrl: Update docs with SRBDS workaround

RDRAND/RDSEED can be hidden using cpuid= to mitigate SRBDS if microcode
isn't available.

This is part of XSA-320 / CVE-2020-0543.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/hypfs: fix loglvl parameter setting
Juergen Gross [Tue, 9 Jun 2020 15:45:46 +0000 (17:45 +0200)]
xen/hypfs: fix loglvl parameter setting

Writing the runtime parameters loglvl or guest_loglvl omits setting the
new length of the resulting parameter value.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Paul Durran <paul@xen.org>
4 years agox86/spec-ctrl: Mitigate the Special Register Buffer Data Sampling sidechannel
Andrew Cooper [Wed, 8 Jan 2020 19:47:46 +0000 (19:47 +0000)]
x86/spec-ctrl: Mitigate the Special Register Buffer Data Sampling sidechannel

See patch documentation and comments.

This is part of XSA-320 / CVE-2020-0543

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/spec-ctrl: CPUID/MSR definitions for Special Register Buffer Data Sampling
Andrew Cooper [Wed, 8 Jan 2020 19:47:46 +0000 (19:47 +0000)]
x86/spec-ctrl: CPUID/MSR definitions for Special Register Buffer Data Sampling

This is part of XSA-320 / CVE-2020-0543

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools: fix setting of errno in xenhypfs_read_raw()
Juergen Gross [Tue, 9 Jun 2020 14:48:50 +0000 (16:48 +0200)]
tools: fix setting of errno in xenhypfs_read_raw()

Setting of errno is wrong in xenhypfs_read_raw(), fix it.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Fixes: 86234eafb9529 ("libs: add libxenhypfs")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools: fix error path of xenhypfs_open()
Juergen Gross [Tue, 9 Jun 2020 14:48:49 +0000 (16:48 +0200)]
tools: fix error path of xenhypfs_open()

In case of an error in xenhypfs_open() the error path will cause a
segmentation fault due to a wrong sequence of closing calls.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: 86234eafb9529 ("libs: add libxenhypfs")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Acked-by: Wei Liu <wl@xen.org>
4 years agodocs-parse-support-md: Cope with buster's pandoc
Ian Jackson [Tue, 9 Jun 2020 11:26:36 +0000 (12:26 +0100)]
docs-parse-support-md: Cope with buster's pandoc

Provide the implementation for newer pandoc json.

I have done an adhoc test and this now works on both buster and
stretch and seems to produce the expected support matrix when run
using the example rune (which processes unstable and 4.11).

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agodocs-parse-support-md: Prepare for coping with pandoc versions
Ian Jackson [Tue, 9 Jun 2020 11:21:48 +0000 (12:21 +0100)]
docs-parse-support-md: Prepare for coping with pandoc versions

Different pandoc versions generate, and expect, a different toplevel
structure for their json output and inpout.  Newer pandoc's toplevel
is a hash.  We are going to want to support this.  We can tell what
kind of output we should produce by looking at the input we got (which
itself came from pandoc).  So:

 * Make space for code to read toplevel objects which are not arrays.
   Currently this code is absent and we just die explicitly (rather
   than dying because we tried to use a hashref as an array ref).

 * Move generation of the toplevel json structure out of
   pandoc2html_inline, and abstract it away through a subref which is
   set up when we read the input file.

This is just prep work.  No functional change other than a change to
an error message.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoioreq: handle pending emulation racing with ioreq server destruction
Paul Durrant [Tue, 9 Jun 2020 10:56:24 +0000 (12:56 +0200)]
ioreq: handle pending emulation racing with ioreq server destruction

When an emulation request is initiated in hvm_send_ioreq() the guest vcpu is
blocked on an event channel until that request is completed. If, however,
the emulator is killed whilst that emulation is pending then the ioreq
server may be destroyed. Thus when the vcpu is awoken the code in
handle_hvm_io_completion() will find no pending request to wait for, but will
leave the internal vcpu io_req.state set to IOREQ_READY and the vcpu shutdown
deferall flag in place (because hvm_io_assist() will never be called). The
emulation request is then completed anyway. This means that any subsequent call
to hvmemul_do_io() will find an unexpected value in io_req.state and will
return X86EMUL_UNHANDLEABLE, which in some cases will result in continuous
re-tries.

This patch fixes the issue by moving the setting of io_req.state and clearing
of shutdown deferral (as will as MSI-X write completion) out of hvm_io_assist()
and directly into handle_hvm_io_completion().

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/Intel: insert Ice Lake and Comet Lake model numbers
Jan Beulich [Tue, 9 Jun 2020 10:55:53 +0000 (12:55 +0200)]
x86/Intel: insert Ice Lake and Comet Lake model numbers

Both match prior generation processors as far as LBR and C-state MSRs
go (SDM rev 072) as well as applicability of the if_pschange_mc erratum
(recent spec updates).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/monitor: revert default behavior when monitoring register write events
Tamas K Lengyel [Tue, 9 Jun 2020 10:54:17 +0000 (12:54 +0200)]
x86/monitor: revert default behavior when monitoring register write events

For the last couple years we have received numerous reports from users of
monitor vm_events of spurious guest crashes when using events. In particular,
it has observed that the problem occurs when vm_events are being disabled. The
nature of the guest crash varied widely and has only occured occasionally. This
made debugging the issue particularly hard. We had discussions about this issue
even here on the xen-devel mailinglist with no luck figuring it out.

The bug has now been identified as a race-condition between register event
handling and disabling the monitor vm_event interface. The default behavior
regarding emulation of register write events is changed so that they get
postponed until the corresponding vm_event handler decides whether to allow such
write to take place. Unfortunately this can only be implemented by performing the
deny/allow step when the vCPU gets scheduled.

Due to that postponed emulation of the event if the user decides to pause the
VM in the vm_event handler and then disable events, the entire emulation step
is skipped the next time the vCPU is resumed. Even if the user doesn't pause
during the vm_event handling but exits immediately and disables vm_event, the
situation becomes racey as disabling vm_event may succeed before the guest's
vCPUs get scheduled with the pending emulation task. This has been particularly
the case with VMS that have several vCPUs as after the VM is unpaused it may
actually take a long time before all vCPUs get scheduled.

In this patch we are reverting the default behavior to always perform emulation
of register write events when the event occurs. To postpone them can be turned
on as an option. In that case the user of the interface still has to take care
of only disabling the interface when its safe as it remains buggy.

Fixes: 96760e2fba10 ('vm_event: deny register writes if refused by vm_event
reply').

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Roger Pau Monné <rogerpau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/rtc: provide mediated access to RTC for PVH dom0
Roger Pau Monné [Mon, 8 Jun 2020 16:13:53 +0000 (18:13 +0200)]
x86/rtc: provide mediated access to RTC for PVH dom0

Mediated access to the RTC was provided for PVHv1 dom0 using the PV
code paths (guest_io_{write/read}), but those accesses where never
implemented for PVHv2 dom0. This patch provides such mediated accesses
to the RTC for PVH dom0, just like it's provided for a classic PV
dom0.

Pull out some of the RTC logic from guest_io_{read/write} into
specific helpers that can be used by both PV and HVM guests. The
setup of the handlers for PVH is done in rtc_init, which is already
used to initialize the fully emulated RTC.

Without this a Linux PVH dom0 will read garbage when trying to access
the RTC, and one vCPU will be constantly looping in
rtc_timer_do_work.

Note that such issue doesn't happen on domUs because the ACPI
NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
accesses are not emulated but rather forwarded to the physical
hardware.

No functional change expected for classic PV dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoxen/arm: mm: Access a PT entry before the table is unmapped
Julien Grall [Sun, 7 Jun 2020 15:51:54 +0000 (16:51 +0100)]
xen/arm: mm: Access a PT entry before the table is unmapped

xen_pt_next_level() will retrieve the MFN from the entry right after the
page-table has been unmapped.

After calling xen_unmap_table(), there is no guarantee the mapping will
still be valid. Depending on the implementation, this may result to a
data abort in Xen.

Re-order the code to retrieve the MFN before the table is unmapped.

Fixes: 53abb9a1dcd9 ("xen/arm: mm: Rework Xen page-tables walk during update")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agogolang/xenlight: remove call to go fmt in gengotypes.py
Nick Rosbrook [Mon, 8 Jun 2020 16:10:39 +0000 (17:10 +0100)]
golang/xenlight: remove call to go fmt in gengotypes.py

Since the golang bindings are now set to be re-generated whenever a
change is made to tools/libxl/libxl_types.idl, the call to go fmt in
gengotypes.py results in a dirty git tree for users without go
installed.

As an immediate fix, just remove the call to go fmt from gengotypes.py.
While here, make sure the DO NOT EDIT comment and package declaration
remain formatted correctly. All other generated code is left
un-formatted for now.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Remove trailing whitespace.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
4 years agoMerge remote-tracking branch 'origin/staging' into staging
Ian Jackson [Mon, 8 Jun 2020 15:26:57 +0000 (16:26 +0100)]
Merge remote-tracking branch 'origin/staging' into staging

4 years agodocs/support-matrix: unbreak docs rendering
Andrew Cooper [Thu, 4 Jun 2020 20:39:37 +0000 (21:39 +0100)]
docs/support-matrix: unbreak docs rendering

The cronjob which renders https://xenbits.xen.org/docs/ has been broken for a
while.  commitish_version() pulls an old version of xen/Makefile out of
history, and uses the xenversion rule.

Currently, this fails with:

  tmp.support-matrix.xen.make:130: scripts/Kbuild.include: No such file or directory

which is because the Makefile legitimately references Kbuild.include with a
relative rather than absolute path.

Rework support-matrix-generate to use sed to extract the major/minor version,
rather than expecting xen/Makefile to be usable in a different tree.

Fixes: 945e80a7301f ("docs: Provide support-matrix-generate, to generate a support matrix in HTML")
Backport: 4.11+
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoVersion changes for 4.14.0-rc
Ian Jackson [Mon, 8 Jun 2020 14:18:20 +0000 (15:18 +0100)]
Version changes for 4.14.0-rc

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoConfig.mk: Nail QEMU_UPSTREAM_REVISION MINIOS_UPSTREAM_REVISION
Ian Jackson [Mon, 8 Jun 2020 14:17:36 +0000 (15:17 +0100)]
Config.mk: Nail QEMU_UPSTREAM_REVISION MINIOS_UPSTREAM_REVISION

We freeze these during release prep, rather than tracking whatever
osstest passed.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agobuild: fix dependency tracking for preprocessed files
Jan Beulich [Mon, 8 Jun 2020 08:25:40 +0000 (10:25 +0200)]
build: fix dependency tracking for preprocessed files

While the issue is more general, I noticed that asm-macros.i not getting
re-generated as needed. This was due to its .*.d file mentioning
asm-macros.o instead of asm-macros.i. Use -MQ here as well, and while at
it also use -MQ to avoid the somewhat fragile sed-ary on the *.lds
dependency tracking files. While there, further avoid open-coding $(CPP)
and drop the bogus (Arm) / stale (x86) -Ui386.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/svm: do not try to handle recalc NPT faults immediately
Igor Druzhinin [Fri, 5 Jun 2020 15:12:11 +0000 (17:12 +0200)]
x86/svm: do not try to handle recalc NPT faults immediately

A recalculation NPT fault doesn't always require additional handling
in hvm_hap_nested_page_fault(), moreover in general case if there is no
explicit handling done there - the fault is wrongly considered fatal.

This covers a specific case of migration with vGPU assigned which
uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall:
at a moment log-dirty is enabled globally, recalculation is requested
for the whole guest memory including those mapped MMIO regions
which causes a page fault being raised at the first access to them;
but due to MMIO P2M type not having any explicit handling in
hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
SVM violation.

Instead of trying to be opportunistic - use safer approach and handle
P2M recalculation in a separate NPT fault by attempting to retry after
making the necessary adjustments. This is aligned with Intel behavior
where there are separate VMEXITs for recalculation and EPT violations
(faults) and only faults are handled in hvm_hap_nested_page_fault().
Do it by also unifying do_recalc return code with Intel implementation
where returning 1 means P2M was actually changed.

Since there was no case previously where p2m_pt_handle_deferred_changes()
could return a positive value - it's safe to replace ">= 0" with just "== 0"
in VMEXIT_NPF handler. finish_type_change() is also not affected by the
change as being able to deal with >0 return value of p2m->recalc from
EPT implementation.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agolibs/hypfs: use correct zlib name in pc file
Wei Liu [Fri, 5 Jun 2020 11:37:25 +0000 (12:37 +0100)]
libs/hypfs: use correct zlib name in pc file

Its name is "zlib" not "z".

Reported-by: Olaf Hering <olaf@aepfle.de>
Fixes: 86234eafb952 ("libs: add libxenhypfs")
Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/shim: Fix defconfig selection and trim the build further
Andrew Cooper [Wed, 3 Jun 2020 15:56:03 +0000 (16:56 +0100)]
x86/shim: Fix defconfig selection and trim the build further

Several options (TBOOT, XENOPROF, Scheduler) depend on EXPERT to be able to
deselect/configure.

Enabling EXPERT now causes the request of the Credit1 scheduler to be honoured
(rather than giving us Credit2), but take this opportunity to switch to Null,
as the previously problematic issues are now believed to be fixed.

Enabling EXPERT also allows XEN_SHSTK to be selected, and we don't want this
being built for shim.  We also don't want TRACEBUFFER or GDBSX either.

Take this oppotunity to swap the disable of HVM_FEP for a general disable of
HVM (likely to have wider impliciations in the future), and disable ARGO (will
necesserily need plumbing work to function in shim).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoconfig: disable hypervisor filesystem for pv-shim
Juergen Gross [Wed, 3 Jun 2020 11:28:52 +0000 (13:28 +0200)]
config: disable hypervisor filesystem for pv-shim

The pv-shim doesn't need the hypervisor filesystem, so disable it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agofix build with CONFIG_HYPFS_CONFIG enabled
Juergen Gross [Wed, 3 Jun 2020 11:28:07 +0000 (13:28 +0200)]
fix build with CONFIG_HYPFS_CONFIG enabled

Commit 58263ed7713e ("xen: add /buildinfo/config entry to hypervisor
filesystem") added a dependency to .config, but the hypervisor's build
config could be have another name via setting KCONFIG_CONFIG.

Fix that by using $(KCONFIG_CONFIG) instead. Additionally reference
the config file via $(XEN_ROOT) instead of a relative path.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agodocs/ucode: Extend runtime microcode loading documentation
Andrew Cooper [Mon, 1 Jun 2020 13:36:28 +0000 (14:36 +0100)]
docs/ucode: Extend runtime microcode loading documentation

Extend the disclaimer about runtime loading.  While we've done our best to
make the mechaism reliable, the safety of late loading does ultimately depend
on the contents of the blobs.

Extend the xen-ucode portion with examples of how to use it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
4 years agox86/ucode: Fix errors with start/end_update()
Andrew Cooper [Mon, 1 Jun 2020 14:37:20 +0000 (15:37 +0100)]
x86/ucode: Fix errors with start/end_update()

c/s 9267a439c "x86/ucode: Document the behaviour of the microcode_ops hooks"
identified several poor behaviours of the start_update()/end_update_percpu()
hooks.

AMD have subsequently confirmed that OSVW don't, and are not expected to,
change across a microcode load, rendering all of this complexity unecessary.

Instead of fixing up the logic to not leave the OSVW state reset in a number
of corner cases, delete the logic entirely.

This in turn allows for the removal of the poorly-named 'start_update'
parameter to microcode_update_one(), and for svm_host_osvw_{init,reset}() to
become static.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools: update configure
Wei Liu [Tue, 2 Jun 2020 17:32:02 +0000 (17:32 +0000)]
tools: update configure

Fixes: e181db8ba4e0 ("m4: use test instead of []")
Signed-off-by: Wei Liu <wl@xen.org>
4 years ago] m4: use test instead of []
Wei Liu [Tue, 2 Jun 2020 09:01:38 +0000 (10:01 +0100)]
] m4: use test instead of []

It is reported that [] was removed by autoconf, which caused the
following error:

  ./configure: line 4681: -z: command not found

Switch to test. That's what is used throughout our configure scripts.
Also put the variable expansion in quotes.

Signed-off-by: Wei Liu <wl@xen.org>
Reported-by: Bertrand Marquis <Bertrand.Marquis@arm.com>
Fixes: 8a6b1665d987 ("configure: also add EXTRA_PREFIX to {CPP/LD}FLAGS")
Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agotools: make libxenhypfs interface more future proof
Juergen Gross [Tue, 2 Jun 2020 06:00:21 +0000 (08:00 +0200)]
tools: make libxenhypfs interface more future proof

As compilers are free to choose the width of an enum they should be
avoided in stable interfaces when declaring a variable. So the
struct xenhypfs_dirent definition should be modified to have explicitly
sized members for type and encoding and the related enums should be
defined separately.

Additionally it is better to have a larger flags member in that struct
with the "writable" indicator occupying only a single bit. This will
make future additions easier.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agotools: check return value of asprintf() in xenhypfs
Juergen Gross [Tue, 2 Jun 2020 06:00:20 +0000 (08:00 +0200)]
tools: check return value of asprintf() in xenhypfs

asprintf() can fail, so check its return value. Additionally fix a
memory leak in xenhypfs.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoCHANGELOG: Add qemu-xen linux device model stubdomains
Jason Andryuk [Tue, 2 Jun 2020 12:03:56 +0000 (12:03 +0000)]
CHANGELOG: Add qemu-xen linux device model stubdomains

Add qemu-xen linux device model stubdomain.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Paul Durrant <paul@xen.org>
4 years agoautomation: implement (rootless) podman support in containerize
Dario Faggioli [Thu, 30 Apr 2020 18:27:39 +0000 (20:27 +0200)]
automation: implement (rootless) podman support in containerize

Right now only docker is supported, when using the containerize script
for building inside containers. Enable podman as well.

Note that podman can be use in rootless mode too, but for that to work
the files /etc/subuid and /etc/subgid must be properly configured.

For instance:

dario@localhost> cat /etc/subuid
dario:100000:65536

dario@localhost:> cat /etc/subgid
dario:100000:65536

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoautomation: openSUSE distro names helpers for containerize
Dario Faggioli [Tue, 2 Jun 2020 12:01:05 +0000 (12:01 +0000)]
automation: openSUSE distro names helpers for containerize

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoautomation: update openSUSE Tumbleweed building dependencies
Dario Faggioli [Thu, 30 Apr 2020 18:27:28 +0000 (20:27 +0200)]
automation: update openSUSE Tumbleweed building dependencies

We need python3 (and the respective -devel package), these days.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agolibxl: stop libxl_domain_info() consuming massive amounts of stack
Paul Durrant [Thu, 28 May 2020 15:13:30 +0000 (16:13 +0100)]
libxl: stop libxl_domain_info() consuming massive amounts of stack

Currently an array of 1024 xc_domaininfo_t is declared on stack. That alone
consumes ~112k. Since libxl_domain_info() creates a new gc this patch simply
uses it to allocate the array instead.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoINSTALL: remove TODO section
Olaf Hering [Fri, 29 May 2020 13:53:03 +0000 (15:53 +0200)]
INSTALL: remove TODO section

The default value '/' for DESTDIR is unusual, but does probably not hurt.

Fixes commit f2b40dababedcd8355bf3e85d00baf17f9827131
Fixes commit 8e986e5a61efeca92b9b268e77957d45d8316ee4

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agocompilers: always use _Static_assert with clang
Roger Pau Monné [Tue, 2 Jun 2020 11:39:02 +0000 (13:39 +0200)]
compilers: always use _Static_assert with clang

All versions of clang used by Xen support _Static_assert, so use it
unconditionally when building Xen with clang.

No functional change expected.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/cpu: fix build with clang 3.5
Roger Pau Monné [Tue, 2 Jun 2020 11:38:32 +0000 (13:38 +0200)]
x86/cpu: fix build with clang 3.5

Clang 3.5 complains with:

common.c:794:24: error: statement expression not allowed at file scope
                      i < ARRAY_SIZE(this_cpu(tss_page).ist_ssp); ++i )
                                     ^
/build/xen/include/asm/percpu.h:14:7: note: expanded from macro 'this_cpu'
    (*RELOC_HIDE(&per_cpu__##var, get_cpu_info()->per_cpu_offset))
      ^
/build/xen/include/xen/compiler.h:104:3: note: expanded from macro 'RELOC_HIDE'
  ({ unsigned long __ptr;                       \
  ^
/build/xen/include/xen/lib.h:68:69: note: expanded from macro 'ARRAY_SIZE'
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))
                                                                    ^
/build/xen/include/xen/compiler.h:85:57: note: expanded from macro '__must_be_array'
  BUILD_BUG_ON_ZERO(__builtin_types_compatible_p(typeof(a), typeof(&a[0])))
                                                        ^
/build/xen/include/xen/lib.h:39:57: note: expanded from macro 'BUILD_BUG_ON_ZERO'
#define BUILD_BUG_ON_ZERO(cond) sizeof(struct { int:-!!(cond); })
                                                        ^

Workaround this by defining the tss_page as a local variable. Adjust
other users of the per-cpu tss_page to also use the newly introduced
local variable.

No functional change expected.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agobuild32: don't discard .shstrtab in linker script
Roger Pau Monné [Tue, 2 Jun 2020 11:37:53 +0000 (13:37 +0200)]
build32: don't discard .shstrtab in linker script

LLVM linker doesn't support discarding .shstrtab, and complains with:

ld -melf_i386_fbsd -N -T build32.lds -o reloc.lnk reloc.o
ld: error: discarding .shstrtab section is not allowed

Add an explicit .shstrtab, .strtab and .symtab sections to the linker
script after the text section in order to make LLVM LD happy and match
the behavior of GNU LD.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/mm: do not attempt to convert _PAGE_GNTTAB to a boolean
Roger Pau Monné [Tue, 2 Jun 2020 11:36:41 +0000 (13:36 +0200)]
x86/mm: do not attempt to convert _PAGE_GNTTAB to a boolean

Clang 10 complains with:

mm.c:1239:10: error: converting the result of '<<' to a boolean always evaluates to true
      [-Werror,-Wtautological-constant-compare]
    if ( _PAGE_GNTTAB && (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
         ^
xen/include/asm/x86_64/page.h:161:25: note: expanded from macro '_PAGE_GNTTAB'
#define _PAGE_GNTTAB (1U<<22)
                        ^

Remove the conversion of _PAGE_GNTTAB to a boolean and instead use a
preprocessor conditional to check if _PAGE_GNTTAB is defined.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoxen/credit2: Fix build following c/s 8e2aa76dc (take 2)
Andrew Cooper [Sat, 30 May 2020 00:52:13 +0000 (01:52 +0100)]
xen/credit2: Fix build following c/s 8e2aa76dc (take 2)

OSSTest reports:

  credit2.c: In function 'cpu_runqueue_siblings_match':
  credit2.c:883:29: error: implicit declaration of function 'cpu_nr_siblings' [-Werror=implicit-function-declaration]
       unsigned int nr_sibls = cpu_nr_siblings(cpu);
                               ^~~~~~~~~~~~~~~
  credit2.c:883:5: error: nested extern declaration of 'cpu_nr_siblings' [-Werror=nested-externs]
       unsigned int nr_sibls = cpu_nr_siblings(cpu);
       ^~~~~~~~
  cc1: all warnings being treated as errors

For whatever reason, cpufeature.h's inclusion is conditional, and missing for
arm32.  Inlcude it explicitly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/CET: Fix build following c/s 43b98e7190
Andrew Cooper [Sat, 30 May 2020 00:41:26 +0000 (01:41 +0100)]
x86/CET: Fix build following c/s 43b98e7190

OSSTest reports:

  x86_64.S: Assembler messages:
  x86_64.S:57: Error: no such instruction: `setssbsy'
  /home/osstest/build.150510.build-amd64/xen/xen/Rules.mk:183: recipe for target 'head.o' failed
  make[4]: Leaving directory '/home/osstest/build.150510.build-amd64/xen/xen/arch/x86/boot'
  make[4]: *** [head.o] Error 1

All use of CET instructions, even those inside alternative blocks, needs to be
behind CONFIG_XEN_SHSTK, as it indicates suitable toolchain support.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/shstk: Activate Supervisor Shadow Stacks
Andrew Cooper [Wed, 22 Apr 2020 12:44:37 +0000 (13:44 +0100)]
x86/shstk: Activate Supervisor Shadow Stacks

With all other plumbing in place, activate shadow stacks when possible.

Note that CET shares the similar problems to SMEP/SMAP with Ring1 being
supervisor to the processor, and that the layout of the shadow stack differs
between an IRET to Ring 1 and Ring 3.  Therefore, we disable PV32 when CET is
enabled.  Compatibility can be maintained if necessary via PV-Shim.

The BSP needs to wait until alternatives have run (to avoid interaction with
CR0.WP), and after the first reset_stack_and_jump() to avoid having a pristine
shadow stack interact in problematic ways with an in-use regular stack.
Activate shadow stack in reinit_bsp_stack().

APs have all infrastructure set up by the booting CPU, so enable shadow stacks
before entering C.  Adjust the logic to call start_secondary rather than jump
to it, so stack traces make more sense.

The crash path needs to turn CET off to avoid interfering with the crash
kernel's environment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/S3: Save and restore Shadow Stack configuration
Andrew Cooper [Wed, 22 Apr 2020 12:44:37 +0000 (13:44 +0100)]
x86/S3: Save and restore Shadow Stack configuration

See code for details.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/entry: Adjust guest paths to be shadow stack compatible
Andrew Cooper [Fri, 24 Apr 2020 13:34:44 +0000 (14:34 +0100)]
x86/entry: Adjust guest paths to be shadow stack compatible

The SYSCALL/SYSENTER/SYSRET paths need to use {SET,CLR}SSBSY.  The IRET to
guest paths must not.  In the SYSRET path, re-position the mov which loads rip
into %rcx so we can use %rcx for CLRSSBSY, rather than spilling another
register to the stack.

While we can in principle detect shadow stack corruption and a failure to
clear the supervisor token busy bit in the SYSRET path (by inspecting the
carry flag following CLRSSBSY), we cannot detect similar problems for the IRET
path (IRET is specified not to fault in this case).

We will double fault at some point later, when next trying to enter Xen, due
to an already-set supervisor shadow stack busy bit.  As SYSRET is a uncommon
path anyway, avoid the added complexity for no appreciable gain.

The IST switch onto the primary stack is not great as we have an instruction
boundary with no shadow stack.  This is the least bad option available.

These paths are not used before shadow stacks are properly established, so can
use alternatives to avoid extra runtime CET detection logic.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/EFI: Avoid mapping EFI system memory as shadow stacks
Andrew Cooper [Fri, 29 May 2020 20:49:13 +0000 (21:49 +0100)]
x86/EFI: Avoid mapping EFI system memory as shadow stacks

Ensure the dirty bit is clear when creating read-only EFI mappings.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/alt: Adjust _alternative_instructions() to not create shadow stacks
Andrew Cooper [Fri, 1 May 2020 17:10:00 +0000 (18:10 +0100)]
x86/alt: Adjust _alternative_instructions() to not create shadow stacks

The current alternatives algorithm clears CR0.WP and writes into .text.  This
has a side effect of the mappings becoming shadow stacks once CET is active.

Adjust _alternative_instructions() to clean up after itself.  This involves
extending the set of bits modify_xen_mappings() to include Dirty (and Accessed
for good measure).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/extable: Adjust extable handling to be shadow stack compatible
Andrew Cooper [Thu, 30 Apr 2020 14:05:24 +0000 (15:05 +0100)]
x86/extable: Adjust extable handling to be shadow stack compatible

When adjusting an IRET frame to recover from a fault, and equivalent
adjustment needs making in the shadow IRET frame.

The adjustment in exception_with_ints_disabled() could in principle be an
alternative block rather than an ifdef, as the only two current users of
_PRE_EXTABLE() are IRET-to-guest instructions.  However, this is not a
fastpath, and this form is more robust to future changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/spec-ctrl: Adjust DO_OVERWRITE_RSB to be shadow stack compatible
Andrew Cooper [Fri, 24 Apr 2020 13:19:52 +0000 (14:19 +0100)]
x86/spec-ctrl: Adjust DO_OVERWRITE_RSB to be shadow stack compatible

The 32 calls need dropping from the shadow stack as well as the regular stack.
To shorten the code, we can use the 32bit forms of RDSSP/INCSSP, but need to
double up the input to INCSSP to counter the operand size based multiplier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/cpu: Adjust reset_stack_and_jump() to be shadow stack compatible
Andrew Cooper [Fri, 24 Apr 2020 13:38:02 +0000 (14:38 +0100)]
x86/cpu: Adjust reset_stack_and_jump() to be shadow stack compatible

We need to unwind up to the supervisor token.  See the comment for details.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/cpu: Adjust enable_nmis() to be shadow stack compatible
Andrew Cooper [Fri, 21 Feb 2020 17:56:57 +0000 (17:56 +0000)]
x86/cpu: Adjust enable_nmis() to be shadow stack compatible

When executing an IRET-to-self, the shadow stack must agree with the regular
stack.  We can't manipulate SSP directly, so have to fake a shadow IRET frame
by executing 3 CALLs, then editing the result to look correct.

This is not a fastpath, is called on the BSP long before CET can be set up,
and may be called on the crash path after CET is disabled.  Use the fact that
INCSSP is allocated from the hint nop space to construct a test for CET being
active which is safe on all processors.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/shstk: Create shadow stacks
Andrew Cooper [Thu, 23 Apr 2020 19:20:59 +0000 (20:20 +0100)]
x86/shstk: Create shadow stacks

Introduce HYPERVISOR_SHSTK pagetable constants, which are Read-Only + Dirty.
Use these in place of _PAGE_RW for memguard_guard_stack(), to create real
shadow stacks on capable hardware.

Supervisor shadow stacks need a token written at the top, which is most easily
done before making the frame read only.

Allocate the shadow IST stack block in struct tss_page.  It doesn't strictly
need to live here, but it is a convenient location (and XPTI-safe, for testing
purposes), and placing it ahead of the TSS doesn't risk colliding with a bad
IO Bitmap offset and turning into some IO port permissions.

Have load_system_tables() set up the shadow IST stack table when setting up
the regular IST in the TSS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/shstk: Rework the stack layout to support shadow stacks
Andrew Cooper [Thu, 23 Apr 2020 19:20:59 +0000 (20:20 +0100)]
x86/shstk: Rework the stack layout to support shadow stacks

We have two free pages in the current stack.  A useful property of shadow
stacks and regular stacks is that they act as each others guard pages as far
as OoB writes go.  As wild OoB stack reads aren't likely, we don't lose any
meaningful protection from using read-only guard pages in general (rather than
non-present guard pages), but result is far simpler for Xen as a whole by not
having a feature/mode dependent stack configuration.

Move the regular IST stacks up by one page, to allow their shadow stack page
to be in slot 0.  The primary shadow stack uses slot 5.

As the shadow IST stacks are only 1k large, shuffle the order of IST vectors
to have #DF numerically highest, so there is no chance of a shadow stack
overflow clobbering the supervisor token.

The XPTI code already breaks the MEMORY_GUARD abstraction for stacks by
forcing it to be in effect (i.e. guard page not present).  To avoid having too
many configurations, do away with the concept entirely, and unconditionally
map the pages in their read-only form.

A later change will turn these properly into shadow stacks.  Some of the
comments written here are the intended result, and will become true in the
subsequent change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/traps: Implement #CP handler and extend #PF for shadow stacks
Andrew Cooper [Fri, 21 Feb 2020 17:56:57 +0000 (17:56 +0000)]
x86/traps: Implement #CP handler and extend #PF for shadow stacks

For now, any #CP exception or shadow stack #PF indicate a bug in Xen, but
attempt to recover from #CP if taken in guest context.

This will of course have to change as part of introducing CET-SS support for
PV guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/shstk: Introduce Supervisor Shadow Stack support
Andrew Cooper [Fri, 21 Feb 2020 17:56:57 +0000 (17:56 +0000)]
x86/shstk: Introduce Supervisor Shadow Stack support

Introduce CONFIG_HAS_AS_CET_SS to determine whether CET Shadow Stack
instructions are supported in the assembler, and CONFIG_XEN_SHSTK as the main
build option.

Introduce cet={no-,}shstk to for a user to select whether or not to use shadow
stacks at runtime, and X86_FEATURE_XEN_SHSTK to determine Xen's overall
enablement of shadow stacks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/traps: Factor out extable_fixup() and make printing consistent
Andrew Cooper [Thu, 30 Apr 2020 14:05:24 +0000 (15:05 +0100)]
x86/traps: Factor out extable_fixup() and make printing consistent

UD faults never had any diagnostics printed, and the others were inconsistent.

Don't use dprintk() because identifying traps.c is actively unhelpful in the
message, as it is the location of the fixup, not the fault.  Use the new
vec_name() infrastructure, rather than leaving raw numbers for the log.

  (XEN) Running stub recovery selftests...
  (XEN) Fixup #UD[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
  (XEN) Fixup #GP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6
  (XEN) Fixup #SS[0000]: ffff82d07fffd040 [ffff82d07fffd040] -> ffff82d0403ac9d6
  (XEN) Fixup #BP[0000]: ffff82d07fffd041 [ffff82d07fffd041] -> ffff82d0403ac9d6

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/traps: Clean up printing in {do_reserved,fatal}_trap()
Andrew Cooper [Thu, 30 Apr 2020 14:05:24 +0000 (15:05 +0100)]
x86/traps: Clean up printing in {do_reserved,fatal}_trap()

For one, they render the vector in a different base.

Introduce X86_EXC_* constants and vec_name() to refer to exceptions by their
mnemonic, which starts bringing the code/diagnostics in line with the Intel
and AMD manuals.

Provide constants for every archtiecturally defined exception, even those not
implemented by Xen yet, as do_reserved_trap() is a catch-all handler.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/build: fix xen/tools/binfile
Juergen Gross [Fri, 29 May 2020 18:28:00 +0000 (20:28 +0200)]
xen/build: fix xen/tools/binfile

xen/tools/binfile contains a bash specific command (let). This leads
to build failures on systems not using bash as /bin/sh.

Replace "let SHIFT=$OPTIND-1" by "SHIFT=$((OPTIND-1))".

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoarm: Fix build following c/s 8e2aa76dc
Andrew Cooper [Fri, 29 May 2020 21:23:59 +0000 (22:23 +0100)]
arm: Fix build following c/s 8e2aa76dc

The ARM side of the cpu_nr_siblings() was missing a return type.  OSSTest
reports:

  /home/osstest/build.150502.build-arm64-xsm/xen/xen/include/asm/cpufeature.h:67:15:
  error: return type defaults to 'int' [-Werror=implicit-int]
   static inline cpu_nr_siblings(unsigned int)
                 ^~~~~~~~~~~~~~~

My local build test then reported:

  /local/xen.git/xen/include/asm/cpufeature.h: In function ‘cpu_nr_siblings’:
  /local/xen.git/xen/include/asm/cpufeature.h:67:1: error: parameter name omitted
   static inline int cpu_nr_siblings(unsigned int)
    ^

Fix it up to match its x86 counterpart.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/pv: remove unnecessary toggle_guest_pt() overhead
Jan Beulich [Tue, 5 May 2020 06:16:03 +0000 (08:16 +0200)]
x86/pv: remove unnecessary toggle_guest_pt() overhead

toggle_guest_pt() is called in pairs, to read guest kernel data
structures when emulating a guest userspace action. Hence this doesn't
modify cr3 from the guest's point of view, and therefore doesn't need
any resync on the exit-to-guest path. Therefore move the updating of
->pv_cr3 and ->root_pgt_changed into toggle_guest_mode(), since undoing
the changes during the second of these invocations wouldn't be a safe
thing to do.

While at it, add a comment ahead of toggle_guest_pt() to clarify its
intended usage.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen/build: introduce CLANG_FLAGS for testing other CFLAGS
Anthony PERARD [Fri, 29 May 2020 15:43:43 +0000 (16:43 +0100)]
xen/build: introduce CLANG_FLAGS for testing other CFLAGS

Commit 534519f0514f ("xen: Have Kconfig check $(CC)'s version")
introduced the use of CLANG_FLAGS in Kconfig which is used when
testing for other CFLAGS via $(cc-option ...) but CLANG_FLAGS doesn't
exist in the Xen build system. (It's a Linux/Kbuild variable that
haven't been added yet.)

The missing CLANG_FLAGS isn't an issue for $(cc-option ..) but it
would be when $(as-instr ..) gets imported from Kbuild to tests
assembly instruction. We need to know if we are going to use clang's
assembler or not.

CLANG_FLAGS needs to be calculated before we call Kconfig.

So, this patch adds CLANG_FLAGS which may contain two flags which are
needed for further testing of $(CC)'s capabilities:
  -no-integrated-as
    This flags isn't new, but simply tested earlier so that it can be
    used in Kconfig. The flags is only added for x86 builds like
    before.
  -Werror=unknown-warning-option
    The one is new and is to make sure that the warning is enabled,
    even though it is by default but could be disabled in a particular
    build of clang, see Linux's commit e8de12fb7cde ("kbuild: Check
    for unknown options with cc-option usage in Kconfig and clang")

    It is present in clang 3.0.0, according Linux's commit
    589834b3a009 ("kbuild: Add -Werror=unknown-warning-option to
    CLANG_FLAGS").

(The "note" that say that the flags was only added once wasn't true
when tested on CentOS 6, so the patch uses $(or) and the flag will only
be added once.)

Fixes: 534519f0514f ("xen: Have Kconfig check $(CC)'s version")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen: credit2: limit the max number of CPUs in a runqueue
Dario Faggioli [Thu, 28 May 2020 21:29:44 +0000 (23:29 +0200)]
xen: credit2: limit the max number of CPUs in a runqueue

In Credit2 CPUs (can) share runqueues, depending on the topology. For
instance, with per-socket runqueues (the default) all the CPUs that are
part of the same socket share a runqueue.

On platform with a huge number of CPUs per socket, that could be a
problem. An example is AMD EPYC2 servers, where we can have up to 128
CPUs in a socket.

It is of course possible to define other, still topology-based, runqueue
arrangements (e.g., per-LLC, per-DIE, etc). But that may still result in
runqueues with too many CPUs on other/future platforms. For instance, a
system with 96 CPUs and 2 NUMA nodes will end up having 48 CPUs per
runqueue. Not as bad, but still a lot!

Therefore, let's set a limit to the max number of CPUs that can share a
Credit2 runqueue. The actual value is configurable (at boot time), the
default being 16. If, for instance,  there are more than 16 CPUs in a
socket, they'll be split among two (or more) runqueues.

Note: with core scheduling enabled, this parameter sets the max number
of *scheduling resources* that can share a runqueue. Therefore, with
granularity set to core (and assumint 2 threads per core), we will have
at most 16 cores per runqueue, which corresponds to 32 threads. But that
is fine, considering how core scheduling works.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agoxen: cpupool: add a back-pointer from a scheduler to its pool
Dario Faggioli [Thu, 28 May 2020 21:29:37 +0000 (23:29 +0200)]
xen: cpupool: add a back-pointer from a scheduler to its pool

If we need to know within which pool a particular scheduler
is working, we can do that by querying the cpupool pointer
of any of the sched_resource-s (i.e., ~ any of the CPUs)
assigned to the scheduler itself.

Basically, we pick any sched_resource that we know uses that
scheduler, and we check its *cpupool pointer. If we really
know that the resource uses the scheduler, this is fine, as
it also means the resource is inside the pool we are
looking for.

But, of course, we can't do that for a pool/scheduler that has
not any been given any sched_resource yet (or if we do not
know whether or not it has any sched_resource).

To overcome such limitation, add a back pointer from the
scheduler, to its own pool.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agoxen: credit2: factor runqueue initialization in its own function.
Dario Faggioli [Thu, 28 May 2020 21:29:30 +0000 (23:29 +0200)]
xen: credit2: factor runqueue initialization in its own function.

As it will be useful in later changes. While there, fix
the doc-comment.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agoxen: credit2: factor cpu to runqueue matching in a function
Dario Faggioli [Thu, 28 May 2020 21:29:24 +0000 (23:29 +0200)]
xen: credit2: factor cpu to runqueue matching in a function

Just move the big if() condition in an inline function.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agodocs/xl.cfg: Rewrite cpuid= section
Andrew Cooper [Tue, 1 Sep 2020 15:08:00 +0000 (16:08 +0100)]
docs/xl.cfg: Rewrite cpuid= section

This is partly to adjust the description of 'k' and 's' seeing as they have
changed, but mostly restructuring the information for clarity.

In particular, use indentation to clearly separate the areas discussing libxl
format from xend format.  In addition, extend the xend format section to
discuss subleaf notation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotools/libxc: Restore CPUID/MSR data found in the migration stream
Andrew Cooper [Fri, 20 Dec 2019 19:38:26 +0000 (19:38 +0000)]
tools/libxc: Restore CPUID/MSR data found in the migration stream

With all other pieces in place, it is now safe to restore the CPUID and MSR
data in the migration stream, rather than discarding them and using the higher
level toolstacks compatibility logic.

While this is a small patch, it has large implications for migrated/resumed
domains.  Most obviously, the CPU family/model/stepping data,
cache/tlb/etc. will no longer change behind the guests back.

Another change is the interpretation of the Xend cpuid strings.  The 'k'
option is not a sensible thing to have ever supported, and 's' is how how the
stream will end up behaving.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotools/libx[cl]: Plumb 'missing' through static_data_done() up into libxl
Andrew Cooper [Mon, 16 Dec 2019 19:03:14 +0000 (19:03 +0000)]
tools/libx[cl]: Plumb 'missing' through static_data_done() up into libxl

Pre Xen-4.14 streams will not contain any CPUID/MSR information.  There is
nothing libxc can do about this, and will have to rely on the higher level
toolstack to provide backwards compatibility.

To facilitate this, extend the static_data_done() callback, highlighting the
missing information, and modify libxl to use it.  At the libxc level, this
requires an arch-specific hook which, for now, always reports CPUID and MSR as
missing.  This will be adjusted in a later change.

No overall functional change - this is just plumbing.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxc/save: Write X86_{CPUID,MSR}_DATA records
Andrew Cooper [Tue, 17 Dec 2019 12:41:02 +0000 (12:41 +0000)]
libxc/save: Write X86_{CPUID,MSR}_DATA records

With the destination side now able to understand X86_{CPUID,MSR}_DATA
records (and compatibly handle their absense), update the sending logic to
obtain and forward this data from Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxc/restore: Handle X86_{CPUID,MSR}_DATA records
Andrew Cooper [Wed, 18 Dec 2019 18:51:01 +0000 (18:51 +0000)]
libxc/restore: Handle X86_{CPUID,MSR}_DATA records

For now, the data are just stashed, and discarded at the end.

A future change will restore the data, once libxl has been adjusted to avoid
clobbering the data.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>