]> xenbits.xensource.com Git - xen.git/log
xen.git
6 years agox86/msr: Virtualise MSR_SPEC_CTRL.SSBD for guests to use
Andrew Cooper [Fri, 13 Apr 2018 15:42:34 +0000 (15:42 +0000)]
x86/msr: Virtualise MSR_SPEC_CTRL.SSBD for guests to use

Almost all infrastructure is already in place.  Update the reserved bits
calculation in guest_wrmsr(), and offer SSBD to guests by default.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/Intel: Mitigations for GPZ SP4 - Speculative Store Bypass
Andrew Cooper [Wed, 28 Mar 2018 14:21:39 +0000 (15:21 +0100)]
x86/Intel: Mitigations for GPZ SP4 - Speculative Store Bypass

To combat GPZ SP4 "Speculative Store Bypass", Intel have extended their
speculative sidechannel mitigations specification as follows:

 * A feature bit to indicate that Speculative Store Bypass Disable is
   supported.
 * A new bit in MSR_SPEC_CTRL which, when set, disables memory disambiguation
   in the pipeline.
 * A new bit in MSR_ARCH_CAPABILITIES, which will be set in future hardware,
   indicating that the hardware is not susceptible to Speculative Store Bypass
   sidechannels.

For contemporary processors, this interface will be implemented via a
microcode update.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/AMD: Mitigations for GPZ SP4 - Speculative Store Bypass
Andrew Cooper [Thu, 26 Apr 2018 09:56:28 +0000 (10:56 +0100)]
x86/AMD: Mitigations for GPZ SP4 - Speculative Store Bypass

AMD processors will execute loads and stores with the same base register in
program order, which is typically how a compiler emits code.

Therefore, by default no mitigating actions are taken, despite there being
corner cases which are vulnerable to the issue.

For performance testing, or for users with particularly sensitive workloads,
the `spec-ctrl=ssbd` command line option is available to force Xen to disable
Memory Disambiguation on applicable hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agodoc: correct livepatch.markdown syntax
Juergen Gross [Tue, 8 May 2018 06:47:30 +0000 (08:47 +0200)]
doc: correct livepatch.markdown syntax

"make -C docs all" fails due to incorrect markdown syntax in
livepatch.markdown. Correct it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Misc fixes:
 * Insert real URLs
 * Drop trailing whitespace
 * Consistent alignment and indentation for code blocks and lists
 * Consistent capitalisation
 * Consistent use of `` blocks for command line arguments and function names
 * Rearrange things not to leave &lt; and &gt; in the text

No change in content.  The document now reads rather more consistently in HTML
and PDF form.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxl: show full value of cpu_khz in xl info output
Olaf Hering [Tue, 3 Apr 2018 11:14:11 +0000 (13:14 +0200)]
xl: show full value of cpu_khz in xl info output

The exact value of cpu_khz can only be obtained via 'xl dmesg', and
therefore can be lost after some time. 'xl info' truncates the value to
full MHz. Adjust the output to show the full khz value.
This helps the host admin to track how a host has calibrated itself. The
value of cpu_khz is used during live migration for the decision if
access to TSC should be emualted.

Commit eb5277a30e ("bitkeeper revision 1.959.1.4
(40d04a87acOb29u-5Y5OxMhHvP2x9g)" gives no hint why cpu_mhz instead of
cpu_khz was chosen.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoConfig.mk: Update QEMU to include build fixes
Anthony PERARD [Fri, 18 May 2018 16:17:54 +0000 (17:17 +0100)]
Config.mk: Update QEMU to include build fixes

This tag includes two build fixes:
- dump: Fix build with newer gcc
    Fix build with GCC-8
- Fix libusb-1.0.22 deprecated libusb_set_debug with libusb_set_option

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
6 years agoxen/kbdif: Add features to disable keyboard and pointer 4.11.0-rc5
Oleksandr Andrushchenko [Wed, 2 May 2018 14:49:19 +0000 (17:49 +0300)]
xen/kbdif: Add features to disable keyboard and pointer

It is now not fully possible to control if and which virtual devices
are created by the frontend, e.g. keyboard and pointer devices
are always created and multi-touch device is created if the
backend advertises multi-touch support. In some cases this
behavior is not desirable and better control over the frontend's
configuration is required.

Add new XenStore feature fields, so it is possible to individually
control set of exposed virtual devices for each guest OS:
 - set feature-disable-keyboard to 1 if no keyboard device needs
   to be created
 - set feature-disable-pointer to 1 if no pointer device needs
   to be created

Keep old behavior by default.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoscripts/add_maintainers.pl: New script
Lars Kurth [Fri, 11 May 2018 16:33:00 +0000 (17:33 +0100)]
scripts/add_maintainers.pl: New script

This provides a much better workflow when using git format-patch and
git send-email, with get_maintainer.pl.

The tool covers step 2 of the following workflow

  Step 1: git format-patch ... -o <patchdir> ...
  Step 2: ./scripts/add_maintainers.pl -d <patchdir>
          This overwrites  *.patch files in <patchdir>
  Step 3: git send-email -to xen-devel@lists.xenproject.org <patchdir>/*.patchxm

I manually tested all options and the most common combinations
on Mac.

Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agovpci/msi: fix unbind loop
Roger Pau Monné [Wed, 16 May 2018 14:28:46 +0000 (16:28 +0200)]
vpci/msi: fix unbind loop

The current unbind loop on failure in vpci_msi_enable is wrong and
will only work correctly if the initial pirq is 0. Fix this by adding
a proper bound.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Introduce a new `spec-ctrl=` command line argument to replace `bti=`
Andrew Cooper [Thu, 26 Apr 2018 09:52:55 +0000 (10:52 +0100)]
x86/spec_ctrl: Introduce a new `spec-ctrl=` command line argument to replace `bti=`

In hindsight, the options for `bti=` aren't as flexible or useful as expected
(including several options which don't appear to behave as intended).
Changing the behaviour of an existing option is problematic for compatibility,
so introduce a new `spec-ctrl=` in the hopes that we can do better.

One common way of deploying Xen is with a single PV dom0 and all domUs being
HVM domains.  In such a setup, an administrator who has weighed up the risks
may wish to forgo protection against malicious PV domains, to reduce the
overall performance hit.  To cater for this usecase, `spec-ctrl=no-pv` will
disable all speculative protection for PV domains, while leaving all
speculative protection for HVM domains intact.

For coding clarity as much as anything else, the suboptions are grouped by
logical area; those which affect the alternatives blocks, and those which
affect Xen's in-hypervisor settings.  See the xen-command-line.markdown for
full details of the new options.

While changing the command line options, take the time to change how the data
is reported to the user.  The three DEBUG printks are upgraded to unilateral,
as they are all relevant pieces of information, and the old "mitigations:"
line is split in the two logical areas described above.

Sample output from booting with `spec-ctrl=no-pv` looks like:

  (XEN) Speculative mitigation facilities:
  (XEN)   Hardware features: IBRS/IBPB STIBP IBPB
  (XEN)   Compiled-in support: INDIRECT_THUNK
  (XEN)   Xen settings: BTI-Thunk RETPOLINE, SPEC_CTRL: IBRS-, Other: IBPB
  (XEN)   Support for VMs: PV: None, HVM: MSR_SPEC_CTRL RSB
  (XEN)   XPTI (64-bit PV only): Dom0 enabled, DomU enabled

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/cpuid: Improvements to guest policies for speculative sidechannel features
Andrew Cooper [Tue, 1 May 2018 10:59:03 +0000 (11:59 +0100)]
x86/cpuid: Improvements to guest policies for speculative sidechannel features

If Xen isn't virtualising MSR_SPEC_CTRL for guests, IBRSB shouldn't be
advertised.  It is not currently possible to express this via the existing
command line options, but such an ability will be introduced.

Another useful option in some usecases is to offer IBPB without IBRS.  When a
guest kernel is known to be compatible (uses retpoline and knows about the AMD
IBPB feature bit), an administrator with pre-Skylake hardware may wish to hide
IBRS.  This allows the VM to have full protection, without Xen or the VM
needing to touch MSR_SPEC_CTRL, which can reduce the overhead of Spectre
mitigations.

Break the logic common to both PV and HVM CPUID calculations into a common
helper, to avoid duplication.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Explicitly set Xen's default MSR_SPEC_CTRL value
Andrew Cooper [Wed, 9 May 2018 12:59:56 +0000 (13:59 +0100)]
x86/spec_ctrl: Explicitly set Xen's default MSR_SPEC_CTRL value

With the impending ability to disable MSR_SPEC_CTRL handling on a
per-guest-type basis, the first exit-from-guest may not have the side effect
of loading Xen's choice of value.  Explicitly set Xen's default during the BSP
and AP boot paths.

For the BSP however, delay setting a non-zero MSR_SPEC_CTRL default until
after dom0 has been constructed when safe to do so.  Oracle report that this
speeds up boots of some hardware by 50s.

"when safe to do so" is based on whether we are virtualised.  A native boot
won't have any other code running in a position to mount an attack.

Reported-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Split X86_FEATURE_SC_MSR into PV and HVM variants
Andrew Cooper [Tue, 17 Apr 2018 13:15:04 +0000 (14:15 +0100)]
x86/spec_ctrl: Split X86_FEATURE_SC_MSR into PV and HVM variants

In order to separately control whether MSR_SPEC_CTRL is virtualised for PV and
HVM guests, split the feature used to control runtime alternatives into two.
Xen will use MSR_SPEC_CTRL itself if either of these features are active.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Elide MSR_SPEC_CTRL handling in idle context when possible
Andrew Cooper [Mon, 7 May 2018 13:06:16 +0000 (14:06 +0100)]
x86/spec_ctrl: Elide MSR_SPEC_CTRL handling in idle context when possible

If Xen is virtualising MSR_SPEC_CTRL handling for guests, but using 0 as its
own MSR_SPEC_CTRL value, spec_ctrl_{enter,exit}_idle() need not write to the
MSR.

Requested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Rename bits of infrastructure to avoid NATIVE and VMEXIT
Andrew Cooper [Mon, 30 Apr 2018 13:20:23 +0000 (14:20 +0100)]
x86/spec_ctrl: Rename bits of infrastructure to avoid NATIVE and VMEXIT

In hindsight, using NATIVE and VMEXIT as naming terminology was not clever.
A future change wants to split SPEC_CTRL_EXIT_TO_GUEST into PV and HVM
specific implementations, and using VMEXIT as a term is completely wrong.

Take the opportunity to fix some stale documentation in spec_ctrl_asm.h.  The
IST helpers were missing from the large comment block, and since
SPEC_CTRL_ENTRY_FROM_INTR_IST was introduced, we've gained a new piece of
functionality which currently depends on the fine grain control, which exists
in lieu of livepatching.  Note this in the comment.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Fold the XEN_IBRS_{SET,CLEAR} ALTERNATIVES together
Andrew Cooper [Tue, 17 Apr 2018 13:15:04 +0000 (14:15 +0100)]
x86/spec_ctrl: Fold the XEN_IBRS_{SET,CLEAR} ALTERNATIVES together

Currently, the SPEC_CTRL_{ENTRY,EXIT}_* macros encode Xen's choice of
MSR_SPEC_CTRL as an immediate constant, and chooses between IBRS or not by
doubling up the entire alternative block.

There is now a variable holding Xen's choice of value, so use that and
simplify the alternatives.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Merge bti_ist_info and use_shadow_spec_ctrl into spec_ctrl_flags
Andrew Cooper [Tue, 17 Apr 2018 13:15:04 +0000 (14:15 +0100)]
x86/spec_ctrl: Merge bti_ist_info and use_shadow_spec_ctrl into spec_ctrl_flags

All 3 bits of information here are control flags for the entry/exit code
behaviour.  Treat them as such, rather than having two different variables.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Express Xen's choice of MSR_SPEC_CTRL value as a variable
Andrew Cooper [Tue, 17 Apr 2018 13:15:04 +0000 (14:15 +0100)]
x86/spec_ctrl: Express Xen's choice of MSR_SPEC_CTRL value as a variable

At the moment, we have two different encodings of Xen's MSR_SPEC_CTRL value,
which is a side effect of how the Spectre series developed.  One encoding is
via an alias with the bottom bit of bti_ist_info, and can encode IBRS or not,
but not other configurations such as STIBP.

Break Xen's value out into a separate variable (in the top of stack block for
XPTI reasons) and use this instead of bti_ist_info in the IST path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/spec_ctrl: Read MSR_ARCH_CAPABILITIES only once
Andrew Cooper [Thu, 26 Apr 2018 11:21:00 +0000 (12:21 +0100)]
x86/spec_ctrl: Read MSR_ARCH_CAPABILITIES only once

Make it available from the beginning of init_speculation_mitigations(), and
pass it into appropriate functions.  Fix an RSBA typo while moving the
affected comment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools/ocaml/libs/xc fix gcc-8 format-truncation warning
John Thomson [Tue, 15 May 2018 01:48:43 +0000 (11:48 +1000)]
tools/ocaml/libs/xc fix gcc-8 format-truncation warning

 CC       xenctrl_stubs.o
xenctrl_stubs.c: In function 'failwith_xc':
xenctrl_stubs.c:65:17: error: 'snprintf' output may be truncated before the last format character [-Werror=format-truncation=]
      "%d: %s: %s", error->code,
                 ^
xenctrl_stubs.c:64:4: note: 'snprintf' output 6 or more bytes (assuming 1029) into a destination of size 1028
    snprintf(error_str, sizeof(error_str),
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      "%d: %s: %s", error->code,
      ~~~~~~~~~~~~~~~~~~~~~~~~~~
      xc_error_code_to_desc(error->code),
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      error->message);
      ~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[8]: *** [/build/xen-git/src/xen/tools/ocaml/libs/xc/../../Makefile.rules:37: xenctrl_stubs.o] Error 1
m

Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/kbdif: Add string constants for raw pointer
Oleksandr Andrushchenko [Wed, 2 May 2018 14:49:18 +0000 (17:49 +0300)]
xen/kbdif: Add string constants for raw pointer

Add missing string constants for {feature|request}-raw-pointer
to align with the rest of the interface file.

Fixes 7868654ff7fe ("kbdif: Define "feature-raw-pointer" and "request-raw-pointer")

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: Correctly process caveats in multi-status sections
Ian Jackson [Tue, 15 May 2018 14:41:14 +0000 (15:41 +0100)]
docs/parse-support-md: Correctly process caveats in multi-status sections

When SUPPORT.md uses the syntax
  Status, <some thing>: <support status>
the caveats were lost (not footnoted) because they were attached
only to <some thing>.

Caveats occur in running text, so they are necessarily part of a real
section, not an individual status line like that.  So attach them to
the RealSectNode, and look there for them.

Reported-by: Lars Kurth <lars.kurth@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Lars Kurth <Lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: Provide $sectnode->{RealSectNode}
Ian Jackson [Tue, 15 May 2018 14:39:03 +0000 (15:39 +0100)]
docs/parse-support-md: Provide $sectnode->{RealSectNode}

No functional change yet.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Lars Kurth <Lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: Rename RealSect to RealInSect
Ian Jackson [Tue, 15 May 2018 14:35:00 +0000 (15:35 +0100)]
docs/parse-support-md: Rename RealSect to RealInSect

This makes the distinction between insections and sectnodes clearer.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Lars Kurth <Lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoviridian: fix cpuid leaf 0x40000003
Paul Durrant [Fri, 11 May 2018 14:48:32 +0000 (15:48 +0100)]
viridian: fix cpuid leaf 0x40000003

The response to viridian leaf 3 needs to split a 64-bit mask across EAX and
EBX, with the low order 32 bits in EAX and the high order 32 bits in EBX.
To facilitate this a union of two uint32_t values and the mask (type
HV_PARTITION_PRIVILEGE_MASK) is allocated on stack as follows:

union {
    HV_PARTITION_PRIVILEGE_MASK mask;
    uint32_t lo, hi;
} u;

This, of course, is incorrect as both lo and hi will alias the low order
32 bits of the mask.

This patch wraps lo and hi in an anonmymous struct to achieve the desired
effect.

NOTE: Fixing this also stops Windows making the HvGetPartitionId hypercall
      which was previously considered erroneous behaviour. Thus the
      hypercall handler is also modified to stop squashing the
      'unimplemented' warning for this hypercall.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibacpi: fixes for iasl >= 20180427 4.11.0-rc4
Roger Pau Monné [Wed, 9 May 2018 10:08:12 +0000 (11:08 +0100)]
libacpi: fixes for iasl >= 20180427

New versions of iasl have introduced improved C file generation, as
reported in the changelog:

iASL: Enhanced the -tc option (which creates an AML hex file in C,
suitable for import into a firmware project):
  1) Create a unique name for the table, to simplify use of multiple
SSDTs.
  2) Add a protection #ifdef in the file, similar to a .h header file.

The net effect of that on generated files is:

-unsigned char AmlCode[] =
+#ifndef __SSDT_S4_HEX__
+#define __SSDT_S4_HEX__
+
+unsigned char ssdt_s4_aml_code[] =

The above example is from ssdt_s4.asl.

Fix the build with newer versions of iasl by stripping the '_aml_code'
suffix from the variable name on generated files.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/HVM: guard against emulator driving ioreq state in weird ways
Jan Beulich [Tue, 8 May 2018 17:12:56 +0000 (18:12 +0100)]
x86/HVM: guard against emulator driving ioreq state in weird ways

In the case where hvm_wait_for_io() calls wait_on_xen_event_channel(),
p->state ends up being read twice in succession: once to determine that
state != p->state, and then again at the top of the loop.  This gives a
compromised emulator a chance to change the state back between the two
reads, potentially keeping Xen in a loop indefinitely.

Instead:
* Read p->state once in each of the wait_on_xen_event_channel() tests,
* re-use that value the next time around,
* and insist that the states continue to transition "forward" (with the
  exception of the transition to STATE_IOREQ_NONE).

This is XSA-262.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/vpt: add support for IO-APIC routed interrupts
Xen Project Security Team [Tue, 8 May 2018 17:12:10 +0000 (18:12 +0100)]
x86/vpt: add support for IO-APIC routed interrupts

And modify the HPET code to make use of it. Currently HPET interrupts
are always treated as ISA and thus injected through the vPIC. This is
wrong because HPET interrupts when not in legacy mode should be
injected from the IO-APIC.

To make things worse, the supported interrupt routing values are set
to [20..23], which clearly falls outside of the ISA range, thus
leading to an ASSERT in debug builds or memory corruption in non-debug
builds because the interrupt injection code will write out of the
bounds of the arch.hvm_domain.vpic array.

Since the HPET interrupt source can change between ISA and IO-APIC
always destroy the timer before changing the mode, or else Xen risks
changing it while the timer is active.

Note that vpt interrupt injection is racy in the sense that the
vIO-APIC RTE entry can be written by the guest in between the call to
pt_irq_masked and hvm_ioapic_assert, or the call to pt_update_irq and
pt_intr_post. Those are not deemed to be security issues, but rather
quirks of the current implementation. In the worse case the guest
might lose interrupts or get multiple interrupt vectors injected for
the same timer source.

This is part of XSA-261.

Address actual and potential compiler warnings. Fix formatting.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/traps: Fix handling of #DB exceptions in hypervisor context
Andrew Cooper [Fri, 23 Mar 2018 17:03:42 +0000 (17:03 +0000)]
x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/traps: Use an Interrupt Stack Table for #DB
Andrew Cooper [Thu, 22 Mar 2018 11:27:03 +0000 (11:27 +0000)]
x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Move exception injection into {,compat_}test_all_events()
Andrew Cooper [Thu, 22 Mar 2018 11:27:03 +0000 (11:27 +0000)]
x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/traps: Fix %dr6 handing in #DB handler
Andrew Cooper [Mon, 26 Mar 2018 08:02:34 +0000 (09:02 +0100)]
x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/domain: Drop the only-written smap_check_policy infrastructure
Andrew Cooper [Tue, 8 May 2018 12:45:45 +0000 (13:45 +0100)]
x86/domain: Drop the only-written smap_check_policy infrastructure

c/s 4c5d78a10d "x86/pagewalk: Re-implement the pagetable walker" dropped the
consumer of smap_policy.  Looking at c/s 31ae587e6f which introduced the
smap_check logic, it exists only to work around a bug in guest_walk_tables()
was resolved by the aformentioned commit.

Remove the unused variables and associated infrastructure.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodoc: add credit2_cap_period_ms boot parameter description
Juergen Gross [Mon, 7 May 2018 10:16:05 +0000 (12:16 +0200)]
doc: add credit2_cap_period_ms boot parameter description

credit2_cap_period_ms isn't mentioned in xen-command-line.markdown.
Add a description.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agodoc: add architecture qualifier to boot parameter entries
Juergen Gross [Mon, 7 May 2018 10:16:04 +0000 (12:16 +0200)]
doc: add architecture qualifier to boot parameter entries

Many of the architecture specific boot parameters are not qualified
as such. Correct that.  Reorder PKU to be alphabetical.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/pv: Hide more EFER bits from PV guests
Andrew Cooper [Tue, 20 Mar 2018 19:36:40 +0000 (19:36 +0000)]
x86/pv: Hide more EFER bits from PV guests

We don't advertise SVM in CPUID so a PV guest shouldn't be under the
impression that it can use SVM functionality, but despite this, it really
shouldn't see SVME set when reading EFER.

On Intel processors, 32bit PV guests don't see, and can't use SYSCALL.

Introduce EFER_KNOWN_MASK to whitelist the features Xen knows about, and use
this to clamp the guests view.

Take the opportunity to reuse the mask to simplify svm_vmcb_isvalid(), and
change "undefined" to "unknown" in the print message, as there is at least
EFER.TCE (Translation Cache Extension) defined but unknown to Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoSVM: introduce a VM entry helper
Jan Beulich [Mon, 7 May 2018 07:12:16 +0000 (09:12 +0200)]
SVM: introduce a VM entry helper

Neither the register values copying nor the trace entry generation need
doing in assembly. The VMLOAD invocation can also be further deferred
(and centralized). Therefore replace the svm_asid_handle_vmrun()
invocation with one of the new helper.

Similarly move the VM exit side register value copying into
svm_vmexit_handler().

Now that we always make it out to guest context after VMLOAD,
svm_sync_vmcb() no longer overrides vmcb_needs_vmsave, making
svm_vmexit_handler() setting the field early unnecessary.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoSVM: re-work VMCB sync-ing
Jan Beulich [Mon, 7 May 2018 07:11:15 +0000 (09:11 +0200)]
SVM: re-work VMCB sync-ing

While the main problem to be addressed here is the issue of what so far
was named "vmcb_in_sync" starting out with the wrong value (should have
been true instead of false, to prevent performing a VMSAVE without ever
having VMLOADed the vCPU's state), go a step further and make the
sync-ed state a tristate: CPU and memory may be in sync or an update
may be required in either direction. Rename the field and introduce an
enum. Callers of svm_sync_vmcb() now indicate the intended new state
(with a slight "anomaly" when requesting VMLOAD: we could store
vmcb_needs_vmsave in those cases as the callers request, but the VMCB
really is in sync at that point, and hence there's no need to VMSAVE in
case we don't make it out to guest context), and all syncing goes
through that function.

With that, there's no need to VMLOAD the state perhaps multiple times;
all that's needed is loading it once before VM entry.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs: fix xpti command line option doc
Wei Liu [Fri, 4 May 2018 15:08:04 +0000 (16:08 +0100)]
docs: fix xpti command line option doc

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: use PCID feature
Juergen Gross [Thu, 26 Apr 2018 11:33:18 +0000 (13:33 +0200)]
xen/x86: use PCID feature

Avoid flushing the complete TLB when switching %cr3 for mitigation of
Meltdown by using the PCID feature if available.

We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
2 values for the non-XPTI case:

- guest active and in kernel mode
- guest active and in user mode
- hypervisor active and guest in user mode (XPTI only)
- hypervisor active and guest in kernel mode (XPTI only)

We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
we disable global pages in cr4. A command line parameter controls in
which cases PCID is being used.

As the non-XPTI case has shown not to perform better with PCID at least
on some machines the default is to use PCID only for domains subject to
XPTI.

With PCID enabled we always disable global pages. This avoids having to
either flush the complete TLB or do a cycle through all PCID values
when invalidating a single global page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: add some cr3 helpers
Juergen Gross [Thu, 26 Apr 2018 11:33:17 +0000 (13:33 +0200)]
xen/x86: add some cr3 helpers

Add some helper macros to access the address and pcid parts of cr3.

Use those helpers where appropriate.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: convert pv_guest_cr4_to_real_cr4() to a function
Juergen Gross [Thu, 26 Apr 2018 11:33:16 +0000 (13:33 +0200)]
xen/x86: convert pv_guest_cr4_to_real_cr4() to a function

pv_guest_cr4_to_real_cr4() is becoming more and more complex. Convert
it from a macro to an ordinary function.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: use flag byte for decision whether xen_cr3 is valid
Juergen Gross [Thu, 26 Apr 2018 11:33:15 +0000 (13:33 +0200)]
xen/x86: use flag byte for decision whether xen_cr3 is valid

Today cpu_info->xen_cr3 is either 0 to indicate %cr3 doesn't need to
be switched on entry to Xen, or negative for keeping the value while
indicating not to restore %cr3, or positive in case %cr3 is to be
restored.

Switch to use a flag byte instead of a negative xen_cr3 value in order
to allow %cr3 values with the high bit set in case we want to keep TLB
entries when using the PCID feature.

This reduces the number of branches in interrupt handling and results
in better performance (e.g. parallel make of the Xen hypervisor on my
system was using about 3% less system time).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: disable global pages for domains with XPTI active
Juergen Gross [Thu, 26 Apr 2018 11:33:14 +0000 (13:33 +0200)]
xen/x86: disable global pages for domains with XPTI active

Instead of flushing the TLB from global pages when switching address
spaces with XPTI being active just disable global pages via %cr4
completely when a domain subject to XPTI is active. This avoids the
need for extra TLB flushes as loading %cr3 will remove all TLB
entries.

In order to avoid states with cr3/cr4 having inconsistent values
(e.g. global pages being activated while cr3 already specifies a XPTI
address space) move loading of the new cr4 value to write_ptbase()
(actually to switch_cr3_cr4() called by write_ptbase()).

This requires to use switch_cr3_cr4() instead of write_ptbase() when
building dom0 in order to avoid setting cr4 with cr4.smap set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: use invpcid for flushing the TLB
Juergen Gross [Thu, 26 Apr 2018 11:33:13 +0000 (13:33 +0200)]
xen/x86: use invpcid for flushing the TLB

If possible use the INVPCID instruction for flushing the TLB instead of
toggling cr4.pge for that purpose.

While at it remove the dependency on cr4.pge being required for mtrr
loading, as this will be required later anyway.

Add a command line option "invpcid" for controlling the use of
INVPCID (default to true).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: support per-domain flag for xpti
Juergen Gross [Thu, 26 Apr 2018 11:33:12 +0000 (13:33 +0200)]
xen/x86: support per-domain flag for xpti

Instead of switching XPTI globally on or off add a per-domain flag for
that purpose. This allows to modify the xpti boot parameter to support
running dom0 without Meltdown mitigations. Using "xpti=no-dom0" as boot
parameter will achieve that.

Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
it is pv-domain specific.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: add a function for modifying cr3
Juergen Gross [Thu, 26 Apr 2018 11:33:11 +0000 (13:33 +0200)]
xen/x86: add a function for modifying cr3

Instead of having multiple places with more or less identical asm
statements just have one function doing a write to cr3.

As this function should be named write_cr3() rename the current
write_cr3() function to switch_cr3().

Suggested-by: Andrew Copper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/xpti: avoid copying L4 page table contents when possible
Juergen Gross [Thu, 26 Apr 2018 11:33:10 +0000 (13:33 +0200)]
x86/xpti: avoid copying L4 page table contents when possible

For mitigation of Meltdown the current L4 page table is copied to the
cpu local root page table each time a 64 bit pv guest is entered.

Copying can be avoided in cases where the guest L4 page table hasn't
been modified while running the hypervisor, e.g. when handling
interrupts or any hypercall not modifying the L4 page table or %cr3.

So add a per-cpu flag indicating whether the copying should be
performed and set that flag only when loading a new %cr3 or modifying
the L4 page table.  This includes synchronization of the cpu local
root page table with other cpus, so add a special synchronization flag
for that case.

A simple performance check (compiling the hypervisor via "make -j 4")
in dom0 with 4 vcpus shows a significant improvement:

- real time drops from 112 seconds to 103 seconds
- system time drops from 142 seconds to 131 seconds

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: fix return value checks of set_guest_{machinecheck,nmi}_trapbounce 4.11.0-rc3
Jan Beulich [Thu, 3 May 2018 15:35:51 +0000 (17:35 +0200)]
x86: fix return value checks of set_guest_{machinecheck,nmi}_trapbounce

Commit 0142064421 ("x86/traps: move set_guest_{machine,nmi}_trapbounce")
converted the functions' return types from int to bool without also
correcting the checks in assembly code: The ABI does not guarantee sub-
32-bit return values to be promoted to 32 bits.

Take the liberty and also adjust the number of spaces used in the compat
code, such that both code sequences end up similar (they already are in
the non-compat case).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoxen/schedule: Fix races in vcpu migration
George Dunlap [Tue, 1 May 2018 17:13:27 +0000 (18:13 +0100)]
xen/schedule: Fix races in vcpu migration

The current sequence to initiate vcpu migration is inefficent and error-prone:

- The initiator sets VPF_migraging with the lock held, then drops the
  lock and calls vcpu_sleep_nosync(), which immediately grabs the lock
  again

- A number of places unnecessarily check for v->pause_flags in between
  those two

- Every call to vcpu_migrate() must be prefaced with
  vcpu_sleep_nosync() or introduce a race condition; this code
  duplication is error-prone

- In the event that v->is_running is true at the beginning of
  vcpu_migrate(), it's almost certain that vcpu_migrate() will end up
  being called in context_switch() as well; we might as well simply
  let it run there and save the duplicated effort (which will be
  non-negligible).

The result is that Credit1 has several races which result in runqueue
<-> v->processor invariants being violated (triggering ASSERTs in
debug builds and strange bugs in production builds).

Instead, introduce vcpu_migrate_start() to initiate the process.
vcpu_migrate_start() is called with the scheduling lock held.  It not
only sets VPF_migrating, but also calls vcpu_sleep_nosync_locked()
(which will automatically do nothing if there's nothing to do).

Rename vcpu_migrate() to vcpu_migrate_finish().  Check for v->is_running and
pause_flags & VPF_migrating at the top and return if appropriate.

Then the way to initiate migration is consistently:

* Grab lock
* vcpu_migrate_start()
* Release lock
* vcpu_migrate_finish()

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoxen: Introduce vcpu_sleep_nosync_locked()
George Dunlap [Wed, 2 May 2018 10:09:18 +0000 (11:09 +0100)]
xen: Introduce vcpu_sleep_nosync_locked()

There are a lot of places which release a lock before calling
vcpu_sleep_nosync(), which then just grabs the lock again.  This is
not only a waste of time, but leads to more code duplication (since
you have to copy-and-paste recipes rather than calling a unified
function), which in turn leads to an increased chance of bugs.

Introduce vcpu_sleep_nosync_locked(), which can be called if you
already hold the schedule lock.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoxen/schedule.c: Fix up whitespace
George Dunlap [Wed, 2 May 2018 10:09:17 +0000 (11:09 +0100)]
xen/schedule.c: Fix up whitespace

Delete tabs and trailing whitespace.

No functional change.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoMAINTAINERS: Add Brian Woods as Designated reviewer to AMD IOMMU and AMD SVM
Lars Kurth [Tue, 1 May 2018 08:03:13 +0000 (09:03 +0100)]
MAINTAINERS: Add Brian Woods as Designated reviewer to AMD IOMMU and AMD SVM

This was discussed in an IRC discussion post the April x86 meeting.
On 27/4/18 Juergen gave a RAB via IRC

Cc: Lars Kurth <lars.kurth@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Brian Woods <brian.woods@amd.com>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoMAINTAINERS, get_maintainer.pl: Add Designated Reviewer (R:) role
Lars Kurth [Tue, 1 May 2018 08:03:12 +0000 (09:03 +0100)]
MAINTAINERS, get_maintainer.pl: Add Designated Reviewer (R:) role

The syntax has been copied from the Linux Maintainers file. I moved the following Linux
get_maintainer.pl patches to Xen, fixing up some merge issues (and a bug).

The get_maintainer.pl changes were based on the following git commits
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/scripts/get_maintainer.pl?id=
c1c3f2c906e35bcb6e4cdf5b8e077660fead14fe
4f07510df2e8c47fd65b8ffaaf6c5d334d59d598

I also removed code related to
  P: Person (obsolete)
which is in the Linux MAINTAINER's file, but not ours. I may not have
caught all instances though.

I have tested on a number of files using mock entries in MAINTAINERS
using ./scripts/get_maintainer.pl -f ...

I also tested --nor to disable the support and it worked as expected.

Cc: Lars Kurth <lars.kurth@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agox86emul: VMOVNTDQA should raise #GP(0) on mis-alignment
Jan Beulich [Mon, 30 Apr 2018 16:02:47 +0000 (18:02 +0200)]
x86emul: VMOVNTDQA should raise #GP(0) on mis-alignment

Commit 50b73118d5 introduced emulation of the insn without extending the
set of opcodes requiring special alignment related #GP behavior.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agotools: prepend to PKG_CONFIG_PATH when configuring qemu
Stewart Hildebrand [Thu, 26 Apr 2018 17:41:08 +0000 (17:41 +0000)]
tools: prepend to PKG_CONFIG_PATH when configuring qemu

A user may choose to set his/her own PKG_CONFIG_PATH, which is useful in the
case of cross-compiling.  We don't want to completely override the
PKG_CONFIG_PATH, just add to it.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/process/release-checklist.txt: Say to push staging branch
Ian Jackson [Wed, 13 Dec 2017 12:14:28 +0000 (12:14 +0000)]
docs/process/release-checklist.txt: Say to push staging branch

Preparing a real release, not just an RC, involves making commits.
Typically, those will be on staging-$x.  The tag will refer to them,
and the checklist already says to push them to xenbits.

But if the *branch* is not pushed, then people who just "git fetch"
won't get the tag because it refers to commits they don't have.
(Because of the strange rules git has about tag fetching.)
Worse, the same may be true of people who "git clone".

And anyway, those commits *should* be fed to staging-$x.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agodocs/process/release-checklist.txt: New instructions for disabling debug
Ian Jackson [Fri, 1 Dec 2017 15:17:33 +0000 (15:17 +0000)]
docs/process/release-checklist.txt: New instructions for disabling debug

The old instructions were obsolete.  Here are the details I used when
branching for 4.10.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/traps: Improve code generation for set_ist()
Andrew Cooper [Thu, 22 Mar 2018 11:31:07 +0000 (11:31 +0000)]
x86/traps: Improve code generation for set_ist()

The IST field in an IDT entry is a 3 bit field, with 5 adjacent reserved bits
which we always write as zero.  By expressing this as a byte field in a union,
we turn an invocation of enable_each_ist() from

  4b 8b 14 d3                     mov    (%r11,%r10,8),%rdx
  48 b8 ff ff ff ff f8 ff ff ff   movabs $0xfffffff8ffffffff,%rax
  48 be 00 00 00 00 01 00 00 00   movabs $0x100000000,%rsi
  48 8b 8a 80 00 00 00            mov    0x80(%rdx),%rcx
  48 21 c1                        and    %rax,%rcx
  48 09 f1                        or     %rsi,%rcx
  48 be 00 00 00 00 02 00 00 00   movabs $0x200000000,%rsi
  48 89 8a 80 00 00 00            mov    %rcx,0x80(%rdx)
  48 8b 4a 20                     mov    0x20(%rdx),%rcx
  48 21 c1                        and    %rax,%rcx
  48 23 82 20 01 00 00            and    0x120(%rdx),%rax
  48 09 f1                        or     %rsi,%rcx
  48 89 4a 20                     mov    %rcx,0x20(%rdx)
  48 b9 00 00 00 00 03 00 00 00   movabs $0x300000000,%rcx
  48 09 c8                        or     %rcx,%rax
  48 89 82 20 01 00 00            mov    %rax,0x120(%rdx)

into

  4b 8b 04 d3                     mov    (%r11,%r10,8),%rax
  c6 80 84 00 00 00 01            movb   $0x1,0x84(%rax)
  c6 40 24 02                     movb   $0x2,0x24(%rax)
  c6 80 24 01 00 00 03            movb   $0x3,0x124(%rax)

which is far more simple.  As the IDT is typically live, this is more
obviously safe.

The net delta for this change is:

  add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-334 (-334)

While making changes here, tidy up the set_ist() declaration.  Drop the
always_inline (I don't recall why I wrote it like that originally) and the ist
parameter need not be unsigned long (although it will be const-propagated in
practice).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/cpuidle: don't init stats lock more than once
Jan Beulich [Fri, 27 Apr 2018 12:35:35 +0000 (14:35 +0200)]
x86/cpuidle: don't init stats lock more than once

Osstest flight 122363, having hit an NMI watchdog timeout, shows CPU1 at

Xen call trace:
   [<ffff82d08023d3f4>] _spin_lock+0x30/0x57
   [<ffff82d0802d9346>] update_last_cx_stat+0x29/0x42
   [<ffff82d0802d96f3>] cpu_idle.c#acpi_processor_idle+0x2ff/0x596
   [<ffff82d080276713>] domain.c#idle_loop+0xa8/0xc3

and CPU0 at

Xen call trace:
   [<ffff82d08023d173>] on_selected_cpus+0xb7/0xde
   [<ffff82d0802dbe22>] powernow.c#powernow_cpufreq_target+0x110/0x1cb
   [<ffff82d080257973>] __cpufreq_driver_target+0x43/0xa6
   [<ffff82d080256b0d>] cpufreq_governor_dbs+0x324/0x37a
   [<ffff82d080257bf2>] __cpufreq_set_policy+0xfa/0x19d
   [<ffff82d080256044>] cpufreq_add_cpu+0x3a1/0x5df
   [<ffff82d0802dbab4>] cpufreq_cpu_init+0x17/0x1a
   [<ffff82d0802567a8>] set_px_pminfo+0x2b6/0x2f7
   [<ffff82d08029f1bf>] do_platform_op+0xe75/0x1977
   [<ffff82d0803712c5>] pv_hypercall+0x1f4/0x440
   [<ffff82d0803784a5>] lstar_enter+0x115/0x120

That is, Dom0's ACPI processor driver is in the process of uploading Px
and Cx data. Looking at the ticket lock state in CPU1's registers, it is
waiting for ticket 0x0000 to have its turn, while the supposed current
owner's ticket is 0x0001, which is an invalid state (and neither of the
other two CPUs holds the lock anyway). Hence I can only conclude that
cpuidle_init_cpu(1) ran on CPU 0 while some other CPU held the lock (the
unlock then put the lock in the state that CPU1 is observing).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodoc: escape underscores in xen-command-line.markdown 4.11.0-rc2
Juergen Gross [Thu, 26 Apr 2018 16:06:00 +0000 (18:06 +0200)]
doc: escape underscores in xen-command-line.markdown

Some underscores are not escaped in xen-command-line.markdown.
Correct that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodoc: sort entries of boot parameters correctly
Juergen Gross [Thu, 26 Apr 2018 16:03:43 +0000 (18:03 +0200)]
doc: sort entries of boot parameters correctly

Some of the boot parameters in docs/misc/xen-command-line.markdown are
not in the correct alphabetically order. Correct that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86emul: adjust handling of AVX2 gathers
Jan Beulich [Thu, 26 Apr 2018 16:02:37 +0000 (18:02 +0200)]
x86emul: adjust handling of AVX2 gathers

HVM's MMIO cache only has a capacity of three entries. Once running out
of entries, hvmemul_linear_mmio_access() will return
X86EMUL_UNHANDLEABLE. Since gathers are an iterative process anyway,
simply commit the portion of work done in this and hypothetical similar
cases, exiting back to guest context for the insn to be retried.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/process/release-checklist.txt
Ian Jackson [Thu, 26 Apr 2018 10:49:27 +0000 (11:49 +0100)]
docs/process/release-checklist.txt

Clarify what is expected of the release technician for SUPPORT.md: fix
the version number field.  The support dates will be set by the
release manageer.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/process/release-technician-checklist.txt: Rename
Ian Jackson [Thu, 26 Apr 2018 10:51:01 +0000 (11:51 +0100)]
docs/process/release-technician-checklist.txt: Rename

This contains instructions and shell runes for the Release Technician,
who is the person doing the technical admin to construct the release,
fork branches, make tags and tarballs, etc.

Rename it to make this clearer.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/support-matrix-generate: use `git log' not `git-log'
Ian Jackson [Wed, 25 Apr 2018 15:46:01 +0000 (16:46 +0100)]
docs/support-matrix-generate: use `git log' not `git-log'

I found this bug when trying to set up the cron job.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/parse-support-md: Do caveats properly (!)
Ian Jackson [Wed, 25 Apr 2018 12:48:27 +0000 (13:48 +0100)]
docs/parse-support-md: Do caveats properly (!)

Each document has its own objects in @insections.  Only the first
RealSect encountered ends up in the main $toplevel_sectlist tree.

This means that trying to unify the Caveats information for all
version in the RealSect (the $insection) does not work.  The caveats
for all versions that aren't the first one where this section was seen
end up in the @insections array during parsing of that version, but
not recorded in the main tree.

The result was that footnotes would only appear in the output for
versions which were the most recent version where that feature row or
category appeared.  The other footnotes would be lost.

Instead, store HasCaveat in the sectnode.  That means ri_Para needs to
find the sectnode.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/parse-support-md: Break out find_current_sectnode
Ian Jackson [Wed, 25 Apr 2018 12:47:31 +0000 (13:47 +0100)]
docs/parse-support-md: Break out find_current_sectnode

We are going to want to add a call site for this.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86: fix slow int80 path after XPTI additions
Jan Beulich [Wed, 25 Apr 2018 12:39:41 +0000 (14:39 +0200)]
x86: fix slow int80 path after XPTI additions

For the int80 slow path to jump to handle_exception_saved, %r14 needs to
be set up suitably for XPTI purposes. This is because of the difference
in nature between the int80 path (which is synchronous WRT guest
actions) and the exception path which is potentially asynchronous.

This is XSA-259.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agolibxl: Specify format of inserted cdrom
Anthony PERARD [Thu, 8 Mar 2018 18:16:41 +0000 (18:16 +0000)]
libxl: Specify format of inserted cdrom

Without this extra parameter on the QMP command, QEMU will guess the
format of the new file.

This is XSA-258.

Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agox86/SVM: Fix intercepted {RD,WR}MSR for the SYS{CALL,ENTER} MSRs
Andrew Cooper [Tue, 24 Apr 2018 16:08:18 +0000 (17:08 +0100)]
x86/SVM: Fix intercepted {RD,WR}MSR for the SYS{CALL,ENTER} MSRs

By default, the SYSCALL MSRs are not intercepted, and accesses are completed
by hardware.  The SYSENTER MSRs are intercepted for cross-vendor
purposes (albeit needlessly in the common case), and are fully emulated.

However, {RD,WR}MSR instructions which happen to be emulated (FEP,
introspection, or older versions of Xen which intercepted #UD), or when the
MSRs are explicitly intercepted (introspection), will be completed
incorrectly.

svm_msr_read_intercept() appears to return the correct values, but only
because of the default read-everything case (which is going to disappear), and
that in vcpu context, hardware should have the guest values in context.
Update the read path to explicitly sync the VMCB and complete the accesses,
rather than falling all the way through to the default case.

svm_msr_write_intercept() silently discard all updates.  Synchronise the VMCB
for all applicable MSRs, and implement suitable checks.  The actual behaviour
of AMD hardware is to truncate the SYSENTER and SFMASK MSRs at 32 bits, but
this isn't implemented yet to remain compatible with the cross-vendor case.

Drop one bit of trailing whitespace while modifing this area of the code.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/spec_ctrl: Fix typo in ARCH_CAPS decode
Andrew Cooper [Tue, 24 Apr 2018 09:44:02 +0000 (10:44 +0100)]
x86/spec_ctrl: Fix typo in ARCH_CAPS decode

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoxpti: fix bug in double fault handling
Juergen Gross [Mon, 23 Apr 2018 14:13:01 +0000 (16:13 +0200)]
xpti: fix bug in double fault handling

When entering the hypervisor via the double fault handler resetting
xen_cr3 was missing. This led to switching to pv_cr3 when returning
from the next following exception, so repair this in order to allow
exception handling to work even after a double fault.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: correct assertion in destroy_perdomain_mapping()
Jan Beulich [Mon, 23 Apr 2018 14:12:01 +0000 (16:12 +0200)]
x86: correct assertion in destroy_perdomain_mapping()

hvm_domain_initialise() may call this with nr being zero, which triggers
the "does not cross L3 boundary" check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agolibxl: fix memory map reported to PVH guests
Roger Pau Monne [Fri, 20 Apr 2018 14:57:19 +0000 (15:57 +0100)]
libxl: fix memory map reported to PVH guests

PVH guests with 4GB of RAM or more get a memory map like the
following:

0x00000000000000 - 0x000000fee00000 RAM
0x000000fee00000 - 0x00000100000000 RESERVED
0x000000fc009000 - 0x000000fc009040 ACPI
0x000000fc000000 - 0x000000fc001000 ACPI
0x000000fc001000 - 0x000000fc009000 ACPI
0x00000100000000 - 0x000001fb200400 RAM

This is wrong because ACPI regions overlap with RAM regions. The cause
of this issue is not setting a big enough MMIO hole and marking the
whole MMIO hole as reserved, when it actually contains several pieces:

 - local APIC page.
 - ACPI tables.
 - HVM special pages.

Of those items only HVM special pages need to be marked as reserved in
order to advise the guest against using them for example for memory
hotplug.

After the fix the layout reported for the same guest is:

0x00000000000000 - 0x000000fc000000 RAM
0x000000feff8000 - 0x000000ff000000 RESERVED
0x000000fc009000 - 0x000000fc009040 ACPI
0x000000fc000000 - 0x000000fc001000 ACPI
0x000000fc001000 - 0x000000fc009000 ACPI
0x00000100000000 - 0x000001fe000400 RAM

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoshim: don't let build modify shim.config
Juergen Gross [Fri, 20 Apr 2018 15:47:55 +0000 (17:47 +0200)]
shim: don't let build modify shim.config

Currently building the shim will modify shim.config in case some config
option was added or modified in the hypervisor.

Avoid that by copying shim.config to an intermediate file instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs/parse-support-md: Correctly handle footnotes for non-leaf sections
Ian Jackson [Tue, 17 Apr 2018 13:34:36 +0000 (14:34 +0100)]
docs/parse-support-md: Correctly handle footnotes for non-leaf sections

Non-leaf sections with footnotes must have a row of their own, for
just that section, because footnotes only appear if there is status
information.

In that case, the footnote applies to only the rows for that section
in the markdown document, ie that RealSect.

And of course for a leaf section that is true too.

So for footnoes we always want to use a rowspan of the number of
Status elements in the section.  So (i) calculate this in
count_rows_sectlist and (ii) use it, instead of the total number of
rows including all the subsections', when writing out the footnote
ref.

This bug has been present in this script since the beginning.

Also, while we're here, suppress the rowspan if it would be 1.

Reported-by: Lars Kurth <lars.kurth@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/parse-support.md: Add some newlines to the table output
Ian Jackson [Tue, 17 Apr 2018 14:24:10 +0000 (15:24 +0100)]
docs/parse-support.md: Add some newlines to the table output

This makes the result easier for humans to read.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoSUPPORT.md: Document the new text ordering rule
Ian Jackson [Thu, 12 Apr 2018 18:22:16 +0000 (19:22 +0100)]
SUPPORT.md: Document the new text ordering rule

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoSUPPORT.md: Move descriptions up before Status info
Ian Jackson [Thu, 12 Apr 2018 16:32:32 +0000 (17:32 +0100)]
SUPPORT.md: Move descriptions up before Status info

This turns all the things which were treated as caveats, but which
don't need to be footnoted in the matrix, into descriptions.

For the benefit of the support matrix generator, this patch (or a
version of it) should be backported to 4.10.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoSUPPORT.md, support matrix: Treat commentary before status as description
Ian Jackson [Thu, 12 Apr 2018 16:57:58 +0000 (17:57 +0100)]
SUPPORT.md, support matrix: Treat commentary before status as description

Running text in feature sections in the markdown document currently
might be (i) a caveat, qualifying or clarifying the support statement
(ii) a plain description of the feature.

Caveats can be version-specific and deserve the [*] annotation in the
relevant feature matrix cell.  They must link to SUPPORT.html for the
specific version.

Descriptions are not version specific.  In that case the [*]
annotation is visusal noise.  Rather, it is better to make a hyperlink
out of the text which is being expanded on.  The hyperlink can point
to any appropriate version.

There is a question about how to notate this distinction in
SUPPORT.md.  After IRL discussion with George and Lars I propose that
we should put text which helps describe a feature (ie, which expands
on a section heading) after the heading but before the Status
indications; whereas, caveats and supplementary information about
the actual status, should follow the Status block.

This patch implements this distinction in the support matrix
generator.  Only paragraphs containing _only_ italic content count as
descriptive; anything else is treated as a caveat.

In the code:

 * Add a new entry to RealSect, HasDescription

 * When parsing, track whether we are before or after the first Status
   block in a new variable $has_feature.

 * In ri_Para, set HasDescription set to the input document index
   when we encounter text before the first feature.

 * When writing a `heading' (ie, the table cell for a feature name)
   look for HasDescription and make an appropriate hyperlink.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/parse-support-md: internals: Rename HasText to HasCaveat
Ian Jackson [Thu, 12 Apr 2018 16:57:43 +0000 (17:57 +0100)]
docs/parse-support-md: internals: Rename HasText to HasCaveat

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agodocs/parse-support-md: internals: Introduce docref_a
Ian Jackson [Thu, 12 Apr 2018 17:06:45 +0000 (18:06 +0100)]
docs/parse-support-md: internals: Introduce docref_a

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/HVM: never retain emulated insn cache when exiting back to guest
Jan Beulich [Mon, 23 Apr 2018 09:01:09 +0000 (11:01 +0200)]
x86/HVM: never retain emulated insn cache when exiting back to guest

Commit 5fcb26e69e ("x86/HVM: don't retain emulated insn cache when
exiting back to guest") didn't go quite far enough: The insn emulator
may itself decide to return X86EMUL_RETRY (currently for certain
CMPXCHG failures and AVX2 gather insns), in which case we'd also exit
back to guest context. Tie the caching to whether we have an I/O
completion pending, instead of x86_emulate()'s return value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/HPET: fix race triggering ASSERT(cpu < nr_cpu_ids)
David Wang [Mon, 23 Apr 2018 09:00:07 +0000 (11:00 +0200)]
x86/HPET: fix race triggering ASSERT(cpu < nr_cpu_ids)

CPUs may share an in-use channel. Hence clearing of a bit from the
cpumask (in hpet_broadcast_exit()) as well as setting one (in
hpet_broadcast_enter()) must not race evaluation of that same cpumask.
Therefore avoid evaluating the cpumask twice in hpet_detach_channel().
Otherwise cpumask_empty() may e.g.return false while the subsequent
cpumask_first() could return nr_cpu_ids, which then triggers the
assertion in cpumask_of() reached through set_channel_irq_affinity().

Signed-off-by: David Wang <davidwang@zhaoxin.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/spec_ctrl: Updates to retpoline-safety decision making
Andrew Cooper [Tue, 17 Apr 2018 12:48:01 +0000 (12:48 +0000)]
x86/spec_ctrl: Updates to retpoline-safety decision making

All of this is as recommended by the Intel whitepaper:

https://software.intel.com/sites/default/files/managed/1d/46/Retpoline-A-Branch-Target-Injection-Mitigation.pdf

The 'RSB Alternative' bit in MSR_ARCH_CAPABILITIES may be set by a hypervisor
to indicate that the virtual machine may migrate to a processor which isn't
retpoline-safe.  Introduce a shortened name (to reduce code volume), treat it
as authorative in retpoline_safe(), and print its value along with the other
ARCH_CAPS bits.

The exact processor models which do have RSB semantics which fall back to BTB
predictions are enumerated, and include Kabylake and Coffeelake.  Leave a
printk() in the default case to help identify cases which aren't covered.

The exact microcode versions from Broadwell RSB-safety are taken from the
referenced microcode update file (adjusting for the known-bad microcode
versions).  Despite the exact wording of the text, it is only Broadwell
processors which need a microcode check.

In practice, this means that all Broadwell hardware with up-to-date microcode
will use retpoline in preference to IBRS, which will be a performance
improvement for desktop and server systems which would previously always opt
for IBRS over retpoline.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agolibs/gnttab: fix FreeBSD gntdev interface
Roger Pau Monne [Tue, 17 Apr 2018 13:03:41 +0000 (14:03 +0100)]
libs/gnttab: fix FreeBSD gntdev interface

Current interface to the gntdev in FreeBSD is wrong, and mostly worked
out of luck before the PTI FreeBSD fixes, when kernel and user-space
where sharing the same page tables.

On FreeBSD ioctls have the size of the passed struct encoded in the ioctl
number, because the generic ioctl handler in the OS takes care of
copying the data from user-space to kernel space, and then calls the
device specific ioctl handler. Thus using ioctl structs with variable
sizes is not possible.

The fix is to turn the array of structs at the end of
ioctl_gntdev_alloc_gref and ioctl_gntdev_map_grant_ref into pointers,
that can be properly accessed from the kernel gntdev driver using the
copyin/copyout functions. Note that this is exactly how it's done for
the privcmd driver.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: Use spec_ctrl_{enter,exit}_idle() in the S3/S5 path
Andrew Cooper [Tue, 17 Apr 2018 17:43:49 +0000 (18:43 +0100)]
x86: Use spec_ctrl_{enter,exit}_idle() in the S3/S5 path

The main purpose of this patch is to avoid opencoding the recovery logic at
the end, but also has the positive side effect of relaxing the SPEC_CTRL
mitigations when working to shut the final CPU down.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/msr: further correct the emulation behaviour of MSR_PRED_CMD
Jan Beulich [Wed, 18 Apr 2018 09:16:37 +0000 (11:16 +0200)]
x86/msr: further correct the emulation behaviour of MSR_PRED_CMD

Following commit a6aa678fa3 ("x86/msr: Correct the emulation behaviour
of MSR_PRED_CMD") we may end up writing the low bit with the wrong
value. While it's unlikely for a guest to want to write zero there, we
should still permit (this without incurring the overhead of an actual
barrier). Correcting this right away will also help whenever further
bits in the MSR might become defined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agoMerge remote-tracking branch 'origin/staging' into staging
Ian Jackson [Tue, 17 Apr 2018 17:28:11 +0000 (18:28 +0100)]
Merge remote-tracking branch 'origin/staging' into staging

7 years agoREADME, Xen/Makefile: Xen 4.11 is -RC now 4.11.0-rc1
Ian Jackson [Tue, 17 Apr 2018 16:22:49 +0000 (17:22 +0100)]
README, Xen/Makefile: Xen 4.11 is -RC now

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoConfig.mk: Switch external trees to fixed tags for Xen 4.11.0-rc1
Ian Jackson [Tue, 17 Apr 2018 16:19:33 +0000 (17:19 +0100)]
Config.mk: Switch external trees to fixed tags for Xen 4.11.0-rc1

The minios tag `xen-4.11.0-rc1' was mistakenly made on the wrong
revision.  So we have burned that tag and use xen-4.11.0-rc1.1
instead.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agomktarball: For qemu upstream, use their scripts/archive-source.sh
Ian Jackson [Tue, 17 Apr 2018 16:53:01 +0000 (17:53 +0100)]
mktarball: For qemu upstream, use their scripts/archive-source.sh

qemu upstream uses git submodules.  git archive does not work with git
submodules (and could not work properly with them, because this is one
of the many things it is inherently impossible to do correctly with
git submodules).

qemu upstream have worked around this by providing a rather scary
shell script which attempts to do roughly the right thing.  It's close
enough that we can use it with only minor precautions.

Unfortunately this does mean that `mktarball' now executes the qemu
source code it was using, rather than merely shuffling it about, as it
did previously.  I think this is a less bad ill than copying (and,
effectively, forking) the scary script.

CC: Wei Liu <wei.liu2@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Juergen Gross <jgross@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/traps: Misc non-functional improvements to set_debugreg()
Andrew Cooper [Fri, 23 Mar 2018 20:26:34 +0000 (20:26 +0000)]
x86/traps: Misc non-functional improvements to set_debugreg()

 * Change 'int i' to being unsigned, and move it into its most narrow scope.
 * Fold the access_ok() checks for %dr{0..3}.  This halves the compiled size
   of the function.
 * Additional newlines in appropriate places.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/pv: Introduce and use x86emul_write_dr()
Andrew Cooper [Fri, 23 Mar 2018 20:26:34 +0000 (20:26 +0000)]
x86/pv: Introduce and use x86emul_write_dr()

set_debugreg() has several bugs:

 * %dr4/5 should function correctly as aliases of %dr6/7 when CR4.DE is clear.
 * Attempting to set the upper 32 bits of %dr6/7 should fail with #GP[0]
   rather than be silently corrected and complete.
 * For emulation, the #UD and #GP[0] cases need properly distinguishing.  Use
   -ENODEV for #UD cases, leaving -EINVAL (bad bits) and -EPERM (not allowed to
   use that valid bit) as before for hypercall callers.
 * A write which clears %dr7.L/G leaves the IO shadow intact, meaning that
   subsequent reads of %dr7 will see stale IO watchpoint configuration.

Implement x86emul_write_dr() as a thin wrapper around set_debugreg().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/pv: Introduce and use x86emul_read_dr()
Andrew Cooper [Fri, 23 Mar 2018 20:13:50 +0000 (20:13 +0000)]
x86/pv: Introduce and use x86emul_read_dr()

do_get_debugreg() has several bugs:

 * The %cr4.de condition is inverted.  %dr4/5 should be accessible only when
   %cr4.de is disabled.
 * When %cr4.de is disabled, emulation should yield #UD rather than complete
   with zero.
 * Using -EINVAL for errors is a broken ABI, as it overlaps with valid values
   near the top of the address space.

Introduce a common x86emul_read_dr() handler (as we will eventually want to
add HVM support) which separates its success/failure indication from the data
value, and have do_get_debugreg() call into the handler.

The ABI of do_get_debugreg() remains broken, but switches from -EINVAL to
-ENODEV for compatibility with the changes in the following patch.

Take the opportunity to add a missing local variable block to x86_emulate.c

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86/msr: Correct the emulation behaviour of MSR_PRED_CMD
Andrew Cooper [Mon, 16 Apr 2018 10:56:00 +0000 (10:56 +0000)]
x86/msr: Correct the emulation behaviour of MSR_PRED_CMD

Experimentally, the behaviour of reserved bits in MSR_PRED_CMD changed between
beta and production microcode, and now raises a #GP fault for set reserved
bits.  The AMD spec for future hardware also specifies this behaviour, and it
is the more sensible behaviour to implement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agomm: fix emfn calculation in init_domheap_pages()
Oleksandr Tyshchenko [Mon, 16 Apr 2018 12:11:09 +0000 (14:11 +0200)]
mm: fix emfn calculation in init_domheap_pages()

The "end" address must be rounded down before shifting,
otherwise we will insert wrong page range to a heap if address isn't
page aligned.

It seems that a copy-paste mistake took place in the following commit:
0c12972e34b20a26f2b42044b98bf12db7ed62b6
xen/mm: Switch some of page_alloc.c to typesafe MFN

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86: check feature flags after resume
Jan Beulich [Mon, 16 Apr 2018 12:10:33 +0000 (14:10 +0200)]
x86: check feature flags after resume

Make sure no previously present features are missing after resume (and
the re-loading of microcode), to avoid later crashes or (likely silent)
hangs / live locks. This doesn't go beyond checking x86_capability[],
but this should be good enough for the immediate need of making sure
that the BIT mitigation MSRs are still available.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
7 years agox86: suppress BTI mitigations around S3 suspend/resume
Jan Beulich [Mon, 16 Apr 2018 12:09:55 +0000 (14:09 +0200)]
x86: suppress BTI mitigations around S3 suspend/resume

NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL
may become available only once we're reloaded microcode. Make
SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for
the critical period of time.

Also set the MSR back to its intended value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>