]> xenbits.xensource.com Git - xen.git/log
xen.git
6 years agox86: correct default_xen_spec_ctrl calculation
Jan Beulich [Wed, 4 Jul 2018 10:26:24 +0000 (12:26 +0200)]
x86: correct default_xen_spec_ctrl calculation

Even with opt_msr_sc_{pv,hvm} both false we should set up the variable
as usual, to ensure proper one-time setup during boot and CPU bringup.
This then also brings the code in line with the comment immediately
ahead of the printk() being modified saying "irrespective of guests".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d6239f64713df819278bf048446d3187c6ac4734
master date: 2018-05-29 12:38:52 +0200

6 years agolibxc/x86/PV: don't hand through CPUID leaf 0x80000008 as is
Jan Beulich [Wed, 4 Jul 2018 10:25:40 +0000 (12:25 +0200)]
libxc/x86/PV: don't hand through CPUID leaf 0x80000008 as is

Just like for HVM the feature set should be used for EBX output, while
EAX should be restricted to the low 16 bits and ECX/EDX should be zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 849cc9ac56eff8a8d575ed9f484aad72f383862c
master date: 2018-05-29 10:51:02 +0100

6 years agox86: guard against #NM
Jan Beulich [Thu, 28 Jun 2018 07:56:01 +0000 (09:56 +0200)]
x86: guard against #NM

Just in case we still don't get CR0.TS handling right, prevent a host
crash by honoring exception fixups in do_device_not_available(). This
would in particular cover emulator stubs raising #NM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 00cebd6f22beb6d5fa65ed2d8d1ff9acf59bce61
master date: 2018-06-28 09:08:04 +0200

6 years agox86/HVM: don't cause #NM to be raised in Xen
Jan Beulich [Thu, 28 Jun 2018 07:55:14 +0000 (09:55 +0200)]
x86/HVM: don't cause #NM to be raised in Xen

The changes for XSA-267 did not touch management of CR0.TS for HVM
guests. In fully eager mode this bit should never be set when
respective vCPU-s are active, or else hvmemul_get_fpu() might leave it
wrongly set, leading to #NM in hypervisor context.

{svm,vmx}_enter() and {svm,vmx}_fpu_dirty_intercept() become unreachable
this way. Explicit {svm,vmx}_fpu_leave() invocations need to be guarded
now.

With no CR0.TS management necessary in fully eager mode, there's also no
need anymore to intercept #NM.

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 488efc29e4e996bb3805c982200f65061390cdce
master date: 2018-06-28 09:07:06 +0200

6 years agolibxl: restore passing "readonly=" to qemu for SCSI disks
Ian Jackson [Thu, 28 Jun 2018 07:48:04 +0000 (09:48 +0200)]
libxl: restore passing "readonly=" to qemu for SCSI disks

A read-only check was introduced for XSA-142, commit ef6cb76026 ("libxl:
relax readonly check introduced by XSA-142 fix") added the passing of
the extra setting, but commit dab0539568 ("Introduce COLO mode and
refactor relevant function") dropped the passing of the setting again,
quite likely due to improper re-basing.

Restore the readonly= parameter to SCSI disks.  For IDE disks this is
supposed to be rejected; add an assert.  And there is a bare ad-hoc
disk drive string in libxl__build_device_model_args_new, which we also
update.

This is XSA-266.

Reported-by: Andrew Reimers <andrew.reimers@orionvm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
master commit: dd64d3c41a2d15139c3a35d22d4cb6b78f4c5c59
master date: 2018-06-28 09:05:06 +0200

6 years agolibxl: qemu_disk_scsi_drive_string: Break out common parts of disk config
Ian Jackson [Thu, 28 Jun 2018 07:47:41 +0000 (09:47 +0200)]
libxl: qemu_disk_scsi_drive_string: Break out common parts of disk config

The generated configurations are identical apart from, in some cases,
reordering of the id=%s element.  So, overall, no functional change.

This is part of XSA-266.

Reported-by: Andrew Reimers <andrew.reimers@orionvm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
master commit: 724e5aa31b58d1e430ad36b484cf0ec021497399
master date: 2018-06-28 09:04:55 +0200

6 years agox86: Refine checks in #DB handler for faulting conditions
Andrew Cooper [Thu, 28 Jun 2018 07:46:54 +0000 (09:46 +0200)]
x86: Refine checks in #DB handler for faulting conditions

One of the fix for XSA-260 (c/s 75d6828bc2 "x86/traps: Fix handling of #DB
exceptions in hypervisor context") added some safety checks to help avoid
livelocks of #DB faults.

While a General Detect #DB exception does have fault semantics, hardware
clears %dr7.gd on entry to the handler, meaning that it is actually safe to
return to.  Furthermore, %dr6.gd is guest controlled and sticky (never cleared
by hardware).  A malicious PV guest can therefore trigger the fatal_trap() and
crash Xen.

Instruction breakpoints are more tricky.  The breakpoint match bits in %dr6
are not sticky, but the Intel manual warns that they may be set for
non-enabled breakpoints, so add a breakpoint enabled check.

Beyond that, because of the restriction on the linear addresses PV guests can
set, and the fault (rather than trap) nature of instruction breakpoints
(i.e. can't be deferred by a MovSS shadow), there should be no way to
encounter an instruction breakpoint in Xen context.  However, for extra
robustness, deal with this situation by clearing the breakpoint configuration,
rather than crashing.

This is XSA-265

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 17bf51297220dcd74da29de99320b6b1c72d1fa5
master date: 2018-06-28 09:04:20 +0200

6 years agox86/mm: don't bypass preemption checks
Jan Beulich [Thu, 28 Jun 2018 07:45:42 +0000 (09:45 +0200)]
x86/mm: don't bypass preemption checks

While unlikely, it is not impossible for a multi-vCPU guest to leverage
bypasses of preemption checks to drive Xen into an unbounded loop.

This is XSA-264.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 17608703c65bf080b0a9f024f9b370872b9f2c05
master date: 2018-06-28 09:03:09 +0200

6 years agox86/HVM: account for fully eager FPU mode in emulation
Jan Beulich [Fri, 15 Jun 2018 11:43:43 +0000 (13:43 +0200)]
x86/HVM: account for fully eager FPU mode in emulation

In fully eager mode we must not clear fpu_dirtied, set CR0.TS, or invoke
the fpu_leave() hook. Instead do what the mode's name says: Restore
state right away.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e23d2234e08872ac1c719f3e338994581483440f
master date: 2018-06-15 11:49:06 +0200

6 years agox86/spec-ctrl: Mitigations for LazyFPU
Andrew Cooper [Thu, 7 Jun 2018 16:00:37 +0000 (17:00 +0100)]
x86/spec-ctrl: Mitigations for LazyFPU

Intel Core processors since at least Nehalem speculate past #NM, which is the
mechanism by which lazy FPU context switching is implemented.

On affected processors, Xen must use fully eager FPU context switching to
prevent guests from being able to read FPU state (SSE/AVX/etc) from previously
scheduled vcpus.

This is part of XSA-267 / CVE-2018-3665

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 243435bf67e8159495194f623b9e4d8c90140384)

6 years agox86: Support fully eager FPU context switching
Andrew Cooper [Thu, 7 Jun 2018 16:00:37 +0000 (17:00 +0100)]
x86: Support fully eager FPU context switching

This is controlled on a per-vcpu bases for flexibility.

This is part of XSA-267 / CVE-2018-3665

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 146dfe9277c2b4a8c399b229e00d819065e3167b)

6 years agoxen/x86: use PCID feature
Juergen Gross [Thu, 26 Apr 2018 11:33:18 +0000 (13:33 +0200)]
xen/x86: use PCID feature

Avoid flushing the complete TLB when switching %cr3 for mitigation of
Meltdown by using the PCID feature if available.

We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
2 values for the non-XPTI case:

- guest active and in kernel mode
- guest active and in user mode
- hypervisor active and guest in user mode (XPTI only)
- hypervisor active and guest in kernel mode (XPTI only)

We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
we disable global pages in cr4. A command line parameter controls in
which cases PCID is being used.

As the non-XPTI case has shown not to perform better with PCID at least
on some machines the default is to use PCID only for domains subject to
XPTI.

With PCID enabled we always disable global pages. This avoids having to
either flush the complete TLB or do a cycle through all PCID values
when invalidating a single global page.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/x86: add some cr3 helpers
Juergen Gross [Thu, 26 Apr 2018 11:33:17 +0000 (13:33 +0200)]
xen/x86: add some cr3 helpers

Add some helper macros to access the address and pcid parts of cr3.

Use those helpers where appropriate.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/x86: convert pv_guest_cr4_to_real_cr4() to a function
Juergen Gross [Thu, 26 Apr 2018 11:33:16 +0000 (13:33 +0200)]
xen/x86: convert pv_guest_cr4_to_real_cr4() to a function

pv_guest_cr4_to_real_cr4() is becoming more and more complex. Convert
it from a macro to an ordinary function.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/x86: use flag byte for decision whether xen_cr3 is valid
Juergen Gross [Thu, 26 Apr 2018 11:33:15 +0000 (13:33 +0200)]
xen/x86: use flag byte for decision whether xen_cr3 is valid

Today cpu_info->xen_cr3 is either 0 to indicate %cr3 doesn't need to
be switched on entry to Xen, or negative for keeping the value while
indicating not to restore %cr3, or positive in case %cr3 is to be
restored.

Switch to use a flag byte instead of a negative xen_cr3 value in order
to allow %cr3 values with the high bit set in case we want to keep TLB
entries when using the PCID feature.

This reduces the number of branches in interrupt handling and results
in better performance (e.g. parallel make of the Xen hypervisor on my
system was using about 3% less system time).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/x86: disable global pages for domains with XPTI active
Juergen Gross [Thu, 26 Apr 2018 11:33:14 +0000 (13:33 +0200)]
xen/x86: disable global pages for domains with XPTI active

Instead of flushing the TLB from global pages when switching address
spaces with XPTI being active just disable global pages via %cr4
completely when a domain subject to XPTI is active. This avoids the
need for extra TLB flushes as loading %cr3 will remove all TLB
entries.

In order to avoid states with cr3/cr4 having inconsistent values
(e.g. global pages being activated while cr3 already specifies a XPTI
address space) move loading of the new cr4 value to write_ptbase()
(actually to switch_cr3_cr4() called by write_ptbase()).

This requires to use switch_cr3_cr4() instead of write_ptbase() when
building dom0 in order to avoid setting cr4 with cr4.smap set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/x86: use invpcid for flushing the TLB
Juergen Gross [Thu, 26 Apr 2018 11:33:13 +0000 (13:33 +0200)]
xen/x86: use invpcid for flushing the TLB

If possible use the INVPCID instruction for flushing the TLB instead of
toggling cr4.pge for that purpose.

While at it remove the dependency on cr4.pge being required for mtrr
loading, as this will be required later anyway.

Add a command line option "invpcid" for controlling the use of
INVPCID (default to true).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/x86: support per-domain flag for xpti
Juergen Gross [Thu, 26 Apr 2018 11:33:12 +0000 (13:33 +0200)]
xen/x86: support per-domain flag for xpti

Instead of switching XPTI globally on or off add a per-domain flag for
that purpose. This allows to modify the xpti boot parameter to support
running dom0 without Meltdown mitigations. Using "xpti=no-dom0" as boot
parameter will achieve that.

Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
it is pv-domain specific.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/x86: add a function for modifying cr3
Juergen Gross [Thu, 26 Apr 2018 11:33:11 +0000 (13:33 +0200)]
xen/x86: add a function for modifying cr3

Instead of having multiple places with more or less identical asm
statements just have one function doing a write to cr3.

As this function should be named write_cr3() rename the current
write_cr3() function to switch_cr3().

Suggested-by: Andrew Copper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/xpti: avoid copying L4 page table contents when possible
Juergen Gross [Tue, 29 May 2018 07:33:35 +0000 (09:33 +0200)]
x86/xpti: avoid copying L4 page table contents when possible

For mitigation of Meltdown the current L4 page table is copied to the
cpu local root page table each time a 64 bit pv guest is entered.

Copying can be avoided in cases where the guest L4 page table hasn't
been modified while running the hypervisor, e.g. when handling
interrupts or any hypercall not modifying the L4 page table or %cr3.

So add a per-cpu flag indicating whether the copying should be
performed and set that flag only when loading a new %cr3 or modifying
the L4 page table.  This includes synchronization of the cpu local
root page table with other cpus, so add a special synchronization flag
for that case.

A simple performance check (compiling the hypervisor via "make -j 4")
in dom0 with 4 vcpus shows a significant improvement:

- real time drops from 112 seconds to 103 seconds
- system time drops from 142 seconds to 131 seconds

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: invpcid support
Wei Liu [Fri, 2 Mar 2018 16:23:38 +0000 (16:23 +0000)]
x86: invpcid support

Provide the functions needed for different modes. Add cpu_has_invpcid.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: move invocations of hvm_flush_guest_tlbs()
Jan Beulich [Tue, 23 Jan 2018 09:43:39 +0000 (10:43 +0100)]
x86: move invocations of hvm_flush_guest_tlbs()

Their need is not tied to the actual flushing of TLBs, but the ticking
of the TLB clock. Make this more obvious by folding the two invocations
into a single one in pre_flush().

Also defer the latching of CR4 in write_cr3() until after pre_flush()
(and hence implicitly until after IRQs are off), making operation
sequence the same in both cases (eliminating the theoretical risk of
pre_flush() altering CR4). This then also improves register allocation,
as the compiler doesn't need to use a callee-saved register for "cr4"
anymore.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/XPTI: fix S3 resume (and CPU offlining in general)
Jan Beulich [Tue, 29 May 2018 07:25:57 +0000 (09:25 +0200)]
x86/XPTI: fix S3 resume (and CPU offlining in general)

We should index an L1 table with an L1 index.

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6b9562dac1746014ab376bd2cf8ba400acf34c6d
master date: 2018-05-28 11:20:26 +0200

6 years agox86/msr: Virtualise MSR_SPEC_CTRL.SSBD for guests to use
Andrew Cooper [Fri, 13 Apr 2018 15:42:34 +0000 (15:42 +0000)]
x86/msr: Virtualise MSR_SPEC_CTRL.SSBD for guests to use

Almost all infrastructure is already in place.  Update the reserved bits
calculation in guest_wrmsr(), and offer SSBD to guests by default.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/Intel: Mitigations for GPZ SP4 - Speculative Store Bypass
Andrew Cooper [Wed, 28 Mar 2018 14:21:39 +0000 (15:21 +0100)]
x86/Intel: Mitigations for GPZ SP4 - Speculative Store Bypass

To combat GPZ SP4 "Speculative Store Bypass", Intel have extended their
speculative sidechannel mitigations specification as follows:

 * A feature bit to indicate that Speculative Store Bypass Disable is
   supported.
 * A new bit in MSR_SPEC_CTRL which, when set, disables memory disambiguation
   in the pipeline.
 * A new bit in MSR_ARCH_CAPABILITIES, which will be set in future hardware,
   indicating that the hardware is not susceptible to Speculative Store Bypass
   sidechannels.

For contemporary processors, this interface will be implemented via a
microcode update.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/AMD: Mitigations for GPZ SP4 - Speculative Store Bypass
Andrew Cooper [Thu, 26 Apr 2018 09:56:28 +0000 (10:56 +0100)]
x86/AMD: Mitigations for GPZ SP4 - Speculative Store Bypass

AMD processors will execute loads and stores with the same base register in
program order, which is typically how a compiler emits code.

Therefore, by default no mitigating actions are taken, despite there being
corner cases which are vulnerable to the issue.

For performance testing, or for users with particularly sensitive workloads,
the `spec-ctrl=ssbd` command line option is available to force Xen to disable
Memory Disambiguation on applicable hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/spec_ctrl: Introduce a new `spec-ctrl=` command line argument to replace `bti=`
Andrew Cooper [Tue, 29 May 2018 07:23:32 +0000 (09:23 +0200)]
x86/spec_ctrl: Introduce a new `spec-ctrl=` command line argument to replace `bti=`

In hindsight, the options for `bti=` aren't as flexible or useful as expected
(including several options which don't appear to behave as intended).
Changing the behaviour of an existing option is problematic for compatibility,
so introduce a new `spec-ctrl=` in the hopes that we can do better.

One common way of deploying Xen is with a single PV dom0 and all domUs being
HVM domains.  In such a setup, an administrator who has weighed up the risks
may wish to forgo protection against malicious PV domains, to reduce the
overall performance hit.  To cater for this usecase, `spec-ctrl=no-pv` will
disable all speculative protection for PV domains, while leaving all
speculative protection for HVM domains intact.

For coding clarity as much as anything else, the suboptions are grouped by
logical area; those which affect the alternatives blocks, and those which
affect Xen's in-hypervisor settings.  See the xen-command-line.markdown for
full details of the new options.

While changing the command line options, take the time to change how the data
is reported to the user.  The three DEBUG printks are upgraded to unilateral,
as they are all relevant pieces of information, and the old "mitigations:"
line is split in the two logical areas described above.

Sample output from booting with `spec-ctrl=no-pv` looks like:

  (XEN) Speculative mitigation facilities:
  (XEN)   Hardware features: IBRS/IBPB STIBP IBPB
  (XEN)   Compiled-in support: INDIRECT_THUNK
  (XEN)   Xen settings: BTI-Thunk RETPOLINE, SPEC_CTRL: IBRS-, Other: IBPB
  (XEN)   Support for VMs: PV: None, HVM: MSR_SPEC_CTRL RSB
  (XEN)   XPTI (64-bit PV only): Dom0 enabled, DomU enabled

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3352afc26c497d26ecb70527db3cb29daf7b1422
master date: 2018-05-16 12:19:10 +0100

6 years agox86/cpuid: Improvements to guest policies for speculative sidechannel features
Andrew Cooper [Tue, 29 May 2018 07:22:56 +0000 (09:22 +0200)]
x86/cpuid: Improvements to guest policies for speculative sidechannel features

If Xen isn't virtualising MSR_SPEC_CTRL for guests, IBRSB shouldn't be
advertised.  It is not currently possible to express this via the existing
command line options, but such an ability will be introduced.

Another useful option in some usecases is to offer IBPB without IBRS.  When a
guest kernel is known to be compatible (uses retpoline and knows about the AMD
IBPB feature bit), an administrator with pre-Skylake hardware may wish to hide
IBRS.  This allows the VM to have full protection, without Xen or the VM
needing to touch MSR_SPEC_CTRL, which can reduce the overhead of Spectre
mitigations.

Break the logic common to both PV and HVM CPUID calculations into a common
helper, to avoid duplication.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cb06b308ec71b23f37a44f5e2351fe2cae0306e9
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Explicitly set Xen's default MSR_SPEC_CTRL value
Andrew Cooper [Tue, 29 May 2018 07:22:27 +0000 (09:22 +0200)]
x86/spec_ctrl: Explicitly set Xen's default MSR_SPEC_CTRL value

With the impending ability to disable MSR_SPEC_CTRL handling on a
per-guest-type basis, the first exit-from-guest may not have the side effect
of loading Xen's choice of value.  Explicitly set Xen's default during the BSP
and AP boot paths.

For the BSP however, delay setting a non-zero MSR_SPEC_CTRL default until
after dom0 has been constructed when safe to do so.  Oracle report that this
speeds up boots of some hardware by 50s.

"when safe to do so" is based on whether we are virtualised.  A native boot
won't have any other code running in a position to mount an attack.

Reported-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cb8c12020307b39a89273d7699e89000451987ab
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Split X86_FEATURE_SC_MSR into PV and HVM variants
Andrew Cooper [Tue, 29 May 2018 07:21:49 +0000 (09:21 +0200)]
x86/spec_ctrl: Split X86_FEATURE_SC_MSR into PV and HVM variants

In order to separately control whether MSR_SPEC_CTRL is virtualised for PV and
HVM guests, split the feature used to control runtime alternatives into two.
Xen will use MSR_SPEC_CTRL itself if either of these features are active.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: fa9eb09d446a1279f5e861e6b84fa8675dabf148
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Elide MSR_SPEC_CTRL handling in idle context when possible
Andrew Cooper [Tue, 29 May 2018 07:21:17 +0000 (09:21 +0200)]
x86/spec_ctrl: Elide MSR_SPEC_CTRL handling in idle context when possible

If Xen is virtualising MSR_SPEC_CTRL handling for guests, but using 0 as its
own MSR_SPEC_CTRL value, spec_ctrl_{enter,exit}_idle() need not write to the
MSR.

Requested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 94df6e8588e35cc2028ccb3fd2921c6e6360605e
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Rename bits of infrastructure to avoid NATIVE and VMEXIT
Andrew Cooper [Tue, 29 May 2018 07:20:43 +0000 (09:20 +0200)]
x86/spec_ctrl: Rename bits of infrastructure to avoid NATIVE and VMEXIT

In hindsight, using NATIVE and VMEXIT as naming terminology was not clever.
A future change wants to split SPEC_CTRL_EXIT_TO_GUEST into PV and HVM
specific implementations, and using VMEXIT as a term is completely wrong.

Take the opportunity to fix some stale documentation in spec_ctrl_asm.h.  The
IST helpers were missing from the large comment block, and since
SPEC_CTRL_ENTRY_FROM_INTR_IST was introduced, we've gained a new piece of
functionality which currently depends on the fine grain control, which exists
in lieu of livepatching.  Note this in the comment.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d9822b8a38114e96e4516dc998f4055249364d5d
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Fold the XEN_IBRS_{SET,CLEAR} ALTERNATIVES together
Andrew Cooper [Tue, 29 May 2018 07:19:54 +0000 (09:19 +0200)]
x86/spec_ctrl: Fold the XEN_IBRS_{SET,CLEAR} ALTERNATIVES together

Currently, the SPEC_CTRL_{ENTRY,EXIT}_* macros encode Xen's choice of
MSR_SPEC_CTRL as an immediate constant, and chooses between IBRS or not by
doubling up the entire alternative block.

There is now a variable holding Xen's choice of value, so use that and
simplify the alternatives.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: af949407eaba7af71067f23d5866cd0bf1f1144d
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Merge bti_ist_info and use_shadow_spec_ctrl into spec_ctrl_flags
Andrew Cooper [Tue, 29 May 2018 07:18:50 +0000 (09:18 +0200)]
x86/spec_ctrl: Merge bti_ist_info and use_shadow_spec_ctrl into spec_ctrl_flags

All 3 bits of information here are control flags for the entry/exit code
behaviour.  Treat them as such, rather than having two different variables.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 5262ba2e7799001402dfe139ff944e035dfff928
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Express Xen's choice of MSR_SPEC_CTRL value as a variable
Andrew Cooper [Tue, 29 May 2018 07:18:16 +0000 (09:18 +0200)]
x86/spec_ctrl: Express Xen's choice of MSR_SPEC_CTRL value as a variable

At the moment, we have two different encodings of Xen's MSR_SPEC_CTRL value,
which is a side effect of how the Spectre series developed.  One encoding is
via an alias with the bottom bit of bti_ist_info, and can encode IBRS or not,
but not other configurations such as STIBP.

Break Xen's value out into a separate variable (in the top of stack block for
XPTI reasons) and use this instead of bti_ist_info in the IST path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 66dfae0f32bfbc899c2f3446d5ee57068cb7f957
master date: 2018-05-16 12:19:10 +0100

6 years agox86/spec_ctrl: Read MSR_ARCH_CAPABILITIES only once
Andrew Cooper [Tue, 29 May 2018 07:17:27 +0000 (09:17 +0200)]
x86/spec_ctrl: Read MSR_ARCH_CAPABILITIES only once

Make it available from the beginning of init_speculation_mitigations(), and
pass it into appropriate functions.  Fix an RSBA typo while moving the
affected comment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d6c65187252a6c1810fd24c4d46f812840de8d3c
master date: 2018-05-16 12:19:10 +0100

6 years agoviridian: fix cpuid leaf 0x40000003
Paul Durrant [Fri, 18 May 2018 09:48:43 +0000 (11:48 +0200)]
viridian: fix cpuid leaf 0x40000003

The response to viridian leaf 3 needs to split a 64-bit mask across EAX and
EBX, with the low order 32 bits in EAX and the high order 32 bits in EBX.
To facilitate this a union of two uint32_t values and the mask (type
HV_PARTITION_PRIVILEGE_MASK) is allocated on stack as follows:

union {
    HV_PARTITION_PRIVILEGE_MASK mask;
    uint32_t lo, hi;
} u;

This, of course, is incorrect as both lo and hi will alias the low order
32 bits of the mask.

This patch wraps lo and hi in an anonmymous struct to achieve the desired
effect.

NOTE: Fixing this also stops Windows making the HvGetPartitionId hypercall
      which was previously considered erroneous behaviour. Thus the
      hypercall handler is also modified to stop squashing the
      'unimplemented' warning for this hypercall.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 29fc0493d8eabdd63f5bbff9e3069253053addca
master date: 2018-05-14 12:57:13 +0100

6 years agolibacpi: fixes for iasl >= 20180427
Roger Pau Monné [Fri, 18 May 2018 09:48:18 +0000 (11:48 +0200)]
libacpi: fixes for iasl >= 20180427

New versions of iasl have introduced improved C file generation, as
reported in the changelog:

iASL: Enhanced the -tc option (which creates an AML hex file in C,
suitable for import into a firmware project):
  1) Create a unique name for the table, to simplify use of multiple
SSDTs.
  2) Add a protection #ifdef in the file, similar to a .h header file.

The net effect of that on generated files is:

-unsigned char AmlCode[] =
+#ifndef __SSDT_S4_HEX__
+#define __SSDT_S4_HEX__
+
+unsigned char ssdt_s4_aml_code[] =

The above example is from ssdt_s4.asl.

Fix the build with newer versions of iasl by stripping the '_aml_code'
suffix from the variable name on generated files.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 858dbaaeda33b05c1ac80aea0ba9a03924e09005
master date: 2018-05-09 18:17:51 +0100

6 years agox86/pv: Hide more EFER bits from PV guests
Andrew Cooper [Fri, 18 May 2018 09:47:37 +0000 (11:47 +0200)]
x86/pv: Hide more EFER bits from PV guests

We don't advertise SVM in CPUID so a PV guest shouldn't be under the
impression that it can use SVM functionality, but despite this, it really
shouldn't see SVME set when reading EFER.

On Intel processors, 32bit PV guests don't see, and can't use SYSCALL.

Introduce EFER_KNOWN_MASK to whitelist the features Xen knows about, and use
this to clamp the guests view.

Take the opportunity to reuse the mask to simplify svm_vmcb_isvalid(), and
change "undefined" to "unknown" in the print message, as there is at least
EFER.TCE (Translation Cache Extension) defined but unknown to Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 589263031c04e2ba527783b4e04e8df27d364769
master date: 2018-05-07 11:52:57 +0100

6 years agox86: fix return value checks of set_guest_{machinecheck,nmi}_trapbounce
Jan Beulich [Fri, 18 May 2018 09:46:41 +0000 (11:46 +0200)]
x86: fix return value checks of set_guest_{machinecheck,nmi}_trapbounce

Commit 0142064421 ("x86/traps: move set_guest_{machine,nmi}_trapbounce")
converted the functions' return types from int to bool without also
correcting the checks in assembly code: The ABI does not guarantee sub-
32-bit return values to be promoted to 32 bits.

Take the liberty and also adjust the number of spaces used in the compat
code, such that both code sequences end up similar (they already are in
the non-compat case).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 4611f529c0e39493a3945641cc161967a864d6b5
master date: 2018-05-03 17:35:51 +0200

6 years agoxen/schedule: Fix races in vcpu migration
George Dunlap [Fri, 18 May 2018 09:45:57 +0000 (11:45 +0200)]
xen/schedule: Fix races in vcpu migration

The current sequence to initiate vcpu migration is inefficent and error-prone:

- The initiator sets VPF_migraging with the lock held, then drops the
  lock and calls vcpu_sleep_nosync(), which immediately grabs the lock
  again

- A number of places unnecessarily check for v->pause_flags in between
  those two

- Every call to vcpu_migrate() must be prefaced with
  vcpu_sleep_nosync() or introduce a race condition; this code
  duplication is error-prone

- In the event that v->is_running is true at the beginning of
  vcpu_migrate(), it's almost certain that vcpu_migrate() will end up
  being called in context_switch() as well; we might as well simply
  let it run there and save the duplicated effort (which will be
  non-negligible).

The result is that Credit1 has several races which result in runqueue
<-> v->processor invariants being violated (triggering ASSERTs in
debug builds and strange bugs in production builds).

Instead, introduce vcpu_migrate_start() to initiate the process.
vcpu_migrate_start() is called with the scheduling lock held.  It not
only sets VPF_migrating, but also calls vcpu_sleep_nosync_locked()
(which will automatically do nothing if there's nothing to do).

Rename vcpu_migrate() to vcpu_migrate_finish().  Check for v->is_running and
pause_flags & VPF_migrating at the top and return if appropriate.

Then the way to initiate migration is consistently:

* Grab lock
* vcpu_migrate_start()
* Release lock
* vcpu_migrate_finish()

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
master commit: 9a36de177c16d6423a07ad61f1c7af5274769aae
master date: 2018-05-03 11:56:48 +0100

6 years agoxen: Introduce vcpu_sleep_nosync_locked()
George Dunlap [Fri, 18 May 2018 09:45:27 +0000 (11:45 +0200)]
xen: Introduce vcpu_sleep_nosync_locked()

There are a lot of places which release a lock before calling
vcpu_sleep_nosync(), which then just grabs the lock again.  This is
not only a waste of time, but leads to more code duplication (since
you have to copy-and-paste recipes rather than calling a unified
function), which in turn leads to an increased chance of bugs.

Introduce vcpu_sleep_nosync_locked(), which can be called if you
already hold the schedule lock.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
master commit: da0a5e00de8aa93f2a7482d138dbee9dec2aa5c2
master date: 2018-05-03 11:56:36 +0100

6 years agox86/SVM: Fix intercepted {RD,WR}MSR for the SYS{CALL,ENTER} MSRs
Andrew Cooper [Fri, 18 May 2018 09:44:44 +0000 (11:44 +0200)]
x86/SVM: Fix intercepted {RD,WR}MSR for the SYS{CALL,ENTER} MSRs

By default, the SYSCALL MSRs are not intercepted, and accesses are completed
by hardware.  The SYSENTER MSRs are intercepted for cross-vendor
purposes (albeit needlessly in the common case), and are fully emulated.

However, {RD,WR}MSR instructions which happen to be emulated (FEP,
introspection, or older versions of Xen which intercepted #UD), or when the
MSRs are explicitly intercepted (introspection), will be completed
incorrectly.

svm_msr_read_intercept() appears to return the correct values, but only
because of the default read-everything case (which is going to disappear), and
that in vcpu context, hardware should have the guest values in context.
Update the read path to explicitly sync the VMCB and complete the accesses,
rather than falling all the way through to the default case.

svm_msr_write_intercept() silently discard all updates.  Synchronise the VMCB
for all applicable MSRs, and implement suitable checks.  The actual behaviour
of AMD hardware is to truncate the SYSENTER and SFMASK MSRs at 32 bits, but
this isn't implemented yet to remain compatible with the cross-vendor case.

Drop one bit of trailing whitespace while modifing this area of the code.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: c04c1866e5131e450ddcd114e32401477c60b816
master date: 2018-04-25 13:08:13 +0100

6 years agoxpti: fix bug in double fault handling
Juergen Gross [Fri, 18 May 2018 09:43:35 +0000 (11:43 +0200)]
xpti: fix bug in double fault handling

When entering the hypervisor via the double fault handler resetting
xen_cr3 was missing. This led to switching to pv_cr3 when returning
from the next following exception, so repair this in order to allow
exception handling to work even after a double fault.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d80af845de7a4db01a4a3b4d779e0e0dcb5e738b
master date: 2018-04-23 16:13:01 +0200

6 years agox86/HVM: never retain emulated insn cache when exiting back to guest
Jan Beulich [Fri, 18 May 2018 09:42:59 +0000 (11:42 +0200)]
x86/HVM: never retain emulated insn cache when exiting back to guest

Commit 5fcb26e69e ("x86/HVM: don't retain emulated insn cache when
exiting back to guest") didn't go quite far enough: The insn emulator
may itself decide to return X86EMUL_RETRY (currently for certain
CMPXCHG failures and AVX2 gather insns), in which case we'd also exit
back to guest context. Tie the caching to whether we have an I/O
completion pending, instead of x86_emulate()'s return value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
master commit: 25b0dad541e31bd892d57cbeafe8e0c0bf4e8385
master date: 2018-04-23 11:01:09 +0200

6 years agox86/HPET: fix race triggering ASSERT(cpu < nr_cpu_ids)
David Wang [Fri, 18 May 2018 09:42:29 +0000 (11:42 +0200)]
x86/HPET: fix race triggering ASSERT(cpu < nr_cpu_ids)

CPUs may share an in-use channel. Hence clearing of a bit from the
cpumask (in hpet_broadcast_exit()) as well as setting one (in
hpet_broadcast_enter()) must not race evaluation of that same cpumask.
Therefore avoid evaluating the cpumask twice in hpet_detach_channel().
Otherwise cpumask_empty() may e.g.return false while the subsequent
cpumask_first() could return nr_cpu_ids, which then triggers the
assertion in cpumask_of() reached through set_channel_irq_affinity().

Signed-off-by: David Wang <davidwang@zhaoxin.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 8c02a19230502a9522b097ee15742599091064aa
master date: 2018-04-23 11:00:07 +0200

6 years agox86/spec_ctrl: Updates to retpoline-safety decision making
Andrew Cooper [Fri, 18 May 2018 09:41:53 +0000 (11:41 +0200)]
x86/spec_ctrl: Updates to retpoline-safety decision making

All of this is as recommended by the Intel whitepaper:

https://software.intel.com/sites/default/files/managed/1d/46/Retpoline-A-Branch-Target-Injection-Mitigation.pdf

The 'RSB Alternative' bit in MSR_ARCH_CAPABILITIES may be set by a hypervisor
to indicate that the virtual machine may migrate to a processor which isn't
retpoline-safe.  Introduce a shortened name (to reduce code volume), treat it
as authorative in retpoline_safe(), and print its value along with the other
ARCH_CAPS bits.

The exact processor models which do have RSB semantics which fall back to BTB
predictions are enumerated, and include Kabylake and Coffeelake.  Leave a
printk() in the default case to help identify cases which aren't covered.

The exact microcode versions from Broadwell RSB-safety are taken from the
referenced microcode update file (adjusting for the known-bad microcode
versions).  Despite the exact wording of the text, it is only Broadwell
processors which need a microcode check.

In practice, this means that all Broadwell hardware with up-to-date microcode
will use retpoline in preference to IBRS, which will be a performance
improvement for desktop and server systems which would previously always opt
for IBRS over retpoline.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/spec_ctrl: Fix typo in ARCH_CAPS decode

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 1232378bd2fef45f613db049b33852fdf84d7ddf
master date: 2018-04-19 17:28:23 +0100
master commit: 27170adb54a558e11defcd51989326a9beb95afe
master date: 2018-04-24 13:34:12 +0100

6 years agox86/pv: Introduce and use x86emul_write_dr()
Andrew Cooper [Fri, 18 May 2018 09:41:11 +0000 (11:41 +0200)]
x86/pv: Introduce and use x86emul_write_dr()

set_debugreg() has several bugs:

 * %dr4/5 should function correctly as aliases of %dr6/7 when CR4.DE is clear.
 * Attempting to set the upper 32 bits of %dr6/7 should fail with #GP[0]
   rather than be silently corrected and complete.
 * For emulation, the #UD and #GP[0] cases need properly distinguishing.  Use
   -ENODEV for #UD cases, leaving -EINVAL (bad bits) and -EPERM (not allowed to
   use that valid bit) as before for hypercall callers.
 * A write which clears %dr7.L/G leaves the IO shadow intact, meaning that
   subsequent reads of %dr7 will see stale IO watchpoint configuration.

Implement x86emul_write_dr() as a thin wrapper around set_debugreg().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f539ae27061c6811fd5e80e0755bf0514e22b977
master date: 2018-04-17 15:12:36 +0100

6 years agox86/pv: Introduce and use x86emul_read_dr()
Andrew Cooper [Fri, 18 May 2018 09:40:23 +0000 (11:40 +0200)]
x86/pv: Introduce and use x86emul_read_dr()

do_get_debugreg() has several bugs:

 * The %cr4.de condition is inverted.  %dr4/5 should be accessible only when
   %cr4.de is disabled.
 * When %cr4.de is disabled, emulation should yield #UD rather than complete
   with zero.
 * Using -EINVAL for errors is a broken ABI, as it overlaps with valid values
   near the top of the address space.

Introduce a common x86emul_read_dr() handler (as we will eventually want to
add HVM support) which separates its success/failure indication from the data
value, and have do_get_debugreg() call into the handler.

The ABI of do_get_debugreg() remains broken, but switches from -EINVAL to
-ENODEV for compatibility with the changes in the following patch.

Take the opportunity to add a missing local variable block to x86_emulate.c

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 881f8dc4314809293efc6f66f9af49734994bf0e
master date: 2018-04-17 15:12:36 +0100

6 years agox86: suppress BTI mitigations around S3 suspend/resume
Jan Beulich [Fri, 18 May 2018 09:39:38 +0000 (11:39 +0200)]
x86: suppress BTI mitigations around S3 suspend/resume

NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL
may become available only once we're reloaded microcode. Make
SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for
the critical period of time.

Also set the MSR back to its intended value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86: Use spec_ctrl_{enter,exit}_idle() in the S3/S5 path

The main purpose of this patch is to avoid opencoding the recovery logic at
the end, but also has the positive side effect of relaxing the SPEC_CTRL
mitigations when working to shut the final CPU down.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 710a8ebf2bc111a34bba04d1c85b6d07ed3d9389
master date: 2018-04-16 14:09:55 +0200
master commit: ef3ab46493f650b7e5cca2b2578a99ca0cbff195
master date: 2018-04-19 10:55:59 +0100

6 years agox86: correct ordering of operations during S3 resume
Jan Beulich [Fri, 18 May 2018 09:39:07 +0000 (11:39 +0200)]
x86: correct ordering of operations during S3 resume

Microcode loading needs to happen before re-enabling interrupts, in case
only updated microcode allows the use of e.g. the SPEC_{CTRL,CMD} MSRs.
Otoh it doesn't need to happen at all when we didn't suspend in the
first place. It needs to happen before spin_debug_enable() though, as it
acquires a lock and hence would otherwise make
common/spinlock.c:check_lock() unhappy. As micrcode loading can be
pretty verbose, also make sure it only runs after console_end_sync().

cpufreq_add_cpu() doesn't need calling on the only "goto enable_cpu"
path, which sits ahead of cpufreq_del_cpu().

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: cb2a4a449dfd50af309a333aa805835015fbc8c8
master date: 2018-04-16 14:08:30 +0200

6 years agoupdate Xen version to 4.10.2-pre
Jan Beulich [Fri, 18 May 2018 09:37:41 +0000 (11:37 +0200)]
update Xen version to 4.10.2-pre

7 years agox86/HVM: guard against emulator driving ioreq state in weird ways
Jan Beulich [Tue, 8 May 2018 17:14:59 +0000 (18:14 +0100)]
x86/HVM: guard against emulator driving ioreq state in weird ways

In the case where hvm_wait_for_io() calls wait_on_xen_event_channel(),
p->state ends up being read twice in succession: once to determine that
state != p->state, and then again at the top of the loop.  This gives a
compromised emulator a chance to change the state back between the two
reads, potentially keeping Xen in a loop indefinitely.

Instead:
* Read p->state once in each of the wait_on_xen_event_channel() tests,
* re-use that value the next time around,
* and insist that the states continue to transition "forward" (with the
  exception of the transition to STATE_IOREQ_NONE).

This is XSA-262.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/vpt: add support for IO-APIC routed interrupts
Xen Project Security Team [Tue, 8 May 2018 17:14:42 +0000 (18:14 +0100)]
x86/vpt: add support for IO-APIC routed interrupts

And modify the HPET code to make use of it. Currently HPET interrupts
are always treated as ISA and thus injected through the vPIC. This is
wrong because HPET interrupts when not in legacy mode should be
injected from the IO-APIC.

To make things worse, the supported interrupt routing values are set
to [20..23], which clearly falls outside of the ISA range, thus
leading to an ASSERT in debug builds or memory corruption in non-debug
builds because the interrupt injection code will write out of the
bounds of the arch.hvm_domain.vpic array.

Since the HPET interrupt source can change between ISA and IO-APIC
always destroy the timer before changing the mode, or else Xen risks
changing it while the timer is active.

Note that vpt interrupt injection is racy in the sense that the
vIO-APIC RTE entry can be written by the guest in between the call to
pt_irq_masked and hvm_ioapic_assert, or the call to pt_update_irq and
pt_intr_post. Those are not deemed to be security issues, but rather
quirks of the current implementation. In the worse case the guest
might lose interrupts or get multiple interrupt vectors injected for
the same timer source.

This is part of XSA-261.

Address actual and potential compiler warnings. Fix formatting.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/traps: Fix handling of #DB exceptions in hypervisor context
Andrew Cooper [Tue, 8 May 2018 17:14:35 +0000 (18:14 +0100)]
x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/traps: Use an Interrupt Stack Table for #DB
Andrew Cooper [Tue, 8 May 2018 17:14:35 +0000 (18:14 +0100)]
x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Move exception injection into {,compat_}test_all_events()
Andrew Cooper [Tue, 8 May 2018 17:14:35 +0000 (18:14 +0100)]
x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/traps: Fix %dr6 handing in #DB handler
Andrew Cooper [Tue, 8 May 2018 17:14:35 +0000 (18:14 +0100)]
x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoupdate Xen version to 4.10.1 RELEASE-4.10.1
Jan Beulich [Wed, 2 May 2018 10:01:19 +0000 (12:01 +0200)]
update Xen version to 4.10.1

7 years agoSUPPORT.md: Add missing support lifetime information
Ian Jackson [Thu, 26 Apr 2018 10:04:09 +0000 (11:04 +0100)]
SUPPORT.md: Add missing support lifetime information

Dates are from Wei:

  Supported-Until:

  $ date --date '2017-12-13 + 18 months' +%F
  2019-06-13

  Security-Support-Until:

  $ date --date '2019-06-13 + 18 months' +%F
  2020-12-13

CC: Lars Kurth <lars.kurth@citrix.com>
CC: Juergen Gross <jgross@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoadapt SUPPORT.md to match 4.11
Juergen Gross [Wed, 25 Apr 2018 15:17:07 +0000 (17:17 +0200)]
adapt SUPPORT.md to match 4.11

Some tags have been changed in 4.11. Adapt the 4.10 ones to match in
order to produce an easier to read support HTML table.

Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoSUPPORT.md: Fix a typo
Ian Jackson [Wed, 25 Apr 2018 13:22:29 +0000 (14:22 +0100)]
SUPPORT.md: Fix a typo

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
7 years agoSUPPORT.md: Document the new text ordering rule
Ian Jackson [Thu, 12 Apr 2018 18:22:16 +0000 (19:22 +0100)]
SUPPORT.md: Document the new text ordering rule

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
(cherry picked from commit 2e9aeb6f40eaf13c20231ec91301be74a19152ad)
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Move descriptions up before Status info
Ian Jackson [Thu, 12 Apr 2018 16:32:32 +0000 (17:32 +0100)]
SUPPORT.md: Move descriptions up before Status info

This turns all the things which were treated as caveats, but which
don't need to be footnoted in the matrix, into descriptions.

For the benefit of the support matrix generator, this patch (or a
version of it) should be backported to 4.10.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
(cherry picked from commit 67b46e14cb943e27134e9c6d7b41b27bdd8c6ae9)

Merge conflicts resolved:
  - x86/HVM: 4.11 talks about "Status, domU"
  - x86/PVH: 4.11 mentions domO so heading is different too
  - ARM: Heading in 4.11 says just "ARM", in 4.10 "ARM guest"

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agodocs/Makefile: Format SUPPORT.md into the toplevel
Ian Jackson [Fri, 6 Apr 2018 17:13:50 +0000 (18:13 +0100)]
docs/Makefile: Format SUPPORT.md into the toplevel

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
(cherry picked from commit f246d42665a6023c248c5b3e374da5691df63f6f)
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agodocs/Makefile: Introduce GENERATE_PANDOC_RULE_RAW
Ian Jackson [Fri, 6 Apr 2018 17:12:37 +0000 (18:12 +0100)]
docs/Makefile: Introduce GENERATE_PANDOC_RULE_RAW

We are going to want to format SUPPORT.md which does not match the
filename patterns in docs/.  So provide a way to make an ad-hoc rule
using pandoc with the standard options.

No functional change in this patch.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
(cherry picked from commit 539f93945cad06fd90784716be1dc8d2624b6f66)
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agodocs/gen-html-index: Support documents at the toplevel
Ian Jackson [Fri, 6 Apr 2018 18:09:16 +0000 (19:09 +0100)]
docs/gen-html-index: Support documents at the toplevel

There are none yet.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
(cherry picked from commit 1e4a834a8f5d970e68cff6d9c16710194bc46537)
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agodocs/gen-html-index: Extract titles from HTML documents
Ian Jackson [Fri, 6 Apr 2018 18:09:02 +0000 (19:09 +0100)]
docs/gen-html-index: Extract titles from HTML documents

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
(cherry picked from commit 7782db9260d4c6499458de4e8d9866bc0427e143)

[ Combined with: ]

docs/gen-html-index: Make HTML::TreeBuilder::XPath optional again

7782db9260d4 "docs/gen-html-index: Extract titles from HTML documents"
requires HTML::TreeBuilder::XPath.

This is sadly not as widely available as I had hoped.  Work around
this problem by making the use of this module optional: instead of
`use'ing at the toplevel, we `require' it in the eval.  If it's not
present, then the title is simply not extracted and the filename is
used as before, which is tolerable.

Also add some debugging.

Reported-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Tested-by: Doug Goldstein <cardoe@cardoe.com>
(cherry picked from commit 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5)

Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Syntax: Provide a title rather than a spurious empty section
Ian Jackson [Fri, 6 Apr 2018 17:16:35 +0000 (18:16 +0100)]
SUPPORT.md: Syntax: Provide a title rather than a spurious empty section

This commits (more or less) this file to be processed with pandoc,
rather than other markdown processors.  There is, unfortunately, no
widely-accepted way to declare a title for the document.

I tested feeding the document to markdown(1) on Debian jessie and it
reproduced the % line as if it were simple text.  I guess many other
markdown processors will do something similarly tolerable.  My
internet searches did not discover a markdown processor that used
lines starting with % for something else.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
(cherry picked from commit a569c6f815fb6a18c64b8f122f5e2bbecd444432)
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Syntax: Fix a typo "States"
Ian Jackson [Fri, 6 Apr 2018 14:20:22 +0000 (15:20 +0100)]
SUPPORT.md: Syntax: Fix a typo "States"

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
(cherry picked from commit ebbd0299089a698c39d4ced966df5831944b4305)
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Syntax: Fix some bullet lists
Ian Jackson [Thu, 5 Apr 2018 16:19:31 +0000 (17:19 +0100)]
SUPPORT.md: Syntax: Fix some bullet lists

Continuations of bullet list items must be indented by exactly 4
spaces (according to pandoc_markdown(5) on Debian jessie).

This is most easily achieved by making the bullet list items have two
spaces before the `*'.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
(cherry picked from commit 01143b6273bc35a35afde154b2bb2415941bea89)
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: fix slow int80 path after XPTI additions
Jan Beulich [Wed, 25 Apr 2018 12:45:08 +0000 (14:45 +0200)]
x86: fix slow int80 path after XPTI additions

For the int80 slow path to jump to handle_exception_saved, %r14 needs to
be set up suitably for XPTI purposes. This is because of the difference
in nature between the int80 path (which is synchronous WRT guest
actions) and the exception path which is potentially asynchronous.

This is XSA-259.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 5a5c368faf45ced8a8c6235f4fbf5cdb38ec939f
master date: 2018-04-25 14:39:41 +0200

7 years agolibxl: Specify format of inserted cdrom
Anthony PERARD [Wed, 25 Apr 2018 12:44:34 +0000 (14:44 +0200)]
libxl: Specify format of inserted cdrom

Without this extra parameter on the QMP command, QEMU will guess the
format of the new file.

This is XSA-258.

Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
master commit: d8f65e68a7c1047fad97206a6282c281247fadc2
master date: 2018-04-25 14:38:47 +0200

7 years agox86/msr: Correct the emulation behaviour of MSR_PRED_CMD
Andrew Cooper [Wed, 18 Apr 2018 14:34:37 +0000 (16:34 +0200)]
x86/msr: Correct the emulation behaviour of MSR_PRED_CMD

Experimentally, the behaviour of reserved bits in MSR_PRED_CMD changed between
beta and production microcode, and now raises a #GP fault for set reserved
bits.  The AMD spec for future hardware also specifies this behaviour, and it
is the more sensible behaviour to implement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/msr: further correct the emulation behaviour of MSR_PRED_CMD

Following commit a6aa678fa3 ("x86/msr: Correct the emulation behaviour
of MSR_PRED_CMD") we may end up writing the low bit with the wrong
value. While it's unlikely for a guest to want to write zero there, we
should still permit (this without incurring the overhead of an actual
barrier). Correcting this right away will also help whenever further
bits in the MSR might become defined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a6aa678fa380e9369cc44701a181142322b3a4b0
master date: 2018-04-16 13:18:19 +0100
master commit: a996273d1fc10d14598985703227bfa35a91f681
master date: 2018-04-18 11:16:37 +0200

7 years agox86/VT-x: Fix determination of EFER.LMA in vmcs_dump_vcpu()
Andrew Cooper [Fri, 13 Apr 2018 14:26:00 +0000 (16:26 +0200)]
x86/VT-x: Fix determination of EFER.LMA in vmcs_dump_vcpu()

The LMA setting comes from the entry controls.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 82540b66ceb9318aa185f2488cbbbe479694de8f
master date: 2018-04-11 11:06:55 +0100

7 years agox86/HVM: suppress I/O completion for port output
Jan Beulich [Fri, 13 Apr 2018 14:25:25 +0000 (16:25 +0200)]
x86/HVM: suppress I/O completion for port output

We don't break up port requests in case they cross emulation entity
boundaries, and a write to an I/O port is necessarily the last
operation of an instruction instance, so there's no need to re-invoke
the full emulation path upon receiving the result from an external
emulator.

In case we want to properly split port accesses in the future, this
change will need to be reverted, as it would prevent things working
correctly when e.g. the first part needs to go to an external emulator,
while the second part is to be handled internally.

While this addresses the reported problem of Windows paging out the
buffer underneath an in-process REP OUTS, it does not address the wider
problem of the re-issued insn (to the insn emulator) being prone to
raise an exception (#PF) during a replayed, previously successful memory
access (we only record prior MMIO accesses).

Leaving aside the problem tried to be worked around here, I think the
performance aspect alone is a good reason to change the behavior.

Also take the opportunity and change bool_t -> bool as
hvm_vcpu_io_need_completion()'s return type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
master commit: 91afb8139f954a06e564d4915bc7d6a8575e2812
master date: 2018-04-11 10:42:24 +0200

7 years agox86/pv: Fix up erroneous segments for 32bit syscall entry
Andrew Cooper [Fri, 13 Apr 2018 14:24:50 +0000 (16:24 +0200)]
x86/pv: Fix up erroneous segments for 32bit syscall entry

The existing FLAT_KERNEL_SS expands to the correct value, 0xe02b, but is the
wrong constant to use.  Switch to FLAT_USER_SS32.

For compat domains however, the reported values are entirely bogus.
FLAT_USER_SS32 (value 0xe02b) is FLAT_RING3_CS in the 32bit ABI, while
FLAT_USER_CS32 (value 0xe023) is FLAT_RING1_DS with an RPL of 3.

The guests SYSCALL callback is invoked with a broken iret frame, and if left
unmodified by the guest, will fail on the way back out when Xen's iret tries
to load a code segment into %ss.

In practice, this is only a problem for 32bit PV guests on AMD hardware, as
Intel hardware doesn't permit the SYSCALL instruction outside of 64bit mode.

This appears to have been broken ever since 64bit support was added to Xen,
and has gone unnoticed because Linux doesn't use SYSCALL in 32bit builds.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: dba899de14989b3dff78009404ed891da7fefdc1
master date: 2018-04-09 13:12:18 +0100

7 years agox86/XPTI: reduce .text.entry
Jan Beulich [Fri, 13 Apr 2018 14:24:11 +0000 (16:24 +0200)]
x86/XPTI: reduce .text.entry

This exposes less code pieces and at the same time reduces the range
covered from slightly above 3 pages to a little below 2 of them.

The code being moved is unchanged, except for the removal of trailing
blanks, insertion of blanks between operands, and a pointless q suffix
from "retq".

A few more small pieces could be moved, but it seems better to me to
leave them where they are to not make it overly hard to follow code
paths.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 454efb2a31b64b98e3dd55c083ce41b87375faa6
master date: 2018-04-05 15:48:23 +0100

7 years agox86: log XPTI enabled status
Jan Beulich [Fri, 13 Apr 2018 14:23:39 +0000 (16:23 +0200)]
x86: log XPTI enabled status

At the same time also report the state of the two defined
ARCH_CAPABILITIES MSR bits. To avoid further complicating the
conditional around that printk(), drop it (it's a debug level one only
anyway).

Issue the main message without any XENLOG_*, and also drop XENLOG_INFO
from the respective BTI message, to make sure they're visible at default
log level also in release builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
master commit: 442b303cdaf7d774c0be8096fe5dbab68701abd3
master date: 2018-04-05 15:48:23 +0100

7 years agox86: disable XPTI when RDCL_NO
Jan Beulich [Fri, 13 Apr 2018 14:22:57 +0000 (16:22 +0200)]
x86: disable XPTI when RDCL_NO

Use the respective ARCH_CAPABILITIES MSR bit, but don't expose the MSR
to guests yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: bee0732d2066691d8204e418d10110930ee4d4f8
master date: 2018-04-05 15:48:23 +0100

7 years agox86/pv: Fix the handing of writes to %dr7
Andrew Cooper [Fri, 13 Apr 2018 14:18:21 +0000 (16:18 +0200)]
x86/pv: Fix the handing of writes to %dr7

c/s 65e35549 "x86/PV: support data breakpoint extension registers"
accidentally broke the handing of writes.  The call to activate_debugregs()
doesn't write %dr7 as v->arch.debugreg[7] hasn't been updated yet, and the
break skips the intended write to %dr7.

Remove the break, causing execution to hit the write_debugreg(7, value); in
context at the bottom of the hunk, which in turn causes hardware to be updated
appropriately.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: adf8feba1afa040f3a84a82953e18af02060884a
master date: 2018-03-29 15:12:21 +0100

7 years agoxen/arm: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
Julien Grall [Mon, 12 Mar 2018 13:19:35 +0000 (13:19 +0000)]
xen/arm: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery

A recent update to the ARM SMCCC_ARCH_WORKAROUND_1 specification (see [1])
allows firmware to return a non zero, positive value, to describe that
although the mitigation is implemented at the higher exception level,
the CPU on which the call is made is not affected.

Relax the check on the return value from ARM_WORKAROUND_1 so that we
only error out if the returned value is negative.

[1] https://developer.arm.com/support/security-update/downloads
"Firmware interfaces for mitigating CVE-2017-5715 System Software on Arm
Systems"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 6b270fae7ad462687550a875f714bff18d764416)

7 years agoxen/arm: vpsci: Rework the logic to start AArch32 vCPU in Thumb mode
Julien Grall [Fri, 23 Feb 2018 18:57:29 +0000 (18:57 +0000)]
xen/arm: vpsci: Rework the logic to start AArch32 vCPU in Thumb mode

32-bit domain is able to select the instruction (ARM vs Thumb) to use
when boot a new vCPU via CPU_ON. This is indicated via bit[0] of the
entry point address (see "T32 support" in PSCI v1.1 DEN0022D). bit[0]
must be cleared when setting the PC.

At the moment, Xen is setting the CPSR.T but never clear bit[0]. Clear
it to match the specification.

At the same time, slighlty rework the code to make clear thumb is only for
32-bit domain. Lastly, take the opportunity to switch is_thumb from int
to bool.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit cd8b749282475caef095ea2f339a01d1ff9714ae)

7 years agoxen/arm: vpsci: Introduce and use PSCI_INVALID_ADDRESS
Julien Grall [Fri, 23 Feb 2018 18:57:28 +0000 (18:57 +0000)]
xen/arm: vpsci: Introduce and use PSCI_INVALID_ADDRESS

PSCI 1.0 added the error return PSCI_INVALID_ADDRESS. It is used to
indicate the entry point address is known to be invalid.

In Xen case, this error could be returned when a 64-bit vCPU is using a
Thumb entry address.

For PSCI 0.1 implementation, return PSCI_INVALID_PARAMETERS instead.

Suggested-by: mirela.simonovic@aggios.com
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: mirela.simonovic@aggios.com
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit e7f46d6b9be02fc18ac9818c064fa8024828b606)

7 years agoxen/arm: psci: Consolidate PSCI version print
Julien Grall [Fri, 23 Feb 2018 18:57:25 +0000 (18:57 +0000)]
xen/arm: psci: Consolidate PSCI version print

Xen is printing the same way the PSCI version for 0.1, 0.2 and later.
The only different is the former is hardcoded.

Furthermore PSCI is now used for other things than SMP bring up. So only
print the PSCI version in psci_init.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit b7d4ba0da52bf88a73c57c929702831189b02075)

7 years agoxen/arm: vpsci: Remove parameter 'ver' from do_common_cpu
Julien Grall [Fri, 23 Feb 2018 18:57:24 +0000 (18:57 +0000)]
xen/arm: vpsci: Remove parameter 'ver' from do_common_cpu

Currently, the behavior of do_common_cpu will slightly change depending
on the PSCI version passed in parameter. Looking at the code, more the
specific 0.2 behavior could move out of the function or adapted for 0.1:

    - x0/r0 can be updated on PSCI 0.1 because general purpose registers
    are undefined upon CPU on. This was deduced from the spec not
    mentioning the state of general purpose registers on CPU on.
    - PSCI 0.1 does not defined PSCI_ALREADY_ON. However, it would be
    safer to bail out if the CPU is already on.

Based on this, the parameter 'ver' is removed and do_psci_cpu_on
(implementation for PSCI 0.1) is adapted to avoid returning
PSCI_ALREADY_ON.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit baba8102bfbd24f6b452342bfbe660be77300fc4)

7 years agoxen/arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
Julien Grall [Fri, 23 Feb 2018 18:57:23 +0000 (18:57 +0000)]
xen/arm64: Kill PSCI_GET_VERSION as a variant-2 workaround

Now that we've standardised on SMCCC v1.1 to perform the branch
prediction invalidation, let's drop the previous band-aid. If vendors
haven't updated their firmware to do SMCCC 1.1, they haven't updated
PSCI either, so we don't loose anything.

This is aligned with the Linux commit 3a0a397ff5ff.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 976319fa3de7f98b558c87b350699fffc278effc)

7 years agoxen/arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
Julien Grall [Fri, 23 Feb 2018 18:57:22 +0000 (18:57 +0000)]
xen/arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 99cd6934c17361456a68edd64348f201e2e00ce1)

7 years agoxen/arm: smccc: Implement SMCCC v1.1 inline primitive
Julien Grall [Fri, 23 Feb 2018 18:57:21 +0000 (18:57 +0000)]
xen/arm: smccc: Implement SMCCC v1.1 inline primitive

One of the major improvement of SMCCC v1.1 is that it only clobbers the
first 4 registers, both on 32 and 64bit. This means that it becomes very
easy to provide an inline version of the SMC call primitive, and avoid
performing a function call to stash the registers that woudl otherwise
be clobbered by SMCCC v1.0.

This patch has been adapted to Xen from Linux commit f2d3b2e8759a. The
changes mades are:
    - Using Xen coding style
    - Remove HVC as not used by Xen
    - Add arm_smccc_res structure

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 16f878bf69ae0ef0dbb7e8c7cc59a86702c7885f)

7 years agoxen/arm: psci: Detect SMCCC version
Julien Grall [Fri, 23 Feb 2018 18:57:20 +0000 (18:57 +0000)]
xen/arm: psci: Detect SMCCC version

PSCI 1.0 and later allows the SMCCC version to be (indirectly) probed
via PSCI_FEATURES. If the PSCI_FEATURES does not exist (PSCI 0.2 or
earlier) and the function returns an error, then we assume SMCCC 1.0
is implemented.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 1c87f28a43b32cc79e139220d5bc97b23b748d8a)

7 years agoxen/arm: smccc: Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR}
Julien Grall [Fri, 23 Feb 2018 18:57:19 +0000 (18:57 +0000)]
xen/arm: smccc: Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR}

Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR} to easily convert
between a 32-bit value and a version number. The encoding is based on
2.2.2 in "Firmware interfaces for mitigation CVE-2017-5715" (ARM DEN 0070A).

Also re-use them to define ARM_SMCCC_VERSION_1_0 and ARM_SMCCC_VERSION_1_1.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 479fab92d356e5a09f5db4596017419cf4023a9f)

7 years agoxen/arm64: Print a per-CPU message with the BP hardening method used
Julien Grall [Fri, 23 Feb 2018 18:57:18 +0000 (18:57 +0000)]
xen/arm64: Print a per-CPU message with the BP hardening method used

This will make easier to know whether BP hardening has been enabled for
a CPU and which method is used.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babcuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 1da1185aa30993bed21185070700343c70343483)

7 years agoxen/arm64: Implement a fast path for handling SMCCC_ARCH_WORKAROUND_1
Julien Grall [Fri, 23 Feb 2018 18:57:17 +0000 (18:57 +0000)]
xen/arm64: Implement a fast path for handling SMCCC_ARCH_WORKAROUND_1

The function SMCCC_ARCH_WORKAROUND_1 will be called by the guest for
hardening the branch predictor. So we want the handling to be as fast as
possible.

As the mitigation is applied on every guest exit, we can check for the
call before saving all the context and return very early.

For now, only provide a fast path for HVC64 call. Because the code rely
on 2 registers, x0 and x1 are saved in advance.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit ce73d72e976cc54f01c85e5c04edd5b9ce7b0906)

7 years agoxen/arm: Adapt smccc.h to be able to use it in assembly code
Julien Grall [Fri, 23 Feb 2018 18:57:16 +0000 (18:57 +0000)]
xen/arm: Adapt smccc.h to be able to use it in assembly code

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 2bd4f05113609ffcba908047576086386311798a)

7 years agoxen/arm: vsmc: Implement SMCCC_ARCH_WORKAROUND_1 BP hardening support
Julien Grall [Fri, 23 Feb 2018 18:57:15 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC_ARCH_WORKAROUND_1 BP hardening support

SMCCC 1.1 offers firmware-based CPU workarounds. In particular,
SMCCC_ARCH_WORKAROUND_1 provides BP hardening for variant 2 of XSA-254
(CVE-2017-5715).

If the hypervisor has some mitigation for this issue, report that we
deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the hypervisor
workaround on every guest exit.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 52c5d8d3c1657cd8dc1675f8205ca0ecc08b6a51)

7 years agoxen/arm: vsmc: Implement SMCCC 1.1
Julien Grall [Fri, 23 Feb 2018 18:57:14 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC 1.1

The new SMC Calling Convention (v1.1) allows for a reduced overhead when
calling into the firmware, and provides a new feature discovery
mechanism. See "Firmware interfaces for mitigating CVE-2017-5715"
ARM DEN 00070A.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 3af378feac5bb504387dd6bc232b9c64a8a376d9)

7 years agoxen/arm: vpsci: Add support for PSCI 1.1
Julien Grall [Fri, 23 Feb 2018 18:57:13 +0000 (18:57 +0000)]
xen/arm: vpsci: Add support for PSCI 1.1

At the moment, Xen provides virtual PSCI interface compliant with 0.1
and 0.2. Since them, the specification has been updated and the latest
version is 1.1 (see ARM DEN 0022D).

>From an implementation point of view, only PSCI_FEATURES is mandatory.
The rest is optional and can be left unimplemented for now.

At the same time, the compatible for PSCI node have been updated to
expose "arm,psci-1.0".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: mirela.simonovic@aggios.com
(cherry picked from commit c52c5fa092645d6d847c2f36b6c21dfb4c157bd6)

7 years agoxen/arm: psci: Rework the PSCI definitions
Julien Grall [Fri, 23 Feb 2018 18:57:12 +0000 (18:57 +0000)]
xen/arm: psci: Rework the PSCI definitions

Some PSCI functions are only available in the 32-bit version. After
recent changes, Xen always needs to know whether the call was made using
32-bit id or 64-bit id. So we don't emulate reserved one.

With the current naming scheme, it is not easy to know which call
supports 32-bit and 64-bit id. So rework the definitions to encode the
version in the name. From now the functions will be named PSCI_0_2_FNxx
where xx is 32 or 64.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit f30b93b42b7137654a69676a61620f763c4ad3b3)

7 years agoxen/arm: vpsci: Move PSCI function dispatching from vsmc.c to vpsci.c
Julien Grall [Tue, 6 Feb 2018 15:53:25 +0000 (15:53 +0000)]
xen/arm: vpsci: Move PSCI function dispatching from vsmc.c to vpsci.c

At the moment PSCI function dispatching is done in vsmc.c and the
function implementation in vpsci.c. Some bits of the implementation is
even done in vsmc.c (see PSCI_SYSTEM_RESET).

This means that it is difficult to follow the implementation and also
it requires to export functions for each PSCI function.

Therefore move PSCI dispatching in two new functions do_vpsci_0_1_call
and do_vpsci_0_2_call. The former will handle PSCI 0.1 calls while the
latter 0.2 or later calls.

At the same time, a new header vpsci.h was created to contain all
definitions for virtual PSCI and avoid confusion with the host PSCI.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit c9d46c6fba9496478fa9f42c4bbebce8a191527d)

7 years agox86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR
Liran Alon [Tue, 20 Mar 2018 13:23:14 +0000 (14:23 +0100)]
x86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR

According to Intel SDM section "Interrupt Acceptance for Fixed Interrupts":
"The trigger mode register (TMR) indicates the trigger mode of the
interrupt (see Figure 10-20). Upon acceptance of an interrupt
into the IRR, the corresponding TMR bit is cleared for
edge-triggered interrupts and set for level-triggered interrupts.
If a TMR bit is set when an EOI cycle for its corresponding
interrupt vector is generated, an EOI message is sent to
all I/O APICs."

Before this patch TMR-bit was cleared on LAPIC EOI which is not what
real hardware does. This was also confirmed in KVM upstream commit
a0c9a822bf37 ("KVM: dont clear TMR on EOI").

Behavior after this patch is aligned with both Intel SDM and KVM
implementation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 12a50030a81a14a3c7be672ddfde707b961479ec
master date: 2018-03-15 16:59:52 +0100