]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agox86/msr: Correct the emulation behaviour of MSR_PRED_CMD
Andrew Cooper [Wed, 18 Apr 2018 14:34:37 +0000 (16:34 +0200)]
x86/msr: Correct the emulation behaviour of MSR_PRED_CMD

Experimentally, the behaviour of reserved bits in MSR_PRED_CMD changed between
beta and production microcode, and now raises a #GP fault for set reserved
bits.  The AMD spec for future hardware also specifies this behaviour, and it
is the more sensible behaviour to implement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/msr: further correct the emulation behaviour of MSR_PRED_CMD

Following commit a6aa678fa3 ("x86/msr: Correct the emulation behaviour
of MSR_PRED_CMD") we may end up writing the low bit with the wrong
value. While it's unlikely for a guest to want to write zero there, we
should still permit (this without incurring the overhead of an actual
barrier). Correcting this right away will also help whenever further
bits in the MSR might become defined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a6aa678fa380e9369cc44701a181142322b3a4b0
master date: 2018-04-16 13:18:19 +0100
master commit: a996273d1fc10d14598985703227bfa35a91f681
master date: 2018-04-18 11:16:37 +0200

7 years agox86/VT-x: Fix determination of EFER.LMA in vmcs_dump_vcpu()
Andrew Cooper [Fri, 13 Apr 2018 14:26:00 +0000 (16:26 +0200)]
x86/VT-x: Fix determination of EFER.LMA in vmcs_dump_vcpu()

The LMA setting comes from the entry controls.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 82540b66ceb9318aa185f2488cbbbe479694de8f
master date: 2018-04-11 11:06:55 +0100

7 years agox86/HVM: suppress I/O completion for port output
Jan Beulich [Fri, 13 Apr 2018 14:25:25 +0000 (16:25 +0200)]
x86/HVM: suppress I/O completion for port output

We don't break up port requests in case they cross emulation entity
boundaries, and a write to an I/O port is necessarily the last
operation of an instruction instance, so there's no need to re-invoke
the full emulation path upon receiving the result from an external
emulator.

In case we want to properly split port accesses in the future, this
change will need to be reverted, as it would prevent things working
correctly when e.g. the first part needs to go to an external emulator,
while the second part is to be handled internally.

While this addresses the reported problem of Windows paging out the
buffer underneath an in-process REP OUTS, it does not address the wider
problem of the re-issued insn (to the insn emulator) being prone to
raise an exception (#PF) during a replayed, previously successful memory
access (we only record prior MMIO accesses).

Leaving aside the problem tried to be worked around here, I think the
performance aspect alone is a good reason to change the behavior.

Also take the opportunity and change bool_t -> bool as
hvm_vcpu_io_need_completion()'s return type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
master commit: 91afb8139f954a06e564d4915bc7d6a8575e2812
master date: 2018-04-11 10:42:24 +0200

7 years agox86/pv: Fix up erroneous segments for 32bit syscall entry
Andrew Cooper [Fri, 13 Apr 2018 14:24:50 +0000 (16:24 +0200)]
x86/pv: Fix up erroneous segments for 32bit syscall entry

The existing FLAT_KERNEL_SS expands to the correct value, 0xe02b, but is the
wrong constant to use.  Switch to FLAT_USER_SS32.

For compat domains however, the reported values are entirely bogus.
FLAT_USER_SS32 (value 0xe02b) is FLAT_RING3_CS in the 32bit ABI, while
FLAT_USER_CS32 (value 0xe023) is FLAT_RING1_DS with an RPL of 3.

The guests SYSCALL callback is invoked with a broken iret frame, and if left
unmodified by the guest, will fail on the way back out when Xen's iret tries
to load a code segment into %ss.

In practice, this is only a problem for 32bit PV guests on AMD hardware, as
Intel hardware doesn't permit the SYSCALL instruction outside of 64bit mode.

This appears to have been broken ever since 64bit support was added to Xen,
and has gone unnoticed because Linux doesn't use SYSCALL in 32bit builds.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: dba899de14989b3dff78009404ed891da7fefdc1
master date: 2018-04-09 13:12:18 +0100

7 years agox86/XPTI: reduce .text.entry
Jan Beulich [Fri, 13 Apr 2018 14:24:11 +0000 (16:24 +0200)]
x86/XPTI: reduce .text.entry

This exposes less code pieces and at the same time reduces the range
covered from slightly above 3 pages to a little below 2 of them.

The code being moved is unchanged, except for the removal of trailing
blanks, insertion of blanks between operands, and a pointless q suffix
from "retq".

A few more small pieces could be moved, but it seems better to me to
leave them where they are to not make it overly hard to follow code
paths.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 454efb2a31b64b98e3dd55c083ce41b87375faa6
master date: 2018-04-05 15:48:23 +0100

7 years agox86: log XPTI enabled status
Jan Beulich [Fri, 13 Apr 2018 14:23:39 +0000 (16:23 +0200)]
x86: log XPTI enabled status

At the same time also report the state of the two defined
ARCH_CAPABILITIES MSR bits. To avoid further complicating the
conditional around that printk(), drop it (it's a debug level one only
anyway).

Issue the main message without any XENLOG_*, and also drop XENLOG_INFO
from the respective BTI message, to make sure they're visible at default
log level also in release builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
master commit: 442b303cdaf7d774c0be8096fe5dbab68701abd3
master date: 2018-04-05 15:48:23 +0100

7 years agox86: disable XPTI when RDCL_NO
Jan Beulich [Fri, 13 Apr 2018 14:22:57 +0000 (16:22 +0200)]
x86: disable XPTI when RDCL_NO

Use the respective ARCH_CAPABILITIES MSR bit, but don't expose the MSR
to guests yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: bee0732d2066691d8204e418d10110930ee4d4f8
master date: 2018-04-05 15:48:23 +0100

7 years agox86/pv: Fix the handing of writes to %dr7
Andrew Cooper [Fri, 13 Apr 2018 14:18:21 +0000 (16:18 +0200)]
x86/pv: Fix the handing of writes to %dr7

c/s 65e35549 "x86/PV: support data breakpoint extension registers"
accidentally broke the handing of writes.  The call to activate_debugregs()
doesn't write %dr7 as v->arch.debugreg[7] hasn't been updated yet, and the
break skips the intended write to %dr7.

Remove the break, causing execution to hit the write_debugreg(7, value); in
context at the bottom of the hunk, which in turn causes hardware to be updated
appropriately.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: adf8feba1afa040f3a84a82953e18af02060884a
master date: 2018-03-29 15:12:21 +0100

7 years agoxen/arm: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
Julien Grall [Mon, 12 Mar 2018 13:19:35 +0000 (13:19 +0000)]
xen/arm: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery

A recent update to the ARM SMCCC_ARCH_WORKAROUND_1 specification (see [1])
allows firmware to return a non zero, positive value, to describe that
although the mitigation is implemented at the higher exception level,
the CPU on which the call is made is not affected.

Relax the check on the return value from ARM_WORKAROUND_1 so that we
only error out if the returned value is negative.

[1] https://developer.arm.com/support/security-update/downloads
"Firmware interfaces for mitigating CVE-2017-5715 System Software on Arm
Systems"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 6b270fae7ad462687550a875f714bff18d764416)

7 years agoxen/arm: vpsci: Rework the logic to start AArch32 vCPU in Thumb mode
Julien Grall [Fri, 23 Feb 2018 18:57:29 +0000 (18:57 +0000)]
xen/arm: vpsci: Rework the logic to start AArch32 vCPU in Thumb mode

32-bit domain is able to select the instruction (ARM vs Thumb) to use
when boot a new vCPU via CPU_ON. This is indicated via bit[0] of the
entry point address (see "T32 support" in PSCI v1.1 DEN0022D). bit[0]
must be cleared when setting the PC.

At the moment, Xen is setting the CPSR.T but never clear bit[0]. Clear
it to match the specification.

At the same time, slighlty rework the code to make clear thumb is only for
32-bit domain. Lastly, take the opportunity to switch is_thumb from int
to bool.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit cd8b749282475caef095ea2f339a01d1ff9714ae)

7 years agoxen/arm: vpsci: Introduce and use PSCI_INVALID_ADDRESS
Julien Grall [Fri, 23 Feb 2018 18:57:28 +0000 (18:57 +0000)]
xen/arm: vpsci: Introduce and use PSCI_INVALID_ADDRESS

PSCI 1.0 added the error return PSCI_INVALID_ADDRESS. It is used to
indicate the entry point address is known to be invalid.

In Xen case, this error could be returned when a 64-bit vCPU is using a
Thumb entry address.

For PSCI 0.1 implementation, return PSCI_INVALID_PARAMETERS instead.

Suggested-by: mirela.simonovic@aggios.com
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: mirela.simonovic@aggios.com
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit e7f46d6b9be02fc18ac9818c064fa8024828b606)

7 years agoxen/arm: psci: Consolidate PSCI version print
Julien Grall [Fri, 23 Feb 2018 18:57:25 +0000 (18:57 +0000)]
xen/arm: psci: Consolidate PSCI version print

Xen is printing the same way the PSCI version for 0.1, 0.2 and later.
The only different is the former is hardcoded.

Furthermore PSCI is now used for other things than SMP bring up. So only
print the PSCI version in psci_init.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit b7d4ba0da52bf88a73c57c929702831189b02075)

7 years agoxen/arm: vpsci: Remove parameter 'ver' from do_common_cpu
Julien Grall [Fri, 23 Feb 2018 18:57:24 +0000 (18:57 +0000)]
xen/arm: vpsci: Remove parameter 'ver' from do_common_cpu

Currently, the behavior of do_common_cpu will slightly change depending
on the PSCI version passed in parameter. Looking at the code, more the
specific 0.2 behavior could move out of the function or adapted for 0.1:

    - x0/r0 can be updated on PSCI 0.1 because general purpose registers
    are undefined upon CPU on. This was deduced from the spec not
    mentioning the state of general purpose registers on CPU on.
    - PSCI 0.1 does not defined PSCI_ALREADY_ON. However, it would be
    safer to bail out if the CPU is already on.

Based on this, the parameter 'ver' is removed and do_psci_cpu_on
(implementation for PSCI 0.1) is adapted to avoid returning
PSCI_ALREADY_ON.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit baba8102bfbd24f6b452342bfbe660be77300fc4)

7 years agoxen/arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
Julien Grall [Fri, 23 Feb 2018 18:57:23 +0000 (18:57 +0000)]
xen/arm64: Kill PSCI_GET_VERSION as a variant-2 workaround

Now that we've standardised on SMCCC v1.1 to perform the branch
prediction invalidation, let's drop the previous band-aid. If vendors
haven't updated their firmware to do SMCCC 1.1, they haven't updated
PSCI either, so we don't loose anything.

This is aligned with the Linux commit 3a0a397ff5ff.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 976319fa3de7f98b558c87b350699fffc278effc)

7 years agoxen/arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
Julien Grall [Fri, 23 Feb 2018 18:57:22 +0000 (18:57 +0000)]
xen/arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 99cd6934c17361456a68edd64348f201e2e00ce1)

7 years agoxen/arm: smccc: Implement SMCCC v1.1 inline primitive
Julien Grall [Fri, 23 Feb 2018 18:57:21 +0000 (18:57 +0000)]
xen/arm: smccc: Implement SMCCC v1.1 inline primitive

One of the major improvement of SMCCC v1.1 is that it only clobbers the
first 4 registers, both on 32 and 64bit. This means that it becomes very
easy to provide an inline version of the SMC call primitive, and avoid
performing a function call to stash the registers that woudl otherwise
be clobbered by SMCCC v1.0.

This patch has been adapted to Xen from Linux commit f2d3b2e8759a. The
changes mades are:
    - Using Xen coding style
    - Remove HVC as not used by Xen
    - Add arm_smccc_res structure

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 16f878bf69ae0ef0dbb7e8c7cc59a86702c7885f)

7 years agoxen/arm: psci: Detect SMCCC version
Julien Grall [Fri, 23 Feb 2018 18:57:20 +0000 (18:57 +0000)]
xen/arm: psci: Detect SMCCC version

PSCI 1.0 and later allows the SMCCC version to be (indirectly) probed
via PSCI_FEATURES. If the PSCI_FEATURES does not exist (PSCI 0.2 or
earlier) and the function returns an error, then we assume SMCCC 1.0
is implemented.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 1c87f28a43b32cc79e139220d5bc97b23b748d8a)

7 years agoxen/arm: smccc: Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR}
Julien Grall [Fri, 23 Feb 2018 18:57:19 +0000 (18:57 +0000)]
xen/arm: smccc: Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR}

Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR} to easily convert
between a 32-bit value and a version number. The encoding is based on
2.2.2 in "Firmware interfaces for mitigation CVE-2017-5715" (ARM DEN 0070A).

Also re-use them to define ARM_SMCCC_VERSION_1_0 and ARM_SMCCC_VERSION_1_1.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 479fab92d356e5a09f5db4596017419cf4023a9f)

7 years agoxen/arm64: Print a per-CPU message with the BP hardening method used
Julien Grall [Fri, 23 Feb 2018 18:57:18 +0000 (18:57 +0000)]
xen/arm64: Print a per-CPU message with the BP hardening method used

This will make easier to know whether BP hardening has been enabled for
a CPU and which method is used.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babcuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 1da1185aa30993bed21185070700343c70343483)

7 years agoxen/arm64: Implement a fast path for handling SMCCC_ARCH_WORKAROUND_1
Julien Grall [Fri, 23 Feb 2018 18:57:17 +0000 (18:57 +0000)]
xen/arm64: Implement a fast path for handling SMCCC_ARCH_WORKAROUND_1

The function SMCCC_ARCH_WORKAROUND_1 will be called by the guest for
hardening the branch predictor. So we want the handling to be as fast as
possible.

As the mitigation is applied on every guest exit, we can check for the
call before saving all the context and return very early.

For now, only provide a fast path for HVC64 call. Because the code rely
on 2 registers, x0 and x1 are saved in advance.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit ce73d72e976cc54f01c85e5c04edd5b9ce7b0906)

7 years agoxen/arm: Adapt smccc.h to be able to use it in assembly code
Julien Grall [Fri, 23 Feb 2018 18:57:16 +0000 (18:57 +0000)]
xen/arm: Adapt smccc.h to be able to use it in assembly code

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 2bd4f05113609ffcba908047576086386311798a)

7 years agoxen/arm: vsmc: Implement SMCCC_ARCH_WORKAROUND_1 BP hardening support
Julien Grall [Fri, 23 Feb 2018 18:57:15 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC_ARCH_WORKAROUND_1 BP hardening support

SMCCC 1.1 offers firmware-based CPU workarounds. In particular,
SMCCC_ARCH_WORKAROUND_1 provides BP hardening for variant 2 of XSA-254
(CVE-2017-5715).

If the hypervisor has some mitigation for this issue, report that we
deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the hypervisor
workaround on every guest exit.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 52c5d8d3c1657cd8dc1675f8205ca0ecc08b6a51)

7 years agoxen/arm: vsmc: Implement SMCCC 1.1
Julien Grall [Fri, 23 Feb 2018 18:57:14 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC 1.1

The new SMC Calling Convention (v1.1) allows for a reduced overhead when
calling into the firmware, and provides a new feature discovery
mechanism. See "Firmware interfaces for mitigating CVE-2017-5715"
ARM DEN 00070A.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 3af378feac5bb504387dd6bc232b9c64a8a376d9)

7 years agoxen/arm: vpsci: Add support for PSCI 1.1
Julien Grall [Fri, 23 Feb 2018 18:57:13 +0000 (18:57 +0000)]
xen/arm: vpsci: Add support for PSCI 1.1

At the moment, Xen provides virtual PSCI interface compliant with 0.1
and 0.2. Since them, the specification has been updated and the latest
version is 1.1 (see ARM DEN 0022D).

>From an implementation point of view, only PSCI_FEATURES is mandatory.
The rest is optional and can be left unimplemented for now.

At the same time, the compatible for PSCI node have been updated to
expose "arm,psci-1.0".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: mirela.simonovic@aggios.com
(cherry picked from commit c52c5fa092645d6d847c2f36b6c21dfb4c157bd6)

7 years agoxen/arm: psci: Rework the PSCI definitions
Julien Grall [Fri, 23 Feb 2018 18:57:12 +0000 (18:57 +0000)]
xen/arm: psci: Rework the PSCI definitions

Some PSCI functions are only available in the 32-bit version. After
recent changes, Xen always needs to know whether the call was made using
32-bit id or 64-bit id. So we don't emulate reserved one.

With the current naming scheme, it is not easy to know which call
supports 32-bit and 64-bit id. So rework the definitions to encode the
version in the name. From now the functions will be named PSCI_0_2_FNxx
where xx is 32 or 64.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit f30b93b42b7137654a69676a61620f763c4ad3b3)

7 years agoxen/arm: vpsci: Move PSCI function dispatching from vsmc.c to vpsci.c
Julien Grall [Tue, 6 Feb 2018 15:53:25 +0000 (15:53 +0000)]
xen/arm: vpsci: Move PSCI function dispatching from vsmc.c to vpsci.c

At the moment PSCI function dispatching is done in vsmc.c and the
function implementation in vpsci.c. Some bits of the implementation is
even done in vsmc.c (see PSCI_SYSTEM_RESET).

This means that it is difficult to follow the implementation and also
it requires to export functions for each PSCI function.

Therefore move PSCI dispatching in two new functions do_vpsci_0_1_call
and do_vpsci_0_2_call. The former will handle PSCI 0.1 calls while the
latter 0.2 or later calls.

At the same time, a new header vpsci.h was created to contain all
definitions for virtual PSCI and avoid confusion with the host PSCI.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit c9d46c6fba9496478fa9f42c4bbebce8a191527d)

7 years agox86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR
Liran Alon [Tue, 20 Mar 2018 13:23:14 +0000 (14:23 +0100)]
x86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR

According to Intel SDM section "Interrupt Acceptance for Fixed Interrupts":
"The trigger mode register (TMR) indicates the trigger mode of the
interrupt (see Figure 10-20). Upon acceptance of an interrupt
into the IRR, the corresponding TMR bit is cleared for
edge-triggered interrupts and set for level-triggered interrupts.
If a TMR bit is set when an EOI cycle for its corresponding
interrupt vector is generated, an EOI message is sent to
all I/O APICs."

Before this patch TMR-bit was cleared on LAPIC EOI which is not what
real hardware does. This was also confirmed in KVM upstream commit
a0c9a822bf37 ("KVM: dont clear TMR on EOI").

Behavior after this patch is aligned with both Intel SDM and KVM
implementation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 12a50030a81a14a3c7be672ddfde707b961479ec
master date: 2018-03-15 16:59:52 +0100

7 years agoSUPPORT.md: Specify support for various image formats
George Dunlap [Tue, 20 Mar 2018 13:22:44 +0000 (14:22 +0100)]
SUPPORT.md: Specify support for various image formats

QEMU supports various image formats, but we only provide security
support for raw, qcow, qcow2, and vhd formats.

Rather than duplicate this information under the "x86/Emulated
storage" section, just refer to the "Blkback" section.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: a64893f32810593468ce3e1ae618047b213a2a75
master date: 2018-03-13 11:07:01 +0000

7 years agoSUPPORT.md: Clarify that the PV keyboard protocol includes mouse support
George Dunlap [Tue, 20 Mar 2018 13:22:22 +0000 (14:22 +0100)]
SUPPORT.md: Clarify that the PV keyboard protocol includes mouse support

s/fo/fo; while we're here.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: e09b90da8065fe249e1d15be86476e48ffb53f5a
master date: 2018-03-13 11:05:43 +0000

7 years agocpufreq/ondemand: fix race while offlining CPU
Jan Beulich [Tue, 20 Mar 2018 13:21:45 +0000 (14:21 +0100)]
cpufreq/ondemand: fix race while offlining CPU

Offlining a CPU involves stopping the cpufreq governor. The on-demand
governor will kill the timer before letting generic code proceed, but
since that generally isn't happening on the subject CPU,
cpufreq_dbs_timer_resume() may run in parallel. If that managed to
invoke the timer handler, that handler needs to run to completion before
dbs_timer_exit() may safely exit.

Make the "stoppable" field a tristate, changing it from +1 to -1 around
the timer function invocation, and make dbs_timer_exit() wait for it to
become non-negative (still writing zero if it's +1).

Also adjust coding style in cpufreq_dbs_timer_resume().

Reported-by: Martin Cerveny <martin@c-home.cz>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Martin Cerveny <martin@c-home.cz>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
master commit: 185413355fe331cbc926d48568838227234c9a20
master date: 2018-03-09 17:30:49 +0100

7 years agox86: remove CR reads from exit-to-guest path
Jan Beulich [Tue, 20 Mar 2018 13:20:59 +0000 (14:20 +0100)]
x86: remove CR reads from exit-to-guest path

CR3 is - during normal operation - only ever loaded from v->arch.cr3,
so there's no need to read the actual control register. For CR4 we can
generally use the cached value on all synchronous entry end exit paths.
Drop the write_cr3 macro, as the two use sites are probably easier to
follow without its use.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 31bf55cb5fe3796cf6a4efbcfc0a9418bb1c783f
master date: 2018-03-06 16:49:36 +0100

7 years agox86: slightly reduce Meltdown band-aid overhead
Jan Beulich [Tue, 20 Mar 2018 13:20:29 +0000 (14:20 +0100)]
x86: slightly reduce Meltdown band-aid overhead

I'm not sure why I didn't do this right away: By avoiding the use of
global PTEs in the cloned directmap, there's no need to fiddle with
CR4.PGE on any of the entry paths. Only the exit paths need to flush
global mappings.

The reduced flushing, however, requires that we now have interrupts off
on all entry paths until after the page table switch, so that flush IPIs
can't be serviced while on the restricted pagetables, leaving a window
where a potentially stale guest global mapping can be brought into the
TLB. Along those lines the "sync" IPI after L4 entry updates now needs
to become a real (and global) flush IPI, so that inside Xen we'll also
pick up such changes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86: correct EFLAGS.IF in SYSENTER frame

Commit 9d1d31ad94 ("x86: slightly reduce Meltdown band-aid overhead")
moved the STI past the PUSHF. While this isn't an active problem (as we
force EFLAGS.IF to 1 before exiting to guest context), let's not risk
internal confusion by finding a PV guest frame with interrupts
apparently off.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 9d1d31ad9498e6ceb285d5774e34fed5f648c273
master date: 2018-03-06 16:48:44 +0100
master commit: c4dd58f0cf23cdf119bbccedfb8c24435fc6f3ab
master date: 2018-03-16 17:27:36 +0100

7 years agox86/xpti: don't map stack guard pages
Jan Beulich [Tue, 20 Mar 2018 13:19:36 +0000 (14:19 +0100)]
x86/xpti: don't map stack guard pages

Other than for the main mappings, don't even do this in release builds,
as there are no huge page shattering concerns here.

Note that since we don't run on the restructed page tables while HVM
guests execute, the non-present mappings won't trigger the triple fault
issue AMD SVM is susceptible to with our current placement of STGI vs
TR loading.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d303784b68237ff3050daa184f560179dda21b8c
master date: 2018-03-06 16:46:57 +0100

7 years agox86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
Andrew Cooper [Tue, 20 Mar 2018 13:16:37 +0000 (14:16 +0100)]
x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings

The current XPTI implementation isolates the directmap (and therefore a lot of
guest data), but a large quantity of CPU0's state (including its stack)
remains visible.

Furthermore, an attacker able to read .text is in a vastly superior position
to normal when it comes to fingerprinting Xen for known vulnerabilities, or
scanning for ROP/Spectre gadgets.

Collect together the entrypoints in .text.entry (currently 3x4k frames, but
can almost certainly be slimmed down), and create a common mapping which is
inserted into each per-cpu shadow.  The stubs are also inserted into this
mapping by pointing at the in-use L2.  This allows stubs allocated later (SMP
boot, or CPU hotplug) to work without further changes to the common mappings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/xpti: really hide almost all of Xen image

Commit 422588e885 ("x86/xpti: Hide almost all of .text and all
.data/.rodata/.bss mappings") carefully limited the Xen image cloning to
just entry code, but then overwrote the just allocated and populated L3
entry with the normal one again covering both Xen image and stubs.

Drop the respective code in favor of an explicit clone_mapping()
invocation. This in turn now requires setup_cpu_root_pgt() to run after
stub setup in all cases. Additionally, with (almost) no unintended
mappings left, the BSP's IDT now also needs to be page aligned.

The moving ahead of cleanup_cpu_root_pgt() is not strictly necessary
for functionality, but things are more logical this way, and we retain
cleanup being done in the inverse order of setup.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/traps: Put idt_table[] back into .bss

c/s d1d6fc97d "x86/xpti: really hide almost all of Xen image" accidentially
moved idt_table[] from .bss to .data by virtue of using the page_aligned
section.  We also have .bss.page_aligned, so use that.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
master commit: 422588e88511d17984544c0f017a927de3315290
master date: 2018-02-15 11:08:27 +0000
master commit: d1d6fc97d66cf56847fc0bcc2ddc370707c22378
master date: 2018-03-06 16:46:27 +0100
master commit: 044fedfaa29b5d5774196e3fc7d955a48bfceac4
master date: 2018-03-09 15:42:24 +0000

7 years agox86: ignore guest microcode loading attempts
Jan Beulich [Fri, 16 Mar 2018 16:14:51 +0000 (17:14 +0100)]
x86: ignore guest microcode loading attempts

The respective MSRs are write-only, and hence attempts by guests to
write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
impression of WRMSR succeeding") no longer ignored. Restore original
behavior for the two affected MSRs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 59c0983e10d70ea2368085271b75fb007811fe52
master date: 2018-03-15 12:44:24 +0100

7 years agoocaml: fix arm build
Wei Liu [Wed, 17 Jan 2018 16:43:54 +0000 (16:43 +0000)]
ocaml: fix arm build

ARM doesn't have emulation_flags in the arch_domainconfig.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 81838c9067ab7f4b89d33f90a71225ffff9800ba)

7 years agoMerge branch 'merge-comet-staging-4.10-v1' into staging-4.10
Wei Liu [Tue, 6 Mar 2018 16:34:26 +0000 (16:34 +0000)]
Merge branch 'merge-comet-staging-4.10-v1' into staging-4.10

7 years agolibxl/pvh: force PVH guests to use the xenstore shutdown
Roger Pau Monne [Tue, 19 Dec 2017 14:17:52 +0000 (14:17 +0000)]
libxl/pvh: force PVH guests to use the xenstore shutdown

PVH guests are all required to support the xenstore-based shutdown
signalling, since there is no other way for a PVH guest to be
requested to shut down.

For HVM guests we check whether the guest has installed a PV-on-HVM
interrupt callback; that does not make sense for PVH guests.

So for PVH guests, take the PV path: assume that all PVH guests have
suitable xenstore drivers.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 1b33150fe06ab9217f7f12b01bc5e607f4f55658)

7 years agox86/HVM: don't give the wrong impression of WRMSR succeeding
Jan Beulich [Tue, 6 Mar 2018 14:56:12 +0000 (15:56 +0100)]
x86/HVM: don't give the wrong impression of WRMSR succeeding

... for non-existent MSRs: wrmsr_hypervisor_regs()'s comment clearly
says that the function returns 0 for unrecognized MSRs, so
{svm,vmx}_msr_write_intercept() should not convert this into success. We
don't want to unconditionally fail the access though, as we can't be
certain the list of handled MSRs is complete enough for the guest types
we care about, so instead mirror what we do on the read paths and probe
the MSR to decide whether to raise #GP.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 1f1d183d49008794b087cf043fc77f724a45af98
master date: 2018-02-27 15:12:23 +0100

7 years agox86/PV: fix off-by-one in I/O bitmap limit check
Jan Beulich [Tue, 6 Mar 2018 14:55:33 +0000 (15:55 +0100)]
x86/PV: fix off-by-one in I/O bitmap limit check

With everyone having their tags below agreeing that putting things the
other way around in the comparison makes things easier to understand, do
that rearrangement while changing the line anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.apu@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: c6527bc66b6dd7a8dadaebb1047c8e52c6c5793c
master date: 2018-02-27 14:10:00 +0100

7 years agogrant: Release domain lock on 'map' path in cache_flush
George Dunlap [Tue, 6 Mar 2018 14:54:54 +0000 (15:54 +0100)]
grant: Release domain lock on 'map' path in cache_flush

common/grant_table.c:cache_flush() grabs the rcu lock for the current
domain, but only releases it on error paths.

Note that this is not a security issue, as the preempt count is used
exclusively for assertions at the moment.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 156b29fca10fd25065fc501eb4b47cff931086f2
master date: 2018-02-27 11:19:27 +0000

7 years agox86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
Andrew Cooper [Tue, 6 Mar 2018 14:54:10 +0000 (15:54 +0100)]
x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context

If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
advertised/hidden.

At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
Therefore, default PV guests can read the value from a previously scheduled
HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.

Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.

To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
update of the MSR, and use this function consistently across the codebase.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: cc0e45db277922b5723a7b1d9657d6f744230cf1
master date: 2018-02-27 10:47:23 +0000

7 years agox86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
Igor Druzhinin [Tue, 6 Mar 2018 14:53:35 +0000 (15:53 +0100)]
x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap

We're noticing a reproducible system boot hang on certain
Skylake platforms where the BIOS is configured in legacy
boot mode with x2APIC disabled. The system stalls immediately
after writing the first SMP initialization sequence into APIC ICR.

The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation for
some reason takes more time than we expect during INIT-SIPI-SIPI
sequence. As the result, the system is constantly moving between
NMI and SMI handler and not making any progress.

To avoid this, initialize the watchdog after SMP bootstrap on
CPU0 and, additionally, protect the NMI handler by moving
IO port access before NMI re-scheduling. The latter should also
help in case of post boot CPU onlining. Although we're running
watchdog at much lower frequency at this point, it's neveretheless
possible we may trigger the issue anyway.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a44f1697968e04fcc6145e3bd51c748b57047240
master date: 2018-02-20 10:16:56 +0100

7 years agox86/srat: fix end calculation in nodes_cover_memory()
Jan Beulich [Tue, 6 Mar 2018 14:52:51 +0000 (15:52 +0100)]
x86/srat: fix end calculation in nodes_cover_memory()

Along the lines of commit 7226486767 ("x86/srat: fix the end pfn check
in valid_numa_range()") nodes_cover_memory() also doesn't consistently
use "end": It's set to an inclusive value initially, but then compared
to the exclusive "end" field of struct node and also possibly set to
nodes[j].start, making it exclusive too. Change the initialization to
make the variable consistently exclusive.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: fdbed42649eb064e7c6d1bae2bdd4f46e7b2a160
master date: 2018-02-15 18:17:32 +0100

7 years agox86/hvm/dmop: only copy what is needed to/from the guest
Ross Lagerwall [Tue, 6 Mar 2018 14:52:20 +0000 (15:52 +0100)]
x86/hvm/dmop: only copy what is needed to/from the guest

dm_op() fails with -EFAULT if the struct xen_dm_op given by the guest is
smaller than Xen's struct xen_dm_op. This is a problem because DMOP is
meant to be a stable ABI but it breaks whenever the size of struct
xen_dm_op changes.

To fix this, change how the copying to and from the guest is done. When
copying from the guest, first copy the header and inspect the op. Then,
only copy the correct amount needed for that op. When copying to the
guest, don't copy the header. Rather, copy only the correct amount
needed for that particular op.

So now the dm_op() will fail if the guest does not supply enough bytes
for the specific op. It will not fail if the guest supplies too many
bytes for the specific op, but Xen will not copy the extra bytes.

Remove some now unused macros and helper functions.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 85cb15dfe4d13b9b8b0f39a9cb257525c0b74c60
master date: 2018-02-15 18:16:17 +0100

7 years agox86/entry: Use 32bit xors rater than 64bit xors for clearing GPRs
Andrew Cooper [Tue, 6 Mar 2018 14:51:33 +0000 (15:51 +0100)]
x86/entry: Use 32bit xors rater than 64bit xors for clearing GPRs

Intel's Silvermont/Knights Landing architecture treats them as full ALU
operations, rather than zeroing idoms.

No functional change, and no change in code volume (only changing the bit
selection in the REX prefix).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: eb1d3a3f04b85d596862a4c9dcf796e67ab4dc09
master date: 2018-02-15 11:08:27 +0000

7 years agox86/emul: Fix the decoding of segment overrides in 64bit mode
Andrew Cooper [Tue, 6 Mar 2018 14:50:56 +0000 (15:50 +0100)]
x86/emul: Fix the decoding of segment overrides in 64bit mode

Explicit segment overides other than %fs and %gs are documented as ignored by
both Intel and AMD.

In practice, this means that:

 * Explicit uses of %ss don't actually yield #SS[0] for non-canonical
   memory references.
 * Explicit uses of %{e,c,d}s don't override %rbp/%rsp-based memory references
   to yield #GP[0] for non-canonical memory references.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b7dce29d9faf3597d009c853ed1fcbed9f7a7f68
master date: 2018-02-15 11:08:27 +0000

7 years agox86/spec_ctrl: Fix several bugs in SPEC_CTRL_ENTRY_FROM_INTR_IST
Andrew Cooper [Tue, 6 Mar 2018 14:50:14 +0000 (15:50 +0100)]
x86/spec_ctrl: Fix several bugs in SPEC_CTRL_ENTRY_FROM_INTR_IST

DO_OVERWRITE_RSB clobbers %rax, meaning in practice that the bti_ist_info
field gets zeroed.  Older versions of this code had the DO_OVERWRITE_RSB
register selectable, so reintroduce this ability and use it to cause the
INTR_IST path to use %rdx instead.

The use of %dl for the %cs.rpl check means that when an IST interrupt hits
Xen, we try to load 1 into the high 32 bits of MSR_SPEC_CTRL, suffering a #GP
fault instead.

Also, drop an unused label which was a copy/paste mistake.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: a2b08fbed388f18235fda5ba1655c1483ef3e215
master date: 2018-02-14 13:22:15 +0000

7 years agox86/srat: fix the end pfn check in valid_numa_range()
Haozhong Zhang [Tue, 6 Mar 2018 14:49:33 +0000 (15:49 +0100)]
x86/srat: fix the end pfn check in valid_numa_range()

... and fix the coding style on fly.

valid_numa_range(..., epfn << PAGE_SHIFT, ...) and its only caller
memory_add(..., epfn, pxm) interpret epfn inconsistently. The former
interprets epfn as the last pfn, while the latter interprets it as the
last pfn plus one. Fix this inconsistency in valid_numa_range(), since
most of other places use the latter interpretation.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 722648676751fda39086f54d961640f88174360b
master date: 2018-02-12 11:08:33 +0000

7 years agox86: reduce Meltdown band-aid IPI overhead
Jan Beulich [Tue, 6 Mar 2018 14:48:31 +0000 (15:48 +0100)]
x86: reduce Meltdown band-aid IPI overhead

In case we can detect single-threaded guest processes (by checking
whether we can account for all root page table uses locally on the vCPU
that's running), there's no point in issuing a sync IPI upon an L4 entry
update, as no other vCPU of the guest will have that page table loaded.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a22320e32dca0918ed23799583f470afe4c24330
master date: 2018-02-07 16:31:41 +0100

7 years agox86/NMI: invert condition in nmi_show_execution_state()
Jan Beulich [Tue, 6 Mar 2018 14:47:55 +0000 (15:47 +0100)]
x86/NMI: invert condition in nmi_show_execution_state()

We want to decode the symbol when _not_ in guest mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 309e0509b7363a895362fcbeac823562c3e18def
master date: 2018-02-06 17:29:59 +0100

7 years agox86/emul: Fix the emulation of invlpga
Andrew Cooper [Tue, 6 Mar 2018 14:46:56 +0000 (15:46 +0100)]
x86/emul: Fix the emulation of invlpga

The instruction requires EFER.SVME set to be usable in the first place.

Furthermore, the emulation doesn't handle ASIDs, so avoid giving the
impression that they work.  Permit ASID 0 which is reserved for non-root
mode (in which case the instruction is identical to invlpg), but raise #UD for
any other ASID.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a91b2ec337a45d5d98e5a4387aa6563bc5cdc4c9
master date: 2018-02-05 18:17:22 +0000

7 years agoignores: update .hgignore
Roger Pau Monné [Thu, 1 Mar 2018 14:11:07 +0000 (15:11 +0100)]
ignores: update .hgignore

To add the shim build output and build directory.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 93de8da382480ab78e83e3bfc05cebb5e5865fe3)

7 years agoignores: update list of git ignored files
Roger Pau Monné [Thu, 1 Mar 2018 14:10:54 +0000 (15:10 +0100)]
ignores: update list of git ignored files

Add the shim build symbol file and remove the xen-shim binary (which
is no longer created).

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit cfbdaeb2e4b47e47521ade90729ba751eb47f4d1)

7 years agofirmware/shim: better filtering of intermediate files during Xen tree setup
Jan Beulich [Thu, 1 Mar 2018 14:10:15 +0000 (15:10 +0100)]
firmware/shim: better filtering of intermediate files during Xen tree setup

I have no idea what *.1 is meant to cover. Instead also exclude
preprocessed and non-source assembly files.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 4c45567d3e5a296af2ff8032cd3108f195b5eb44)

7 years agofirmware/shim: better filtering of dependency files during Xen tree setup
Jan Beulich [Thu, 1 Mar 2018 14:10:02 +0000 (15:10 +0100)]
firmware/shim: better filtering of dependency files during Xen tree setup

I have no idea what *.d1 is supposed to refer to - we only have .*.d
and .*.d2 files (note also the leading dot). Also switch to passing
-name instead of -path to find - that's a requirement for .*.d et al to
work, but would probably have been better from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 3f4bc4cc2f521c572fdc4a30dd1434ef8ecd559c)

7 years agobuild: remove shim related targets
Roger Pau Monné [Fri, 23 Feb 2018 10:05:19 +0000 (11:05 +0100)]
build: remove shim related targets

There's no need to have shim specific targets, so just use the regular
xen makefile targets in order to build the shim binary.

When the shim is build as part of the firmware directory install the
stripped Xen binary to the firmware directory and place a binary with
symbols in the debug directory.

The objcopy step of the shim build is also removed in this patch:
since the shim is booted in PVH mode there's no need for the resulting
binary to be in elf32 format. Xen can load PVH kernels with either a
32 or 64bit elf header.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit b38c4e1763baa448ea34c5f337932f351798c9a1)

7 years agoshim: allow building of just the shim with build-ID-incapable linker
Jan Beulich [Tue, 20 Feb 2018 09:10:59 +0000 (10:10 +0100)]
shim: allow building of just the shim with build-ID-incapable linker

The ELF note the shim build inserts causes mkelf32 to choke on the
second program header. However, the output of mkelf32 isn't really
needed when building inside tools/firmware/ - an attempt to build it is
made solely because of a wrong dependency.

Further changes to the make logic will be needed to also allow building
a shim-enabled "normal" xen with such a linker (as it looks the --notes
option will need passing not just when the linker support build ID
generation).

Also drop a stray variable setting from the x86 Makefile.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 682b13c259e531f6848f535032c256ec8fcaca71)

7 years agofirmware/shim: avoid mkdir error during Xen tree setup
Jan Beulich [Wed, 14 Feb 2018 07:16:00 +0000 (08:16 +0100)]
firmware/shim: avoid mkdir error during Xen tree setup

"mkdir -p" reports a missing operand, as config/ has no subdirs. Oddly
enough this doesn't cause the whole command (and hence the build to
fail), despite the "set -e" now covering the entire set of commands -
perhaps a quirk of the relatively old bash I've seen this with (a few
simple experiments suggest that commands inside () producing a non-
success status would exit the inner shell, but not the outer one).

Add a dummy . argument to the invocation.

Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit f25dce4a2adf518678280495712d66e627adec1e)

7 years agofirmware/shim: correctly handle errors during Xen tree setup
Jan Beulich [Tue, 13 Feb 2018 17:19:33 +0000 (18:19 +0100)]
firmware/shim: correctly handle errors during Xen tree setup

"set -e" on a separate Makefile line is meaningless. Glue together all
the lines that this is supposed to cover.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit e139d34a1c4b7775d5855458a325e0e4176bdf7e)

7 years agofirmware/shim: update Makefile
Wei Liu [Mon, 5 Mar 2018 19:14:48 +0000 (19:14 +0000)]
firmware/shim: update Makefile

The patch named "firmware/shim: fix build process to use POSIX find options"
has diverged between comet branch and staging.

Update the Makefile to use the rune in staging. This is need to
cherry-pick further patches.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/shim: don't use 32-bit compare on boolean variable
Jan Beulich [Thu, 1 Feb 2018 10:32:45 +0000 (11:32 +0100)]
x86/shim: don't use 32-bit compare on boolean variable

Current upstream gas silently assumes 32-bit operand size for most
operations where the size can't be inferred from an involved register
(my own one doesn't anymore, which is how I've noticed this). It is pure
luck that the 3 bytes following pvh_boot are currently padding ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 98dc9606868a807206ad0f4c3a45046d4e0e1260)

7 years agoxen/pvshim: fix GNTTABOP_query_size hypercall forwarding with SMAP
Roger Pau Monne [Fri, 26 Jan 2018 15:29:10 +0000 (15:29 +0000)]
xen/pvshim: fix GNTTABOP_query_size hypercall forwarding with SMAP

Disable SMAP in the shim before bouncing the hypercall, or else L0
will fail to get the hypercall buffer.

Reported-by: Fatih Acar <fatih.acar@gandi.net>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 1124a9a26f05439a3aa31eaea227285e50dc94c0)

7 years agoRevert "x86/boot: Map more than the first 16MB"
Wei Liu [Wed, 17 Jan 2018 19:47:05 +0000 (19:47 +0000)]
Revert "x86/boot: Map more than the first 16MB"

This reverts commit 7d6f958d9d18c54017f5ef6e299a08037f035747.

Now we have PVH info relocation support, this change is no longer
needed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 1c8627d2a102274f8afeb9dc548100a60077fc8d)

7 years agox86: relocate pvh_info
Wei Liu [Wed, 17 Jan 2018 18:38:02 +0000 (18:38 +0000)]
x86: relocate pvh_info

Modify early boot code to relocate pvh info as well, so that we can be
sure __va in __start_xen works.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 201f852eaf8dd429475e5874d7610f0447e1654a)

7 years agoxen/shim: stash RSDP address for ACPI driver
Wei Liu [Mon, 22 Jan 2018 16:28:30 +0000 (16:28 +0000)]
xen/shim: stash RSDP address for ACPI driver

It used to the case that we placed RSDP under 1MB and let Xen search
for it. We moved the placement to under 4GB in 4a5733771, so the
search wouldn't work.

Introduce rsdp_hint to ACPI code and set that variable in
convert_pvh_info.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 8b5314f5d637541cf1b3df83d53bb29d8684b174)

7 years agolibxl: lower shim related message to level DEBUG
Wei Liu [Thu, 18 Jan 2018 16:48:05 +0000 (16:48 +0000)]
libxl: lower shim related message to level DEBUG

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 4dcfd7d1436c77ee92081a36cf63f569dc4ef725)

7 years agox86/shim: use credit scheduler
Wei Liu [Thu, 18 Jan 2018 12:32:35 +0000 (12:32 +0000)]
x86/shim: use credit scheduler

Remove sched=null from shim cmdline and doc

We use the default scheduler (credit1 as of writing). The NULL
scheduler still has bugs to fix.

Update shim.config.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 831cbe98f6d13e24a3e6521ddd35102729e1d505)

7 years agox86/guest: clean up guest/xen.h
Wei Liu [Thu, 18 Jan 2018 11:47:52 +0000 (11:47 +0000)]
x86/guest: clean up guest/xen.h

Remove extraneous semicolon. Add blank lines. Remove unused static
inline functions.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 68ccebedb987f06fc9a78b1e3f33918119bd0abb)

7 years agolibxl: remove whitespaces introduced in 62982da926
Wei Liu [Thu, 18 Jan 2018 11:54:29 +0000 (11:54 +0000)]
libxl: remove whitespaces introduced in 62982da926

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 4a79815ab1fdeffc79e35db20981e83363c985ff)

7 years agoxen/pvshim: switch shim.c to use typesafe mfn_to_page and virt_to_mfn
Roger Pau Monne [Thu, 18 Jan 2018 10:34:04 +0000 (10:34 +0000)]
xen/pvshim: switch shim.c to use typesafe mfn_to_page and virt_to_mfn

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit dd872a261a5306685879866b055fecb3e09dd80d)

7 years agoxen/pvshim: fix coding style issues
Roger Pau Monne [Wed, 17 Jan 2018 09:29:35 +0000 (09:29 +0000)]
xen/pvshim: fix coding style issues

Fix a couple of coding style issues.

No code or functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 250d920ddcd8ee40422265691c069a682c939f1e)

7 years agoxen/pvshim: re-order replace_va_mapping code
Roger Pau Monne [Wed, 17 Jan 2018 09:24:03 +0000 (09:24 +0000)]
xen/pvshim: re-order replace_va_mapping code

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 72eb5d7320192a7999f5ca840b65d4935e202e89)

7 years agoxen/pvshim: identity pin shim vCPUs to pCPUs
Roger Pau Monne [Wed, 17 Jan 2018 09:20:05 +0000 (09:20 +0000)]
xen/pvshim: identity pin shim vCPUs to pCPUs

Since VCPUOP_{up/down} already identity maps vCPU hotplug to pCPU
hotplug also identity pin the vCPUs to the pCPUs in the scheduler.
This prevents vCPU migration and should improve performance.

While there also use __cpumask_set_cpu instead of cpumask_set_cpu,
there's no need to use the locked variant.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c9c71e4124b00d33d89aa95527b32527cee5198f)

7 years agotools: fix arm build after bdf693ee61b48
Wei Liu [Wed, 17 Jan 2018 09:50:27 +0000 (09:50 +0000)]
tools: fix arm build after bdf693ee61b48

The ramdisk fields were removed. We should use modules[0] instead.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit db3ae8becc2b4f9f544eafa06a7c858c7cc9f029)

7 years agoDon't build xen-shim for 32 bit build host
Wei Liu [Tue, 16 Jan 2018 18:56:45 +0000 (18:56 +0000)]
Don't build xen-shim for 32 bit build host

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 36c560e7f38130f12a36e8b66b0785fb655fe893)

7 years agoRevert "x86/guest: use the vcpu_info area from shared_info"
Wei Liu [Mon, 5 Mar 2018 15:02:24 +0000 (15:02 +0000)]
Revert "x86/guest: use the vcpu_info area from shared_info"

This reverts commit 69f4d872e524932d392acd80989c5b776baa4522.

There is already the required commit in staging 4.10. Revert the
workaround from comet branch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/shim: commit shim.config changes for 4.10 branch
Wei Liu [Mon, 5 Mar 2018 15:07:25 +0000 (15:07 +0000)]
x86/shim: commit shim.config changes for 4.10 branch

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoMerge tag '4.10.0-shim-comet-3' into staging-4.10
Wei Liu [Mon, 5 Mar 2018 11:15:16 +0000 (11:15 +0000)]
Merge tag '4.10.0-shim-comet-3' into staging-4.10

Xen 4.10.0 "Comet" shim v3

Fixed trivial merge conflicts of comet and spec ctrl series.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/arm: Flush TLBs before turning on the MMU to avoid stale entries
Julien Grall [Tue, 27 Feb 2018 11:15:57 +0000 (11:15 +0000)]
xen/arm: Flush TLBs before turning on the MMU to avoid stale entries

We don't know what is the state of the TLBs when booting Xen. To avoid
stale entries, it is necessary to flush the TLBs before turning on the
MMU.

Reported-by: Iain Hunter <iain@hunterembedded.co.uk>
Signed-off-by: Julien Grall <julien.gralL@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 1c473c42199a8f4d70533c202e1c57ecd1dad35b)

7 years agoxen/arm: vgic: Make sure the number of SPIs is a multiple of 32
Julien Grall [Fri, 16 Feb 2018 14:59:56 +0000 (14:59 +0000)]
xen/arm: vgic: Make sure the number of SPIs is a multiple of 32

The vGIC relies on having a pending_irq available for every IRQs
described in the ranks. As each rank describes 32 interrupts, we need to
make sure the number of SPIs is a multiple of 32.

Reported-by: Jeff Kubascik <Jeff.Kubascik@dornerworks.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jarvis Roach <Jarvis.Roach@dornerworks.com>
(cherry picked from commit 23b40df6f098e3bcb2f105a4909860240976e40f)

7 years agox86/hvm: Disallow the creation of HVM domains without Local APIC emulation
Andrew Cooper [Tue, 27 Feb 2018 13:19:50 +0000 (14:19 +0100)]
x86/hvm: Disallow the creation of HVM domains without Local APIC emulation

There are multiple problems, not necesserily limited to:

 * Guests which configure event channels via hvmop_set_evtchn_upcall_vector(),
   or which hit %cr8 emulation will cause Xen to fall over a NULL vlapic->regs
   pointer.

 * On Intel hardware, disabling the TPR_SHADOW execution control without
   reenabling CR8_{LOAD,STORE} interception means that the guests %cr8
   accesses interact with the real TPR.  Amongst other things, setting the
   real TPR to 0xf blocks even IPIs from interrupting this CPU.

 * On hardware which sets up the use of Interrupt Posting, including
   IOMMU-Posting, guests run without the appropriate non-root configuration,
   which at a minimum will result in dropped interrupts.

Whether no-LAPIC mode is of any use at all remains to be seen.

This is XSA-256.

Reported-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 0aa6158b674c5d083b75ac8fcd1e7ae92d0c39ae
master date: 2018-02-27 14:08:36 +0100

7 years agognttab: don't blindly free status pages upon version change
Jan Beulich [Tue, 27 Feb 2018 13:19:19 +0000 (14:19 +0100)]
gnttab: don't blindly free status pages upon version change

There may still be active mappings, which would trigger the respective
BUG_ON(). Split the loop into one dealing with the page attributes and
the second (when the first fully passed) freeing the pages. Return an
error if any pages still have pending references.

This is part of XSA-255.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 38bfcc165dda5f4284d7c218b91df9e144ddd88d
master date: 2018-02-27 14:07:12 +0100

7 years agognttab/ARM: don't corrupt shared GFN array
Jan Beulich [Tue, 27 Feb 2018 13:18:34 +0000 (14:18 +0100)]
gnttab/ARM: don't corrupt shared GFN array

... by writing status GFNs to it. Introduce a second array instead.
Also implement gnttab_status_gmfn() properly now that the information is
suitably being tracked.

While touching it anyway, remove a misguided (but luckily benign) upper
bound check from gnttab_shared_gmfn(): We should never access beyond the
bounds of that array.

This is part of XSA-255.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 9d2f8f9c65d4da35437f50ed9e812a2c5ab313e2
master date: 2018-02-27 14:04:44 +0100

7 years agomemory: don't implicitly unpin for decrease-reservation
Jan Beulich [Tue, 27 Feb 2018 13:17:36 +0000 (14:17 +0100)]
memory: don't implicitly unpin for decrease-reservation

It very likely was a mistake (copy-and-paste from domain cleanup code)
to implicitly unpin here: The caller should really unpin itself before
(or after, if they so wish) requesting the page to be removed.

This is XSA-252.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d798a0952903db9d8ee0a580e03f214d2b49b7d7
master date: 2018-02-27 14:03:27 +0100

7 years agoxen/arm: cpuerrata: Actually check errata on non-boot CPUs
Julien Grall [Wed, 14 Feb 2018 12:22:23 +0000 (12:22 +0000)]
xen/arm: cpuerrata: Actually check errata on non-boot CPUs

The cpu errata framework was introduced in commit 8b01f6364f "xen/arm:
Detect silicon revision and set cap bits accordingly" and was meant to
detect errata present on any CPUs (via check_local_cpu_errata). However,
the function to check the MIDR (is_affected_midr_range) mistakenly
always use the boot CPU MIDR.

Fix is_affected_midr_range to use the current CPU MIDR.

Reported-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 27196d4cc917d91b5b5daee50173565139ca9c9d)

7 years agoxen/arm: vsmc: Don't implement function IDs that don't exist
Julien Grall [Tue, 6 Feb 2018 15:53:24 +0000 (15:53 +0000)]
xen/arm: vsmc: Don't implement function IDs that don't exist

The current implementation of SMCCC relies on the fact only the function
number (bits [15:0]) is enough to identify what to implement.

However, PSCI call are only available in the range 0x84000000-0x8400001F
and 0xC4000000-0xC400001F. Furthermore, not all SMC32 functions have
equivalent in the SMC64. This is the case of:
    * PSCI_VERSION
    * CPU_OFF
    * MIGRATE_INFO_TYPE
    * SYSTEM_OFF
    * SYSTEM_RESET

Similarly call count, call uid, revision can only be query using smc32/hvc32
fast calls (See 6.2 in ARM DEN 0028B).

Xen should only implement identifier existing in the specification in
order to avoid potential clashes with later revision. Therefore rework the
vsmc code to use the whole function identifier rather than only the
function number.

At the same time, the new macros for call count, call uid, revision are
renamed to better suit the spec.

Lastly, update SSSC_SMCCC_FUNCTION_COUNT to match the correct number of
funtions. Note that version is not updated because the number has always
been wrong, and nobody could properly use it.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: vpsci: Removing dummy MIGRATE and MIGRATE_INFO_UP_CPU
Julien Grall [Tue, 6 Feb 2018 15:53:23 +0000 (15:53 +0000)]
xen/arm: vpsci: Removing dummy MIGRATE and MIGRATE_INFO_UP_CPU

The PSCI call MIGRATE and MIGRATE_INFO_UP_CPU are optional and
implemented as just returning PSCI_NOT_SUPPORTED (aka UNKNOWN_FUNCTION
for SMCCC).

The new SMCCC framework is able to deal with unimplemented function and
return the proper error code. So remove the implementations for both
function.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/idle: Clear SPEC_CTRL while idle
Andrew Cooper [Thu, 8 Feb 2018 10:54:52 +0000 (11:54 +0100)]
x86/idle: Clear SPEC_CTRL while idle

On contemporary hardware, setting IBRS/STIBP has a performance impact on
adjacent hyperthreads.  It is therefore recommended to clear the setting
before becoming idle, to avoid an idle core preventing adjacent userspace
execution from running at full performance.

Care must be taken to ensure there are no ret or indirect branch instructions
between spec_ctrl_{enter,exit}_idle() invocations, which are forced always
inline.  Care must also be taken to avoid using spec_ctrl_enter_idle() between
flushing caches and becoming idle, in cases where that matters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4c7e478d597b0346eef3a256cfd6794ac778b608
master date: 2018-01-26 14:10:21 +0000

7 years agox86/cpuid: Offer Indirect Branch Controls to guests
Andrew Cooper [Thu, 8 Feb 2018 10:54:12 +0000 (11:54 +0100)]
x86/cpuid: Offer Indirect Branch Controls to guests

With all infrastructure in place, it is now safe to let guests see and use
these features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: 67c6838ddacfa646f9d1ae802bd0f16a935665b8
master date: 2018-01-26 14:10:21 +0000

7 years agox86/ctxt: Issue a speculation barrier between vcpu contexts
Andrew Cooper [Thu, 8 Feb 2018 10:53:40 +0000 (11:53 +0100)]
x86/ctxt: Issue a speculation barrier between vcpu contexts

Issuing an IBPB command flushes the Branch Target Buffer, so that any poison
left by one vcpu won't remain when beginning to execute the next.

The cost of IBPB is substantial, and skipped on transition to idle, as Xen's
idle code is robust already.  All transitions into vcpu context are fully
serialising in practice (and under consideration for being retroactively
declared architecturally serialising), so a cunning attacker cannot use SP1 to
try and skip the flush.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a2ed643ed783020f885035432e9c0919756921d1
master date: 2018-01-26 14:10:21 +0000

7 years agox86/boot: Calculate the most appropriate BTI mitigation to use
Andrew Cooper [Thu, 8 Feb 2018 10:53:10 +0000 (11:53 +0100)]
x86/boot: Calculate the most appropriate BTI mitigation to use

See the logic and comments in init_speculation_mitigations() for further
details.

There are two controls for RSB overwriting, because in principle there are
cases where it might be safe to forego rsb_native (Off the top of my head,
SMEP active, no 32bit PV guests at all, no use of vmevent/paging subsystems
for HVM guests, but I make no guarantees that this list of restrictions is
exhaustive).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/spec_ctrl: Fix determination of when to use IBRS

The original version of this logic was:

    /*
     * On Intel hardware, we'd like to use retpoline in preference to
     * IBRS, but only if it is safe on this hardware.
     */
    else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
    {
        if ( retpoline_safe() )
            thunk = THUNK_RETPOLINE;
        else
            ibrs = true;
    }

but it was changed by a request during review.  Sadly, the result is buggy as
it breaks the later fallback logic by allowing IBRS to appear as available
when in fact it isn't.

This in practice means that on repoline-unsafe hardware without IBRS, we
select THUNK_JUMP despite intending to select THUNK_RETPOLINE.

Reported-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 2713715305ca516f698d58cec5e0b322c3b2c4eb
master date: 2018-01-26 14:10:21 +0000
master commit: 30cbd0c83ef3d0edac2d5bcc41a9a2b7a843ae58
master date: 2018-02-06 18:32:58 +0000

7 years agox86/entry: Avoid using alternatives in NMI/#MC paths
Andrew Cooper [Thu, 8 Feb 2018 10:52:28 +0000 (11:52 +0100)]
x86/entry: Avoid using alternatives in NMI/#MC paths

This patch is deliberately arranged to be easy to revert if/when alternatives
patching becomes NMI/#MC safe.

For safety, there must be a dispatch serialising instruction in (what is
logically) DO_SPEC_CTRL_ENTRY so that, in the case that Xen needs IBRS set in
context, an attacker can't speculate around the WRMSR and reach an indirect
branch within the speculation window.

Using conditionals opens this attack vector up, so the else clause gets an
LFENCE to force the pipeline to catch up before continuing.  This also covers
the safety of RSB conditional, as execution it is guaranteed to either hit the
WRMSR or LFENCE.

One downside of not using alternatives is that there unconditionally an LFENCE
in the IST path in cases where we are not using the features from IBRS-capable
microcode.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3fffaf9c13e9502f09ad4ab1aac3f8b7b9398f6f
master date: 2018-01-26 14:10:21 +0000

7 years agox86/entry: Organise the clobbering of the RSB/RAS on entry to Xen
Andrew Cooper [Thu, 8 Feb 2018 10:51:38 +0000 (11:51 +0100)]
x86/entry: Organise the clobbering of the RSB/RAS on entry to Xen

ret instructions are speculated directly to values recorded in the Return
Stack Buffer/Return Address Stack, as there is no uncertainty in well-formed
code.  Guests can take advantage of this in two ways:

  1) If they can find a path in Xen which executes more ret instructions than
     call instructions.  (At least one in the waitqueue infrastructure,
     probably others.)

  2) Use the fact that the RSB/RAS in hardware is actually a circular stack
     without a concept of empty.  (When it logically empties, stale values
     will start being used.)

To mitigate, overwrite the RSB on entry to Xen with gadgets which will capture
and contain rogue speculation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: e6c0128e9ab25bf66df11377a33ee5584d7f99e3
master date: 2018-01-26 14:10:21 +0000

7 years agox86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
Andrew Cooper [Thu, 8 Feb 2018 10:50:40 +0000 (11:50 +0100)]
x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

We need to be able to either set or clear IBRS in Xen context, as well as
restore appropriate guest values in guest context.  See the documentation in
asm-x86/spec_ctrl_asm.h for details.

With the contemporary microcode, writes to %cr3 are slower when SPEC_CTRL.IBRS
is set.  Therefore, the positioning of SPEC_CTRL_{ENTRY/EXIT}* is important.

Ideally, the IBRS_SET/IBRS_CLEAR hunks might be positioned either side of the
%cr3 change, but that is rather more complicated to arrange, and could still
result in a guest controlled value in SPEC_CTRL during the %cr3 change,
negating the saving if the guest chose to have IBRS set.

Therefore, we optimise for the pre-Skylake case (being far more common in the
field than Skylake and later, at the moment), where we have a Xen-preferred
value of IBRS clear when switching %cr3.

There is a semi-unrelated bugfix, where various asm_defn.h macros have a
hidden dependency on PAGE_SIZE, which results in an assembler error if used in
a .macro definition.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 5e7962901131186d3514528ed57c7a9901a15a3e
master date: 2018-01-26 14:10:21 +0000

7 years agox86/hvm: Permit guests direct access to MSR_{SPEC_CTRL,PRED_CMD}
Andrew Cooper [Thu, 8 Feb 2018 10:49:32 +0000 (11:49 +0100)]
x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL,PRED_CMD}

For performance reasons, HVM guests should have direct access to these MSRs
when possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 5a2fe171144ebcc908ea1fca45058d6010f6a286
master date: 2018-01-26 14:10:21 +0000

7 years agox86/migrate: Move MSR_SPEC_CTRL on migrate
Andrew Cooper [Thu, 8 Feb 2018 10:49:02 +0000 (11:49 +0100)]
x86/migrate: Move MSR_SPEC_CTRL on migrate

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 0cf2a4eb769302b7d7d7835540e7b2f15006df30
master date: 2018-01-26 14:10:21 +0000

7 years agox86/msr: Emulation of MSR_{SPEC_CTRL,PRED_CMD} for guests
Andrew Cooper [Thu, 8 Feb 2018 10:48:22 +0000 (11:48 +0100)]
x86/msr: Emulation of MSR_{SPEC_CTRL,PRED_CMD} for guests

As per the spec currently available here:

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

MSR_ARCH_CAPABILITIES will only come into existence on new hardware, but is
implemented as a straight #GP for now to avoid being leaky when new hardware
arrives.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ea58a679a6190e714a592f1369b660769a48a80c
master date: 2018-01-26 14:10:21 +0000

7 years agox86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
Andrew Cooper [Thu, 8 Feb 2018 10:47:41 +0000 (11:47 +0100)]
x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests

Intel specifies IBRS/IBPB (combined, in a single bit) and STIBP as a separate
bit.  AMD specifies IBPB alone in a 3rd bit.

AMD's IBPB is a subset of Intel's combined IBRS/IBPB.  For performance
reasons, administrators might wish to express "IBPB only" even on Intel
hardware, so we allow the AMD bit to be used for this purpose.

The behaviour of STIBP is more complicated.

It is our current understanding that STIBP will be advertised on HT-capable
hardware irrespective of whether HT is enabled, but not advertised on
HT-incapable hardware.  However, for ease of virtualisation, STIBP's
functionality is ignored rather than reserved by microcode/hardware on
HT-incapable hardware.

For guest safety, we treat STIBP as special, always override the toolstack
choice, and always advertise STIBP if IBRS is available.  This removes the
corner case where STIBP is not advertised, but the guest is running on
HT-capable hardware where it does matter.

Finally as a bugfix, update the libxc CPUID logic to understand the e8b
feature leaf, which has the side effect of also offering CLZERO to guests on
applicable hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d297b56682e730d598e2529cc6998151d3b6f6f8
master date: 2018-01-26 14:10:21 +0000

7 years agox86: fix GET_STACK_END
Wei Liu [Thu, 8 Feb 2018 10:45:19 +0000 (11:45 +0100)]
x86: fix GET_STACK_END

AIUI the purpose of having the .if directive is to make GET_STACK_END
work with any general purpose registers. The code as-is would produce
the wrong result for r8. Fix it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 8155476765a5bdecea1534b46562cf28e0113a9a
master date: 2018-01-25 11:34:17 +0000