]> xenbits.xensource.com Git - people/dariof/xen.git/log
people/dariof/xen.git
7 years agoARM: VGIC: split up gic_dump_info() to cover virtual part separately
Andre Przywara [Tue, 6 Feb 2018 17:08:58 +0000 (17:08 +0000)]
ARM: VGIC: split up gic_dump_info() to cover virtual part separately

Currently gic_dump_info() not only dumps the hardware state of the GIC,
but also the VGIC internal virtual IRQ lists.
Split the latter off and move it into gic-vgic.c to observe the abstraction.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: VGIC: split gic.c to observe hardware/virtual GIC separation
Andre Przywara [Tue, 6 Feb 2018 17:08:57 +0000 (17:08 +0000)]
ARM: VGIC: split gic.c to observe hardware/virtual GIC separation

Currently gic.c holds code to handle hardware IRQs as well as code to
bridge VGIC requests to the GIC virtualization hardware.
Despite being named gic.c, this file reaches into the VGIC and uses data
structures describing virtual IRQs.
To improve abstraction, move the VGIC functions into a separate file,
so that gic.c does what it says on the tin.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: drop unneeded gic_restore_pending_irqs()
Andre Przywara [Tue, 6 Feb 2018 17:08:56 +0000 (17:08 +0000)]
ARM: VGIC: drop unneeded gic_restore_pending_irqs()

In gic_restore_pending_irqs() we push our pending virtual IRQs into the
list registers. This function is called once from gic_inject(), just
before we return to the guest, but also in gic_restore_state(), when
we context-switch a VCPU. Having a closer look it turns out that the
later call is not needed, since we will always call gic_inject() anyway.
So remove that call (and the forward declaration) to streamline this
interface and make separating the GIC from the VGIC world later.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen: Disable ARINC653 scheduler by default for non-DEBUG builds
George Dunlap [Thu, 8 Feb 2018 16:23:50 +0000 (16:23 +0000)]
xen: Disable ARINC653 scheduler by default for non-DEBUG builds

The ARINC653 scheduler is targeted at a very specific niche; typical
users cannot benefit from using it.  Disable it by default for
non-DEBUG builds.  (Enable it for DEBUG builds so that we catch any
build breakages sooner rather than later.)

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agoxen: Fix credit1 Kconfig entry
George Dunlap [Thu, 8 Feb 2018 16:23:50 +0000 (16:23 +0000)]
xen: Fix credit1 Kconfig entry

...so that it shows up in the menu and can be disabled.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agoocaml/libs/xb: don't generate *.mli automatically
Wei Liu [Wed, 7 Feb 2018 17:09:34 +0000 (17:09 +0000)]
ocaml/libs/xb: don't generate *.mli automatically

To stay in line with other parts of the ocaml code base.

This requires committing a bunch of mli files in tree.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
7 years agoocaml/libs/xb: update xb.mli in accordance with df1e4c6e7f8
Wei Liu [Wed, 7 Feb 2018 17:09:33 +0000 (17:09 +0000)]
ocaml/libs/xb: update xb.mli in accordance with df1e4c6e7f8

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
7 years agostubdom: install firmware files as data
Olaf Hering [Wed, 7 Feb 2018 15:11:17 +0000 (16:11 +0100)]
stubdom: install firmware files as data

Remove the executable bits of vtpm files by using _DATA instead of _PROG.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agokconfig/gcov: rename to coverage
Roger Pau Monné [Wed, 7 Feb 2018 15:32:18 +0000 (16:32 +0100)]
kconfig/gcov: rename to coverage

So it can be used by both gcc and clang. Just add the Kconfig option
and modify the makefiles so the llvm coverage specific code can be
added in a follow up patch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[jb: also change the shim config]

7 years agox86: reduce Meltdown band-aid IPI overhead
Jan Beulich [Wed, 7 Feb 2018 15:31:41 +0000 (16:31 +0100)]
x86: reduce Meltdown band-aid IPI overhead

In case we can detect single-threaded guest processes (by checking
whether we can account for all root page table uses locally on the vCPU
that's running), there's no point in issuing a sync IPI upon an L4 entry
update, as no other vCPU of the guest will have that page table loaded.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoPCI/passthrough: don't discard Dom0 provided information
Jan Beulich [Wed, 7 Feb 2018 15:30:24 +0000 (16:30 +0100)]
PCI/passthrough: don't discard Dom0 provided information

Instead of giving, to subsequent code, the appearance of there not
having been any "info" data provided, adjust the conditional guarding
SR-IOV handling.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agoupdate the minimal ocaml version to 4.02
Michael Young [Wed, 7 Feb 2018 13:59:00 +0000 (13:59 +0000)]
update the minimal ocaml version to 4.02

The ocaml safe-strings patch uses code introduced in ocaml 4.02
so update the minimal version.

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
7 years agox86/boot: Make alternative patching NMI-safe
Andrew Cooper [Wed, 31 Jan 2018 16:09:39 +0000 (16:09 +0000)]
x86/boot: Make alternative patching NMI-safe

During patching, there is a very slim risk that an NMI or MCE interrupt in the
middle of altering the code in the NMI/MCE paths, in which case bad things
will happen.

The NMI risk can be eliminated by running the patching loop in NMI context, at
which point the CPU will defer further NMIs until patching is complete.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agox86/mm: Add debug code to detect illegal page_lock and put_page_type ordering
George Dunlap [Wed, 24 Jan 2018 11:56:31 +0000 (11:56 +0000)]
x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering

The fix for XSA-242 depends on the same cpu never calling
_put_page_type() while holding a page_lock() for that page; doing so
may cause a deadlock under the right conditions.

Furthermore, even before that, there was never any discipline for the
order in which page locks are grabbed; if there are any paths that
grab the locks for two different pages at once, we risk creating the
conditions for a deadlock to occur.

These are believed to be safe, because it is believed that:
1. No hypervisor paths ever lock two pages at once, and
2. We never call _put_page_type() on a page while holding its page lock.

Add a check to debug builds to catch any violations of these
assumpitons.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agomake xen ocaml safe-strings compliant
Michael Young [Tue, 6 Feb 2018 21:27:23 +0000 (21:27 +0000)]
make xen ocaml safe-strings compliant

Xen built with ocaml 4.06 gives errors such as
Error: This expression has type bytes but an expression was
        expected of type string
as Byte and safe-strings which were introduced in 4.02 are the
default in 4.06.
This patch which is mostly by Richard W.M. Jones of Red Hat
from https://bugzilla.redhat.com/show_bug.cgi?id=1526703
fixes these issues.

v2: drop tools/ocaml/libs/xc/xenctrl.ml from the patch as the
affected code was removed by commit d933f1a53c06002351c1e36d40615e40bd4bf6af
tools/ocaml: Drop coredump infrastructure

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
[ wei: remove trailing whitespaces ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: clearify symlink usage in xen-pv-channel
Olaf Hering [Wed, 7 Feb 2018 08:45:53 +0000 (09:45 +0100)]
docs: clearify symlink usage in xen-pv-channel

The previous version simply states that a symlink has to be created
without telling where the symlink should point to.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: fix kernel config option in xen-pv-channel
Olaf Hering [Wed, 7 Feb 2018 08:30:57 +0000 (09:30 +0100)]
docs: fix kernel config option in xen-pv-channel

HVC is shown underlined, the underscores are missing.
Fix it by using underscores.
Remove stale I.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/spec_ctrl: Fix determination of when to use IBRS
Andrew Cooper [Tue, 6 Feb 2018 13:45:17 +0000 (13:45 +0000)]
x86/spec_ctrl: Fix determination of when to use IBRS

The original version of this logic was:

    /*
     * On Intel hardware, we'd like to use retpoline in preference to
     * IBRS, but only if it is safe on this hardware.
     */
    else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
    {
        if ( retpoline_safe() )
            thunk = THUNK_RETPOLINE;
        else
            ibrs = true;
    }

but it was changed by a request during review.  Sadly, the result is buggy as
it breaks the later fallback logic by allowing IBRS to appear as available
when in fact it isn't.

This in practice means that on repoline-unsafe hardware without IBRS, we
select THUNK_JUMP despite intending to select THUNK_RETPOLINE.

Reported-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxc: add xc_domain_remove_from_physmap to wrap XENMEM_remove_from_physmap
Zhongze Liu [Tue, 30 Jan 2018 17:50:18 +0000 (01:50 +0800)]
libxc: add xc_domain_remove_from_physmap to wrap XENMEM_remove_from_physmap

This is for the proposal "Allow setting up shared memory areas between VMs
from xl config file". See:

  https://lists.xen.org/archives/html/xen-devel/2017-08/msg03242.html

Then plan is to use XENMEM_add_to_physmap_batch to map the shared pages from
one domU to another and use XENMEM_remove_from_physmap to cancel the sharing.
A wrapper to XENMEM_add_to_physmap_batch was added in the following commit:

  commit 20e725e9364cff4a29945f66986ecd88cca8743d

Now add the wrapper to XENMEM_remove_from_physmap.

Signed-off-by: Zhongze Liu <blackskygg@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotests/xen-access: disable CR4 write events on application exit
Razvan Cojocaru [Mon, 29 Jan 2018 21:48:24 +0000 (23:48 +0200)]
tests/xen-access: disable CR4 write events on application exit

On exit, xen-access did not unsubscribe from CR4 write vm_events,
potentially leaving the guest stuck.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
7 years agox86/NMI: invert condition in nmi_show_execution_state()
Jan Beulich [Tue, 6 Feb 2018 16:29:59 +0000 (17:29 +0100)]
x86/NMI: invert condition in nmi_show_execution_state()

We want to decode the symbol when _not_ in guest mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agolibxc: don't fail domain creation when unpacking initrd fails
Jan Beulich [Tue, 6 Feb 2018 16:29:33 +0000 (17:29 +0100)]
libxc: don't fail domain creation when unpacking initrd fails

At least Linux kernels have been able to work with gzip-ed initrd for
quite some time; initrd compressed with other methods aren't even being
attempted to unpack. Furthermore the unzip-ing routine used here isn't
capable of dealing with various forms of concatenated files, each of
which was gzip-ed separately (it is this particular case which has been
the source of observed VM creation failures).

Hence, if unpacking fails, simply hand the compressed blob to the guest
as is.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/livepatch: Drop stray tabs and fix indentation
Andrew Cooper [Mon, 5 Feb 2018 11:03:47 +0000 (11:03 +0000)]
xen/livepatch: Drop stray tabs and fix indentation

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agox86/emul: Fix the emulation of invlpga
Andrew Cooper [Fri, 2 Feb 2018 16:10:17 +0000 (16:10 +0000)]
x86/emul: Fix the emulation of invlpga

The instruction requires EFER.SVME set to be usable in the first place.

Furthermore, the emulation doesn't handle ASIDs, so avoid giving the
impression that they work.  Permit ASID 0 which is reserved for non-root
mode (in which case the instruction is identical to invlpg), but raise #UD for
any other ASID.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/emul: Misc non-functional improvements
Andrew Cooper [Fri, 2 Feb 2018 11:42:05 +0000 (11:42 +0000)]
x86/emul: Misc non-functional improvements

 * Drop trailing whitespace
 * Use ARRAY_SIZE() rather than opencoding it

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/svm: correct EFER.SVME intercept checks
Brian Woods [Mon, 5 Feb 2018 09:15:25 +0000 (10:15 +0100)]
x86/svm: correct EFER.SVME intercept checks

Corrects some EFER.SVME checks in intercepts.  See AMD APM vol2 section
15.4 for more details.  VMMCALL isn't checked due to guests needing it
to boot.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/svm: update VGIF support
Brian Woods [Mon, 5 Feb 2018 09:14:48 +0000 (10:14 +0100)]
x86/svm: update VGIF support

There are places where the GIF value is checked.  A guest with VGIF
enabled can change the GIF value without the host being involved,
therefore it needs to check the GIF value in the VMCB rather the one in
the nestedsvm struct.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86emul: add missing suffixes in test harness
Jan Beulich [Mon, 5 Feb 2018 09:14:15 +0000 (10:14 +0100)]
x86emul: add missing suffixes in test harness

I'm in the process of putting together a gas change issuing at least
warnings when the intended size of a memory operation can't be deduced
from another (register) operand. Add missing suffixes to silence such
future diagnostics.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: add tables for XOP 08 and 09 extension spaces
Jan Beulich [Mon, 5 Feb 2018 09:12:50 +0000 (10:12 +0100)]
x86emul: add tables for XOP 08 and 09 extension spaces

Convert the few existing opcodes so far supported.

Also adjust two vex_* case labels to better be ext_* (the values are
identical).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/arm: Don't crash the domain on invalid HVC immediate
Julien Grall [Fri, 2 Feb 2018 10:14:44 +0000 (10:14 +0000)]
xen/arm: Don't crash the domain on invalid HVC immediate

domain_crash_synchronous() should only be used when something went wrong
in Xen. It is better to inject to the guest as it will be in a better
position to provide helpful information (stack trace...).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Don't crash domain on bad MMIO emulation
Julien Grall [Fri, 2 Feb 2018 10:14:43 +0000 (10:14 +0000)]
xen/arm: Don't crash domain on bad MMIO emulation

Now the MMIO emulation is able to distinguish unhandled IO from aborted
one, there are no need to crash the domain when the region is access
with a bad width.

Instead let Xen inject a data abort to the guest and decide what to do.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: io: Distinguish unhandled IO from aborted one
Julien Grall [Fri, 2 Feb 2018 10:14:42 +0000 (10:14 +0000)]
xen/arm: io: Distinguish unhandled IO from aborted one

Currently, Xen is considering that an IO could either be handled or
unhandled. When unhandled, the stage-2 abort function will try another
way to resolve the abort.

However, the MMIO emulation may return unhandled when the address
belongs to an emulated range but was not correct. In that case, Xen
should avoid to try another way and directly inject a guest data abort.

Introduce a tri-state return to distinguish the following state:
    * IO_ABORT: The IO was handled but resulted in an abort
    * IO_HANDLED: The IO was handled
    * IO_UNHANDLED: The IO was unhandled

For now, it is considered that an IO belonging to an emulated range
could either be handled or inject an abort. This could be revisit in the
future if overlapped region exist (or we want to try another way to
resolve the abort).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: traps: Merge try_handle_mmio() and handle_mmio()
Julien Grall [Fri, 2 Feb 2018 10:14:41 +0000 (10:14 +0000)]
xen/arm: traps: Merge try_handle_mmio() and handle_mmio()

At the moment, try_handle_mmio() will do check on the HSR and bail out
if one check fail. This means that another method will be tried to
handle the fault even for bad access on emulated region. While this
should not be an issue, this is not future proof.

Move the checks of try_handle_mmio() in handle_mmio() after we identified
the fault to target an emulated MMIO. While this does not fix the potential
fall-through, a follow-up patch will do by distinguish the potential error.

Note that the handle_mmio() was renamed to try_handle_mmio() and the
prototype adapted.

While merging the 2 functions, remove the check whether the fault is
stage-2 abort on stage-1 translation walk because the instruction
syndrome will always be invalid (see B3-1433 in DDI 0406C.c and
D10-2460 in DDI 0487C.a).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm32: entry: Document the purpose of r11 in the traps handler
Julien Grall [Fri, 2 Feb 2018 14:19:25 +0000 (14:19 +0000)]
xen/arm32: entry: Document the purpose of r11 in the traps handler

It took me a bit of time to understand why __DEFINE_TRAP_ENTRY is
storing the original stack pointer in r11. It is working in pair with
return_traps_entry where sp will be restored from r11.

This is fine because per the AAPCS r11 must be preserved by the
subroutine. So in return_from_trap, r11 will still contain the original
stack pointer.

Add some documentation in the code to point the 2 sides to each other.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm32: Invalidate icache on guest exist for Cortex-A15
Julien Grall [Fri, 2 Feb 2018 14:19:24 +0000 (14:19 +0000)]
xen/arm32: Invalidate icache on guest exist for Cortex-A15

In order to avoid aliasing attacks against the branch predictor on
Cortex A-15, let's invalidate the BTB on guest exit, which can only be
done by invalidating the icache (with ACTLR[0] being set).

We use the same hack as for A12/A17 to perform the vector decoding.

This is based on Linux patch from the kpti branch in [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm32: Invalidate BTB on guest exit for Cortex A17 and 12
Julien Grall [Fri, 2 Feb 2018 14:19:23 +0000 (14:19 +0000)]
xen/arm32: Invalidate BTB on guest exit for Cortex A17 and 12

In order to avoid aliasing attackes agains the branch predictor, let's
invalidate the BTB on guest exist. This is made complicated by the fact
that we cannot take a branch invalidating the BTB.

This is based on the fourth version posted by Marc Zyngier on Linux-arm
mailing list (see [1]).

This is part of XSA-254.

[1] https://www.spinics.net/lists/arm-kernel/msg632062.html

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm32: Add skeleton to harden branch predictor aliasing attacks
Julien Grall [Fri, 2 Feb 2018 14:19:22 +0000 (14:19 +0000)]
xen/arm32: Add skeleton to harden branch predictor aliasing attacks

Aliasing attacked against CPU branch predictors can allow an attacker to
redirect speculative control flow on some CPUs and potentially divulge
information from one context to another.

This patch adds initiatial skeleton code behind a new Kconfig option
to enable implementation-specific mitigations against these attacks
for CPUs that are affected.

Most of mitigations will have to be applied when entering to the
hypervisor from the guest context.

Because the attack is against branch predictor, it is not possible to
safely use branch instruction before the mitigation is applied.
Therefore this has to be done in the vector entry before jump to the
helper handling a given exception.

However, on arm32, each vector contain a single instruction. This means
that the hardened vector tables may rely on the state of registers that
does not hold when in the hypervisor (e.g SP is 8 bytes aligned).
Therefore hypervisor code running with guest vectors table should be
minimized and always have IRQs and SErrors masked to reduce the risk to
use them.

This patch provides an infrastructure to switch vector tables before
entering to the guest and when leaving it.

Note that alternative could have been used, but older Xen (4.8 or
earlier) doesn't have support. So avoid using alternative to ease
backporting.

This is part of XSA-254.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm32: entry: Add missing trap_reset entry
Julien Grall [Fri, 2 Feb 2018 14:19:21 +0000 (14:19 +0000)]
xen/arm32: entry: Add missing trap_reset entry

At the moment, the reset vector is defined as .word 0 (e.g andeq r0, r0,
r0).

This is rather unintuitive and will result to execute the trap
undefined. Instead introduce trap helpers for reset and will generate an
error message in the unlikely case that reset will be called.

This is part of XSA-254.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm32: Add missing MIDR values for Cortex-A17 and A12
Julien Grall [Fri, 2 Feb 2018 14:19:20 +0000 (14:19 +0000)]
xen/arm32: Add missing MIDR values for Cortex-A17 and A12

Cortex-A17 and A12 MIDR will be used in a follow-up patch for hardening
the branch predictor.

This is part of XSA-254.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm32: entry: Consolidate DEFINE_TRAP_ENTRY_* macros
Julien Grall [Fri, 2 Feb 2018 14:19:19 +0000 (14:19 +0000)]
xen/arm32: entry: Consolidate DEFINE_TRAP_ENTRY_* macros

The only difference between all the DEFINE_TRAP_ENTRY_* macros  are the
interrupts (Asynchronous Abort, IRQ, FIQ) unmasked.

Rather than duplicating the code, introduce __DEFINE_TRAP_ENTRY macro
that will take the list of interrupts to unmask.

This is part of XSA-254.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/emul: Add structure names to opcode tables
Andrew Cooper [Thu, 1 Feb 2018 19:51:23 +0000 (19:51 +0000)]
x86/emul: Add structure names to opcode tables

No functional change, but it makes the diff context line more helpful when
reviewing patches which alter the opcode tables.  e.g. Consider:

  --- a/xen/arch/x86/x86_emulate/x86_emulate.c
  +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
  @@ -370,7 +370,7 @@ static const struct {
       [0x0c ... 0x0f] = { .simd_size = simd_packed_fp },
       [0x10] = { .simd_size = simd_packed_int },
       [0x13] = { .simd_size = simd_other, .two_op = 1 },
  -    [0x14 ... 0x15] = { .simd_size = simd_packed_fp },
  +    [0x14 ... 0x16] = { .simd_size = simd_packed_fp },
       [0x17] = { .simd_size = simd_packed_int, .two_op = 1 },
       [0x18 ... 0x19] = { .simd_size = simd_scalar_fp, .two_op = 1 },
       [0x1a] = { .simd_size = simd_128, .two_op = 1 },

which is entirely ambiguous between 0f38 and 0f3a, and the same diff with this
change in place:

  --- a/xen/arch/x86/x86_emulate/x86_emulate.c
  +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
  @@ -370,7 +370,7 @@ static const struct ext0f38_table {
       [0x0c ... 0x0f] = { .simd_size = simd_packed_fp },
       [0x10] = { .simd_size = simd_packed_int },
       [0x13] = { .simd_size = simd_other, .two_op = 1 },
  -    [0x14 ... 0x15] = { .simd_size = simd_packed_fp },
  +    [0x14 ... 0x16] = { .simd_size = simd_packed_fp },
       [0x17] = { .simd_size = simd_packed_int, .two_op = 1 },
       [0x18 ... 0x19] = { .simd_size = simd_scalar_fp, .two_op = 1 },
       [0x1a] = { .simd_size = simd_128, .two_op = 1 },

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: support FMA insns
Jan Beulich [Fri, 2 Feb 2018 10:57:34 +0000 (11:57 +0100)]
x86emul: support FMA insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper@citrix.com>
7 years agox86: allow easier disabling of BTI mitigations
Jan Beulich [Fri, 2 Feb 2018 10:56:08 +0000 (11:56 +0100)]
x86: allow easier disabling of BTI mitigations

Support both a "disable everything" and a "disable all RSB overwriting"
sub-option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/emul: Split exception handling out of invoke_stub()
Andrew Cooper [Wed, 24 Jan 2018 17:41:13 +0000 (17:41 +0000)]
x86/emul: Split exception handling out of invoke_stub()

For a release build, bloat-o-meter reports:

  add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5111 (-5111)
  function                                     old     new   delta
  x86_emulate                               126458  121347   -5111

or in other words, a 4% redunction in code size from this change alone.

The use of __LINE__ is a concern with livepatching, but any livepatch touching
this file is overwhemlingly likely to alter x86_emulate() anyway.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoarm/alternatives: Drop the !HAS_ALTERNATIVE infrastructure
Andrew Cooper [Tue, 30 Jan 2018 11:08:45 +0000 (11:08 +0000)]
arm/alternatives: Drop the !HAS_ALTERNATIVE infrastructure

ARM now unconditionally selects HAS_ALTERNATIVE, which has caused the
!HAS_ALTERNATIVE code in include/asm-arm/alternative.h to bitrot to the point
of failing to compile.

Expand all the CONFIG_HAS_ALTERNATIVE references in ARM code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agox86/ioemul: Misc improvements to ioport_emulate.c
Andrew Cooper [Thu, 11 Jan 2018 12:42:59 +0000 (12:42 +0000)]
x86/ioemul: Misc improvements to ioport_emulate.c

Put the opcode into an array and use memcpy.  This allows the compiled code to
be written with two movs, rather than 10 mov $imm8's.  Also, drop trailing
whitespace in the file.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/shim: don't use 32-bit compare on boolean variable
Jan Beulich [Thu, 1 Feb 2018 10:32:45 +0000 (11:32 +0100)]
x86/shim: don't use 32-bit compare on boolean variable

Current upstream gas silently assumes 32-bit operand size for most
operations where the size can't be inferred from an involved register
(my own one doesn't anymore, which is how I've noticed this). It is pure
luck that the 3 bytes following pvh_boot are currently padding ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86emul: support FMA4 insns
Jan Beulich [Thu, 1 Feb 2018 10:31:55 +0000 (11:31 +0100)]
x86emul: support FMA4 insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support F16C insns
Jan Beulich [Thu, 1 Feb 2018 10:29:39 +0000 (11:29 +0100)]
x86emul: support F16C insns

Note that this avoids emulating the behavior of VCVTPS2PH found on at
least some Intel CPUs, which update MXCSR even when the memory write
faults.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/emul: Improvements to internal users of decode_register()
Andrew Cooper [Thu, 25 Jan 2018 12:16:12 +0000 (12:16 +0000)]
x86/emul: Improvements to internal users of decode_register()

Most users of decode_register() can be replaced with decode_gpr() right away.

For the few sites which do care about possibly using the legacy byteop
encoding, rename decode_register() to _decode_gpr() (to match its non-legacy
counterpart), and adjust its 'int highbyte_regs' parameter to the more correct
'bool legacy'.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: Improvements to external users of decode_register()
Andrew Cooper [Thu, 25 Jan 2018 12:16:12 +0000 (12:16 +0000)]
x86/hvm: Improvements to external users of decode_register()

 * Rename to decode_gpr() to be more specific as to its purpose
 * Drop the highbyte encoding handling, as no users currently care, and it
   unlikely that future users would care.
 * Change to a static inline, returning an unsigned long pointer.

Doing so highlights that the "invalid gpr" paths in hvm_mov_{to,from}_cr()
were actually unreachable.  All callers already passed in-range GPRs, and
out-of-range GPRs would have hit the BUG() previously.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/emul: Optimise decode_register() somewhat
Andrew Cooper [Thu, 25 Jan 2018 12:16:12 +0000 (12:16 +0000)]
x86/emul: Optimise decode_register() somewhat

The positions of GPRs inside struct cpu_user_regs doesn't follow any
particular order, so as compiled, decode_register() becomes a jump table to 16
blocks which calculate the appropriate offset, at a total of 207 bytes.

Instead, pre-compute the offsets at build time and use pointer arithmetic to
calculate the result.  By observation, most callers in x86_emulate() inline
and constant-propagate the highbyte_regs value of 0.

The splitting of the general and legacy byte-op cases means that we will now
hit an ASSERT if any code path tries to use the legacy byte-op encoding with a
REX prefix.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/emul: Introduce a test covering legacy byte ops
Andrew Cooper [Tue, 30 Jan 2018 15:39:55 +0000 (15:39 +0000)]
x86/emul: Introduce a test covering legacy byte ops

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: move declaration of the exception_table to C
Roger Pau Monné [Wed, 31 Jan 2018 11:36:38 +0000 (12:36 +0100)]
x86: move declaration of the exception_table to C

This makes the code cleaner because there's no need to declare the
exception_table in assembly, and also fixes the following error when
using clang's integrated assembler:

entry.S:834:15: error: unexpected token in '.rept' directive
        .rept 32 - ((. - exception_table) / 8)
              ^
entry.S:836:14: error: unmatched '.endr' directive
        .endr
             ^

This should be a non-functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: re-organize toggle_guest_*()
Jan Beulich [Wed, 31 Jan 2018 11:35:52 +0000 (12:35 +0100)]
x86: re-organize toggle_guest_*()

toggle_guest_mode() is only ever being called for 64-bit PV vCPU-s -
replace the 32-bit PV conditional by an ASSERT().

Introduce a local helper without 32-bit PV conditional, to be used by
both pre-existing functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxenmem_add_to_physmap_one() has no need to know of XENMAPSPACE_gmfn_range
Jan Beulich [Wed, 31 Jan 2018 11:34:08 +0000 (12:34 +0100)]
xenmem_add_to_physmap_one() has no need to know of XENMAPSPACE_gmfn_range

As its name says, it handles a single GMFN only anyway. Note that ARM
needs no adjustment, as it doesn't handle the two types at all.

Also take the opportunity and clean up the handling of XENMAPSPACE_gmfn
a little: There's no point in going through "idx" when capturing the MFN.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/cmdline: Fix parse_boolean() for unadorned values
Andrew Cooper [Wed, 31 Jan 2018 10:35:52 +0000 (10:35 +0000)]
xen/cmdline: Fix parse_boolean() for unadorned values

A command line such as "cpuid=no-ibrsb,no-stibp" tickles a bug in
parse_boolean() because the separating comma fails the NUL case.

Instead, check for slen == nlen which accounts for the boundary (if any)
passed via the 'e' parameter.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoARM: GICv3: copy Dom0 GICv3 reg property from host DT
Andre Przywara [Tue, 30 Jan 2018 09:35:05 +0000 (09:35 +0000)]
ARM: GICv3: copy Dom0 GICv3 reg property from host DT

At the moment we re-generate the Dom0 GICv3 DT node, by creating the
"reg" property from scratch using our previously parsed and
translated(!) host addresses. However we then write the *absolute*
addresses into the new node, not considering possible "range" mappings
in any of the GIC's parent nodes. So whenever one of the parents has a
non-empty ranges property, Dom0 will wrongly translate the addresses.
Properly incorporating the ranges properties sounds tedious, so let's
just copy the first part of the reg property instead (as we do for GICv2),
since the addresses for Dom0 are identical to those from the hardware.

The mainline kernel DT for the Espressobin board with an Marvell 3720 SoC
has the GIC in such an translated bus, so this patch allows this board
to boot properly (after adding support for the SoC's UART).

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: GICv3: Only initialize ITS when the distributor supports LPIs.
Julien Grall [Wed, 24 Jan 2018 18:26:16 +0000 (18:26 +0000)]
xen/arm: GICv3: Only initialize ITS when the distributor supports LPIs.

There are firmware tables out describing the ITS but does not support
LPIs. This will result to a data abort when trying to initialize ITS.

While this can be consider a bug in the Device-Tree, same configuration
boots on Linux. So gate the ITS initialization with the support of LPIs
in the distributor.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: GICv3: Parse ITS information from the firmware tables later on
Julien Grall [Wed, 24 Jan 2018 18:26:15 +0000 (18:26 +0000)]
xen/arm: GICv3: Parse ITS information from the firmware tables later on

There are Device Tree (e.g for the Foundation Model) out that describes the
ITS but LPIs is not supported by the platform. Booting with such DT will
result to an early Data Abort. The same DT is booting fine with a
baremetal  Linux because ITS will be initialized only when LPIs is
supported.

While this is a bug in the DT, I think Xen should be boot with the same
hardware level support (e.g ITS will not be used) as with a baremetal
Linux.

The slight problem is Xen is relying on gicv3_its_host_has_its() to know
if ITS can be used. The list is populated by gicv3_its_{dt,acpi}_init().
It would be theoretically possible to gate those with a check of
GICD_TYPER.LPIS because we don't know yet whether the HW is an actual
GICv3/GICv4.

Looking at the callers of gicv3_its_host_has_its(), they will only be
done after gicv3_its_init() is called. Therefore move the parsing of ITS
information from firmware tables later on.

Note that gicv3_its_init() has been moved at the end of the file to
avoid forward declaration.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/hvm: Drop hvm_set_mode() and associated vmx hooks
Andrew Cooper [Sat, 27 Jan 2018 21:09:10 +0000 (21:09 +0000)]
x86/hvm: Drop hvm_set_mode() and associated vmx hooks

This is more vestigial rementants of PVHv1.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agoxen/evtchn: Cleanup for virq_is_global() infrastructure
Andrew Cooper [Sun, 21 Jan 2018 17:21:05 +0000 (17:21 +0000)]
xen/evtchn: Cleanup for virq_is_global() infrastructure

Switch it, and the arch infrastructure, to return bool.  Drop the unnecessary
rc variable, and remove a redundant assertion from send_global_virq().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/asm: Drop __GET_CURRENT()
Andrew Cooper [Mon, 29 Jan 2018 18:01:35 +0000 (18:01 +0000)]
x86/asm: Drop __GET_CURRENT()

__GET_CURRENT() is dangerous to use, as is easy to confuse with GET_CURRENT(),
but strictly depends on the regster parameter already having the STACK_END
value in it.  Also, there is no reason to special case accesses of
current_vcpu differently to other cpuinfo fields.

Expand __GET_CURRENT() in its current users, and remove the macro.

Take the opportunity to replace the GET_CURRENT() in the cstar path which
doesn't need to recalculate STACK_END.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/libxl: Fix assertion failure when trying to build a nested-virt PVH domain
Andrew Cooper [Fri, 26 Jan 2018 19:03:12 +0000 (19:03 +0000)]
tools/libxl: Fix assertion failure when trying to build a nested-virt PVH domain

xl: libxl.c:339: libxl_defbool_val: Assertion `!libxl_defbool_is_default(db)' failed.

This happens because initiate_domain_create() checks for type != HVM, then
pokes at the hvm union.  Check for == HVM instead so the union access is
correctly guarded.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoxen: Fix XSM build after dropping XEN_DOMCTL_getmemlist
Andrew Cooper [Fri, 26 Jan 2018 19:33:40 +0000 (19:33 +0000)]
xen: Fix XSM build after dropping XEN_DOMCTL_getmemlist

c/s 94450e36bfbb removed XEN_DOMCTL_getmemlist entirely, but missed adjusting
the XSM side of things.  As far as I can tell, 'pagelist' wasn't even offered
to dom0 in default policy.

Also, drop the stale struct xen_domctl_getmemlist which was missed from the
same changeset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agox86/boot: turn the selftests ASSERT into a warning
Stefano Stabellini [Fri, 26 Jan 2018 17:19:31 +0000 (09:19 -0800)]
x86/boot: turn the selftests ASSERT into a warning

On selftests failure, print a very visible warning instead of crashing
over an ASSERT.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Adjust to print extra information in the case of a failure

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/boot: Make the "Building Dom0" messages consistent
Andrew Cooper [Fri, 26 Jan 2018 15:59:51 +0000 (15:59 +0000)]
x86/boot: Make the "Building Dom0" messages consistent

Switch the PV message to match the wording of the PVH side, use the same
number of ***'s, explicitly identify PV vs PVH, set the log level at INFO, and
print the real domid (which won't be 0 in pv-shim mode).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen: Drop DOMCTL_getmemlist and xc_get_pfn_list()
Andrew Cooper [Mon, 15 Jan 2018 10:00:51 +0000 (10:00 +0000)]
xen: Drop DOMCTL_getmemlist and xc_get_pfn_list()

c/s 4ddf474e2 "tools/xen-mceinj: Pass in GPA when injecting through
MSR_MCI_ADDR" removed the remaining user of hypercall.

It has been listed as broken, deprecated and wont-fix since XSA-74, so take
this opportunity to remove it completely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/ocaml: Drop coredump infrastructure
Andrew Cooper [Fri, 19 Jan 2018 18:04:27 +0000 (18:04 +0000)]
tools/ocaml: Drop coredump infrastructure

It is unused, and uses an obsolete hypercall which has never ever functioned
for HVM guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
7 years agoxen/pvshim: fix GNTTABOP_query_size hypercall forwarding with SMAP
Roger Pau Monne [Fri, 26 Jan 2018 15:29:10 +0000 (15:29 +0000)]
xen/pvshim: fix GNTTABOP_query_size hypercall forwarding with SMAP

Disable SMAP in the shim before bouncing the hypercall, or else L0
will fail to get the hypercall buffer.

Reported-by: Fatih Acar <fatih.acar@gandi.net>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/idle: Clear SPEC_CTRL while idle
Andrew Cooper [Fri, 3 Nov 2017 16:43:02 +0000 (16:43 +0000)]
x86/idle: Clear SPEC_CTRL while idle

On contemporary hardware, setting IBRS/STIBP has a performance impact on
adjacent hyperthreads.  It is therefore recommended to clear the setting
before becoming idle, to avoid an idle core preventing adjacent userspace
execution from running at full performance.

Care must be taken to ensure there are no ret or indirect branch instructions
between spec_ctrl_{enter,exit}_idle() invocations, which are forced always
inline.  Care must also be taken to avoid using spec_ctrl_enter_idle() between
flushing caches and becoming idle, in cases where that matters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/cpuid: Offer Indirect Branch Controls to guests
Andrew Cooper [Mon, 13 Nov 2017 15:41:38 +0000 (15:41 +0000)]
x86/cpuid: Offer Indirect Branch Controls to guests

With all infrastructure in place, it is now safe to let guests see and use
these features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/ctxt: Issue a speculation barrier between vcpu contexts
Andrew Cooper [Thu, 16 Nov 2017 18:35:11 +0000 (18:35 +0000)]
x86/ctxt: Issue a speculation barrier between vcpu contexts

Issuing an IBPB command flushes the Branch Target Buffer, so that any poison
left by one vcpu won't remain when beginning to execute the next.

The cost of IBPB is substantial, and skipped on transition to idle, as Xen's
idle code is robust already.  All transitions into vcpu context are fully
serialising in practice (and under consideration for being retroactively
declared architecturally serialising), so a cunning attacker cannot use SP1 to
try and skip the flush.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/boot: Calculate the most appropriate BTI mitigation to use
Andrew Cooper [Tue, 19 Dec 2017 13:59:21 +0000 (13:59 +0000)]
x86/boot: Calculate the most appropriate BTI mitigation to use

See the logic and comments in init_speculation_mitigations() for further
details.

There are two controls for RSB overwriting, because in principle there are
cases where it might be safe to forego rsb_native (Off the top of my head,
SMEP active, no 32bit PV guests at all, no use of vmevent/paging subsystems
for HVM guests, but I make no guarantees that this list of restrictions is
exhaustive).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Avoid using alternatives in NMI/#MC paths
Andrew Cooper [Mon, 22 Jan 2018 14:41:33 +0000 (14:41 +0000)]
x86/entry: Avoid using alternatives in NMI/#MC paths

This patch is deliberately arranged to be easy to revert if/when alternatives
patching becomes NMI/#MC safe.

For safety, there must be a dispatch serialising instruction in (what is
logically) DO_SPEC_CTRL_ENTRY so that, in the case that Xen needs IBRS set in
context, an attacker can't speculate around the WRMSR and reach an indirect
branch within the speculation window.

Using conditionals opens this attack vector up, so the else clause gets an
LFENCE to force the pipeline to catch up before continuing.  This also covers
the safety of RSB conditional, as execution it is guaranteed to either hit the
WRMSR or LFENCE.

One downside of not using alternatives is that there unconditionally an LFENCE
in the IST path in cases where we are not using the features from IBRS-capable
microcode.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Organise the clobbering of the RSB/RAS on entry to Xen
Andrew Cooper [Fri, 3 Nov 2017 16:39:42 +0000 (16:39 +0000)]
x86/entry: Organise the clobbering of the RSB/RAS on entry to Xen

ret instructions are speculated directly to values recorded in the Return
Stack Buffer/Return Address Stack, as there is no uncertainty in well-formed
code.  Guests can take advantage of this in two ways:

  1) If they can find a path in Xen which executes more ret instructions than
     call instructions.  (At least one in the waitqueue infrastructure,
     probably others.)

  2) Use the fact that the RSB/RAS in hardware is actually a circular stack
     without a concept of empty.  (When it logically empties, stale values
     will start being used.)

To mitigate, overwrite the RSB on entry to Xen with gadgets which will capture
and contain rogue speculation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
Andrew Cooper [Fri, 3 Nov 2017 16:17:00 +0000 (16:17 +0000)]
x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

We need to be able to either set or clear IBRS in Xen context, as well as
restore appropriate guest values in guest context.  See the documentation in
asm-x86/spec_ctrl_asm.h for details.

With the contemporary microcode, writes to %cr3 are slower when SPEC_CTRL.IBRS
is set.  Therefore, the positioning of SPEC_CTRL_{ENTRY/EXIT}* is important.

Ideally, the IBRS_SET/IBRS_CLEAR hunks might be positioned either side of the
%cr3 change, but that is rather more complicated to arrange, and could still
result in a guest controlled value in SPEC_CTRL during the %cr3 change,
negating the saving if the guest chose to have IBRS set.

Therefore, we optimise for the pre-Skylake case (being far more common in the
field than Skylake and later, at the moment), where we have a Xen-preferred
value of IBRS clear when switching %cr3.

There is a semi-unrelated bugfix, where various asm_defn.h macros have a
hidden dependency on PAGE_SIZE, which results in an assembler error if used in
a .macro definition.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: Permit guests direct access to MSR_{SPEC_CTRL,PRED_CMD}
Andrew Cooper [Tue, 14 Nov 2017 19:22:28 +0000 (19:22 +0000)]
x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL,PRED_CMD}

For performance reasons, HVM guests should have direct access to these MSRs
when possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/migrate: Move MSR_SPEC_CTRL on migrate
Andrew Cooper [Thu, 16 Nov 2017 18:40:27 +0000 (18:40 +0000)]
x86/migrate: Move MSR_SPEC_CTRL on migrate

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/msr: Emulation of MSR_{SPEC_CTRL,PRED_CMD} for guests
Andrew Cooper [Thu, 9 Nov 2017 19:28:04 +0000 (19:28 +0000)]
x86/msr: Emulation of MSR_{SPEC_CTRL,PRED_CMD} for guests

As per the spec currently available here:

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

MSR_ARCH_CAPABILITIES will only come into existence on new hardware, but is
implemented as a straight #GP for now to avoid being leaky when new hardware
arrives.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
Andrew Cooper [Tue, 16 Jan 2018 15:45:51 +0000 (15:45 +0000)]
x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests

Intel specifies IBRS/IBPB (combined, in a single bit) and STIBP as a separate
bit.  AMD specifies IBPB alone in a 3rd bit.

AMD's IBPB is a subset of Intel's combined IBRS/IBPB.  For performance
reasons, administrators might wish to express "IBPB only" even on Intel
hardware, so we allow the AMD bit to be used for this purpose.

The behaviour of STIBP is more complicated.

It is our current understanding that STIBP will be advertised on HT-capable
hardware irrespective of whether HT is enabled, but not advertised on
HT-incapable hardware.  However, for ease of virtualisation, STIBP's
functionality is ignored rather than reserved by microcode/hardware on
HT-incapable hardware.

For guest safety, we treat STIBP as special, always override the toolstack
choice, and always advertise STIBP if IBRS is available.  This removes the
corner case where STIBP is not advertised, but the guest is running on
HT-capable hardware where it does matter.

Finally as a bugfix, update the libxc CPUID logic to understand the e8b
feature leaf, which has the side effect of also offering CLZERO to guests on
applicable hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/build: Untangle CONFIG_DEBUG and CONFIG_FRAME_POINTER
Andrew Cooper [Thu, 25 Jan 2018 18:38:17 +0000 (18:38 +0000)]
xen/build: Untangle CONFIG_DEBUG and CONFIG_FRAME_POINTER

Both options are independently choseable in KConfig, but currently a DEBUG
build without FRAME_POINTER is left to the compilers default choice, not the
users choice.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/p2m: force return value checking of p2m_set_entry()
Jan Beulich [Fri, 26 Jan 2018 12:26:57 +0000 (13:26 +0100)]
x86/p2m: force return value checking of p2m_set_entry()

As XSAs 246 and 247 have shown, not doing so is rather dangerous.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen: Fix xsm build after [g]cov renaming
Wei Liu [Thu, 25 Jan 2018 13:14:24 +0000 (13:14 +0000)]
xen: Fix xsm build after [g]cov renaming

Commit e8d461497d9 renamed gcov_op to coverage_op but forgot to change
XSM handles.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoxl: Don't warn on using 'deprecated' mode selection
George Dunlap [Mon, 8 Jan 2018 15:50:53 +0000 (15:50 +0000)]
xl: Don't warn on using 'deprecated' mode selection

We generally support old config formats indefinintely (see the disk
format) without emitting warnings.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl: move ibxl_devid_to_device_... to LIBXL_DEFINE_DEVID_TO_DEVICE
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:59 +0000 (19:19 +0200)]
libxl: move ibxl_devid_to_device_... to LIBXL_DEFINE_DEVID_TO_DEVICE

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: move libxl__device_from_ to LIBXL_DEFINE_DEVICE_FROM_TYPE
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:58 +0000 (19:19 +0200)]
libxl: move libxl__device_from_ to LIBXL_DEFINE_DEVICE_FROM_TYPE

LIBXL_DEFINE_DEVICE_FROM_TYPE uses libxl__..._devtype.type to
be assigned as device and backend type.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: use libxl__device_kind in LIBXL_DEFINE_UPDATE_DEVID
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:57 +0000 (19:19 +0200)]
libxl: use libxl__device_kind in LIBXL_DEFINE_UPDATE_DEVID

Use libxl__..._devtype.type to update device id.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: use libxl__device_kind to get device XS entry
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:56 +0000 (19:19 +0200)]
libxl: use libxl__device_kind to get device XS entry

On adding to XS name of device is taken from
libxl__device_kind enum. On getting device from XS
the name is hardcoded. It leads to potential
mistmatch errors. The patch is using libxl__device_kind
everywere to have one source of device name.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: fix GET_STACK_END
Wei Liu [Wed, 24 Jan 2018 20:26:26 +0000 (20:26 +0000)]
x86: fix GET_STACK_END

AIUI the purpose of having the .if directive is to make GET_STACK_END
work with any general purpose registers. The code as-is would produce
the wrong result for r8. Fix it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agocoverage: introduce generic file
Roger Pau Monné [Thu, 25 Jan 2018 11:30:01 +0000 (12:30 +0100)]
coverage: introduce generic file

It will contain the generic implementation of sysctl_cov_op, which
will be shared between all the coverage implementations.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agogcov: introduce hooks for the sysctl
Roger Pau Monné [Thu, 25 Jan 2018 11:28:47 +0000 (12:28 +0100)]
gcov: introduce hooks for the sysctl

So that other implementations of the sysctl can be added.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agogcov: rename sysctl and functions
Roger Pau Monné [Thu, 25 Jan 2018 11:27:44 +0000 (12:27 +0100)]
gcov: rename sysctl and functions

Change gcov to cov (for internal interfaces) or coverage (for the
public ones).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/domctl: remove XEN_DOMCTL_pin_mem_cacheattr
Ross Lagerwall [Thu, 25 Jan 2018 11:26:55 +0000 (12:26 +0100)]
x86/domctl: remove XEN_DOMCTL_pin_mem_cacheattr

Remove the implementation of XEN_DOMCTL_pin_mem_cacheattr since it has
been replaced by a dmop. Change xc_domain_pin_memory_cacheattr() so
that it is only defined when XC_WANT_COMPAT_DEVICEMODEL_API is set and
have it call the new dmop.  Leave the definitions of
XEN_DOMCTL_MEM_CACHEATTR_* since they are still used by QEMU.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxendevicemodel: provide xendevicemodel_pin_memory_cacheattr
Ross Lagerwall [Thu, 25 Jan 2018 11:26:36 +0000 (12:26 +0100)]
libxendevicemodel: provide xendevicemodel_pin_memory_cacheattr

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agolibxendevicemodel: provide xendevicemodel_relocate_memory
Ross Lagerwall [Thu, 25 Jan 2018 11:26:23 +0000 (12:26 +0100)]
libxendevicemodel: provide xendevicemodel_relocate_memory

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agox86/hvm: provide XEN_DMOP_pin_memory_cacheattr
Ross Lagerwall [Thu, 25 Jan 2018 11:25:22 +0000 (12:25 +0100)]
x86/hvm: provide XEN_DMOP_pin_memory_cacheattr

Provide XEN_DMOP_pin_memory_cacheattr to allow a deprivileged QEMU to
pin the caching type of RAM after moving the VRAM. It is equivalent to
XEN_DOMCTL_pin_memory_cacheattr.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: provide XEN_DMOP_relocate_memory
Ross Lagerwall [Thu, 25 Jan 2018 11:24:14 +0000 (12:24 +0100)]
x86/hvm: provide XEN_DMOP_relocate_memory

Provide XEN_DMOP_relocate_memory, a limited version of
XENMEM_add_to_physmap to allow a deprivileged QEMU to move VRAM when a
guest programs its BAR. It is equivalent to XENMEM_add_to_physmap with
space == XENMAPSPACE_gmfn_range.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: make xenmem_add_to_physmap global
Ross Lagerwall [Thu, 25 Jan 2018 11:23:35 +0000 (12:23 +0100)]
mm: make xenmem_add_to_physmap global

Make it global in preparation to be called by a new dmop.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/VT-d: Remove the use of __LINE__ from IOMMU_WAIT_OP()
Andrew Cooper [Wed, 24 Jan 2018 14:11:16 +0000 (14:11 +0000)]
xen/VT-d: Remove the use of __LINE__ from IOMMU_WAIT_OP()

The use of __LINE__ in printk()'s is problematic for livepatching, as it tends
to cause unnecessary binary differences.

Take this opportunity to provide some rather more useful information than just
file/line/func in the form of the full register/stack trace leading to the
problem (which I've needed in the past for debugging).

Also, drop the unnecessary else clause while editing here here.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>