]> xenbits.xensource.com Git - people/dariof/xen.git/log
people/dariof/xen.git
7 years agox86emul: support FMA4 insns
Jan Beulich [Thu, 1 Feb 2018 10:31:55 +0000 (11:31 +0100)]
x86emul: support FMA4 insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support F16C insns
Jan Beulich [Thu, 1 Feb 2018 10:29:39 +0000 (11:29 +0100)]
x86emul: support F16C insns

Note that this avoids emulating the behavior of VCVTPS2PH found on at
least some Intel CPUs, which update MXCSR even when the memory write
faults.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/emul: Improvements to internal users of decode_register()
Andrew Cooper [Thu, 25 Jan 2018 12:16:12 +0000 (12:16 +0000)]
x86/emul: Improvements to internal users of decode_register()

Most users of decode_register() can be replaced with decode_gpr() right away.

For the few sites which do care about possibly using the legacy byteop
encoding, rename decode_register() to _decode_gpr() (to match its non-legacy
counterpart), and adjust its 'int highbyte_regs' parameter to the more correct
'bool legacy'.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: Improvements to external users of decode_register()
Andrew Cooper [Thu, 25 Jan 2018 12:16:12 +0000 (12:16 +0000)]
x86/hvm: Improvements to external users of decode_register()

 * Rename to decode_gpr() to be more specific as to its purpose
 * Drop the highbyte encoding handling, as no users currently care, and it
   unlikely that future users would care.
 * Change to a static inline, returning an unsigned long pointer.

Doing so highlights that the "invalid gpr" paths in hvm_mov_{to,from}_cr()
were actually unreachable.  All callers already passed in-range GPRs, and
out-of-range GPRs would have hit the BUG() previously.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/emul: Optimise decode_register() somewhat
Andrew Cooper [Thu, 25 Jan 2018 12:16:12 +0000 (12:16 +0000)]
x86/emul: Optimise decode_register() somewhat

The positions of GPRs inside struct cpu_user_regs doesn't follow any
particular order, so as compiled, decode_register() becomes a jump table to 16
blocks which calculate the appropriate offset, at a total of 207 bytes.

Instead, pre-compute the offsets at build time and use pointer arithmetic to
calculate the result.  By observation, most callers in x86_emulate() inline
and constant-propagate the highbyte_regs value of 0.

The splitting of the general and legacy byte-op cases means that we will now
hit an ASSERT if any code path tries to use the legacy byte-op encoding with a
REX prefix.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/emul: Introduce a test covering legacy byte ops
Andrew Cooper [Tue, 30 Jan 2018 15:39:55 +0000 (15:39 +0000)]
x86/emul: Introduce a test covering legacy byte ops

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: move declaration of the exception_table to C
Roger Pau Monné [Wed, 31 Jan 2018 11:36:38 +0000 (12:36 +0100)]
x86: move declaration of the exception_table to C

This makes the code cleaner because there's no need to declare the
exception_table in assembly, and also fixes the following error when
using clang's integrated assembler:

entry.S:834:15: error: unexpected token in '.rept' directive
        .rept 32 - ((. - exception_table) / 8)
              ^
entry.S:836:14: error: unmatched '.endr' directive
        .endr
             ^

This should be a non-functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: re-organize toggle_guest_*()
Jan Beulich [Wed, 31 Jan 2018 11:35:52 +0000 (12:35 +0100)]
x86: re-organize toggle_guest_*()

toggle_guest_mode() is only ever being called for 64-bit PV vCPU-s -
replace the 32-bit PV conditional by an ASSERT().

Introduce a local helper without 32-bit PV conditional, to be used by
both pre-existing functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxenmem_add_to_physmap_one() has no need to know of XENMAPSPACE_gmfn_range
Jan Beulich [Wed, 31 Jan 2018 11:34:08 +0000 (12:34 +0100)]
xenmem_add_to_physmap_one() has no need to know of XENMAPSPACE_gmfn_range

As its name says, it handles a single GMFN only anyway. Note that ARM
needs no adjustment, as it doesn't handle the two types at all.

Also take the opportunity and clean up the handling of XENMAPSPACE_gmfn
a little: There's no point in going through "idx" when capturing the MFN.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/cmdline: Fix parse_boolean() for unadorned values
Andrew Cooper [Wed, 31 Jan 2018 10:35:52 +0000 (10:35 +0000)]
xen/cmdline: Fix parse_boolean() for unadorned values

A command line such as "cpuid=no-ibrsb,no-stibp" tickles a bug in
parse_boolean() because the separating comma fails the NUL case.

Instead, check for slen == nlen which accounts for the boundary (if any)
passed via the 'e' parameter.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoARM: GICv3: copy Dom0 GICv3 reg property from host DT
Andre Przywara [Tue, 30 Jan 2018 09:35:05 +0000 (09:35 +0000)]
ARM: GICv3: copy Dom0 GICv3 reg property from host DT

At the moment we re-generate the Dom0 GICv3 DT node, by creating the
"reg" property from scratch using our previously parsed and
translated(!) host addresses. However we then write the *absolute*
addresses into the new node, not considering possible "range" mappings
in any of the GIC's parent nodes. So whenever one of the parents has a
non-empty ranges property, Dom0 will wrongly translate the addresses.
Properly incorporating the ranges properties sounds tedious, so let's
just copy the first part of the reg property instead (as we do for GICv2),
since the addresses for Dom0 are identical to those from the hardware.

The mainline kernel DT for the Espressobin board with an Marvell 3720 SoC
has the GIC in such an translated bus, so this patch allows this board
to boot properly (after adding support for the SoC's UART).

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: GICv3: Only initialize ITS when the distributor supports LPIs.
Julien Grall [Wed, 24 Jan 2018 18:26:16 +0000 (18:26 +0000)]
xen/arm: GICv3: Only initialize ITS when the distributor supports LPIs.

There are firmware tables out describing the ITS but does not support
LPIs. This will result to a data abort when trying to initialize ITS.

While this can be consider a bug in the Device-Tree, same configuration
boots on Linux. So gate the ITS initialization with the support of LPIs
in the distributor.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: GICv3: Parse ITS information from the firmware tables later on
Julien Grall [Wed, 24 Jan 2018 18:26:15 +0000 (18:26 +0000)]
xen/arm: GICv3: Parse ITS information from the firmware tables later on

There are Device Tree (e.g for the Foundation Model) out that describes the
ITS but LPIs is not supported by the platform. Booting with such DT will
result to an early Data Abort. The same DT is booting fine with a
baremetal  Linux because ITS will be initialized only when LPIs is
supported.

While this is a bug in the DT, I think Xen should be boot with the same
hardware level support (e.g ITS will not be used) as with a baremetal
Linux.

The slight problem is Xen is relying on gicv3_its_host_has_its() to know
if ITS can be used. The list is populated by gicv3_its_{dt,acpi}_init().
It would be theoretically possible to gate those with a check of
GICD_TYPER.LPIS because we don't know yet whether the HW is an actual
GICv3/GICv4.

Looking at the callers of gicv3_its_host_has_its(), they will only be
done after gicv3_its_init() is called. Therefore move the parsing of ITS
information from firmware tables later on.

Note that gicv3_its_init() has been moved at the end of the file to
avoid forward declaration.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/hvm: Drop hvm_set_mode() and associated vmx hooks
Andrew Cooper [Sat, 27 Jan 2018 21:09:10 +0000 (21:09 +0000)]
x86/hvm: Drop hvm_set_mode() and associated vmx hooks

This is more vestigial rementants of PVHv1.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agoxen/evtchn: Cleanup for virq_is_global() infrastructure
Andrew Cooper [Sun, 21 Jan 2018 17:21:05 +0000 (17:21 +0000)]
xen/evtchn: Cleanup for virq_is_global() infrastructure

Switch it, and the arch infrastructure, to return bool.  Drop the unnecessary
rc variable, and remove a redundant assertion from send_global_virq().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/asm: Drop __GET_CURRENT()
Andrew Cooper [Mon, 29 Jan 2018 18:01:35 +0000 (18:01 +0000)]
x86/asm: Drop __GET_CURRENT()

__GET_CURRENT() is dangerous to use, as is easy to confuse with GET_CURRENT(),
but strictly depends on the regster parameter already having the STACK_END
value in it.  Also, there is no reason to special case accesses of
current_vcpu differently to other cpuinfo fields.

Expand __GET_CURRENT() in its current users, and remove the macro.

Take the opportunity to replace the GET_CURRENT() in the cstar path which
doesn't need to recalculate STACK_END.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/libxl: Fix assertion failure when trying to build a nested-virt PVH domain
Andrew Cooper [Fri, 26 Jan 2018 19:03:12 +0000 (19:03 +0000)]
tools/libxl: Fix assertion failure when trying to build a nested-virt PVH domain

xl: libxl.c:339: libxl_defbool_val: Assertion `!libxl_defbool_is_default(db)' failed.

This happens because initiate_domain_create() checks for type != HVM, then
pokes at the hvm union.  Check for == HVM instead so the union access is
correctly guarded.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoxen: Fix XSM build after dropping XEN_DOMCTL_getmemlist
Andrew Cooper [Fri, 26 Jan 2018 19:33:40 +0000 (19:33 +0000)]
xen: Fix XSM build after dropping XEN_DOMCTL_getmemlist

c/s 94450e36bfbb removed XEN_DOMCTL_getmemlist entirely, but missed adjusting
the XSM side of things.  As far as I can tell, 'pagelist' wasn't even offered
to dom0 in default policy.

Also, drop the stale struct xen_domctl_getmemlist which was missed from the
same changeset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agox86/boot: turn the selftests ASSERT into a warning
Stefano Stabellini [Fri, 26 Jan 2018 17:19:31 +0000 (09:19 -0800)]
x86/boot: turn the selftests ASSERT into a warning

On selftests failure, print a very visible warning instead of crashing
over an ASSERT.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Adjust to print extra information in the case of a failure

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/boot: Make the "Building Dom0" messages consistent
Andrew Cooper [Fri, 26 Jan 2018 15:59:51 +0000 (15:59 +0000)]
x86/boot: Make the "Building Dom0" messages consistent

Switch the PV message to match the wording of the PVH side, use the same
number of ***'s, explicitly identify PV vs PVH, set the log level at INFO, and
print the real domid (which won't be 0 in pv-shim mode).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen: Drop DOMCTL_getmemlist and xc_get_pfn_list()
Andrew Cooper [Mon, 15 Jan 2018 10:00:51 +0000 (10:00 +0000)]
xen: Drop DOMCTL_getmemlist and xc_get_pfn_list()

c/s 4ddf474e2 "tools/xen-mceinj: Pass in GPA when injecting through
MSR_MCI_ADDR" removed the remaining user of hypercall.

It has been listed as broken, deprecated and wont-fix since XSA-74, so take
this opportunity to remove it completely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/ocaml: Drop coredump infrastructure
Andrew Cooper [Fri, 19 Jan 2018 18:04:27 +0000 (18:04 +0000)]
tools/ocaml: Drop coredump infrastructure

It is unused, and uses an obsolete hypercall which has never ever functioned
for HVM guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
7 years agoxen/pvshim: fix GNTTABOP_query_size hypercall forwarding with SMAP
Roger Pau Monne [Fri, 26 Jan 2018 15:29:10 +0000 (15:29 +0000)]
xen/pvshim: fix GNTTABOP_query_size hypercall forwarding with SMAP

Disable SMAP in the shim before bouncing the hypercall, or else L0
will fail to get the hypercall buffer.

Reported-by: Fatih Acar <fatih.acar@gandi.net>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/idle: Clear SPEC_CTRL while idle
Andrew Cooper [Fri, 3 Nov 2017 16:43:02 +0000 (16:43 +0000)]
x86/idle: Clear SPEC_CTRL while idle

On contemporary hardware, setting IBRS/STIBP has a performance impact on
adjacent hyperthreads.  It is therefore recommended to clear the setting
before becoming idle, to avoid an idle core preventing adjacent userspace
execution from running at full performance.

Care must be taken to ensure there are no ret or indirect branch instructions
between spec_ctrl_{enter,exit}_idle() invocations, which are forced always
inline.  Care must also be taken to avoid using spec_ctrl_enter_idle() between
flushing caches and becoming idle, in cases where that matters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/cpuid: Offer Indirect Branch Controls to guests
Andrew Cooper [Mon, 13 Nov 2017 15:41:38 +0000 (15:41 +0000)]
x86/cpuid: Offer Indirect Branch Controls to guests

With all infrastructure in place, it is now safe to let guests see and use
these features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/ctxt: Issue a speculation barrier between vcpu contexts
Andrew Cooper [Thu, 16 Nov 2017 18:35:11 +0000 (18:35 +0000)]
x86/ctxt: Issue a speculation barrier between vcpu contexts

Issuing an IBPB command flushes the Branch Target Buffer, so that any poison
left by one vcpu won't remain when beginning to execute the next.

The cost of IBPB is substantial, and skipped on transition to idle, as Xen's
idle code is robust already.  All transitions into vcpu context are fully
serialising in practice (and under consideration for being retroactively
declared architecturally serialising), so a cunning attacker cannot use SP1 to
try and skip the flush.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/boot: Calculate the most appropriate BTI mitigation to use
Andrew Cooper [Tue, 19 Dec 2017 13:59:21 +0000 (13:59 +0000)]
x86/boot: Calculate the most appropriate BTI mitigation to use

See the logic and comments in init_speculation_mitigations() for further
details.

There are two controls for RSB overwriting, because in principle there are
cases where it might be safe to forego rsb_native (Off the top of my head,
SMEP active, no 32bit PV guests at all, no use of vmevent/paging subsystems
for HVM guests, but I make no guarantees that this list of restrictions is
exhaustive).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Avoid using alternatives in NMI/#MC paths
Andrew Cooper [Mon, 22 Jan 2018 14:41:33 +0000 (14:41 +0000)]
x86/entry: Avoid using alternatives in NMI/#MC paths

This patch is deliberately arranged to be easy to revert if/when alternatives
patching becomes NMI/#MC safe.

For safety, there must be a dispatch serialising instruction in (what is
logically) DO_SPEC_CTRL_ENTRY so that, in the case that Xen needs IBRS set in
context, an attacker can't speculate around the WRMSR and reach an indirect
branch within the speculation window.

Using conditionals opens this attack vector up, so the else clause gets an
LFENCE to force the pipeline to catch up before continuing.  This also covers
the safety of RSB conditional, as execution it is guaranteed to either hit the
WRMSR or LFENCE.

One downside of not using alternatives is that there unconditionally an LFENCE
in the IST path in cases where we are not using the features from IBRS-capable
microcode.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Organise the clobbering of the RSB/RAS on entry to Xen
Andrew Cooper [Fri, 3 Nov 2017 16:39:42 +0000 (16:39 +0000)]
x86/entry: Organise the clobbering of the RSB/RAS on entry to Xen

ret instructions are speculated directly to values recorded in the Return
Stack Buffer/Return Address Stack, as there is no uncertainty in well-formed
code.  Guests can take advantage of this in two ways:

  1) If they can find a path in Xen which executes more ret instructions than
     call instructions.  (At least one in the waitqueue infrastructure,
     probably others.)

  2) Use the fact that the RSB/RAS in hardware is actually a circular stack
     without a concept of empty.  (When it logically empties, stale values
     will start being used.)

To mitigate, overwrite the RSB on entry to Xen with gadgets which will capture
and contain rogue speculation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point
Andrew Cooper [Fri, 3 Nov 2017 16:17:00 +0000 (16:17 +0000)]
x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point

We need to be able to either set or clear IBRS in Xen context, as well as
restore appropriate guest values in guest context.  See the documentation in
asm-x86/spec_ctrl_asm.h for details.

With the contemporary microcode, writes to %cr3 are slower when SPEC_CTRL.IBRS
is set.  Therefore, the positioning of SPEC_CTRL_{ENTRY/EXIT}* is important.

Ideally, the IBRS_SET/IBRS_CLEAR hunks might be positioned either side of the
%cr3 change, but that is rather more complicated to arrange, and could still
result in a guest controlled value in SPEC_CTRL during the %cr3 change,
negating the saving if the guest chose to have IBRS set.

Therefore, we optimise for the pre-Skylake case (being far more common in the
field than Skylake and later, at the moment), where we have a Xen-preferred
value of IBRS clear when switching %cr3.

There is a semi-unrelated bugfix, where various asm_defn.h macros have a
hidden dependency on PAGE_SIZE, which results in an assembler error if used in
a .macro definition.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: Permit guests direct access to MSR_{SPEC_CTRL,PRED_CMD}
Andrew Cooper [Tue, 14 Nov 2017 19:22:28 +0000 (19:22 +0000)]
x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL,PRED_CMD}

For performance reasons, HVM guests should have direct access to these MSRs
when possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/migrate: Move MSR_SPEC_CTRL on migrate
Andrew Cooper [Thu, 16 Nov 2017 18:40:27 +0000 (18:40 +0000)]
x86/migrate: Move MSR_SPEC_CTRL on migrate

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/msr: Emulation of MSR_{SPEC_CTRL,PRED_CMD} for guests
Andrew Cooper [Thu, 9 Nov 2017 19:28:04 +0000 (19:28 +0000)]
x86/msr: Emulation of MSR_{SPEC_CTRL,PRED_CMD} for guests

As per the spec currently available here:

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

MSR_ARCH_CAPABILITIES will only come into existence on new hardware, but is
implemented as a straight #GP for now to avoid being leaky when new hardware
arrives.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests
Andrew Cooper [Tue, 16 Jan 2018 15:45:51 +0000 (15:45 +0000)]
x86/cpuid: Handling of IBRS/IBPB, STIBP and IBRS for guests

Intel specifies IBRS/IBPB (combined, in a single bit) and STIBP as a separate
bit.  AMD specifies IBPB alone in a 3rd bit.

AMD's IBPB is a subset of Intel's combined IBRS/IBPB.  For performance
reasons, administrators might wish to express "IBPB only" even on Intel
hardware, so we allow the AMD bit to be used for this purpose.

The behaviour of STIBP is more complicated.

It is our current understanding that STIBP will be advertised on HT-capable
hardware irrespective of whether HT is enabled, but not advertised on
HT-incapable hardware.  However, for ease of virtualisation, STIBP's
functionality is ignored rather than reserved by microcode/hardware on
HT-incapable hardware.

For guest safety, we treat STIBP as special, always override the toolstack
choice, and always advertise STIBP if IBRS is available.  This removes the
corner case where STIBP is not advertised, but the guest is running on
HT-capable hardware where it does matter.

Finally as a bugfix, update the libxc CPUID logic to understand the e8b
feature leaf, which has the side effect of also offering CLZERO to guests on
applicable hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/build: Untangle CONFIG_DEBUG and CONFIG_FRAME_POINTER
Andrew Cooper [Thu, 25 Jan 2018 18:38:17 +0000 (18:38 +0000)]
xen/build: Untangle CONFIG_DEBUG and CONFIG_FRAME_POINTER

Both options are independently choseable in KConfig, but currently a DEBUG
build without FRAME_POINTER is left to the compilers default choice, not the
users choice.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/p2m: force return value checking of p2m_set_entry()
Jan Beulich [Fri, 26 Jan 2018 12:26:57 +0000 (13:26 +0100)]
x86/p2m: force return value checking of p2m_set_entry()

As XSAs 246 and 247 have shown, not doing so is rather dangerous.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen: Fix xsm build after [g]cov renaming
Wei Liu [Thu, 25 Jan 2018 13:14:24 +0000 (13:14 +0000)]
xen: Fix xsm build after [g]cov renaming

Commit e8d461497d9 renamed gcov_op to coverage_op but forgot to change
XSM handles.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoxl: Don't warn on using 'deprecated' mode selection
George Dunlap [Mon, 8 Jan 2018 15:50:53 +0000 (15:50 +0000)]
xl: Don't warn on using 'deprecated' mode selection

We generally support old config formats indefinintely (see the disk
format) without emitting warnings.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl: move ibxl_devid_to_device_... to LIBXL_DEFINE_DEVID_TO_DEVICE
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:59 +0000 (19:19 +0200)]
libxl: move ibxl_devid_to_device_... to LIBXL_DEFINE_DEVID_TO_DEVICE

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: move libxl__device_from_ to LIBXL_DEFINE_DEVICE_FROM_TYPE
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:58 +0000 (19:19 +0200)]
libxl: move libxl__device_from_ to LIBXL_DEFINE_DEVICE_FROM_TYPE

LIBXL_DEFINE_DEVICE_FROM_TYPE uses libxl__..._devtype.type to
be assigned as device and backend type.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: use libxl__device_kind in LIBXL_DEFINE_UPDATE_DEVID
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:57 +0000 (19:19 +0200)]
libxl: use libxl__device_kind in LIBXL_DEFINE_UPDATE_DEVID

Use libxl__..._devtype.type to update device id.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: use libxl__device_kind to get device XS entry
Oleksandr Grytsov [Wed, 24 Jan 2018 17:19:56 +0000 (19:19 +0200)]
libxl: use libxl__device_kind to get device XS entry

On adding to XS name of device is taken from
libxl__device_kind enum. On getting device from XS
the name is hardcoded. It leads to potential
mistmatch errors. The patch is using libxl__device_kind
everywere to have one source of device name.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: fix GET_STACK_END
Wei Liu [Wed, 24 Jan 2018 20:26:26 +0000 (20:26 +0000)]
x86: fix GET_STACK_END

AIUI the purpose of having the .if directive is to make GET_STACK_END
work with any general purpose registers. The code as-is would produce
the wrong result for r8. Fix it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agocoverage: introduce generic file
Roger Pau Monné [Thu, 25 Jan 2018 11:30:01 +0000 (12:30 +0100)]
coverage: introduce generic file

It will contain the generic implementation of sysctl_cov_op, which
will be shared between all the coverage implementations.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agogcov: introduce hooks for the sysctl
Roger Pau Monné [Thu, 25 Jan 2018 11:28:47 +0000 (12:28 +0100)]
gcov: introduce hooks for the sysctl

So that other implementations of the sysctl can be added.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agogcov: rename sysctl and functions
Roger Pau Monné [Thu, 25 Jan 2018 11:27:44 +0000 (12:27 +0100)]
gcov: rename sysctl and functions

Change gcov to cov (for internal interfaces) or coverage (for the
public ones).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/domctl: remove XEN_DOMCTL_pin_mem_cacheattr
Ross Lagerwall [Thu, 25 Jan 2018 11:26:55 +0000 (12:26 +0100)]
x86/domctl: remove XEN_DOMCTL_pin_mem_cacheattr

Remove the implementation of XEN_DOMCTL_pin_mem_cacheattr since it has
been replaced by a dmop. Change xc_domain_pin_memory_cacheattr() so
that it is only defined when XC_WANT_COMPAT_DEVICEMODEL_API is set and
have it call the new dmop.  Leave the definitions of
XEN_DOMCTL_MEM_CACHEATTR_* since they are still used by QEMU.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxendevicemodel: provide xendevicemodel_pin_memory_cacheattr
Ross Lagerwall [Thu, 25 Jan 2018 11:26:36 +0000 (12:26 +0100)]
libxendevicemodel: provide xendevicemodel_pin_memory_cacheattr

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agolibxendevicemodel: provide xendevicemodel_relocate_memory
Ross Lagerwall [Thu, 25 Jan 2018 11:26:23 +0000 (12:26 +0100)]
libxendevicemodel: provide xendevicemodel_relocate_memory

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agox86/hvm: provide XEN_DMOP_pin_memory_cacheattr
Ross Lagerwall [Thu, 25 Jan 2018 11:25:22 +0000 (12:25 +0100)]
x86/hvm: provide XEN_DMOP_pin_memory_cacheattr

Provide XEN_DMOP_pin_memory_cacheattr to allow a deprivileged QEMU to
pin the caching type of RAM after moving the VRAM. It is equivalent to
XEN_DOMCTL_pin_memory_cacheattr.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: provide XEN_DMOP_relocate_memory
Ross Lagerwall [Thu, 25 Jan 2018 11:24:14 +0000 (12:24 +0100)]
x86/hvm: provide XEN_DMOP_relocate_memory

Provide XEN_DMOP_relocate_memory, a limited version of
XENMEM_add_to_physmap to allow a deprivileged QEMU to move VRAM when a
guest programs its BAR. It is equivalent to XENMEM_add_to_physmap with
space == XENMAPSPACE_gmfn_range.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: make xenmem_add_to_physmap global
Ross Lagerwall [Thu, 25 Jan 2018 11:23:35 +0000 (12:23 +0100)]
mm: make xenmem_add_to_physmap global

Make it global in preparation to be called by a new dmop.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/VT-d: Remove the use of __LINE__ from IOMMU_WAIT_OP()
Andrew Cooper [Wed, 24 Jan 2018 14:11:16 +0000 (14:11 +0000)]
xen/VT-d: Remove the use of __LINE__ from IOMMU_WAIT_OP()

The use of __LINE__ in printk()'s is problematic for livepatching, as it tends
to cause unnecessary binary differences.

Take this opportunity to provide some rather more useful information than just
file/line/func in the form of the full register/stack trace leading to the
problem (which I've needed in the past for debugging).

Also, drop the unnecessary else clause while editing here here.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/pv: Export pv_hypercall_table[] rather than working around it in several ways
Andrew Cooper [Wed, 24 Jan 2018 12:01:55 +0000 (12:01 +0000)]
x86/pv: Export pv_hypercall_table[] rather than working around it in several ways

The functions in compat.c are thing wrappers around the main hypercalls,
massaging certain parameters.  However, they second-guess the content of
pv_hypercall_table[], which is problematic for the shim case.  Instead,
arrange for them to call via function pointer, which removes the need for
pv_get_hypercall_handler().

With pv_hypercall_table[] exported, there is no need for
pv_hypercall_table_replace(), so its single callsite gets modified to cope.
The backing code behind __va(__pa()) is substantial, and there is no need to
calculate it repeatedly (Xen's .rodata is also contiguous in the directmap).

While adjusting the declarations, guard content in arch/x86/pv with CONFIG_PV.

The net difference is:
  add/remove: 0/2 grow/shrink: 4/1 up/down: 176/-321 (-145)
  function                                     old     new   delta
  pv_shim_setup_dom                           1130    1266    +136
  do_sched_op_compat                           176     192     +16
  compat_physdev_op_compat                      90     106     +16
  do_physdev_op_compat                          98     106      +8
  do_event_channel_op_compat                   145     123     -22
  pv_get_hypercall_handler                      28       -     -28
  pv_hypercall_table_replace                   271       -    -271

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/acpi: process softirqs while printing CPU ACPI data
Roger Pau Monné [Wed, 24 Jan 2018 17:02:14 +0000 (18:02 +0100)]
x86/acpi: process softirqs while printing CPU ACPI data

Or else the watchdog triggers on boxes with a huge number of CPUs

Reported-by: Simon Crowe <simon.crowe@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/compat: fix compilation errors with clang 6
Roger Pau Monné [Wed, 24 Jan 2018 17:01:33 +0000 (18:01 +0100)]
x86/compat: fix compilation errors with clang 6

The following errors are generated when compiling Xen with clang 6:

In file included from x86_64/asm-offsets.c:9:
In file included from /root/src/xen/xen/include/xen/sched.h:8:
In file included from /root/src/xen/xen/include/xen/shared.h:6:
In file included from /root/src/xen/xen/include/compat/arch-x86/../xen.h:9:
/root/src/xen/xen/include/compat/arch-x86/xen.h:10:10: error: the current #pragma pack aligment
      value is modified in the included file [-Werror,-Wpragma-pack]
#include "xen-x86_32.h"
         ^
/root/src/xen/xen/include/compat/arch-x86/xen-x86_32.h:40:9: note: previous '#pragma pack'
      directive that modifies alignment is here
#pragma pack()
        ^
In file included from x86_64/asm-offsets.c:9:
In file included from /root/src/xen/xen/include/xen/sched.h:8:
In file included from /root/src/xen/xen/include/xen/shared.h:6:
/root/src/xen/xen/include/compat/arch-x86/../xen.h:9:10: error: the current #pragma pack aligment
      value is modified in the included file [-Werror,-Wpragma-pack]
#include "arch-x86/xen.h"
         ^
/root/src/xen/xen/include/compat/arch-x86/xen.h:71:9: note: previous '#pragma pack' directive that
      modifies alignment is here
#pragma pack()
        ^
2 errors generated.

Fix this by using pragma push/pop in order to store the current pragma
value in the compiler stack and later restoring it when using clang.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/shutdown: use ACPI reboot method for Dell PowerEdge R740
Ross Lagerwall [Wed, 24 Jan 2018 17:01:00 +0000 (18:01 +0100)]
x86/shutdown: use ACPI reboot method for Dell PowerEdge R740

When EFI booting the Dell PowerEdge R740, it consistently wanders into the
weeds and gets an invalid opcode in the EFI ResetSystem call.
Quirk this hardware to use the ACPI reboot method instead.

Example stack trace:

----[ Xen-4.11-unstable  x86_64  debug=n   Not tainted ]----
CPU:    0
RIP:    e008:[<0000000000000017>] 0000000000000017
RFLAGS: 0000000000010202   CONTEXT: hypervisor
rax: 0000000066eb2ff0   rbx: ffff83005f627c20   rcx: 000000006c54e100
rdx: 0000000000000000   rsi: 0000000000000065   rdi: 000000107355f000
rbp: ffff83005f627c70   rsp: ffff83005f627b48   r8:  ffff83005f627b90
r9:  0000000000000000   r10: ffff83005f627c88   r11: 0000000000000000
r12: 0000000000000000   r13: 0000000000000cf9   r14: 0000000000000065
r15: ffff830000000000   cr0: 0000000080050033   cr4: 00000000003526e0
cr3: 000000107355f000   cr2: ffffc90000cff000
fsb: 0000000000000000   gsb: ffff88019f600000   gss: 0000000000000000
ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
Xen code around <0000000000000017> (0000000000000017):
 f0 d8 dd 00 f0 54 ff 00 <f0> 50 dd 00 f0 d8 dd 00 f0 a5 fe 00 f0 87 e9 00
Xen stack trace from rsp=ffff83005f627b48:
   ffff83005f627b50 ffffffffffffffda 000000006c547aaa ffff82d000000001
   ffff83005f627bec 000000107355f000 000000006c546fb8 ffff83107ffe3240
   0000000000000000 0000000000000000 8000000000000002 0000000000000000
   000000006c546b95 000000006c54c700 ffff83005f627bdc ffff83005f627be8
   000000005f616000 ffff83005f627c20 0000000000000000 0000000000000cf9
   ffff820080350001 000000000000000b ffff82d080351eda 0000000000000000
   0000000000000000 0000000000000000 0000000000000000 000000005f616000
   0000000000000000 ffff82d08095ff60 ffff82d08095ff60 000000f100000000
   ffff82d080296097 000000000000e008 0000000000000000 ffff83005f627c88
   0000000000000000 00000000fffffffe ffff82d0802959d2 ffff82d0802959d2
   000000008095f300 000000005f627c9c 00000000000000f8 0000000000000000
   00000000000000f8 ffff82d080932c00 0000000000000000 ffff82d08095f7c8
   ffff82d080932c00 0000000000000000 0000000000000000 ffff82d080295a9b
   ffff83005f627d98 ffff82d0802361f3 ffff82d080932c00 0000000080000000
   ffff83005f627d98 ffff82d080279a19 ffff82d08095f02c ffff82d080000000
   0000000000000000 00000000000000fb 0000000000000000 00000071484e54f6
   ffff831073542098 ffff82d08093ac78 ffff831072befd30 0000000000000000
   0000000000000000 0000000000000000 0000000000000000 0000000000000000
   0000000000000000 ffff82d08034f185 ffff82d080949460 0000000000000000
   ffff82d08095f270 0000000000000008 ffff83107357ae20 0000007146ce4bd3
Xen call trace:
   [<0000000000000017>] 0000000000000017
   [<ffff82d080351eda>] efi_reset_system+0x5a/0x90
   [<ffff82d080296097>] smp_send_stop+0x97/0xa0
   [<ffff82d0802959d2>] machine_restart+0x212/0x2d0
   [<ffff82d0802959d2>] machine_restart+0x212/0x2d0
   [<ffff82d080295a9b>] shutdown.c#__machine_restart+0xb/0x10
   [<ffff82d0802361f3>] smp_call_function_interrupt+0x53/0x80
   [<ffff82d080279a19>] do_IRQ+0x259/0x660
   [<ffff82d08034f185>] common_interrupt+0x85/0x90
   [<ffff82d0802c6152>] mwait-idle.c#mwait_idle+0x242/0x390
   [<ffff82d08026b446>] domain.c#idle_loop+0x86/0xc0

****************************************
Panic on CPU 0:
FATAL TRAP: vector = 6 (invalid opcode)
****************************************

dmidecode info:

BIOS Information:
    Vendor: Dell Inc.
    Version: 1.2.11
    Release Date: 10/19/2017
    BIOS Revision: 1.2
System Information:
    Manufacturer: Dell Inc.
    Product Name: PowerEdge R740

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agodrop "domain_" prefix from struct domain's dirty CPU mask
Jan Beulich [Wed, 24 Jan 2018 17:00:01 +0000 (18:00 +0100)]
drop "domain_" prefix from struct domain's dirty CPU mask

It being a field of struct domain is sufficient to recognize its
purpose.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86: avoid explicit TLB flush when saving exec state
Jan Beulich [Wed, 24 Jan 2018 16:59:22 +0000 (17:59 +0100)]
x86: avoid explicit TLB flush when saving exec state

Now that it's obvious that only a single dirty CPU can exist for a vCPU,
it becomes clear that flush_mask() doesn't need to be invoked when
sync_local_execstate() was already run. And with the IPI handler
clearing FLUSH_TLB from the passed flags anyway if
__sync_local_execstate() returns true, it also becomes clear that
FLUSH_TLB doesn't need to be passed here in the first place; neither of
the two places actually have a need to flush the TLB in any event (quite
possibly FLUSH_TLB was being passed there solely for flush_area_mask()
to make it past its no-op check).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoreplace vCPU's dirty CPU mask by numeric ID
Jan Beulich [Wed, 24 Jan 2018 16:58:45 +0000 (17:58 +0100)]
replace vCPU's dirty CPU mask by numeric ID

At most one bit can be set in the masks, so especially on larger systems
it's quite a bit of unnecessary memory and processing overhead to track
the information as a mask. Store the numeric ID of the respective CPU
instead, or VCPU_CPU_CLEAN if no dirty state exists.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
7 years agotools: bump library version numbers to 4.11
Wei Liu [Wed, 24 Jan 2018 12:37:23 +0000 (12:37 +0000)]
tools: bump library version numbers to 4.11

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoRevert "x86/boot: Map more than the first 16MB"
Wei Liu [Wed, 17 Jan 2018 19:47:05 +0000 (19:47 +0000)]
Revert "x86/boot: Map more than the first 16MB"

This reverts commit 7d6f958d9d18c54017f5ef6e299a08037f035747.

Now we have PVH info relocation support, this change is no longer
needed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: relocate pvh_info
Wei Liu [Wed, 17 Jan 2018 18:38:02 +0000 (18:38 +0000)]
x86: relocate pvh_info

Modify early boot code to relocate pvh info as well, so that we can be
sure __va in __start_xen works.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: cleanup processor.h
Juergen Gross [Tue, 23 Jan 2018 09:45:22 +0000 (10:45 +0100)]
x86: cleanup processor.h

Remove NSC/Cyrix CPU macros and current_text_addr() which are used
nowhere.

Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoadd check to cpumask_of()
Jan Beulich [Tue, 23 Jan 2018 09:44:43 +0000 (10:44 +0100)]
add check to cpumask_of()

Just like any other function's CPU inputs, the one here shouldn't go
unchecked.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: make CPU state flush requests explicit
Jan Beulich [Tue, 23 Jan 2018 09:44:11 +0000 (10:44 +0100)]
x86: make CPU state flush requests explicit

Having this be an implied side effect of a TLB flush is not very nice:
It could (at least in theory) lead to unintended state flushes (see e.g.
https://lists.xenproject.org/archives/html/xen-devel/2017-11/msg00187.html
for context). Introduce a flag to be used in the two places actually
wanting the state flushed, and conditionalize the
__sync_local_execstate() invocation in the IPI handler accordingly.

At the same time also conditionalize the flush_area_local() invocations,
to short-circuit the function ending up as a no-op anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move invocations of hvm_flush_guest_tlbs()
Jan Beulich [Tue, 23 Jan 2018 09:43:39 +0000 (10:43 +0100)]
x86: move invocations of hvm_flush_guest_tlbs()

Their need is not tied to the actual flushing of TLBs, but the ticking
of the TLB clock. Make this more obvious by folding the two invocations
into a single one in pre_flush().

Also defer the latching of CR4 in write_cr3() until after pre_flush()
(and hence implicitly until after IRQs are off), making operation
sequence the same in both cases (eliminating the theoretical risk of
pre_flush() altering CR4). This then also improves register allocation,
as the compiler doesn't need to use a callee-saved register for "cr4"
anymore.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/setup: do not relocate Xen over current Xen image placement
Daniel Kiper [Tue, 23 Jan 2018 09:42:10 +0000 (10:42 +0100)]
x86/setup: do not relocate Xen over current Xen image placement

Otherwise, due to Xen code/data changes under CPU feet, Xen may crash
silently at boot.

We were hit by the issue in OVS Xen 4.4 with my earlier version of
EFI/Multiboot2 patches. Initially its implementation allowed relocation
of Xen even if it was relocated by the bootloader. This led to the
crashes on some new Oracle machines because copy destination partially
overlapped with the end of current/initial Xen image placement.

After some discussion on Xen-devel we decided to disable Xen relocation in
my EFI/Multiboot2 upstream patches if the booloader did the work for us.
Though one case is still not covered. If Xen is not relocated by the
booloader then it tries to do that by itself. If all RAM regions above
currently occupied one are unsuitable for relocation then Xen tries to move
itself higher in it. And if (end - reloc_size + XEN_IMG_OFFSET) goes below
__pa(_end) then copy/relocation destination overlaps, at least partially,
with its source.

I can agree that this should not happen on todays machines very often.
If at all. It is rather unusual to not have usable RAM regions above
~5 MiB nowadays. Though I think that we should at least consider putting
such safety measure here. Otherwise Xen may crash mysteriously without
any stack trace. It is very confusing and impairs further debugging.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: cpuerrata: Remove percpu.h include
Julien Grall [Mon, 22 Jan 2018 14:35:42 +0000 (14:35 +0000)]
xen/arm: cpuerrata: Remove percpu.h include

The include percpu.h was added by mistake in cpuerrata.h (see commit
4c4fddc166 "xen/arm64: Add skeleton to harden the branch aliasing
attacks"). So remove it.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/shim: stash RSDP address for ACPI driver
Wei Liu [Mon, 22 Jan 2018 16:28:30 +0000 (16:28 +0000)]
xen/shim: stash RSDP address for ACPI driver

It used to the case that we placed RSDP under 1MB and let Xen search
for it. We moved the placement to under 4GB in 4a5733771, so the
search wouldn't work.

Introduce rsdp_hint to ACPI code and set that variable in
convert_pvh_info.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxl: lower shim related message to level DEBUG
Wei Liu [Thu, 18 Jan 2018 16:48:05 +0000 (16:48 +0000)]
libxl: lower shim related message to level DEBUG

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/shim: use credit scheduler
Wei Liu [Thu, 18 Jan 2018 12:32:35 +0000 (12:32 +0000)]
x86/shim: use credit scheduler

Remove sched=null from shim cmdline and doc

We use the default scheduler (credit1 as of writing). The NULL
scheduler still has bugs to fix.

Update shim.config.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agox86/guest: clean up guest/xen.h
Wei Liu [Thu, 18 Jan 2018 11:47:52 +0000 (11:47 +0000)]
x86/guest: clean up guest/xen.h

Remove extraneous semicolon. Add blank lines. Remove unused static
inline functions.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxl: remove whitespaces introduced in 62982da926
Wei Liu [Thu, 18 Jan 2018 11:54:29 +0000 (11:54 +0000)]
libxl: remove whitespaces introduced in 62982da926

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agoUpdate shim.config
Wei Liu [Thu, 18 Jan 2018 12:19:45 +0000 (12:19 +0000)]
Update shim.config

Kconfig has

  bool "VGA support" if !PV_SHIM_EXCLUSIVE

so for the shim build VGA option doesn't exist.

This avoids having shim.config changed every time the shim is built.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/pv: Break handle_ldt_mapping_fault() out of handle_gdt_ldt_mapping_fault()
Andrew Cooper [Tue, 17 Oct 2017 15:21:46 +0000 (16:21 +0100)]
x86/pv: Break handle_ldt_mapping_fault() out of handle_gdt_ldt_mapping_fault()

Adjust handle_ldt_mapping_fault() exclude the use of this fixup path for
non-PV guests.  Well-formed code shouldn't reference the LDT while in HVM vcpu
context, but currently on a context switch from PV to HVM context, there may
be a stale LDT selector loaded, over an unmapped region.

By explicitly excluding HVM context at this point, we avoid erroneous
hypervisor execution resulting in a cascade failure, by falling into
pv_map_ldt_shadow_page().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Rename invalidate_shadow_ldt() to pv_destroy_ldt()
Andrew Cooper [Tue, 3 Oct 2017 10:18:37 +0000 (11:18 +0100)]
x86/pv: Rename invalidate_shadow_ldt() to pv_destroy_ldt()

and move it into pv/descriptor-tables.c beside its GDT counterpart.  Reduce
the !in_irq() check from a BUG_ON() to ASSERT().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/idt: Factor out enabling and disabling of ISTs
Andrew Cooper [Thu, 19 Oct 2017 15:11:28 +0000 (15:11 +0000)]
x86/idt: Factor out enabling and disabling of ISTs

All alteration of IST settings (other than the crash path) happen in an
identical triple.  Introduce helpers to keep the triple in sync, and reduce
the risk of opencoded mistakes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/thunk: Fix GEN_INDIRECT_THUNK comment
Andrew Cooper [Tue, 16 Jan 2018 19:10:12 +0000 (19:10 +0000)]
x86/thunk: Fix GEN_INDIRECT_THUNK comment

This is a rebasing error in c/s 858cba0d4c6b "x86: Introduce alternative
indirect thunks" hidden by other changes in the same sentence.

The name with dots rather than underscores was the prerelease GCC ABI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/misc/xen-hvmctx: fix the build
Paul Durrant [Fri, 19 Jan 2018 14:08:14 +0000 (09:08 -0500)]
tools/misc/xen-hvmctx: fix the build

The recent commit 66bf4ef0 "x86/hvm: re-work viridian APIC assist code"
modified one of the field names in struct hvm_viridian_vcpu_context but
did not accordingly modify xen-hvmctx, leading to a failure to build tools.

This patch makes the necessary change to fix the build.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/hvm: re-work viridian APIC assist code
Paul Durrant [Fri, 19 Jan 2018 10:17:30 +0000 (11:17 +0100)]
x86/hvm: re-work viridian APIC assist code

It appears there is a case where Windows enables the APIC assist
enlightenment[1] but does not use it. This scenario is perfectly valid
according to the documentation, but causes the state machine in Xen to
become confused leading to a domain_crash() such as the following:

(XEN) d4: VIRIDIAN GUEST_OS_ID: vendor: 1 os: 4 major: 6 minor: 1 sp: 0
      build: 1db0
(XEN) d4: VIRIDIAN HYPERCALL: enabled: 1 pfn: 3ffff
(XEN) d4v0: VIRIDIAN VP_ASSIST_PAGE: enabled: 1 pfn: 3fffe
(XEN) domain_crash called from viridian.c:452
(XEN) Domain 4 (vcpu#0) crashed on cpu#1:

The following sequence of events is an example of how this can happen:

 - On return to guest vlapic_has_pending_irq() finds a bit set in the IRR.
 - vlapic_ack_pending_irq() calls viridian_start_apic_assist() which latches
   the vector, sets the bit in the ISR and clears it from the IRR.
 - The guest then processes the interrupt but EOIs it normally, therefore
   clearing the bit in the ISR.
 - On next return to guest vlapic_has_pending_irq() calls
   viridian_complete_apic_assist(), which discovers the assist bit still set
   in the shared page and therefore leaves the latched vector in place, but
   also finds another bit set in the IRR.
 - vlapic_ack_pending_irq() is then called but, because the ISR is was
   cleared by the EOI, another call is made to viridian_start_apic_assist()
   and this then calls domain_crash() because it finds the latched vector
   has not been cleared.

Having re-visited the code I also conclude that Xen's implementation of the
enlightenment is currently wrong and we are not properly following the
specification.

The specification says:

"The hypervisor sets the \93No EOI required\94 bit when it injects a virtual
 interrupt if the following conditions are satisfied:

 - The virtual interrupt is edge-triggered, and
 - There are no lower priority interrupts pending.

 If, at a later time, a lower priority interrupt is requested, the
 hypervisor clears the \93No EOI required\94 such that a subsequent EOI causes
 an intercept.
 In case of nested interrupts, the EOI intercept is avoided only for the
 highest priority interrupt. This is necessary since no count is maintained
 for the number of EOIs performed by the OS. Therefore only the first EOI
 can be avoided and since the first EOI clears the \93No EOI Required\94 bit,
 the next EOI generates an intercept."

Thus it is quite legitimate to set the "No EOI required" bit and then
subsequently take a higher priority interrupt without clearing the bit.
Thus the avoided EOI will then relate to that subsequent interrupt rather
than the highest priority interrupt when the bit was set. Hence latching
the vector when setting the bit is not entirely useful and somewhat
misleading.

This patch re-works the APIC assist code to simply track when the "No EOI
required" bit is set and test if it has been cleared by the guest (i.e.
'completing' the APIC assist), thus indicating a 'missed EOI'. Missed EOIs
need to be dealt with in two places:

 - In vlapic_has_pending_irq(), to avoid comparing the IRR against a stale
   ISR, and
 - In vlapic_EOI_set() because a missed EOI for a higher priority vector
   should be dealt with before the actual EOI for the lower priority
   vector.

Furthermore, because the guest is at liberty to ignore the "No EOI required"
bit (which lead the crash detailed above) vlapic_EOI_set() must also make
sure the bit is cleared to avoid confusing the state machine.

Lastly the previous code did not properly emulate an EOI if a missed EOI
was discovered in vlapic_has_pending_irq(); it merely cleared the bit in
the ISR. The new code instead calls vlapic_EOI_set().

[1] See section 10.3.5 of Microsoft's "Hypervisor Top Level Functional
    Specification v5.0b".

NOTE: The changes to the save/restore code are safe because the layout
      of struct hvm_viridian_vcpu_context is unchanged and the new
      interpretation of the (previously so named) vp_assist_vector field
      as the boolean pending flag maintains the correct semantics.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/efi: fix build with linkers that support both coff-x86-64 and pe-x86-64
Roger Pau Monné [Fri, 19 Jan 2018 10:16:58 +0000 (11:16 +0100)]
x86/efi: fix build with linkers that support both coff-x86-64 and pe-x86-64

When using a linker that supports both formats the following error
will be triggered:

efi/buildid.o: file not recognized: File format is ambiguous
efi/buildid.o: matching formats: coff-x86-64 pe-x86-64

Solve this by specifying the efi/buildid.o format to pe-x86-64.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
7 years agox86/shadow: widen reference count
Jan Beulich [Fri, 19 Jan 2018 10:16:10 +0000 (11:16 +0100)]
x86/shadow: widen reference count

Utilize as many of the bits available in the union as possible, without
(just to be on the safe side) colliding with any of the bits outside of
PGT_type_mask.

Note that the first and last hunks of the xen/include/asm-x86/mm.h
change are merely code motion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/PoD: correctly handle non-order-0 decrease-reservation requests
Jan Beulich [Fri, 19 Jan 2018 10:14:42 +0000 (11:14 +0100)]
x86/PoD: correctly handle non-order-0 decrease-reservation requests

p2m_pod_decrease_reservation() at the moment only returns a boolean
value: true for "nothing more to do", false for "something more to do".
If it returns false, decrease_reservation() will loop over the entire
range, calling guest_remove_page() for each page.

Unfortunately, in the case p2m_pod_decrease_reservation() succeeds
partially, some of the memory in the range will be not-present; at which
point guest_remove_page() will return an error, and the entire operation
will fail.

Fix this by:
1. Having p2m_pod_decrease_reservation() return exactly the number of
   gpfn pages it has handled (i.e., replaced with 'not present').
2. Making guest_remove_page() return -ENOENT in the case that the gpfn
   in question was already empty (and in no other cases).
3. When looping over guest_remove_page(), expect the number of -ENOENT
   failures to be no larger than the number of pages
   p2m_pod_decrease_reservation() removed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/HVM: make explicit that hvm_print_line() does output only
Jan Beulich [Fri, 19 Jan 2018 10:09:55 +0000 (11:09 +0100)]
x86/HVM: make explicit that hvm_print_line() does output only

On input "c" being 0xff should already have the effect of bailing early
(due to the isprint()), but let's rather make this explicit. Also
convert the BUG_ON() to an ASSERT() (nothing fatal happens in the
function if this is violated), at the same time extending what is being
checked.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agodon't pass r12 as reference
Stefano Stabellini [Thu, 18 Jan 2018 21:48:49 +0000 (13:48 -0800)]
don't pass r12 as reference

r12 and x16 are of different sizes; when passing r12 as a reference to
do_trap_hypercall on arm64, we end up dereferencing it as a pointer to a
64bit value, but actually it isn't.

Instead, use a temporary variable to pass r12, and write back the result
after the call to do_trap_hypercall.

CID: 1457708
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
7 years agodocs: add MBA description in docs
Yi Sun [Tue, 19 Dec 2017 00:42:22 +0000 (08:42 +0800)]
docs: add MBA description in docs

This patch adds MBA description in related documents.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agotools: implement new generic set value interface and MBA set value command
Yi Sun [Tue, 19 Dec 2017 00:42:21 +0000 (08:42 +0800)]
tools: implement new generic set value interface and MBA set value command

This patch implements new generic set value interfaces in libxc and libxl.
These interfaces are suitable for all allocation features. It also adds a
new MBA set value command in xl.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: implement new generic get value interface and MBA get value command
Yi Sun [Tue, 19 Dec 2017 00:42:20 +0000 (08:42 +0800)]
tools: implement new generic get value interface and MBA get value command

This patch implements generic get value interfaces in libxc and libxl.
It also refactors the get value flow in xl to make it be suitable for all
allocation features. Based on that, a new MBA get value command is added in xl.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agotools: rename 'xc_psr_cat_type' to 'xc_psr_type'
Yi Sun [Tue, 19 Dec 2017 00:42:19 +0000 (08:42 +0800)]
tools: rename 'xc_psr_cat_type' to 'xc_psr_type'

This patch renames 'xc_psr_cat_type' to 'xc_psr_type' so that
the structure name is common for all allocation features.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agotools: implement the new xl get hw info interface
Yi Sun [Tue, 19 Dec 2017 00:42:18 +0000 (08:42 +0800)]
tools: implement the new xl get hw info interface

This patch implements a new xl get HW info interface. A new argument
is added for psr-hwinfo command to get and show MBA HW info.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: implement the new libxl get hw info interface
Yi Sun [Tue, 19 Dec 2017 00:42:17 +0000 (08:42 +0800)]
tools: implement the new libxl get hw info interface

This patch implements the new libxl get hw info interface,
'libxl_psr_get_hw_info', which is suitable to all psr allocation
features. It also implements corresponding list free function,
'libxl_psr_hw_info_list_free' and makes 'libxl_psr_cat_get_info' call
'libxl_psr_get_hw_info' to avoid redundant code in libxl_psr.c.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: implement the new libxc get hw info interface
Yi Sun [Tue, 19 Dec 2017 00:42:16 +0000 (08:42 +0800)]
tools: implement the new libxc get hw info interface

This patch implements a new libxc get hw info interface and corresponding
data structures. It also changes libxl_psr.c to call this new interface.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: create general interfaces to support psr allocation features
Yi Sun [Tue, 19 Dec 2017 00:42:15 +0000 (08:42 +0800)]
tools: create general interfaces to support psr allocation features

This patch creates general interfaces in libxl to support all psr
allocation features.

Add 'LIBXL_HAVE_PSR_GENERIC' to indicate interface change.

Please note, the functionality cannot work until later patches
are applied.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/pvshim: switch shim.c to use typesafe mfn_to_page and virt_to_mfn
Roger Pau Monne [Thu, 18 Jan 2018 10:34:04 +0000 (10:34 +0000)]
xen/pvshim: switch shim.c to use typesafe mfn_to_page and virt_to_mfn

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agofirmware/shim: fix build process to use POSIX find options
Roger Pau Monne [Wed, 17 Jan 2018 08:37:54 +0000 (08:37 +0000)]
firmware/shim: fix build process to use POSIX find options

The -printf find option is not POSIX compatible, so replace it with
another rune.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/pvshim: fix coding style issues
Roger Pau Monne [Wed, 17 Jan 2018 09:29:35 +0000 (09:29 +0000)]
xen/pvshim: fix coding style issues

Fix a couple of coding style issues.

No code or functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/pvshim: re-order replace_va_mapping code
Roger Pau Monne [Wed, 17 Jan 2018 09:24:03 +0000 (09:24 +0000)]
xen/pvshim: re-order replace_va_mapping code

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/pvshim: identity pin shim vCPUs to pCPUs
Roger Pau Monne [Wed, 17 Jan 2018 09:20:05 +0000 (09:20 +0000)]
xen/pvshim: identity pin shim vCPUs to pCPUs

Since VCPUOP_{up/down} already identity maps vCPU hotplug to pCPU
hotplug also identity pin the vCPUs to the pCPUs in the scheduler.
This prevents vCPU migration and should improve performance.

While there also use __cpumask_set_cpu instead of cpumask_set_cpu,
there's no need to use the locked variant.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/pvh: place the trampoline starting at MFN 1
Roger Pau Monne [Wed, 17 Jan 2018 08:34:26 +0000 (08:34 +0000)]
xen/pvh: place the trampoline starting at MFN 1

Since PVH guest jump straight into trampoline_setup trampoline_phys is
not initialized, thus the trampoline is relocated to address 0.

This works, but has the undesirable effect of having VA 0 mapped to
MFN 0, which means NULL pointed dereferences no longer trigger a page
fault.

In order to solve this, place the trampoline starting at MFN 1 and
reserve the memory used by it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>