Jan Beulich [Thu, 14 Jan 2016 09:42:53 +0000 (10:42 +0100)]
x86/xsave: simplify xcomp_bv initialization
This simplifies a number of pointless conditionals: Bits 0 and 1 of
xcomp_bv don't matter anyway, and as long as none of bits 2..62 are
set, setting bit 63 is pointless too unless XSAVES is in use.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Thu, 14 Jan 2016 09:37:53 +0000 (10:37 +0100)]
x86/hvm: introduce a flags field in the CPU save record
Introduce a new flags field and use bit 0 to signal if the FPU has been
initialised or not. Previously Xen always wrongly assumed the FPU was
initialised on restore.
While modifying the FPU restore part of hvm_load_cpu_ctxt remove the
memcpy branching, since v->arch.fpu_ctxt will always point to the right
area for hosts with XSAVE or without it.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 14 Jan 2016 09:33:39 +0000 (10:33 +0100)]
x86/HVM: prune error labels in do_hvm_op()
I've got repeatedly annoyed by the bad naming: Make them slightly
better recognizable (and less likely to get mixed up), except in cases
where they can be eliminated altogether.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 14 Jan 2016 09:32:35 +0000 (10:32 +0100)]
x86emul: support clzero
... in anticipation of this possibly going to get used by guests for
basic thinks like memset() or clearing or pages.
Since the emulation doesn't use clzero itself, checking the guest's
CPUID for the feature to be exposed is (intentionally) being avoided
here. All that's required is sensible guest side data for the clflush
line size.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Tue, 12 Jan 2016 10:36:33 +0000 (11:36 +0100)]
convert FLASK_ENABLE to Kconfig
Converts the Config.mk option of FLASK_ENABLE into a Kconfig option for
the hypervisor called CONFIG_FLASK. This commit knowingly breaks the
dependent relationship on XSM_ENABLE which is addressed when XSM_ENABLE
is converted to Kconfig.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Doug Goldstein [Tue, 12 Jan 2016 10:33:55 +0000 (11:33 +0100)]
build: save generated xen .config
Since we now support changing Xen options with Kconfig, we should save
the configuration that was used to build up Xen. This will save it in
/usr/lib/debug alongside xen-syms and call it xen-$(FULLVERSION).config
Suggested-by: Ian Campbell <ian.campbell@citrix.com> Requested-by: Jan Beulich <jbeulich@suse.com> # the directory Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Brendan Gregg [Tue, 12 Jan 2016 10:33:16 +0000 (11:33 +0100)]
x86/VPMU: implement ipc and arch filter flags
This introduces a way to have a restricted VPMU, by specifying one of two
predefined groups of PMCs to make available. For secure environments, this
allows the VPMU to be used without needing to enable all PMCs.
Signed-off-by: Brendan Gregg <bgregg@netflix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Juergen Gross [Tue, 12 Jan 2016 10:29:55 +0000 (11:29 +0100)]
add xenstore domain flag to hypervisor
In order to be able to have full support of a xenstore domain in Xen
add a "Xenstore-domain" flag to the hypervisor. This flag must be
specified at domain creation time and is returned by
XEN_DOMCTL_getdomaininfo.
It will allow the domain to retrieve domain information by issuing the
XEN_DOMCTL_getdomaininfo itself in order to be able to check for
domains having been destroyed. At the same time this flag will inhibit
the domain to be migrated, as this wouldn't be a very wise thing to do.
In case of a later support of a rebootable Dom0 this flag will allow to
recognize a xenstore domain already being present to connect to.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com>
Haozhong Zhang [Tue, 12 Jan 2016 10:29:25 +0000 (11:29 +0100)]
x86/hvm: add support for pcommit instruction
Pass PCOMMIT CPU feature into HVM domain. Currently, we do not intercept
pcommit instruction for L1 guest, and allow L1 to intercept pcommit
instruction for L2 guest.
The specification of pcommit instruction can be found in
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> for tools bits
Haozhong Zhang [Tue, 12 Jan 2016 10:28:58 +0000 (11:28 +0100)]
x86/hvm: allow guest to use clflushopt and clwb
Pass CPU features CLFLUSHOPT and CLWB into HVM domain so that those two
instructions can be used by guest.
The specification of above two instructions can be found in
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> for tools bits
Andrew Cooper [Fri, 8 Jan 2016 14:38:03 +0000 (14:38 +0000)]
tools/libxc: Adjust error handling in map_p2m_list() to fix CentOS 7 build
The "goto err;" for malloc() error handling would cause the cleanup code
to use 'ptes' before it had been initialised, and causing a build
failure because of -Werror=maybe-uninitialised.
Use "goto err;" consistently for all error handling.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Haozhong Zhang [Fri, 8 Jan 2016 09:48:29 +0000 (10:48 +0100)]
x86/time: use correct guest TSC frequency in tsc_get_info()
When the TSC mode of a HVM container is TSC_MODE_DEFAULT or
TSC_MODE_PVRDTSCP and no TSC emulation is used, the existing
tsc_get_info() uses the host TSC frequency (cpu_khz) as the guest TSC
frequency. However, tsc_set_info() may set the guest TSC frequency to a
value different than the host. In order to keep consistent to
tsc_set_info(), this patch makes tsc_get_info() use the value set by
tsc_set_info() as the guest TSC frequency.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Haozhong Zhang [Fri, 8 Jan 2016 09:48:10 +0000 (10:48 +0100)]
x86/time: use correct guest TSC frequency in tsc_set_info()
When TSC_MODE_PVRDTSCP is used for a HVM container and TSC scaling is
available, use the non-zero value of argument gtsc_khz of tsc_set_info()
as the guest TSC frequency rather than using the host TSC
frequency. Otherwise, TSC scaling will not be able get the correct ratio
between the host and guest TSC frequencies.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Bob Moore [Thu, 7 Jan 2016 16:33:09 +0000 (17:33 +0100)]
ACPI 5.0: Add new/changed tables to headers
Adds new file, actbl3.h
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit c5bd6537329e66a8b36234f19a36d94b72d07394]
[only port changes of Generic Interrupt and Generic Distributor, other
changes already exist] Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Paul Durrant [Thu, 7 Jan 2016 14:28:33 +0000 (15:28 +0100)]
public/io/netif.h: document transmit and receive wire formats separately
Currently there is no documented wire format for guest receive-side
packets but the location of the 'wire format' comment block suggests
it is the same as transmit-side. This is almost true but there is a
subtle difference in the use of the 'size' field for the first fragment.
For clarity this patch creates separate comment blocks for receive
and transmit side packet wire formats, tries to be more clear about the
distinction between 'fragments' and 'extras', and documents the subtlety
concerning the size field of the first fragment.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Doug Goldstein [Thu, 7 Jan 2016 14:27:43 +0000 (15:27 +0100)]
remove dups in x86 and x86_64 variables
Currently the Xen build uses x86 and x86_64 variables as well as
CONFIG_X86 and CONFIG_X86_64. This just removes the duplication. The
CONFIG_ variables are now managed by Kconfig but existed previously so
this duplication existed prior to the Kconfig migration.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Feng Wu <feng.wu@intel.com>
$(CONFIG_X86_64) -> y in x86 makefiles.
$(CONFIG_X86_64) -> $(CONFIG_X86) in non-x86 makefiles.
Boris Ostrovsky [Thu, 7 Jan 2016 14:27:16 +0000 (15:27 +0100)]
x86/VPMU: don't allow any non-zero writes to MSR_IA32_PEBS_ENABLE
Calculation reserved bits for MSR_IA32_PEBS_ENABLE is model-dependent
and since we don't support PEBS anyway we shouldn't allow any writes to
it (but let's still permit guests wishing to disable PEBS).
We should also report PEBS as unsupported to HVM, just like we do on PV.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Boris Ostrovsky [Thu, 7 Jan 2016 14:26:37 +0000 (15:26 +0100)]
x86/VPMU: check more carefully which bits are allowed to be written to MSRs
Current Intel VPMU emulation needs to perform more checks when writing
PMU MSRs on guest's behalf:
* MSR_CORE_PERF_GLOBAL_CTRL is not checked at all
* MSR_CORE_PERF_FIXED_CTR_CTRL has more reserved bits in PMU version 2
* MSR_CORE_PERF_GLOBAL_OVF_CTRL's bit 61 is allowed on versions greater
* than 2.
We can also use precomputed mask in core2_vpmu_do_interrupt().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Doug Goldstein [Thu, 7 Jan 2016 14:18:45 +0000 (15:18 +0100)]
convert FLASK_ENABLE to Kconfig
Converts the Config.mk option of FLASK_ENABLE into a Kconfig option for
the hypervisor called CONFIG_FLASK. This commit knowingly breaks the
dependent relationship on XSM_ENABLE which is addressed when XSM_ENABLE
is converted to Kconfig.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
xen/arm: ignore writes to GICD_ICACTIVER ... GICD_ICACTIVERN
Injecting a fault to the guest just because it is writing to one of the
GICD_ICACTIVER registers, which are part of the GICv2 and GICv3 specs,
is harsh. Additionally it causes recent linux kernels to fail to boot on
Xen.
Ignore writes to GICD_ICACTIVER ... GICD_ICACTIVERN instead, to solve
the boot issue and for backportability. However implementing the
registers properly might a better long term solution.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Boris Ostrovsky [Wed, 6 Jan 2016 20:03:21 +0000 (15:03 -0500)]
libxc: Don't write terminating NULL character to command string
When copying boot command string for HVMlite guests we explicitly write
'\0' at MAX_GUEST_CMDLINE offset. Unless the string is close to
MAX_GUEST_CMDLINE in length this write will end up in the wrong place,
beyond the end of the mapped range.
We don't need to limit the size of command string to some arbitrary
number. Any size that can be successfully allocated and mapped is valid
and so the string is guaranteed to be NULL-terminated (since we use
strlen, which needs terminating '\0', to calculate allocation size).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Thu, 7 Jan 2016 12:36:54 +0000 (13:36 +0100)]
libxc: set flag for support of linear p2m list in domain builder
Set the SIF_VIRT_P2M_4TOOLS flag for pv-domUs in the domain builder
to indicate the Xen tools have full support for the virtual mapped
linear p2m list.
This will enable pv-domUs to drop support of the 3 level p2m tree
and use the linear list only. Without setting this flag some kernels
might limit themselves to 512 GB memory size in order not to break
migration.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Juergen Gross [Thu, 7 Jan 2016 12:36:53 +0000 (13:36 +0100)]
libxc: stop migration in case of p2m list structural changes
With support of the virtual mapped linear p2m list for migration it is
now possible to detect structural changes of the p2m list which before
would either lead to a crashing or otherwise wrong behaving domU.
A guest supporting the linear p2m list will increment the
p2m_generation counter located in the shared info page before and after
each modification of a mapping related to the p2m list. A change of
that counter can be detected by the tools and reacted upon.
As such a change should occur only very rarely once the domU is up the
most simple reaction is to cancel migration in such an event.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Thu, 7 Jan 2016 12:36:52 +0000 (13:36 +0100)]
libxc: support of linear p2m list for migration of pv-domains
In order to be able to migrate pv-domains with more than 512 GB of RAM
the p2m information can be specified by the guest kernel via a virtual
mapped linear p2m list instead of a 3 level tree.
Add support for this new p2m format in libxc.
As the sanity checking of the virtual p2m address needs defines for the
xen regions use those defines when doing page table checks as well.
There were two harmless off by one errors in normalise_pagetable()
being fixed by using those defines (xen_last set to 512 instead of
511), the other one is fixed directly.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Thu, 17 Dec 2015 17:29:10 +0000 (17:29 +0000)]
xen/arm: vgic: Clarify some comments after 5d495f4
Ian pointed out that the definition of "offset" and "appropriate
boundary" in the comments added by "xen/arm: vgic: Optimize the way to
store the target vCPU in the rank" were not cleared.
Clarify them by explicitly mentionning the offset is in byte and the
appropriate boundary is ITARGET<n>/IROUTER<n>
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 16 Dec 2015 12:31:09 +0000 (12:31 +0000)]
tools: Refactor "xentoollog" into its own library
In attempting to disaggregate libxenctrl I found that many of the
pieces were going to want access to this library, so split it out (as
it probably should always have been).
Various build adjustments are needed. In particular things which use
xtl_* themselves now need to explicity link against the library.
This has a nice side effect which is that users of libxl no longer
need to link against libxenctrl just to create a logger, which was
counter to the principal that applications using libxl shouldn't be
required to look behind the curtain. This means that xl no longer
links against libxenctrl.
The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- dropped QEMU_TRADITIONAL_REVISION update, this had since
progressed to 569eac99e8dd which is after 9fad9ed28583, the
commit needed here. ]
Ian Campbell [Wed, 16 Dec 2015 12:31:08 +0000 (12:31 +0000)]
stubdom: recurse into tools/include in mk-headers-$(XEN_TARGET_ARCH) rule
... rather than in the libxc rule.
This puts all the header dependencies in one place and will allow us
to avoid races when more libraries which need these headers are
introduced. I observed issues with the xen-foreign/tmp.size file
getting deleted in parallel with another process trying to use it.
The mini-os links are already created in the
mk-headers-$(XEN_TARGET_ARCH) target so the other places which do so
are redundant, in the case of polarssl and vtpmmgr indirectly through
their eventual dependency on newlib which in turn depends on
mk-headers-$(XEN_TARGET_ARCH).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: samuel.thibault@ens-lyon.org
Ian Campbell [Wed, 16 Dec 2015 15:06:35 +0000 (15:06 +0000)]
tools: allow configure time choice of libexec subdirectory.
Currently we hardcode various paths such as $libexec/xen/{bin,boot},
however some downstreams (notably Debian) would like instead to
install things into $libexec/xen-X.Y/{bin,boot} as part of allowing
multiple versions of the tools packages to be installed.
Since this currently involves patching configure its a bit fiddly,
provide a configure option for the leaf dir instead, name it
--with-libexec-leaf-dir similar to the existing
--with-sysconfig-leaf-dir.
Rather than have the determination of the full path in both configure
and config/Paths.mk.in move it into configure only. Also for
consistency move the other LIBEXEC_* to configure, even though they
are only substituted into Paths.mk.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: 805508@bugs.debian.org
[ ijc -- removed stray ` ]
Andrew Cooper [Mon, 4 Jan 2016 09:59:38 +0000 (09:59 +0000)]
x86/vmx: Fix injection of #DB traps following XSA-156
Most #DB exceptions are traps rather than faults, meaning that the instruction
pointer in the exception frame points after the instruction rather than at it.
However, VMX intercepts all have fault semantics, even when intercepting a
trap. Re-injecting an intercepted trap as a fault causes an infinite loop in
the guest, by re-executing the same trapping instruction repeatedly. This
breaks debugging inside the guest.
Introduce a helper which copies VM_EXIT_INTR_INTO to VM_ENTRY_INTR_INFO, and
use it to mirror the intercepted interrupt back to the guest.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 22 Dec 2015 09:12:14 +0000 (10:12 +0100)]
IOMMU: unhide messages useful for diagnostics
Undue use of dprintk() lead to many messages useful in diagnosing
issues in the field now being hidden in non-debug (i.e. production)
builds. Re-surface them.
Jan Beulich [Tue, 22 Dec 2015 09:11:44 +0000 (10:11 +0100)]
VT-d: unhide messages needed for diagnosing firmware issues
Undue use of dprintk() lead to many messages useful in diagnosing
issues in the field now being hidden in non-debug (i.e. production)
builds. Re-surface them, namely when init-time only and/or already
guarded by iommu_{verbose,debug} conditionals. Switch from using
iommu_verbose to iommu_debug in a couple of runtime cases.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Feng Wu <feng.wu@intel.com>
Andrew Cooper [Tue, 22 Dec 2015 09:10:44 +0000 (10:10 +0100)]
x86/mmuext: unify okay/rc error handling in do_mmuext_op()
c/s 506db90 "x86/HVM: merge HVM and PVH hypercall tables" introduced a path
whereby 'okay' was used uninitialised, with broke compilation on CentOS 7.
Splitting the error handling like this is fragile and unnecessary. Drop the
okay variable entirely and just use rc directly, substituting rc = -EINVAL/0
for okay = 0/1.
In addition, two error messages are updated to print rc, and some stray
whitespace is dropped.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Make setting of rc happen consistently after MEM_LOG(), if that is being
used.
Alex Xu [Mon, 21 Dec 2015 16:11:17 +0000 (17:11 +0100)]
get-fields.sh: use printf for POSIX compat
xen/tools/get-fields.sh used echo -n which is not POSIX compatible and
breaks building with dash (shell). Change it to use printf %s which is
usable everywhere.
Yu Zhang [Mon, 21 Dec 2015 16:07:55 +0000 (17:07 +0100)]
x86/HVM: remove identical relationship between ioreq type and rangeset type
This patch uses HVMOP_IO_RANGE_XXX values rather than the raw ioreq
type to select the ioreq server, therefore the identical relationship
between ioreq type and rangeset type is no longer necessary.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Malcolm Crossley [Mon, 21 Dec 2015 12:40:48 +0000 (13:40 +0100)]
x86: make debug output consistent in hvm_set_callback_via
The unconditional printks in the switch statement of the
hvm_set_callback_via function results in Xen log spam in non debug
versions of Xen. The printks are for debug output only so conditionally
compile the entire switch statement on debug versions of Xen only.
This is XSA-169.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Boris Ostrovsky [Mon, 21 Dec 2015 12:40:13 +0000 (13:40 +0100)]
x86/HVM: merge HVM and PVH hypercall tables
The tables are almost identical and therefore there is little reason to
keep both sets.
PVH needs 3 extra hypercalls:
* mmuext_op. MMUEXT_PIN_L<x>_TABLE are required by control domain (dom0)
when building guests. We add MMUEXT_UNPIN_TABLE for completeness.
* platform_op. These are only available to privileged domains. We will
(eventually) have privileged HVMlite guests and therefore shouldn't
limit this to PVH only.
* xenpmu_op. any guest with !has_vlapic() (i.e. PV, PVH and HVMlite)
should be able to use it.
Note that until recently PVH guests used mmuext_op's MMUEXT_INVLPG_MULTI and
MMUEXT_TLB_FLUSH_MULTI commands but it has been determined that using the
former was incorrect and using the latter is correct for now but is not
guaranteed to work in the future.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 21 Dec 2015 12:38:22 +0000 (13:38 +0100)]
x86/vPMU: constrain MSR_IA32_DS_AREA loads
For one, loading the MSR with a possibly non-canonical address was
possible since the verification is conditional, while the MSR load
wasn't. And then for PV guests we need to further limit the range of
valid addresses to exclude the hypervisor range.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Huaitong Han [Mon, 21 Dec 2015 12:37:17 +0000 (13:37 +0100)]
x86/xsaves: get_xsave_addr, check xsave header and support uncompressed format
The check needs to be against the xsave header in the area, rather than Xen's
maximum xfeature_mask. A guest might easily have a smaller xcr0 than the
maximum Xen is willing to allow, causing the pointer below to be bogus.
The get_xsave_addr() is modified to support uncompressed xstate areas.
Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
David Vrabel [Mon, 21 Dec 2015 12:36:41 +0000 (13:36 +0100)]
x86/ept: invalidate guest physical mappings on VMENTER
If a guest allocates a page and the tlbflush_timestamp on the page
indicates that a TLB flush of the previous owner is required, only the
linear and combined mappings are invalidated. The guest-physical
mappings are not invalidated.
This is currently safe because the EPT code ensures that the
guest-physical and combined mappings are invalidated /before/ the page
is freed. However, this prevents us from deferring the EPT invalidate
until after the page is freed (e.g., to defer the invalidate until the
p2m locks are released).
The TLB flush that may be done after allocating page already causes
the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if
one is pending.
This means __ept_sync_domain() need not do anything and the thus the
on_selected_cpu() call does not need to wait for as long.
ept_sync_domain() now marks all PCPUs as needing to be invalidated,
including PCPUs that the domain has not run on. We still only IPI
those PCPUs that are active so this does not result in any more INVEPT
calls.
We do not attempt to track when PCPUs may have cached translations
because the only safe way to clear this per-CPU state is if
immediately after an invalidate the PCPU is not active (i.e., the PCPU
is not in d->domain_dirty_cpumask). Since we only invalidate on
VMENTER or by IPIing active PCPUs this can never happen.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Our 'struct domain' has when lock profiling is enabled is bigger than
one page.
We can't use vmap nor vzalloc as both of those stash the
physical address in struct page which makes the assumptions
in 'arch_init_memory' trip over ASSERTs.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Jan Beulich [Mon, 21 Dec 2015 12:35:13 +0000 (13:35 +0100)]
VMX: allocate APIC access page from domain heap
... since we don't need its virtual address anywhere (it's a
placeholder page only after all). For this to work (and possibly be
done elsewhere too) share_xen_page_with_guest() needs to mark pages
handed to it as Xen heap ones.
To be on the safe side, also explicitly clear the page (not having done
so was okay due to the XSA-100 fix, but is still a latent bug since we
don't formally guarantee allocations to come out zeroed, and in fact
this property may disappear again as soon as the asynchronous runtime
scrubbing patches arrive).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
We must ensure that the prod/cons are only read once and that
the compiler won't try to optimize the reads. That is split
the read of these in multiple instructions influencing later
branch code. As such insert barriers when fetching the cons
and prod index.
This is part of XSA155.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Instead of RING_GET_REQUEST. Using a local copy of the
ring (and also with proper memory barriers) will mean
we can do not have to worry about the compiler optimizing
the code and doing a double-fetch in the shared memory space.
This is part of XSA155.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Fri, 20 Nov 2015 16:59:05 +0000 (11:59 -0500)]
xen: Add RING_COPY_REQUEST()
Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected). Safe usage of a request
generally requires taking a local copy.
Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy(). This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.
Use a volatile source to prevent the compiler from reordering or
omitting the copy.
This is part of XSA155.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 17 Dec 2015 13:22:46 +0000 (14:22 +0100)]
x86/HVM: avoid reading ioreq state more than once
Otherwise, especially when the compiler chooses to translate the
switch() to a jump table, unpredictable behavior (and in the jump table
case arbitrary code execution) can result.
This is XSA-166.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Tue, 15 Dec 2015 13:14:00 +0000 (14:14 +0100)]
build: convert HAS_KEXEC / KEXEC use to Kconfig
Use the Kconfig generated CONFIG_HAS_KEXEC defines in the build system
and replace kexec :=y in Rules.mk with a kconfig option called
CONFIG_KEXEC. Purposefully did not merge the two variables together in
this patch to keep this as mechanical as possible.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com>