Andrew Cooper [Tue, 22 Dec 2015 09:10:44 +0000 (10:10 +0100)]
x86/mmuext: unify okay/rc error handling in do_mmuext_op()
c/s 506db90 "x86/HVM: merge HVM and PVH hypercall tables" introduced a path
whereby 'okay' was used uninitialised, with broke compilation on CentOS 7.
Splitting the error handling like this is fragile and unnecessary. Drop the
okay variable entirely and just use rc directly, substituting rc = -EINVAL/0
for okay = 0/1.
In addition, two error messages are updated to print rc, and some stray
whitespace is dropped.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Make setting of rc happen consistently after MEM_LOG(), if that is being
used.
Alex Xu [Mon, 21 Dec 2015 16:11:17 +0000 (17:11 +0100)]
get-fields.sh: use printf for POSIX compat
xen/tools/get-fields.sh used echo -n which is not POSIX compatible and
breaks building with dash (shell). Change it to use printf %s which is
usable everywhere.
Yu Zhang [Mon, 21 Dec 2015 16:07:55 +0000 (17:07 +0100)]
x86/HVM: remove identical relationship between ioreq type and rangeset type
This patch uses HVMOP_IO_RANGE_XXX values rather than the raw ioreq
type to select the ioreq server, therefore the identical relationship
between ioreq type and rangeset type is no longer necessary.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Malcolm Crossley [Mon, 21 Dec 2015 12:40:48 +0000 (13:40 +0100)]
x86: make debug output consistent in hvm_set_callback_via
The unconditional printks in the switch statement of the
hvm_set_callback_via function results in Xen log spam in non debug
versions of Xen. The printks are for debug output only so conditionally
compile the entire switch statement on debug versions of Xen only.
This is XSA-169.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Boris Ostrovsky [Mon, 21 Dec 2015 12:40:13 +0000 (13:40 +0100)]
x86/HVM: merge HVM and PVH hypercall tables
The tables are almost identical and therefore there is little reason to
keep both sets.
PVH needs 3 extra hypercalls:
* mmuext_op. MMUEXT_PIN_L<x>_TABLE are required by control domain (dom0)
when building guests. We add MMUEXT_UNPIN_TABLE for completeness.
* platform_op. These are only available to privileged domains. We will
(eventually) have privileged HVMlite guests and therefore shouldn't
limit this to PVH only.
* xenpmu_op. any guest with !has_vlapic() (i.e. PV, PVH and HVMlite)
should be able to use it.
Note that until recently PVH guests used mmuext_op's MMUEXT_INVLPG_MULTI and
MMUEXT_TLB_FLUSH_MULTI commands but it has been determined that using the
former was incorrect and using the latter is correct for now but is not
guaranteed to work in the future.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 21 Dec 2015 12:38:22 +0000 (13:38 +0100)]
x86/vPMU: constrain MSR_IA32_DS_AREA loads
For one, loading the MSR with a possibly non-canonical address was
possible since the verification is conditional, while the MSR load
wasn't. And then for PV guests we need to further limit the range of
valid addresses to exclude the hypervisor range.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Huaitong Han [Mon, 21 Dec 2015 12:37:17 +0000 (13:37 +0100)]
x86/xsaves: get_xsave_addr, check xsave header and support uncompressed format
The check needs to be against the xsave header in the area, rather than Xen's
maximum xfeature_mask. A guest might easily have a smaller xcr0 than the
maximum Xen is willing to allow, causing the pointer below to be bogus.
The get_xsave_addr() is modified to support uncompressed xstate areas.
Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
David Vrabel [Mon, 21 Dec 2015 12:36:41 +0000 (13:36 +0100)]
x86/ept: invalidate guest physical mappings on VMENTER
If a guest allocates a page and the tlbflush_timestamp on the page
indicates that a TLB flush of the previous owner is required, only the
linear and combined mappings are invalidated. The guest-physical
mappings are not invalidated.
This is currently safe because the EPT code ensures that the
guest-physical and combined mappings are invalidated /before/ the page
is freed. However, this prevents us from deferring the EPT invalidate
until after the page is freed (e.g., to defer the invalidate until the
p2m locks are released).
The TLB flush that may be done after allocating page already causes
the original guest to VMEXIT, thus on VMENTER we can do an INVEPT if
one is pending.
This means __ept_sync_domain() need not do anything and the thus the
on_selected_cpu() call does not need to wait for as long.
ept_sync_domain() now marks all PCPUs as needing to be invalidated,
including PCPUs that the domain has not run on. We still only IPI
those PCPUs that are active so this does not result in any more INVEPT
calls.
We do not attempt to track when PCPUs may have cached translations
because the only safe way to clear this per-CPU state is if
immediately after an invalidate the PCPU is not active (i.e., the PCPU
is not in d->domain_dirty_cpumask). Since we only invalidate on
VMENTER or by IPIing active PCPUs this can never happen.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Our 'struct domain' has when lock profiling is enabled is bigger than
one page.
We can't use vmap nor vzalloc as both of those stash the
physical address in struct page which makes the assumptions
in 'arch_init_memory' trip over ASSERTs.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Jan Beulich [Mon, 21 Dec 2015 12:35:13 +0000 (13:35 +0100)]
VMX: allocate APIC access page from domain heap
... since we don't need its virtual address anywhere (it's a
placeholder page only after all). For this to work (and possibly be
done elsewhere too) share_xen_page_with_guest() needs to mark pages
handed to it as Xen heap ones.
To be on the safe side, also explicitly clear the page (not having done
so was okay due to the XSA-100 fix, but is still a latent bug since we
don't formally guarantee allocations to come out zeroed, and in fact
this property may disappear again as soon as the asynchronous runtime
scrubbing patches arrive).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
We must ensure that the prod/cons are only read once and that
the compiler won't try to optimize the reads. That is split
the read of these in multiple instructions influencing later
branch code. As such insert barriers when fetching the cons
and prod index.
This is part of XSA155.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Instead of RING_GET_REQUEST. Using a local copy of the
ring (and also with proper memory barriers) will mean
we can do not have to worry about the compiler optimizing
the code and doing a double-fetch in the shared memory space.
This is part of XSA155.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Fri, 20 Nov 2015 16:59:05 +0000 (11:59 -0500)]
xen: Add RING_COPY_REQUEST()
Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected). Safe usage of a request
generally requires taking a local copy.
Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy(). This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.
Use a volatile source to prevent the compiler from reordering or
omitting the copy.
This is part of XSA155.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 17 Dec 2015 13:22:46 +0000 (14:22 +0100)]
x86/HVM: avoid reading ioreq state more than once
Otherwise, especially when the compiler chooses to translate the
switch() to a jump table, unpredictable behavior (and in the jump table
case arbitrary code execution) can result.
This is XSA-166.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Tue, 15 Dec 2015 13:14:00 +0000 (14:14 +0100)]
build: convert HAS_KEXEC / KEXEC use to Kconfig
Use the Kconfig generated CONFIG_HAS_KEXEC defines in the build system
and replace kexec :=y in Rules.mk with a kconfig option called
CONFIG_KEXEC. Purposefully did not merge the two variables together in
this patch to keep this as mechanical as possible.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Tue, 15 Dec 2015 13:14:00 +0000 (14:14 +0100)]
build: convert HAS_DEVICE_TREE use to Kconfig
Use the Kconfig generated CONFIG_HAS_DEVICE_TREE defines in the code
base.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Tue, 15 Dec 2015 13:14:00 +0000 (14:14 +0100)]
build: convert HAS_PASSTHROUGH use to Kconfig
Use the Kconfig generated HAS_PASSTHROUGH defines for the code base.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Tue, 15 Dec 2015 13:14:00 +0000 (14:14 +0100)]
build: use generated Kconfig options for Xen
Switches the build system to rely on the options and flags generated by
Kconfig to control what gets built and how. Follow on patches will
convert items to be prefixed with CONFIG_. Additionally remove a #define
that resulted in a redefined variable when building for arm.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Tue, 15 Dec 2015 13:14:00 +0000 (14:14 +0100)]
build: build Kconfig and config rules
Wire in the Kconfig build and makefile rules to be able to generate
valid configuration files to be used by the build process but don't
actually use the output for affecting the Xen build. To avoid dragging
in most of Kbuild from the Linux kernel this adds Makefile.kconfig which
is our real entry point into building kconfig. This attempts to reuse as
much of the Xen build bits as possible and wire them to the bits that
kconfig expects to be provided by Kbuild.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Fri, 11 Dec 2015 16:00:11 +0000 (10:00 -0600)]
tools: always enable HAS_MEM_ACCESS
For all supported targets HAS_MEM_ACCESS is enabled so this drops the
conditional and always makes it enabled. The goal here is to remove the
setting in the top level config directory when kconfig changes land.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 16 Dec 2015 11:00:25 +0000 (12:00 +0100)]
tools/symbols: document binutils commits for issues needing workarounds so far
Also the issue 3rd issue mentioned in commit d37d63d4b5 ("symbols:
prefix static symbols with their source file names") has been fixed by
binutils commit 270f824531 (also expected to appear in 2.27).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Which otherwise leads to the following on resume after migrate (comparing
non-XSM to XSM):
ata2.00: configured for MWDMA2
usb 1-2: reset full-speed USB device number 2 using uhci_hcd
+PM: restore of devices complete after 3779.268 msecs
usb 1-2: USB disconnect, device number 2
-PM: restore of devices complete after 2342.528 msecs
usb 1-2: new full-speed USB device number 3 using uhci_hcd
usb 1-2: New USB device found, idVendor=0627, idProduct=0001
usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb 1-2: Product: QEMU USB Tablet
usb 1-2: Manufacturer: QEMU 0.10.2
usb 1-2: SerialNumber: 1
input: QEMU 0.10.2 QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-2/1-2:1.0/input/input8
generic-usb 0003:0627:0001.0002: input,hidraw0: USB HID v0.01 Pointer [QEMU 0.10.2 QEMU USB Tablet] on usb-0000:00:01.2-2/input0
Restarting tasks ... done.
Setting capacity to 20480000
Setting capacity to 20480000
+uhci_hcd 0000:00:01.2: Unlink after no-IRQ? Controller is probably using the wrong IRQ.
And a glitch in the domU which is sufficient to disrupt the post migration
checks done by osstest.
This has been through a test run on merlot1 and resolved the migration
issues with the test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm
osstest test case.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Roger Pau Monne [Mon, 7 Dec 2015 16:48:37 +0000 (17:48 +0100)]
libxl: add support for migrating HVM guests without a device model
Only some minor libxl changes are needed in order to be able to migrate HVM
guests without a device model, no hypervisor changes are needed.
This change prevents sending the emulator context if the device model
version is set to none.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Mon, 7 Dec 2015 16:48:36 +0000 (17:48 +0100)]
libxl: allow the creation of HVM domains without a device model.
Replace the firmware loaded into HVM guests with an OS kernel. Since the HVM
builder now uses the PV xc_dom_* set of functions this kernel will be parsed
and loaded inside the guest like on PV, but the container is a pure HVM
guest.
Also, if device_model_version is set to none or a device model for the
specified domain is not present unconditinally set the nic type to
LIBXL_NIC_TYPE_VIF.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Mon, 7 Dec 2015 16:48:35 +0000 (17:48 +0100)]
libxc: switch xc_dom_elfloader to be used with HVMlite domains
Allow xc_dom_elfloader to report a guest type as hvm-3.0-x86_32 if it's
running inside of a HVM container and has the PHYS32_ENTRY elfnote set.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Tue, 15 Dec 2015 13:16:45 +0000 (14:16 +0100)]
building with perfc=y was broken
because of b38d426ad09 ("x86/viridian: flush remote tlbs
by hypercall") which was defining mshv_call_flush, but using
mshv_flush.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Anthony PERARD [Tue, 15 Dec 2015 13:16:29 +0000 (14:16 +0100)]
hvmloader: load proper ACPI tables with OVMF
This patch loads the ACPI tables associated with QEMU instead of the one
for qemu-traditional, since we only support OVMF with qemu-xen.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 15 Dec 2015 13:15:43 +0000 (14:15 +0100)]
x86: generate labels at the beginning of unlikely sub-sections
This is to limit symbol table growth, which would be quite a bit worse
if we went with the "label every unlikely sub-section contribution"
approach proposed previously.
Older gas doesn't support quoted symbols, yet the result looks quite
bit better that way. Hence two variants get introduced, one using
proper path names (including slashes and dashes) and one using path
names after converting them to valid symbol names (slashes and dashes
replaced).
As a secondary adjustment also change the section name used with Clang.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Roger Pau Monné [Tue, 15 Dec 2015 13:14:17 +0000 (14:14 +0100)]
libxc/xen: introduce a start info structure for HVMlite guests
This structure contains the physical address of the command line, as well as
the physical address of the list of loaded modules. The physical address of
this structure is passed to the guest at boot time in the %ebx register.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 15 Dec 2015 13:12:32 +0000 (14:12 +0100)]
x86: allow HVM guests to use hypercalls to bring up vCPUs
Allow the usage of the VCPUOP_initialise, VCPUOP_up, VCPUOP_down,
VCPUOP_is_up, VCPUOP_get_physid and VCPUOP_send_nmi hypercalls from HVM
guests.
This patch introduces a new structure (vcpu_hvm_context) that should be used
in conjuction with the VCPUOP_initialise hypercall in order to initialize
vCPUs for HVM guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roger Pau Monné [Tue, 15 Dec 2015 13:12:18 +0000 (14:12 +0100)]
libxc: allow creating domains without emulated devices
Introduce a new flag in xc_dom_image that turns on and off the emulated
devices. This prevents creating the VGA hole, the hvm_info page and the
ioreq server pages. libxl unconditionally sets it to true for all HVM
domains at the moment.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Tue, 15 Dec 2015 13:11:11 +0000 (14:11 +0100)]
x86: set the vPMU interface based on the presence of a lapic
Instead of choosing the interface to expose to guests based on the guest
type, do it based on whether the guest has an emulated local apic or not.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Paul Durrant [Tue, 1 Dec 2015 13:55:25 +0000 (13:55 +0000)]
libxl: re-implement libxl__xs_printf()
This patch adds a new libxl__xs_vprintf() which actually checks the
success of the underlying call to xs_write() (logging if it fails) and
then re-implements libxl__xs_printf() using this (and replacing the
call to vasprintf() with a call to libxl__vsprintf()).
libxl__xs_vprintf() is added to the 'checked' section of libxl_internal.h
and, since it now underpins libxl__xs_printf(), that declaration is
moved into the same section.
Looking at call sites of libxl__xs_printf() it seems as though several
of them expected a failure if the underlying xs_write() failed, so this
patch should actually fulfil the semantic that was intended all along.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Paul Durrant [Tue, 1 Dec 2015 13:55:24 +0000 (13:55 +0000)]
libxl: re-name libxl__xs_write() to libxl__xs_printf()...
...to denote what it actually does.
The name libxl__xs_write() suggests something taking a buffer and length,
akin to write(2), whereas the semantics of the function are actually more
akin to printf(3).
This patch is a textual substitution of libxl__xs_write with
libxl__xs_printf with some associated formatting fixes.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Tue, 1 Dec 2015 17:52:12 +0000 (17:52 +0000)]
xen/arm: p2m: Remove translation table when it's empty
Currently, the translation table is left in place even if no entries
are in use. Because of how the p2m code has been implemented,
replacing a translation table by a block (i.e superpage) is not
supported. Therefore, any remapping of a superpage size will be split
in smaller chunks making the translation less efficient.
Replacing a table by a block when a new mapping is added would be too
complicated because it requires us to check if all the upper levels
are not in use and free them if necessary.
Instead, we will remove the empty translation table when mappings are
removed. To avoid going through all the table checking if no entry is
in use, a counter representing the number of entry currently in use is
kept per table translation and updated when an entry changes state
(i.e valid <-> invalid).
As Xen allocates a page for each translation table, it's possible to
store the counter in the struct page_info. A new field p2m_refcount
has been introduced in the in use union for this purpose. This is fine
as the page is only used by the P2M code and nobody touches the other
field of the union type_info.
For the record, type_info has not been used because it would require
more work to use it properly as Xen on ARM doesn't yet have the
concept of type.
Once Xen has finished removing a mapping and all the references to
each translation table have been updated, then the higher levels will
be processed and freed as needed. This will allow us to propagate the
number of references and free multiple translation table at different
level in one go.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- updated commit message as discussed ]
Julien Grall [Tue, 1 Dec 2015 17:52:09 +0000 (17:52 +0000)]
xen/arm: p2m: Flush for every exit paths in apply_p2m_changes
Currently, the TLB is not flushed if an error occured while updating the
stage-2 p2m. However, the TLB will contain stale mappings for any entry
updated so far.
To avoid a such situation, flush on every exit path when the variable
"flush" is set.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Thu, 10 Dec 2015 16:24:51 +0000 (17:24 +0100)]
sched: fix (ACPI S3) resume with cpupools with different schedulers
In fact, with 2 cpupools, one (the default) Credit and
one Credit2 (with at least 1 pCPU in the latter), trying
a (e.g., ACPI S3) suspend/resume crashes like this:
During suspend, the pCPUs are not removed from their
pools with the standard procedure (which would involve
schedule_cpu_switch(). During resume, they:
1) are assigned to the default cpupool (CPU_UP_PREPARE
phase);
2) are moved to the pool they were in before suspend,
via schedule_cpu_switch() (CPU_ONLINE phase)
During resume, scheduling (even if just the idle loop)
can happen right after the CPU_STARTING phase(before
CPU_ONLINE), i.e., before the pCPU is put back in its
pool. In this case, it is the default pool'sscheduler
that is invoked (Credit1, in the example above). But,
during suspend, the Credit2 specific vCPU data is not
being freed, and Credit1 specific vCPU data is not
allocated, during resume.
Therefore, Credit1 schedules on pCPUs whose idle vCPU's
sched_priv points to Credit2 vCPU data, and we crash.
Fix things by properly deallocating scheduler specific
data of the pCPU's pool scheduler during pCPU teardown,
and re-allocating them --always for &ops-- during pCPU
bringup.
This also fixes another (latent) bug. In fact, it avoids,
still in schedule_cpu_switch(), that Credit1's free_vdata()
is used to deallocate data allocated with Credit2's
alloc_vdata(). This is not easy to trigger, but only
because the other bug shown above manifests first and
crashes the host.
The downside of this patch, is that it adds one more
allocation on the resume path, which is not ideal. Still,
there is no better way of fixing the described bugs at
the moment. Removing (all ideally) allocations happening
during resume should continue being chased, in the long
run.
Wei Liu [Wed, 9 Dec 2015 10:43:36 +0000 (10:43 +0000)]
libxl: update check-xl-disk-parse
The block-attach command now returns 1 when fails. Update first test
case to expect return value 1 instead of 255.
The parser now doesn't generate output for default values. Remove them
from expected output.
According to 417e6b70 ("libxl: add option for discard support to xl disk
configuration"), the "discard=" variant is never supported, delete two
test cases with that variant.
Reported-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 10 Dec 2015 12:17:49 +0000 (13:17 +0100)]
VT-d: make flush-all actually flush all
Passing gfn=0 and page_count=0 actually avoids the
iommu_flush_iotlb_dsi() and results in page-specific invalidation
instead.
Reported-by: "张智" <zhangzhi2014@caep.cn> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Feng Wu <feng.wu@intel.com>
Jan Beulich [Thu, 10 Dec 2015 12:17:21 +0000 (13:17 +0100)]
x86: re-enable NX if disabled
I noticed Linux 4.4 doing this universally now, and I think it's a good
idea to override such anti-security BIOS settings (we certainly have no
compatibility problem due to NX being enabled).
Secondary changes:
- no need to check supported extended CPUID level for leaves 80000000
and 80000001 (required on x86-64)
- no need to update c->cpuid_level in early_init_intel() (done anyway
in generic_identify())
- alignment of trampoline data items
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Boris Ostrovsky [Thu, 10 Dec 2015 12:15:35 +0000 (13:15 +0100)]
x86/VPMU: support only versions 2 through 4 of architectural performance monitoring
We need to have at least version 2 since it's the first version to
support various control and status registers (such as
MSR_CORE_PERF_GLOBAL_CTRL) that VPMU relies on always having.
We don't fully emulate version 4 but since it's back compatible with
earlier versions we can fall back to v3. At this point there is no
compatibility statement for v5 so anything above 4 is not supported.
For guests querying PMU version via CPUID leaf 0xa clip it at v3.
With explicit testing for PMU version we can now remove CPUID model
check.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Ross Lagerwall [Thu, 10 Dec 2015 12:14:53 +0000 (13:14 +0100)]
x86: fixup IRQs when CPUs go down during shutdown
Commit fc0c3fa2ad5c ("x86/IO-APIC: fix setup of Xen internally used IRQs
(take 2)") introduced a regression on some hardware where Xen would hang
during shutdown, repeating the following message:
APIC error on CPU0: 08(08), Receive accept error
This appears to be because an interrupt (in this case from the serial
console) destined for a CPU other than the boot CPU is left unhandled so
an APIC error on CPU 0 is generated instead.
To fix this, before taking down the non-boot CPUs, call fixup_irqs()
with a CPU mask of only the boot CPU to reset the IRQ affinities
correctly.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Feng Wu [Thu, 10 Dec 2015 12:14:04 +0000 (13:14 +0100)]
vmx: properly handle notification event when vCPU is running
When a vCPU is running in Root mode and a notification event
has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
the current cpu, so the pending interrupt in PIRR will be
synced to vIRR before VM-Exit in time.
Signed-off-by: Feng Wu <feng.wu@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Feng Wu [Thu, 10 Dec 2015 12:13:33 +0000 (13:13 +0100)]
pass-through: update IRTE according to guest interrupt config changes
When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.
For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.
For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.
Signed-off-by: Feng Wu <feng.wu@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Feng Wu [Thu, 10 Dec 2015 12:12:06 +0000 (13:12 +0100)]
vmx: suppress posting interrupts when 'SN' is set
Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.
Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: <kevin.tian@intel.com>
Ian Campbell [Thu, 10 Dec 2015 10:21:34 +0000 (10:21 +0000)]
Revert "tools: Refactor "xentoollog" into its own library"
This reverts commit c7d3afbb44b47af9103be0b914afd588a84d9e62 which
broke the libvirt build, since libvirt uses xtl_* and hence needs
updating to link against the new library when necessary.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 9 Dec 2015 12:53:13 +0000 (13:53 +0100)]
memory: fix XSA-158 fix
For one the uses of domu_max_order and ptdom_max_order were swapped.
And then gcc warns about an unused result of a __must_check function
in the control part of a conditional expression when both other
expressions can be determined by the compiler to produce the same value
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68039), which happens
when HAS_PASSTHROUGH is undefined (i.e. for ARM on 4.4 and older).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 3 Dec 2015 11:22:02 +0000 (11:22 +0000)]
tools: Refactor "xentoollog" into its own library
In attempting to disaggregate libxenctrl I found that many of the
pieces were going to want access to this library, so split it out (as
it probably should always have been).
Various build adjustments are needed. In particular things which use
xtl_* themselves now need to explicity link against the library.
This has a nice side effect which is that users of libxl no longer
need to link against libxenctrl just to create a logger, which was
counter to the principal that applications using libxl shouldn't be
required to look behind the curtain. This means that xl no longer
links against libxenctrl.
The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- Update QEMU_TRADITIONAL_REVISION and MINIOS_UPSTREAM_REVISION ]
Ian Campbell [Thu, 3 Dec 2015 11:22:01 +0000 (11:22 +0000)]
tools/Rules.mk: Properly handle libraries with recursive dependencies.
In tree libraries which link against other in tree libraries in a way
which is opaque to their callers need special handling, specifically
correct use of -Wl,-rpath-link for the recusively used libraries.
Currently this is rather simple, but up coming changes are going to
introduce transitive dependencies more than 1 step deep.
Introduce a SHDEPS idiom to contain all the recursive deps for a
library and include those in both LDLIBS (for linking) and SHLIB (for
recursive uses).
Try and document the whole thing.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Thu, 3 Dec 2015 11:22:00 +0000 (11:22 +0000)]
tools/ocaml: simplify compile/link of test apps
xtl doesn't require the full LDLIBS_libxenctrl, just the -L and
xenlight.cmxa, the latter which contains LDLIBS_libxenctrl as needed.
Fixing this avoids the need to be concerned about LDLIBS_libxenctrl
becoming more than one word in the future.
Since the tests are pure ocaml (no C components) CFLAGS and
LIBS_xenlight are not required.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: David Scott <dave@recoil.org> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: David Scott <dave@recoil.org>
Juergen Gross [Wed, 2 Dec 2015 07:42:17 +0000 (08:42 +0100)]
libxc: try to find last used pfn when migrating
For migration the last used pfn of a guest is needed to size the
logdirty bitmap and as an upper bound of the page loop. Unfortunately
there are pv-kernels advertising a much higher maximum pfn as they
are really using in order to support memory hotplug. This will lead
to allocation of much more memory in Xen tools during migration as
really needed.
Try to find the last used guest pfn of a pv-domu by scanning the p2m
tree from the last entry towards it's start and search for an entry
not being invalid.
Normally the mid pages of the p2m tree containing all invalid entries
are being reused, so we can just scan the top page for identical
entries and skip them but the first one.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[ ijc -- added errno = E2BIG to one error path ] Acked-by: Ian Campbell <ian.campbell@citrix.com>
Fix regression in xendomains initscript: test for privcmd char device
Since commit:
"xendomains initscript: test for privcmd char device"
(1367e9e5ba4d1612e303123ec0bbf961100fcfa1)
due to incorrect negation the xendomains initscript bails out
early when both: "/dev/xen/privcmd" and "/proc/xen/privcmd"
are present in dom0.
Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Tue, 1 Dec 2015 12:09:58 +0000 (12:09 +0000)]
libxl: Introduce a template for devices with a controller
We have several outstanding patch series which add devices that have
two levels: a controller and individual devices attached to that
controller.
In the interest of consistency, this patch introduces a section that
sketches out a template for interfaces for such devices.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Olaf Hering <olaf@aepfle.de> Acked-by: Chun Yan Liu <cyliu@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 18 Nov 2015 15:34:54 +0000 (15:34 +0000)]
libxl: Fix bootloader-related virtual memory leak on pv build failure
The bootloader may call libxl__file_reference_map(), which mmap's the
pv_kernel and pv_ramdisk into process memory. This was only unmapped,
however, on the success path of libxl__build_pv(). If there were a
failure anywhere between libxl_bootloader.c:parse_bootloader_result()
and the end of libxl__build_pv(), the calls to
libxl__file_reference_unmap() would be skipped, leaking the mapped
virtual memory.
Ideally this would be fixed by adding the unmap calls to the
destruction path for libxl__domain_build_state. Unfortunately the
lifetime of the libxl__domain_build_state is opaque, and it doesn't
have a proper destruction path. But, the only thing in it that isn't
from the gc are these bootloader references, and they are only ever
set for one libxl__domain_build_state, the one which is
libxl__domain_create_state.build_state.
So we can clean up in the exit path from libxl__domain_create_*, which
always comes through domcreate_complete.
Remove the now-redundant unmaps in libxl__build_pv's success path.
This is XSA-160.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 8 Dec 2015 13:01:43 +0000 (14:01 +0100)]
memory: fix XENMEM_exchange error handling
assign_pages() can fail due to the domain getting killed in parallel,
which should not result in a hypervisor crash.
Reported-by: Julien Grall <julien.grall@citrix.com>
Also delete a redundant put_gfn() - all relevant paths leading to the
"fail" label already do this (and there are also paths where it was
plain wrong). All of the put_gfn()-s got introduced by 51032ca058
("Modify naming of queries into the p2m"), including the otherwise
unneeded initializer for k (with even a kind of misleading comment -
the compiler warning could actually have served as a hint that the use
is wrong).
This is CVE-2015-8339 + CVE-2015-8340 / XSA-159.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>