Jan Beulich [Wed, 1 Mar 2017 16:49:57 +0000 (17:49 +0100)]
x86/mm: switch away from temporary 32-bit register names
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Tue, 28 Feb 2017 14:07:09 +0000 (14:07 +0000)]
efi/boot: Don't free ebmalloc area at all
Freeing part of the BSS back for general use proves to be problematic. It is
not accounted for in xen_in_range(), causing errors when constructing the
IOMMU tables, resulting in a failure to boot.
Other smaller issues are that tboot treats the entire BSS as hypervisor data,
creating and checking a MAC of it on S3, and that, by being 1MB in size,
freeing it guarentees to shatter the hypervisor superpage mappings.
This is a stopgap fix to unblock master, while alternatives are discussed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 1 Mar 2017 09:40:48 +0000 (10:40 +0100)]
x86/Viridian: switch away from temporary 32-bit register names
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Jan Beulich [Wed, 1 Mar 2017 09:40:22 +0000 (10:40 +0100)]
x86/SVM: switch away from temporary 32-bit register names
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Jan Beulich [Wed, 1 Mar 2017 09:39:44 +0000 (10:39 +0100)]
x86/HVMemul: switch away from temporary 32-bit register names
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Haozhong Zhang [Wed, 1 Mar 2017 09:29:57 +0000 (10:29 +0100)]
x86: ensure copying runstate/time to L1 rather than L2
For a HVM domain, if a vcpu is in the nested guest mode,
__raw_copy_to_guest(), __copy_to_guest() and __copy_field_to_guest()
used by update_runstate_area() and update_secondary_system_time() will
copy data to L2 guest rather than the L1 guest.
This commit temporally clears the nested guest flag before all guest
copies in update_runstate_area() and update_secondary_system_time(),
and restores the flag after those guest copy operations.
The flag clear/restore is combined with the existing
smap_policy_change() which is renamed to update_guest_memory_policy().
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
The default dom0_mem is 128M which is not sufficient to boot a Ubuntu
based Dom0. It is not clear what a better default value could be.
Instead, loudly warn the user when dom0_mem is unspecified and wait 3
secs. Then use 512M.
Update the docs to specify that dom0_mem is required on ARM. (The
current xen-command-line document does not actually reflect the current
behavior of dom0_mem on ARM correctly.)
Andrew Cooper [Tue, 28 Feb 2017 15:17:17 +0000 (15:17 +0000)]
x86/layout: Correct Xen's idea of its own memory layout
c/s b4cd59fe "x86: reorder .data and .init when linking" had an unintended
side effect, where xen_in_range() and the tboot S3 MAC were no longer correct.
In practice, it means that Xen's .data section is excluded from consideration,
which means:
1) Default IOMMU construction for the hardware domain could create mappings.
2) .data isn't included in the tboot MAC checked on resume from S3.
Adjust the comments and virtual address anchors used to define the regions.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:44 +0000 (07:21 +0100)]
xenstore: make memory report available via XS_CONTROL
Add a XS_CONTROL command to xenstored for doing a talloc report to a
file. Right now this is supported by specifying a command line option
when starting xenstored and sending a signal to the daemon to trigger
the report.
To dump the report to the standard log file call:
xenstore-control memreport
To dump the report to a new file call:
xenstore-control memreport <file>
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:43 +0000 (07:21 +0100)]
xenstore: add support for changing log functionality dynamically
Today Xenstore supports logging only if specified at start of the
Xenstore daemon. As it can't be disabled during runtime it is not
recommended to start xenstored with logging enabled.
Add support for switching logging on and off at runtime and to
specify a (new) logfile. This is done via the XS_CONTROL wire command
which can be sent with xenstore-control.
To switch logging on just use:
xenstore-control log on
To switch it off again:
xenstore-control log off
To specify a (new) logfile:
xenstore-control logfile <file>
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:42 +0000 (07:21 +0100)]
xenstore: enhance control command support
The Xenstore protocol supports the XS_CONTROL command for triggering
various actions in the Xenstore daemon. Enhance that support by using
a command table and adding a help function.
Support multiple control commands in the associated xenstore-control
program used to issue XS_CONTROL commands.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:41 +0000 (07:21 +0100)]
xenstore: Split out XS_CONTROL action to dedicated source file
Move the XS_CONTROL handling of xenstored to a new source file
xenstored_control.c.
In order to avoid making get_string() in xenstored_core.c globally
visible use strlen() instead, which is save in this context due to
xs_count_strings() before returned a value > 1.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:40 +0000 (07:21 +0100)]
xenstore: rename XS_DEBUG wire command
In preparation to support other than pure debug functionality via the
Xenstore XS_DEBUG wire command rename it to XS_CONTROL and make
XS_DEBUG an alias of it.
Add an alias xs_control_command for the associated xs_debug_command,
too.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 27 Feb 2017 11:47:10 +0000 (11:47 +0000)]
x86/shadow: Fix build with CONFIG_SHADOW_PAGING=n following c/s 45ac805
c/s 45ac805 "x86/paging: Package up the log dirty function pointers" neglected
the case when CONFIG_SHADOW_PAGING is disabled. Make a similar adjustment to
the none stubs.
Spotted by a Travis RANDCONFIG run.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
- When capturing branches, LBR H/W will always clear bits 61:62
regardless of the sign extension
- For WRMSR, bits 61:62 are considered the part of the sign extension
This bug affects only certain pCPUs (e.g. Haswell) with vCPUs that
use LBR. Fix it by sign-extending TSX bits in all LBR entries during
VM entry in affected cases.
LBR MSRs are currently not Live Migrated. In order to implement such
functionality, the MSR levelling work has to be done first because
hosts can have different LBR formats.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Sergey Dyasli [Thu, 23 Feb 2017 09:33:25 +0000 (09:33 +0000)]
x86/vmx: introduce vmx_find_msr()
Modify vmx_add_msr() to use a variation of insertion sort algorithm:
find a place for the new entry and shift all subsequent elements before
insertion.
The new vmx_find_msr() exploits the fact that MSR list is now sorted
and reuses the existing code for binary search.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Fri, 24 Feb 2017 18:12:19 +0000 (18:12 +0000)]
x86/emul: Fix sarx emulation test
The emulation tests run `sarx %edx,(%ecx),%ebx` with 0xfedcba98 pointed at by
%ecx, and 0xff13 in %rdx.
As the instruction uses a 32bit operand size, the expected result is
0x00000000ffffffdb in %rbx (rather than 0xffffffffffffffdb), due to usual
behaviour of 32bit operations on 64bit registers.
The test harness was incorrectly sign extending from 32 bits to 64 bits rather
than zero extending when checking the result of emulation, causing a false
negative failure.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 16 Feb 2017 16:42:16 +0000 (16:42 +0000)]
x86/paging: Package up the log dirty function pointers
They depend soley on paging mode, so don't need to be repeated per domain, and
can live in .rodata. While making this change, drop the redundant log_dirty
from the function pointer names.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Fri, 17 Feb 2017 17:10:50 +0000 (17:10 +0000)]
x86/cpuid: Handle leaf 0x1 in guest_cpuid()
The features words, ecx and edx, are already audited as part of the featureset
logic. The existing leaf 0x80000001 dynamic logic has its SYSCALL adjustment
split out, as the rest of the adjustments are common with leaf 0x1. The
existing leaf 0x1 feature adjustments from {pv,hvm}_cpuid() are moved
wholesale into guest_cpuid(), although deduped against the common adjustments.
The eax word is family/model/stepping information, and is fine to use as
provided by the toolstack, although with reserved bits cleared.
The ebx word is more problematic. The low 8 bits are the brand ID and safe to
pass straight through. The next 8 bits are the CLFLUSH line size. This value
is forwarded straight from hardware, as nothing good can possibly come of
providing an alternative value to the guest.
The next 8 bits are slightly different between Intel and AMD, but are both
some property of the number of logical cores in the current physical package.
For now, the toolstack value is used unchanged until better topology support
is available.
The final 8 bits are the initial legacy APIC ID. For HVM guests, this was
overridden to vcpu_id * 2. The same logic is now applied to PV guests, so
guests don't observe a constant number on all vcpus via their emulated or
faulted view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 24 Feb 2017 16:22:13 +0000 (17:22 +0100)]
x86emul/test: split generic and testcase specific parts
Both the build logic and the invocation have their blowfish specific
aspects abstracted out here. Additionally
- run native execution (if suitable) first (as that one failing
suggests a problem with the to be tested code itself, in which case
having the emulator have a go over it is kind of pointless)
- move the 64-bit tests up in blobs[] so 64-bit native execution will
also precede 32-bit emulation (on 64-bit systems only of course)
- instead of -msoft-float (we'd rather not have the compiler generate
such code), pass -fno-asynchronous-unwind-tables and -g0 (reducing
binary size of the helper images as well as [slightly] compilation
time)
- skip tests with zero length blobs (these can result from failed
compilation, but not failing the build in this case seems desirable:
it may allow partial testing - e.g. with older compilers - and
permits manually removing certain tests from the generated headers
without having to touch actual source code)
- constrain rIP to the actual blob range rather than looking for the
specific (fake) return address put on the stack
- also print the opcode when x86_emulate() fails
- print at least three progress dots (for relatively short tests)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:49:19 +0000 (15:49 +0100)]
x86: setup PVHv2 Dom0 ACPI tables
Create a new MADT table that contains the topology exposed to the guest. A
new XSDT table is also created, in order to filter the tables that we want
to expose to the guest, plus the Xen crafted MADT. This in turn requires Xen
to also create a new RSDP in order to make it point to the custom XSDT.
Also, regions marked as E820_ACPI or E820_NVS are identity mapped into Dom0
p2m, plus any top-level ACPI tables that should be accessible to Dom0 and
reside in reserved regions. This is needed because some memory maps don't
properly account for all the memory used by ACPI, so it's common to find ACPI
tables in reserved regions.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:48:59 +0000 (15:48 +0100)]
x86: setup PVHv2 Dom0 CPUs
Initialize Dom0 BSP/APs and setup the memory and IO permissions. This also sets
the initial BSP state in order to match the protocol specified in
docs/misc/hvmlite.markdown.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:48:43 +0000 (15:48 +0100)]
x86: parse Dom0 kernel for PVHv2
Introduce a helper to parse the Dom0 kernel.
A new helper is also introduced to libelf, that's used to store the destination
vcpu of the domain. This parameter is needed when loading the kernel on a HVM
domain (PVHv2), since hvm_copy_to_guest_phys requires passing the destination
vcpu.
While there also fix image_base and image_start to be of type "void *", and do
the necessary fixup of related functions.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:47:55 +0000 (15:47 +0100)]
x86/libelf: pass the destination vCPU to libelf for Dom0 build
Allow setting the destination vCPU for libelf, so that elf_load_image can take
it into account when loading the kernel for Dom0. This is needed for PVHv2 Dom0
build, so that hvm_copy_to_guest_phys can be called with a Dom0 vCPU instead of
current (that contains the idle vCPU at this point).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:47:03 +0000 (15:47 +0100)]
x86: populate PVHv2 Dom0 physical memory map
Craft the Dom0 e820 memory map and populate it. Introduce a helper to remove
memory pages that are shared between Xen and a domain, and use it in order to
remove low 1MB RAM regions from dom_io in order to assign them to a PVHv2 Dom0.
On hardware lacking support for unrestricted mode also craft the identity page
tables and the TSS used for virtual 8086 mode.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:46:10 +0000 (15:46 +0100)]
x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
PVHv2 guests, unlike HVM guests, won't have the option to route interrupts
from physical or emulated devices over event channels using PIRQs. This
applies to both DomU and Dom0 PVHv2 guests.
Introduce a new XEN_X86_EMU_USE_PIRQ to notify Xen whether a HVM guest can
route physical interrupts (even from emulated devices) over event channels,
and is thus allowed to use some of the PHYSDEV ops.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 24 Feb 2017 09:22:09 +0000 (09:22 +0000)]
x86/hvm: Don't let hvm_set_efer() raise #GP itself
c/s 49de10f3c "x86/hvm: Don't raise #GP behind the emulators back for MSR
accesses" missed an edge case.
hvm_set_efer() raises #GP itself, so deliberately avoided the goto gp_fault
path in hvm_msr_write_intercept().
With the above change, guest updates to MSR_EFER which end up faulting raises
hvm_msr_write_intercept() returning X86EMUL_EXCEPTION. The second #GP gets
combined to #DF and handed back to the guest.
Update hvm_set_efer() to avoid raising #GP, requiring its callers to do so.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Fri, 24 Feb 2017 08:58:50 +0000 (09:58 +0100)]
arm/p2m: remove the page from p2m->pages list before freeing it
The p2m code is using the page list field to link all the pages used
for the stage-2 page tables. The page is added into the p2m->pages
list just after the allocation but never removed from the list.
The page list field is also used by the allocator, not removing may
result a later Xen crash due to inconsistency (see [1]).
This bug was introduced by the reworking of p2m code in commit 2ef3e36ec7
"xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry".
Wei Liu [Tue, 21 Feb 2017 14:52:46 +0000 (14:52 +0000)]
tools: move xl to a dedicated directory
It makes clear distinction between the client (xl) and library (libxl),
which should help design better APIs. This will also help reduce the
code size in libxl directory.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Tue, 21 Feb 2017 14:40:48 +0000 (14:40 +0000)]
tools: provide libxlutil compiling and linking options
We are about to split out xl (which depends on libxlutil) to a different
directory. Provide the proper options for compiling and linking in
Rules.mk, and replace the hardcoded string in libxl/Makefile.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Thu, 23 Feb 2017 16:46:45 +0000 (16:46 +0000)]
xen-access: request compat devicemodel API
xc_hvm_inject_trap is moved to the new libdevicemodel. Request the
compat layer from libxenctrl for now to make xen-access build again.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
In Python3 'long' type have been merged into 'int', '1L' syntax is no
longer valid. Assign 'int' type to a 'long' variable in python3, so
'long(1)' will give correct result in both python2 and python3.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
In Python3, PyTypeObject looks slightly different, and also module
initialization is different. Instead of Py_InitModule, PyModule_Create
should be called on already defined PyModuleDef structure. And then
initialization function should return that module.
Additionally initialization function should be named PyInit_<name>,
instead of init<name>.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
In Python2 PyBytes is the same as PyString, but in Python3 PyString is
gone and 'str' is really PyUnicode in C-API.
When handling arbitrary data, use PyBytes - which is the right thing to
do in Python3, and pose no API change in Python2. When handling
xenstore paths and transaction ids, which have well defined format, use
PyUnicode - to ease API usage - no need to prefix all xenstore paths
with 'b' when migrating scripts to Python3.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Paul Durrant [Fri, 10 Feb 2017 14:34:15 +0000 (14:34 +0000)]
tools/libxendevicemodel: add a call to restrict the handle
My recent patch [1] to the Linux privcmd module introduced a mechanism
to restrict an open file handle to subsequently only accept operations for
a specified domain.
This patch extends the libxendevicemodel API and make use of the
mechanism in the Linux-specific code to restrict operations on the
interface handle.
Paul Durrant [Wed, 15 Feb 2017 14:44:16 +0000 (14:44 +0000)]
tools/libxendevicemodel: extract functions and add a compat layer
This patch extracts all functions resulting in a dm_op hypercall from
libxenctrl and moves them into libxendevicemodel. It also adds a compat
layer into libxenctrl, which can be selected by defining
XC_WANT_COMPAT_DEVICEMODEL_API to 1 before including xenctrl.h.
With this patch the core of libxendevicemodel still uses libxencall to
issue the dm_op hypercalls, but this is done by calling through code that
can be modified on a per-OS basis. A subsequent patch will add a Linux-
specific variant.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Paul Durrant [Wed, 15 Feb 2017 13:54:25 +0000 (13:54 +0000)]
tools/libxendevicemodel: introduce the new library
The new xendevicemodel library is intended to be used by all Xen device
models such that the only hypercall that use will be the dm_op hypercall
added by commit 524a98c2.
This patch adds the boilerplate for the new library, with only open() and
close() entry points, and calls to those from libxenctrl in preparation
for the compat layer added by a subsequent patch.
[ Also: update MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION
to the commits with the corresponding changes to those other trees
- Ian Jackson ]
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>