Andrew Cooper [Tue, 28 Feb 2017 15:17:17 +0000 (15:17 +0000)]
x86/layout: Correct Xen's idea of its own memory layout
c/s b4cd59fe "x86: reorder .data and .init when linking" had an unintended
side effect, where xen_in_range() and the tboot S3 MAC were no longer correct.
In practice, it means that Xen's .data section is excluded from consideration,
which means:
1) Default IOMMU construction for the hardware domain could create mappings.
2) .data isn't included in the tboot MAC checked on resume from S3.
Adjust the comments and virtual address anchors used to define the regions.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:44 +0000 (07:21 +0100)]
xenstore: make memory report available via XS_CONTROL
Add a XS_CONTROL command to xenstored for doing a talloc report to a
file. Right now this is supported by specifying a command line option
when starting xenstored and sending a signal to the daemon to trigger
the report.
To dump the report to the standard log file call:
xenstore-control memreport
To dump the report to a new file call:
xenstore-control memreport <file>
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:43 +0000 (07:21 +0100)]
xenstore: add support for changing log functionality dynamically
Today Xenstore supports logging only if specified at start of the
Xenstore daemon. As it can't be disabled during runtime it is not
recommended to start xenstored with logging enabled.
Add support for switching logging on and off at runtime and to
specify a (new) logfile. This is done via the XS_CONTROL wire command
which can be sent with xenstore-control.
To switch logging on just use:
xenstore-control log on
To switch it off again:
xenstore-control log off
To specify a (new) logfile:
xenstore-control logfile <file>
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:42 +0000 (07:21 +0100)]
xenstore: enhance control command support
The Xenstore protocol supports the XS_CONTROL command for triggering
various actions in the Xenstore daemon. Enhance that support by using
a command table and adding a help function.
Support multiple control commands in the associated xenstore-control
program used to issue XS_CONTROL commands.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:41 +0000 (07:21 +0100)]
xenstore: Split out XS_CONTROL action to dedicated source file
Move the XS_CONTROL handling of xenstored to a new source file
xenstored_control.c.
In order to avoid making get_string() in xenstored_core.c globally
visible use strlen() instead, which is save in this context due to
xs_count_strings() before returned a value > 1.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 24 Feb 2017 06:21:40 +0000 (07:21 +0100)]
xenstore: rename XS_DEBUG wire command
In preparation to support other than pure debug functionality via the
Xenstore XS_DEBUG wire command rename it to XS_CONTROL and make
XS_DEBUG an alias of it.
Add an alias xs_control_command for the associated xs_debug_command,
too.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 27 Feb 2017 11:47:10 +0000 (11:47 +0000)]
x86/shadow: Fix build with CONFIG_SHADOW_PAGING=n following c/s 45ac805
c/s 45ac805 "x86/paging: Package up the log dirty function pointers" neglected
the case when CONFIG_SHADOW_PAGING is disabled. Make a similar adjustment to
the none stubs.
Spotted by a Travis RANDCONFIG run.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
- When capturing branches, LBR H/W will always clear bits 61:62
regardless of the sign extension
- For WRMSR, bits 61:62 are considered the part of the sign extension
This bug affects only certain pCPUs (e.g. Haswell) with vCPUs that
use LBR. Fix it by sign-extending TSX bits in all LBR entries during
VM entry in affected cases.
LBR MSRs are currently not Live Migrated. In order to implement such
functionality, the MSR levelling work has to be done first because
hosts can have different LBR formats.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Sergey Dyasli [Thu, 23 Feb 2017 09:33:25 +0000 (09:33 +0000)]
x86/vmx: introduce vmx_find_msr()
Modify vmx_add_msr() to use a variation of insertion sort algorithm:
find a place for the new entry and shift all subsequent elements before
insertion.
The new vmx_find_msr() exploits the fact that MSR list is now sorted
and reuses the existing code for binary search.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Fri, 24 Feb 2017 18:12:19 +0000 (18:12 +0000)]
x86/emul: Fix sarx emulation test
The emulation tests run `sarx %edx,(%ecx),%ebx` with 0xfedcba98 pointed at by
%ecx, and 0xff13 in %rdx.
As the instruction uses a 32bit operand size, the expected result is
0x00000000ffffffdb in %rbx (rather than 0xffffffffffffffdb), due to usual
behaviour of 32bit operations on 64bit registers.
The test harness was incorrectly sign extending from 32 bits to 64 bits rather
than zero extending when checking the result of emulation, causing a false
negative failure.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 16 Feb 2017 16:42:16 +0000 (16:42 +0000)]
x86/paging: Package up the log dirty function pointers
They depend soley on paging mode, so don't need to be repeated per domain, and
can live in .rodata. While making this change, drop the redundant log_dirty
from the function pointer names.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Fri, 17 Feb 2017 17:10:50 +0000 (17:10 +0000)]
x86/cpuid: Handle leaf 0x1 in guest_cpuid()
The features words, ecx and edx, are already audited as part of the featureset
logic. The existing leaf 0x80000001 dynamic logic has its SYSCALL adjustment
split out, as the rest of the adjustments are common with leaf 0x1. The
existing leaf 0x1 feature adjustments from {pv,hvm}_cpuid() are moved
wholesale into guest_cpuid(), although deduped against the common adjustments.
The eax word is family/model/stepping information, and is fine to use as
provided by the toolstack, although with reserved bits cleared.
The ebx word is more problematic. The low 8 bits are the brand ID and safe to
pass straight through. The next 8 bits are the CLFLUSH line size. This value
is forwarded straight from hardware, as nothing good can possibly come of
providing an alternative value to the guest.
The next 8 bits are slightly different between Intel and AMD, but are both
some property of the number of logical cores in the current physical package.
For now, the toolstack value is used unchanged until better topology support
is available.
The final 8 bits are the initial legacy APIC ID. For HVM guests, this was
overridden to vcpu_id * 2. The same logic is now applied to PV guests, so
guests don't observe a constant number on all vcpus via their emulated or
faulted view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 24 Feb 2017 16:22:13 +0000 (17:22 +0100)]
x86emul/test: split generic and testcase specific parts
Both the build logic and the invocation have their blowfish specific
aspects abstracted out here. Additionally
- run native execution (if suitable) first (as that one failing
suggests a problem with the to be tested code itself, in which case
having the emulator have a go over it is kind of pointless)
- move the 64-bit tests up in blobs[] so 64-bit native execution will
also precede 32-bit emulation (on 64-bit systems only of course)
- instead of -msoft-float (we'd rather not have the compiler generate
such code), pass -fno-asynchronous-unwind-tables and -g0 (reducing
binary size of the helper images as well as [slightly] compilation
time)
- skip tests with zero length blobs (these can result from failed
compilation, but not failing the build in this case seems desirable:
it may allow partial testing - e.g. with older compilers - and
permits manually removing certain tests from the generated headers
without having to touch actual source code)
- constrain rIP to the actual blob range rather than looking for the
specific (fake) return address put on the stack
- also print the opcode when x86_emulate() fails
- print at least three progress dots (for relatively short tests)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:49:19 +0000 (15:49 +0100)]
x86: setup PVHv2 Dom0 ACPI tables
Create a new MADT table that contains the topology exposed to the guest. A
new XSDT table is also created, in order to filter the tables that we want
to expose to the guest, plus the Xen crafted MADT. This in turn requires Xen
to also create a new RSDP in order to make it point to the custom XSDT.
Also, regions marked as E820_ACPI or E820_NVS are identity mapped into Dom0
p2m, plus any top-level ACPI tables that should be accessible to Dom0 and
reside in reserved regions. This is needed because some memory maps don't
properly account for all the memory used by ACPI, so it's common to find ACPI
tables in reserved regions.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:48:59 +0000 (15:48 +0100)]
x86: setup PVHv2 Dom0 CPUs
Initialize Dom0 BSP/APs and setup the memory and IO permissions. This also sets
the initial BSP state in order to match the protocol specified in
docs/misc/hvmlite.markdown.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:48:43 +0000 (15:48 +0100)]
x86: parse Dom0 kernel for PVHv2
Introduce a helper to parse the Dom0 kernel.
A new helper is also introduced to libelf, that's used to store the destination
vcpu of the domain. This parameter is needed when loading the kernel on a HVM
domain (PVHv2), since hvm_copy_to_guest_phys requires passing the destination
vcpu.
While there also fix image_base and image_start to be of type "void *", and do
the necessary fixup of related functions.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:47:55 +0000 (15:47 +0100)]
x86/libelf: pass the destination vCPU to libelf for Dom0 build
Allow setting the destination vCPU for libelf, so that elf_load_image can take
it into account when loading the kernel for Dom0. This is needed for PVHv2 Dom0
build, so that hvm_copy_to_guest_phys can be called with a Dom0 vCPU instead of
current (that contains the idle vCPU at this point).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:47:03 +0000 (15:47 +0100)]
x86: populate PVHv2 Dom0 physical memory map
Craft the Dom0 e820 memory map and populate it. Introduce a helper to remove
memory pages that are shared between Xen and a domain, and use it in order to
remove low 1MB RAM regions from dom_io in order to assign them to a PVHv2 Dom0.
On hardware lacking support for unrestricted mode also craft the identity page
tables and the TSS used for virtual 8086 mode.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 24 Feb 2017 14:46:10 +0000 (15:46 +0100)]
x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
PVHv2 guests, unlike HVM guests, won't have the option to route interrupts
from physical or emulated devices over event channels using PIRQs. This
applies to both DomU and Dom0 PVHv2 guests.
Introduce a new XEN_X86_EMU_USE_PIRQ to notify Xen whether a HVM guest can
route physical interrupts (even from emulated devices) over event channels,
and is thus allowed to use some of the PHYSDEV ops.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 24 Feb 2017 09:22:09 +0000 (09:22 +0000)]
x86/hvm: Don't let hvm_set_efer() raise #GP itself
c/s 49de10f3c "x86/hvm: Don't raise #GP behind the emulators back for MSR
accesses" missed an edge case.
hvm_set_efer() raises #GP itself, so deliberately avoided the goto gp_fault
path in hvm_msr_write_intercept().
With the above change, guest updates to MSR_EFER which end up faulting raises
hvm_msr_write_intercept() returning X86EMUL_EXCEPTION. The second #GP gets
combined to #DF and handed back to the guest.
Update hvm_set_efer() to avoid raising #GP, requiring its callers to do so.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Fri, 24 Feb 2017 08:58:50 +0000 (09:58 +0100)]
arm/p2m: remove the page from p2m->pages list before freeing it
The p2m code is using the page list field to link all the pages used
for the stage-2 page tables. The page is added into the p2m->pages
list just after the allocation but never removed from the list.
The page list field is also used by the allocator, not removing may
result a later Xen crash due to inconsistency (see [1]).
This bug was introduced by the reworking of p2m code in commit 2ef3e36ec7
"xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry".
Wei Liu [Tue, 21 Feb 2017 14:52:46 +0000 (14:52 +0000)]
tools: move xl to a dedicated directory
It makes clear distinction between the client (xl) and library (libxl),
which should help design better APIs. This will also help reduce the
code size in libxl directory.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Tue, 21 Feb 2017 14:40:48 +0000 (14:40 +0000)]
tools: provide libxlutil compiling and linking options
We are about to split out xl (which depends on libxlutil) to a different
directory. Provide the proper options for compiling and linking in
Rules.mk, and replace the hardcoded string in libxl/Makefile.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Thu, 23 Feb 2017 16:46:45 +0000 (16:46 +0000)]
xen-access: request compat devicemodel API
xc_hvm_inject_trap is moved to the new libdevicemodel. Request the
compat layer from libxenctrl for now to make xen-access build again.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
In Python3 'long' type have been merged into 'int', '1L' syntax is no
longer valid. Assign 'int' type to a 'long' variable in python3, so
'long(1)' will give correct result in both python2 and python3.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
In Python3, PyTypeObject looks slightly different, and also module
initialization is different. Instead of Py_InitModule, PyModule_Create
should be called on already defined PyModuleDef structure. And then
initialization function should return that module.
Additionally initialization function should be named PyInit_<name>,
instead of init<name>.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
In Python2 PyBytes is the same as PyString, but in Python3 PyString is
gone and 'str' is really PyUnicode in C-API.
When handling arbitrary data, use PyBytes - which is the right thing to
do in Python3, and pose no API change in Python2. When handling
xenstore paths and transaction ids, which have well defined format, use
PyUnicode - to ease API usage - no need to prefix all xenstore paths
with 'b' when migrating scripts to Python3.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Paul Durrant [Fri, 10 Feb 2017 14:34:15 +0000 (14:34 +0000)]
tools/libxendevicemodel: add a call to restrict the handle
My recent patch [1] to the Linux privcmd module introduced a mechanism
to restrict an open file handle to subsequently only accept operations for
a specified domain.
This patch extends the libxendevicemodel API and make use of the
mechanism in the Linux-specific code to restrict operations on the
interface handle.
Paul Durrant [Wed, 15 Feb 2017 14:44:16 +0000 (14:44 +0000)]
tools/libxendevicemodel: extract functions and add a compat layer
This patch extracts all functions resulting in a dm_op hypercall from
libxenctrl and moves them into libxendevicemodel. It also adds a compat
layer into libxenctrl, which can be selected by defining
XC_WANT_COMPAT_DEVICEMODEL_API to 1 before including xenctrl.h.
With this patch the core of libxendevicemodel still uses libxencall to
issue the dm_op hypercalls, but this is done by calling through code that
can be modified on a per-OS basis. A subsequent patch will add a Linux-
specific variant.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Paul Durrant [Wed, 15 Feb 2017 13:54:25 +0000 (13:54 +0000)]
tools/libxendevicemodel: introduce the new library
The new xendevicemodel library is intended to be used by all Xen device
models such that the only hypercall that use will be the dm_op hypercall
added by commit 524a98c2.
This patch adds the boilerplate for the new library, with only open() and
close() entry points, and calls to those from libxenctrl in preparation
for the compat layer added by a subsequent patch.
[ Also: update MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION
to the commits with the corresponding changes to those other trees
- Ian Jackson ]
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Wed, 18 Jan 2017 15:05:24 +0000 (15:05 +0000)]
x86/mm: Swap mfn_valid() to use mfn_t
Replace one opencoded mfn_eq() and some coding style issues on altered lines.
Swap __mfn_valid() to being bool, although it can't be updated to take mfn_t
because of include dependencies.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Julien Grall <julien.grall@arm.com>
Daniel Kiper [Wed, 22 Feb 2017 13:38:06 +0000 (14:38 +0100)]
efi: create new early memory allocator
There is a problem with place_string() which is used as early memory
allocator. It gets memory chunks starting from start symbol and goes
down. Sadly this does not work when Xen is loaded using multiboot2
protocol because then the start lives on 1 MiB address and we should
not allocate a memory from below of it. So, I tried to use mem_lower
address calculated by GRUB2. However, this solution works only on some
machines. There are machines in the wild (e.g. Dell PowerEdge R820)
which uses first ~640 KiB for boot services code or data... :-(((
Hence, we need new memory allocator for Xen EFI boot code which is
quite simple and generic and could be used by place_string() and
efi_arch_allocate_mmap_buffer(). I think about following solutions:
1) We could use native EFI allocation functions (e.g. AllocatePool()
or AllocatePages()) to get memory chunk. However, later (somewhere
in __start_xen()) we must copy its contents to safe place or reserve
it in e820 memory map and map it in Xen virtual address space. This
means that the code referring to Xen command line, loaded modules and
EFI memory map, mostly in __start_xen(), will be further complicated
and diverge from legacy BIOS cases. Additionally, both former things
have to be placed below 4 GiB because their addresses are stored in
multiboot_info_t structure which has 32-bit relevant members.
2) We may allocate memory area statically somewhere in Xen code which
could be used as memory pool for early dynamic allocations. Looks
quite simple. Additionally, it would not depend on EFI at all and
could be used on legacy BIOS platforms if we need it. However, we
must carefully choose size of this pool. We do not want increase Xen
binary size too much and waste too much memory but also we must fit
at least memory map on x86 EFI platforms. As I saw on small machine,
e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
So, it means that we need more than 8 KiB for EFI memory map only.
Additionally, if we use this memory pool for Xen and modules command
line storage (it would be used when xen.efi is executed as EFI application)
then we should add, I think, about 1 KiB. In this case, to be on safe
side, we should assume at least 64 KiB pool for early memory allocations.
Which is about 4 times of our earlier calculations. However, during
discussion on Xen-devel Jan Beulich suggested that just in case we should
use 1 MiB memory pool like it is in original place_string() implementation.
So, let's use 1 MiB as it was proposed. If we think that we should not
waste unallocated memory in the pool on running system then we can mark
this region as __initdata and move all required data to dynamically
allocated places somewhere in __start_xen().
2a) We could put memory pool into .bss.page_aligned section. Then allocate
memory chunks starting from the lowest address. After init phase we can
free unused portion of the memory pool as in case of .init.text or .init.data
sections. This way we do not need to allocate any space in image file and
freeing of unused area in the memory pool is very simple.
Now #2a solution is implemented because it is quite simple and requires
limited number of changes, especially in __start_xen().
New allocator is quite generic and can be used on ARM platforms too.
Though it is not enabled on ARM yet due to lack of some prereq.
List of them is placed before ebmalloc code.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Tested-by: Doug Goldstein <cardoe@cardoe.com>
Daniel Kiper [Wed, 22 Feb 2017 13:36:56 +0000 (14:36 +0100)]
efi: build xen.gz with EFI code
Build xen.gz with EFI code. We need this to support multiboot2
protocol on EFI platforms.
If we wish to load non-ELF file using multiboot (v1) or multiboot2 then
it must contain "linear" (or "flat") representation of code and data.
This is requirement of both boot protocols. Currently, PE file contains
many sections which are not "linear" (one after another without any holes)
or even do not have representation in a file (e.g. BSS). From EFI point
of view everything is OK and works. However, this file layout cannot be
properly interpreted by multiboot protocols family. In theory there is
a chance that we could build proper PE file (from multiboot protocols POV)
using current build system. However, it means that xen.efi further diverge
from Xen ELF file (in terms of contents and build method). On the other
hand ELF has all needed properties. So, it means that this is good starting
point for further development. Additionally, I think that this is also good
starting point for further xen.efi code and build optimizations. It looks
that there is a chance that finally we can generate xen.efi directly from
Xen ELF using just simple objcopy or other tool. This way we will have one
Xen binary which can be loaded by three boot protocols: EFI native loader,
multiboot (v1) and multiboot2.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Daniel Kiper [Wed, 22 Feb 2017 13:35:05 +0000 (14:35 +0100)]
x86: add multiboot2 protocol support
Add multiboot2 protocol support. Alter min memory limit handling as we
now may not find it from either multiboot (v1) or multiboot2.
This way we are laying the foundation for EFI + GRUB2 + Xen development.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Doug Goldstein <cardoe@cardoe.com>
Jan Beulich [Wed, 22 Feb 2017 11:36:36 +0000 (12:36 +0100)]
x86/VMX: sanitize VM86 TSS handling
The present way of setting this up is flawed: Leaving the I/O bitmap
pointer at zero means that the interrupt redirection bitmap lives
outside (ahead of) the allocated space of the TSS. Similarly setting a
TSS limit of 255 when only 128 bytes get allocated means that 128 extra
bytes may be accessed by the CPU during I/O port access processing.
Introduce a new HVM param to set the allocated size of the TSS, and
have the hypervisor actually take care of setting namely the I/O bitmap
pointer. Both this and the segment limit now take the allocated size
into account.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 2 Nov 2016 15:50:23 +0000 (15:50 +0000)]
x86/emul: Support CPUID faulting via a speculative MSR read
This removes the need for every cpuid() emulation hook to individually support
CPUID faulting.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 2 Nov 2016 14:36:49 +0000 (14:36 +0000)]
x86/hvm: Don't raise #GP behind the emulators back for MSR accesses
The current hvm_msr_{read,write}_intercept() infrastructure calls
hvm_inject_hw_exception() directly to latch a fault, and returns
X86EMUL_EXCEPTION to its caller.
This behaviour is problematic for the hvmemul_{read,write}_msr() paths, as the
fault is raised behind the back of the x86 emulator.
Alter the behaviour so hvm_msr_{read,write}_intercept() simply returns
X86EMUL_EXCEPTION, leaving the callers to actually inject the #GP fault.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Sun, 18 Dec 2016 14:56:28 +0000 (14:56 +0000)]
x86/vmx: Drop vmx_msr_state infrastructure
To avoid leaking host MSR state into guests, guest LSTAR, STAR and
SYSCALL_MASK state is unconditionally loaded when switching into guest
context.
Attempting to dirty-track the state is pointless; host state is always
restoring upon exit from guest context, meaning that guest state is always
considered dirty.
Drop struct vmx_msr_state, enum VMX_INDEX_MSR_* and msr_index[]. The guests
MSR values are stored plainly in arch_vmx_struct, in the same way as shadow_gs
and cstar are. vmx_restore_guest_msrs() and long_mode_do_msr_write() ensure
that the hardware MSR values are always up-to-date.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Sun, 18 Dec 2016 14:56:28 +0000 (14:56 +0000)]
x86/vmx: Remove vmx_save_host_msrs() and host_msr_state
A pcpu's LSTAR, STAR and SYSCALL_MASK MSRs are unconditionally switched when
moving in and out of HVM vcpu context. Two of these values are compile time
constants, and the third is directly available in an existing per-cpu
variable.
There is no need to save host state in vmx_cpu_up() into a different per-cpu
structure, so drop all the infrastructure. vmx_restore_host_msrs() is
simplified to 3 plain WRMSR instructions.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Sun, 18 Dec 2016 14:56:28 +0000 (14:56 +0000)]
x86/setup: Intoduce XEN_MSR_STAR
Xen's choice of the MSR_STAR value is constant across all pcpus. Introduce a
new define and use it to avoid the opencoding in subarch_percpu_traps_init()
and restore_rest_processor_state().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>