Ian Jackson [Fri, 22 Dec 2017 16:12:23 +0000 (16:12 +0000)]
xl: pvshim: Provide and document xl config
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: pvshim, not pvhshim
works with type "pvh", not type "pv"
pvshim_etc. options in config are not erroneously ignored
Ian Jackson [Fri, 5 Jan 2018 15:50:38 +0000 (15:50 +0000)]
libxl: pvshim: Provide first-class config settings to enable shim mode
This is API-compatible because old callers are supposed to call
libxl_*_init to initialise the struct; and the updated function clears
these members.
It is ABI-compatible because the new fields make this member of the
guest type union larger but only within the existing size of that
union.
Unfortunately it is not easy to backport because it depends on the PVH
domain type. Attempts to avoid use of the PVH domain type involved
working with two views of the configuration: the "underlying" domain
type and the "visible" type (and corresponding config info). Also
there are different sets of config settings for PV and PVH, which
callers would have to know to set.
And, unfortunately, it will not be possible, with this approach, to
enable the shim by default for all libxl callers. (Although it could
perhaps be done in xl.)
For now, our config defaults are:
* if enabled, path is "xen-shim" in the xen firmware directory
* if enabled, cmdline is the one we are currently debugging with
The debugging arguments will be rationalised in a moment.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: pvshim, not pvhshim
works with type "pvh", not type "pv"
George Dunlap [Thu, 14 Dec 2017 16:16:20 +0000 (16:16 +0000)]
libxl: Introduce hack to allow PVH mode to add a shim
libxl will look for LIBXL_PVSHIM_PATH and LIBXL_PVSHIM_CMDLINE
environment variables. If the first is present, it will boot with the
shim and the existing kernel / ramdisk. (That is, the shim as the "kernel" and the
kernel and ramdisk both as extra modules.)
If not, it will just boot the kernel / ramdisk directly (that is, with
the kernel as "kernel" and the ramdisk as a module).
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Roger Pau Monne [Tue, 5 Dec 2017 16:22:03 +0000 (16:22 +0000)]
xen/pvshim: add grant table operations
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Fri, 8 Dec 2017 14:39:45 +0000 (14:39 +0000)]
xen/pvshim: modify Dom0 builder in order to build a DomU
According to the PV ABI the initial virtual memory regions should
contain the xenstore and console pages after the start_info. Fix this
and add the pages to the p2m/m2p after the start_info page also.
Also set the correct values in the start_info for DomU operation.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Jonathan Ludlam [Mon, 27 Nov 2017 16:18:58 +0000 (16:18 +0000)]
tools/libxc: Multi modules support
Signed-off-by: Jonathan Ludlam <jonathan.ludlam@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Thu, 16 Nov 2017 17:56:18 +0000 (17:56 +0000)]
x86: xen pv clock time source
It is a variant of TSC clock source.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Tue, 28 Nov 2017 18:30:15 +0000 (18:30 +0000)]
x86/fixmap: Modify fix_to_virt() to return a void pointer
Almost all users of fix_to_virt() actually want a pointer. Include the cast
within the definition, so the callers don't need to.
Two users which need the integer value are switched to using __fix_to_virt()
directly. A few users stay fully unchanged, due to GCC's void pointer
arithmetic extension causing the same behaviour. Most users however have
their explicit casting dropped.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 6 Dec 2017 11:50:23 +0000 (12:50 +0100)]
x86/HVM: don't retain emulated insn cache when exiting back to guest
vio->mmio_retry is being set when a repeated string insn is being split
up. In that case we'll exit to the guest, expecting immediate re-entry.
Interruptions, however, may be serviced by the guest before re-entry
from the repeated string insn. Any emulation needed in the course of
handling the interruption must not fetch from the internally maintained
cache.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Jan Beulich [Tue, 5 Dec 2017 16:23:53 +0000 (17:23 +0100)]
x86: don't ignore foreigndom on L2/L3/L4 page table updates
Silently assuming DOMID_SELF is unlikely to be a good idea for page
table updates. For PGT_writable pages, though, it seems better to allow
the writes, so the same check isn't being applied there.
Also add blank lines between the individual case blocks.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 5 Dec 2017 16:22:31 +0000 (17:22 +0100)]
x86/mm: drop yet another relic of translated PV domains from new_guest_cr3()
The function can be called for PV domains only, which commit 5a0b9fba92
("x86/mm: drop further relics of translated PV domains") sort of
realized, but not fully.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 4 Dec 2017 10:04:18 +0000 (11:04 +0100)]
gnttab: improve GNTTABOP_cache_flush locking
Dropping the lock before returning from grant_map_exists() means handing
possibly stale information back to the caller. Return back the pointer
to the active entry instead, for the caller to release the lock once
done.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jann validly points out that with a caller bogusly requesting a zero-
element batch with non-zero high command bits (the ones used for
continuation encoding), the assertion right before the call to
hypercall_create_continuation() would trigger. A similar situation would
arise afaict for non-empty batches with op and/or length zero in every
element.
While we want the former to succeed (as we do elsewhere for similar
no-op requests), the latter can clearly be converted to an error, as
this is a state that can't be the result of a prior operation.
Take the opportunity and also correct the order of argument checks:
We shouldn't accept zero-length elements with unknown bits set in "op".
Also constify cache_flush()'s first parameter.
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andre Przywara <andre.przywara@linaro.org> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Roger Pau Monné [Mon, 4 Dec 2017 10:02:46 +0000 (11:02 +0100)]
pci: introduce a type to store a SBDF
That provides direct access to all the members that constitute a SBDF.
The only function switched to use it is hvm_pci_decode_addr, because
it makes following patches simpler.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Gregory Herrero [Mon, 4 Dec 2017 10:01:48 +0000 (11:01 +0100)]
libelf: allow having HYPERCALL_PAGE entry before VIRT_BASE in __xen_guest section
When filling __xen_guest section of a guest, user may define
HYPERCALL_PAGE earlier than VIRT_BASE in the section leading to an
incorrect hypercall page address since an undefined virt_base could be
used to compute hypercall page address.
If there is no VIRT_BASE entry in __xen_guest section, default value of
0 is used for virt_base. Thus, setting hypercall page address to
HYPERCALL_PAGE value is correct in this case too.
Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Zhenzhong Duan [Mon, 4 Dec 2017 10:01:24 +0000 (11:01 +0100)]
x86/physdev: remove redundant code in branch MAP_PIRQ_TYPE_MSI
Same code is already in allocate_and_map_msi_pirq()
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
David Esler [Mon, 4 Dec 2017 10:00:24 +0000 (11:00 +0100)]
x86/boot: rename send_chr to print_err
The send_chr function sends an entire C-string and not one character and
doesn't necessarily just send it over the serial UART anymore so rename
it to print_err so that its closer in name to what it does.
Signed-off-by: David Esler <drumandstrum@gmail.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Brian Woods [Thu, 16 Nov 2017 22:11:15 +0000 (16:11 -0600)]
x86/svm: Add virtual GIF support
This patch detects and enables Virtual GIF if available. This allows
a nested hypervisor to perform STGIs and CLGIs without having to be
intercepted by host hypervisor.
Signed-off-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Brian Woods [Thu, 16 Nov 2017 22:11:14 +0000 (16:11 -0600)]
x86/svm: Add virtual GIF feature definition
Add support for enabling the virtual GIF feature.
Signed-off-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 28 Nov 2017 18:48:07 +0000 (18:48 +0000)]
x86/traps: Drop redundant printk() in fatal_trap()
show_page_walk() already prints the linear address of the walk, and
show_execution_state() has printed a raw %cr2 value. This avoids having
two adjacent log lines with identical information.
Boris Ostrovsky [Thu, 9 Nov 2017 15:37:53 +0000 (10:37 -0500)]
x86/pvh: Do not add DSDT and FACS to PVH dom0 XSDT
These tables are pointed to from FADT. Adding them will
result in duplicate entries in the guest's tables.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Sergey Dyasli [Mon, 23 Oct 2017 09:33:02 +0000 (10:33 +0100)]
x86/vvmx: don't enable vmcs shadowing for nested guests
Running "./xtf_runner vvmx" in L1 Xen under L0 Xen produces the
following result on H/W with VMCS shadowing:
Test: vmxon
Failure in test_vmxon_in_root_cpl0()
Expected 0x8200000f: VMfailValid(15) VMXON_IN_ROOT
Got 0x82004400: VMfailValid(17408) <unknown>
Test result: FAILURE
This happens because SDM allows vmentries with enabled VMCS shadowing
VM-execution control and VMCS link pointer value of ~0ull. But results
of a nested VMREAD are undefined in such cases.
Fix this by not copying the value of VMCS shadowing control from vmcs01
to vmcs02.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Brian Woods [Tue, 31 Oct 2017 22:03:08 +0000 (17:03 -0500)]
x86/svm: add virtual VMLOAD/VMSAVE support
On AMD family 17h server processors, there is a feature called virtual
VMLOAD/VMSAVE. This allows a nested hypervisor to preform a VMLOAD or
VMSAVE without needing to be intercepted by the host hypervisor.
Virtual VMLOAD/VMSAVE requires the host hypervisor to be in long mode
and nested page tables to be enabled. For more information about it
please see:
AMD64 Architecture Programmer’s Manual Volume 2: System Programming
http://support.amd.com/TechDocs/24593.pdf
Section: VMSAVE and VMLOAD Virtualization (Section 15.33.1)
This patch series adds support to check for and enable the virtual
VMLOAD/VMSAVE features if available.
Signed-off-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Adding support for enabling the virtual VMLOAD/VMSAVE feature..
Signed-off-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Brian Woods [Tue, 31 Oct 2017 22:03:06 +0000 (17:03 -0500)]
x86/svm: rename lbr control field in vmcb
Rename the lbr_control field in the vmcb for future/upcoming changes.
Signed-off-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Tue, 17 Oct 2017 17:06:23 +0000 (18:06 +0100)]
x86/vmx: Don't rewrite HOST_TR_SELECTOR on every context switch
TSS_ENTRY is a compile time constant, so HOST_TR_SELECTOR can be set up during
VMCS construction and left alone thereafter, rather than rewriting it on every
context switch.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Mon, 16 Oct 2017 13:20:07 +0000 (13:20 +0000)]
xen/pv: Construct d0v0's GDT properly
c/s cf6d39f8199 "x86/PV: properly populate descriptor tables" changed the GDT
to reference zero_page for intermediate frames between the guest and Xen
frames.
Because dom0_construct_pv() doesn't call arch_set_info_guest(), some bits of
initialisation are missed, including the pv_destroy_gdt() which initially
fills the references to zero_page.
In practice, this means there is a window between starting and the first call
to HYPERCALL_set_gdt() were lar/lsl/verr/verw suffer non-architectural
behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
This probably wants backporting to Xen 4.7 and later.
Andrew Cooper [Mon, 2 Oct 2017 14:13:38 +0000 (14:13 +0000)]
x86/ldt: Alter how invalidate_shadow_ldt() deals with TLB flushes
Modify invalidate_shadow_ldt() to return a boolean indicating whether mappings
have been dropped, rather than taking a flush parameter. Tweak the internal
logic to be able to ASSERT() that v->arch.pv_vcpu.shadow_ldt_mapcnt matches
the number of PTEs removed.
This allows MMUEXTOP_SET_LDT to avoid a local TLB flush if no LDT entries had
been faulted in to begin with.
Finally, correct a comment in __get_page_type(). Under no circumstance is it
safe to forgo the TLB shootdown for GDT/LDT pages, as that would allow one
vcpu to gain a writeable mapping to a frame still mapped as a GDT/LDT by
another vcpu.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 2 Oct 2017 13:58:17 +0000 (13:58 +0000)]
xen/x86: Introduce static inline wrappers for l{idt,gdt,ldt,tr}()
This avoids indirection and parameter constraint issues. Doing so relaxes the
load_LDT() constraints from %ax to any general purpose register. The helpers
are upgraded to full compiler barriers, because nothing good will come of
having these reordered with respect to other segment accesses.
The triple-fault reboot method stays as is, to avoid the int3 possibly getting
moved relative to the lidt.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>