Jan Beulich [Fri, 6 Jan 2017 14:07:31 +0000 (15:07 +0100)]
x86: use unambiguous register names
Eliminate the mis-naming of 64-bit fields with 32-bit register names
(eflags instead of rflags etc). To ensure no piece of code was missed,
transiently use the underscore prefixed names only for 32-bit register
accesses. This will be cleaned up subsequently.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 6 Jan 2017 14:06:09 +0000 (15:06 +0100)]
x86: drop cpu_has_sse{,2}
Commit dc88221c97 ("x86: rename XMM* features to SSE*") pointlessly
added them - these features are always available on 64-bit CPUs. (Let's
not assume this for MMX though in at least the insn emulator.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Thu, 5 Jan 2017 16:26:09 +0000 (10:26 -0600)]
x86/mtrr: use stdbool instead of int + define
Instead of using an int and providing a define for TRUE and FALSE,
change the code to use stdbool that Xen provides.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Minor style tweaks] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Boris Ostrovsky [Tue, 3 Jan 2017 14:04:12 +0000 (09:04 -0500)]
libxl: Update xenstore on VCPU hotplug for all guest types
Currently HVM guests that use upstream qemu do not update xenstore's
availability entry for VCPUs. While it is not strictly necessary for
hotplug to work, xenstore ends up not reflecting actual status of
VCPUs. We should fix this.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Fri, 23 Dec 2016 12:12:36 +0000 (12:12 +0000)]
build: move setting LTO options to xen/Rules.mk
Having them in StdGNU.mk would affect both hypervisor and tools build.
However judging from the commit message of e4cdd74f LTO was only meant
to affect hypvervisor build.
Move the relevant bits to xen/Rules.mk.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Thu, 5 Jan 2017 11:41:50 +0000 (11:41 +0000)]
x86/pv: Defer I/O bitmap checks even in 64bit mode for emulate_privilege_op()
The I/O bitmap doesn't change function depending on mode. 64bit userspace
such as an X server still needs to enter guest_io_okay() to find that the PV
kernel did set up an appropriate virtual I/O bitmap to permit access.
While moving the check, alter its representation to be easier to read.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 3 Jan 2017 11:55:54 +0000 (11:55 +0000)]
x86/vvmx: Drop sreg_to_index[]
Since c/s 0888d36b "x86/emul: Correct the decoding of SReg3 operands",
x86_seg_* have followed hardware encodings, meaning that this translation
table is now an identiy transform.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 5 Jan 2017 10:11:19 +0000 (11:11 +0100)]
x86/VMX: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Quan Xu [Thu, 5 Jan 2017 10:10:01 +0000 (11:10 +0100)]
x86/apicv: fix RTC periodic timer and apicv issue
When Xen apicv is enabled, wall clock time is faster on Windows7-32
guest with high payload (with 2vCPU, captured from xentrace, in
high payload, the count of IPI interrupt increases rapidly between
these vCPUs).
If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in vIRR), unfortunately, the IPI
intrrupt is high priority than periodic timer interrupt. Xen updates
IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
within VMX non-root operation without a VM-Exit. Within VMX non-root
operation, if periodic timer interrupt index of bit is set in vIRR and
highest, the apicv delivers periodic timer interrupt within VMX non-root
operation as well.
But in current code, if Xen doesn't update periodic timer interrupt bit
set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
of this case to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, then Xen will deliver a periodic timer interrupt again.
And that we update periodic timer interrupt in every VM-entry, there is
a chance that already-injected instance (before EOI-induced exit happens)
will incur another pending IRR setting if there is a VM-exit happens
between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
guest receives more periodic timer interrupt.
So we set eoi_exit_bitmap for intack.vector - give a chance to post
periodic time interrupts when periodic time interrupts become the
highest one.
Signed-off-by: Quan Xu <xuquan8@huawei.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Chao Gao <chao.gao@intel.com>
Andrew Cooper [Thu, 8 Dec 2016 08:46:42 +0000 (08:46 +0000)]
x86/cpuid: Untangle the <asm/cpufeature.h> include hierachy
The use of X86_FEATURES_ONLY was shortlived in Linux for the same problem
encountered here. The following series needs to add extra includes to
asm/cpuid.h, which breaks the build elsewhere given the current hierachy.
Move the feature definitions into a separate header file, which also matches
the solution Linux used.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Piotr Luc [Wed, 4 Jan 2017 13:29:30 +0000 (14:29 +0100)]
x86/mwait-idle: add Knights Mill CPUID
Add Knights Mill (KNM) to the list of CPUIDs supported by mwait-idle.
Signed-off-by: Piotr Luc <piotr.luc@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit: a2c1bc645e87346150516b3abf1933ed29d0f48b] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andy Shevchenko [Wed, 4 Jan 2017 13:29:08 +0000 (14:29 +0100)]
x86/mwait-idle: add CPU model 0x4a (Atom Z34xx series)
Add CPU ID for Atom Z34xx processors. Datasheets indicate support for this,
detailed information about potential quirks or limitations are missing, though.
So we just reuse the definition from official BSP code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit: 5e7ec268fd48d63cfd0e3a9be6c6443f01673bd4] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 4 Jan 2017 13:28:32 +0000 (14:28 +0100)]
x86emul: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc).
Note that the result is not fully consistent until after at least one
more patch is in place, primarily to limit patch size (by trying to not
touch the same line twice).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 4 Jan 2017 13:28:02 +0000 (14:28 +0100)]
x86emul: make _PRE_EFLAGS() tolerate first argument being 32-bit
While this may appear to introduce a truncation issue, the high 32 bits
get zapped already anyway (early in _PRE_EFLAGS() as well as in
_POST_EFLAGS()). Once a subsequent patch switches to use proper 32-bit
EFLAGS operands, we'll in fact end up with more correct code, as that
zeroing of the upper halves will then go away.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 4 Jan 2017 13:27:17 +0000 (14:27 +0100)]
x86emul: support LAR/LSL/VERR/VERW
This involves protmode_load_seg() accepting x86_seg_none as input, with
the meaning to
- suppress any exceptions other than #PF,
- not commit any state.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 16 Dec 2016 17:36:22 +0000 (17:36 +0000)]
x86/cpu: Improvements to get_cpu_vendor()
Comparing 3 integers is more efficient than using strcmp(), and is more useful
to the gcv_guest case than having to fabricate a suitable string to pass. The
gcv_host cases have both options easily to hand, and experimentally, the
resulting code is more efficient.
Update the cpu_dev structure to be more efficient. c_vendor[] only needs to
be 8 bytes long to cover all the CPU drivers Xen has, which avoids storing an
8-byte pointer to 8 bytes of data. Drop c_ident[1] as we have no CPU drivers
with a second ident string, and turn it into an anonymous union to allow
access to the integer values directly.
This avoids all need for the vendor_id union in update_domain_cpuid_info().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 16 Dec 2016 17:53:09 +0000 (17:53 +0000)]
x86/cpu: Drop unused X86_VENDOR_* values
Xen only has CPU drivers for Intel, Centaur and AMD. All other contributions
to X86_VENDOR_NUM simply make the cpu_devs[] array longer, reducing the
efficiency of get_cpu_vendor()
There is one remaning hidden reference to X86_VENDOR_CYRIX in the MTRR code.
However, as far as I can tell, Cyrix never realeased a 64bit processor. It is
therefore dead code.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Thu, 29 Dec 2016 16:36:31 +0000 (16:36 +0000)]
libxl: fix libxl_set_memory_target
Commit 26dbc93a ("libxl: Remove pointless hypercall from
libxl_set_memory_target") removed the call to xc_domain_getinfolist, but
it failed to notice that "info" was actually needed later.
Put that back. While at it, make the code conform to coding style
requirement.
Reported-by: Juergen Gross <jgross@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Tue, 3 Jan 2017 08:44:10 +0000 (09:44 +0100)]
x86/SVM: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Jan Beulich [Tue, 3 Jan 2017 08:43:29 +0000 (09:43 +0100)]
x86/HVMemul: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 3 Jan 2017 08:42:52 +0000 (09:42 +0100)]
x86/guest-walk: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 3 Jan 2017 08:42:10 +0000 (09:42 +0100)]
x86/MSR: introduce MSR access split/fold helpers
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Andrew Cooper [Fri, 9 Dec 2016 18:40:11 +0000 (18:40 +0000)]
x86/emul: Correct the return value handling of VMFUNC
The bracketing of x86_emulate() calling the ops->vmfunc() hook is wrong with
respect to the assignment to rc, which can trip the new assertions in
x86_emulate_wrapper().
The hvmemul_vmfunc() hook should only raise #UD if X86EMUL_EXCEPTION is
returned. This is only a latent bug at the moment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Eric DeVolder [Wed, 21 Dec 2016 21:37:31 +0000 (13:37 -0800)]
Corrected comment typo "count not" to "could not"
Fix cut-n-paste typo; changed the words "count not" to "could not".
No functional changes.
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Thu, 22 Dec 2016 09:56:34 +0000 (10:56 +0100)]
libacpi: don't build x86-only AML for ARM64 mk_dsdt
Commit d6ac8e22c7c5 ("acpi/x86: define ACPI IO registers for
PVH guests") broke ARM64 build of mk_dsdt.c due to introduction
of XEN_ACPI_CPU_MAP[_LEN] macros that are needed only for x86
guests.
We could fix the build by dealing specifically with those macros
but since post-MADT code is not executed on ARM64 anyway we can
compile it for x86 only.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Mon, 19 Dec 2016 15:02:03 +0000 (15:02 +0000)]
init/FreeBSD: fix xencommons so it can only be launched by Dom0
At the moment the execution of xencommons is gated on the presence of the
privcmd device, but that's not correct, since privcmd is available to all Xen
domains (privileged or unprivileged). Instead of using privcmd use the
xenstored device, which will only be available to the domain that's in charge
of running xenstored, and thus xencommons.
Roger Pau Monne [Mon, 19 Dec 2016 15:02:01 +0000 (15:02 +0000)]
init/FreeBSD: set correct PATH for xl devd
FreeBSD init scripts don't have /usr/local/{bin/sbin} in it's PATH, which
prevents `xl devd` from working properly since hotplug scripts require the set
of xenstore cli tools to be in PATH.
While there also fix the usage of --pidfile, which according to the xl help
doesn't use "=", and add braces around XLDEVD_PIDFILE.
Jan Beulich [Wed, 21 Dec 2016 16:01:58 +0000 (17:01 +0100)]
x86/misc: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 16:01:34 +0000 (17:01 +0100)]
x86/traps: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 16:00:40 +0000 (17:00 +0100)]
x86/HVM: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 15:58:20 +0000 (16:58 +0100)]
x86emul: don't unconditionally clear segment bases upon null selector loads
AMD explicitly documents that namely FS and GS don't have their bases
cleared in that case, and I see no reason why guests may not rely on
that behavior. To facilitate this a new input field (the CPU vendor) is
being added.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 15:57:34 +0000 (16:57 +0100)]
x86emul: some REX related polishing
While there are a few cases where it seems better to open-code REX_*
values, there's one where this clearly is a bad idea. And the SYSEXIT
emulation has no need to look at REX at all, it can simply use op_bytes
instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Praveen Kumar [Wed, 21 Dec 2016 15:53:35 +0000 (16:53 +0100)]
sched: removal of redundant check in Credit
The patch gets rid of a redundant check in csched_vcpu_acct. In fact,
the function is only called from csched_tick, which already checks
that current is not the idle vcpu. The patch also adds an ASSERT to
the same effect, in order to make assumption ( i.e., no calling this
on idle vcpus) even more clear and as a guard for future mis-use.
Jan Beulich [Wed, 21 Dec 2016 15:46:13 +0000 (16:46 +0100)]
x86: force EFLAGS.IF on when exiting to PV guests
Guest kernels modifying instructions in the process of being emulated
for another of their vCPU-s may effect EFLAGS.IF to be cleared upon
next exiting to guest context, by converting the being emulated
instruction to CLI (at the right point in time). Prevent any such bad
effects by always forcing EFLAGS.IF on. And to cover hypothetical other
similar issues, also force EFLAGS.{IOPL,NT,VM} to zero.
This is CVE-2016-10024 / XSA-202.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Boris Ostrovsky [Tue, 20 Dec 2016 08:54:38 +0000 (09:54 +0100)]
acpi/x86: define ACPI IO registers for PVH guests
Define VCPU available map address (used by AML's PRSC method)
and GPE0 CPU hotplug event number. Use these definitions in mk_dsdt
instead hardcoded values.
These definitions will later be used by both the hypervisor and
the toolstack (initially for PVH guests only), thus they are
placed in public headers.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Tue, 20 Dec 2016 08:54:12 +0000 (09:54 +0100)]
x86/pmtimer: move ACPI registers from PMTState to hvm_domain
These registers (pm1a specifically) are not all specific to pm timer
and are accessed by non-pmtimer code (for example, sleep/power button
emulation).
The public name for save state structure is kept as 'pmtimer' to avoid
code churn with the expected changes in migration code. hvm_hw_acpi
name is introduced for internal use but when migration code is updated
hvm_hw_pmtimer will be renamed to hvm_hw_acpi.
No functional changes are introduced.
(While this file is being modified, also add emacs mode style rune)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Haozhong Zhang [Tue, 20 Dec 2016 08:53:39 +0000 (09:53 +0100)]
vvmx: replace vmreturn() by vmsucceed() and vmfail*()
Replace vmreturn() by vmsucceed(), vmfail(), vmfail_valid() and
vmfail_invalid(), which are consistent to the pseudo code on Intel
SDM, and allow to return VM instruction error numbers to L1
hypervisor.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
c/s 08fac63 misused v->domain-arch.paging.gfn_bits as the width of
guest physical address and missed adding PAGE_SHIFT to it when
checking vmxon operand.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Mon, 19 Dec 2016 16:52:42 +0000 (17:52 +0100)]
x86: fix asm() constraint in clear_user()
Commit 2fdf5b2554 ("x86: streamline copying to/from user memory")
wrongly used "g" here, when it obviously needs to be a register.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 19 Dec 2016 10:49:20 +0000 (11:49 +0100)]
x86/SMP: CPU0's scratch mask is needed earlier
When putting together commit 3b61726458 ("x86: introduce and use
scratch CPU mask") I failed to remember that AMD IOMMU setups needs the
scratch mask prior to smp_prepare_cpus() having run. Use a static mask
for the boot CPU instead.
Note that the definition of scratch_cpu0mask could also be put inside a
"NR_CPUS > 2 * BITS_PER_LONG" conditional, but it seems preferable to
me to carry the extra variable in all cases and avoid the #ifdef-ary.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Since VMIDs are related to 2nd stage address translation, it makes more sense
to move the call to p2m_vmid_allocator_init(), which initializes the vmid
allocation bitmap, inside setup_virt_paging(), where 2nd stage address translation
is set up.
Wei Liu [Fri, 16 Dec 2016 15:51:33 +0000 (15:51 +0000)]
libxl: set rc to 0 in init_acpi_config in success path
xc_doamin_getinfo returns >=0 in success path, and if there is no vnode
configured, that rc will be returned to caller, which indicates error.
Fix that by setting rc to 0 in success path.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Wed, 14 Dec 2016 11:05:18 +0000 (11:05 +0000)]
x86/emul: Simplfy L{ES,DS,SS,FS,GS} handling
%ss, %fs and %gs can be calculated by directly masking the opcode. %es and
%ds cant, but the calculation isn't hard.
Use seg rather than dst.val for storing the calculated segment, which is
appropriately typed. Drop the sel local variable entirely and use dst.val
instead. The mode_64bit() check can be repositioned and simplified to drop
the ext check. Replace opencoding of X86EMUL_OKAY.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 16 Dec 2016 13:38:29 +0000 (14:38 +0100)]
x86/HVM: handle_{mmio*,pio}() return value adjustments
Don't ignore their return values. Don't indicate success to callers of
handle_pio() when in fact the domain has been crashed.
Make all three functions return bool. Adjust formatting of switch()
statements being touched anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:37:35 +0000 (14:37 +0100)]
x86/boot: fix build with certain older gcc versions
Despite all attempts so far (ending in commit fecf584294 ["Config.mk:
fix comment for debug option"] adjusting the respective comment),
Config.mk's debug= setting still affects the hypervisor build: CFLAGS
gets -g added there.
xen/arch/x86/boot/build32.mk includes that file, and hence inherits the
setting too. Some gcc versions take -g to create an .eh_frame section
despite -fno-asynchronous-unwind-tables (which instead one would expect
to produce .debug_frame).
In turn, commit 93c0c0287a ("x86/boot: create *.lnk files with linker
script") was - in my understanding - supposed to make sure .text is
first, but apparently it did also not really achieve that effect: Both
reloc.lnk and reloc.bin in the case here ended up with .eh_frame first,
which obviously rendered the whole final binary unusable.
Explicitly suppress generation of any kind of debug info when building
reloc.o.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:36:36 +0000 (14:36 +0100)]
x86emul: reduce CMPXCHG{8,16}B footprint and casting
Re-use an existing stack variable (reducing stack footprint, which also
results in smaller code due to some stack accesses no longer needing a
32-bit displacement), at once using a union instead of casts. Also
switch to rex_prefix based conditionals instead of op_bytes ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:34:34 +0000 (14:34 +0100)]
x86: introduce and use scratch CPU mask
__get_page_type(), so far using an on-stack CPU mask variable, is
involved in recursion when e.g. pinning page tables. This means there
may be up to five instances of the function active at a time, implying
five instances of the (up to 512 bytes large) CPU mask variable. An IRQ
happening at the deepest point of the stack has been observed to cause
a stack overflow with a 4095-pCPU build, when the IRQ handling results
in send_guest_pirq() being called (leading to vcpu_kick() -> ... ->
csched_vcpu_wake() -> __runq_tickle() -> cpumask_raise_softirq(), the
last two of which also have CPU mask variables on their stacks).
Introduce a per-CPU variable instead, which can then be used by any
code never running in IRQ context.
The mask can then also be used by other MMU code as well as by
msi_compose_msg() (and quite likely we'll find further uses down the
road).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:33:43 +0000 (14:33 +0100)]
VT-d: correct dma_msi_set_affinity()
Commit 83cd2038fe ("VT-d: use msi_compose_msg()) together with 15aa6c6748 ("amd iommu: use base platform MSI implementation"),
introducing the use of a per-CPU scratch CPU mask, went too far:
dma_msi_set_affinity() may, at least in theory, be called in
interrupt context, and hence the use of that scratch variable is not
correct.
Since the function overwrites the destination information anyway,
allow msi_compose_msg() to be called with a NULL CPU mask, avoiding
the use of that scratch variable.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:32:51 +0000 (14:32 +0100)]
x86: streamline copying to/from user memory
Their size parameters being "unsigned", there's neither a point for
them returning "unsigned long", nor for any of their (assembly)
arithmetic to involved 64-bit operations on other than addresses.
Take the opportunity and fold __do_clear_user() into its single user
(using qword stores instead of dword ones), name all asm() operands,
and reduce the amount of (redundant) operands.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anshul Makkar [Mon, 12 Dec 2016 14:00:05 +0000 (14:00 +0000)]
xsm: allow relevant permission during migrate and gpu-passthrough.
During guest migrate allow permission to prevent
spurious page faults.
Prevents these errors:
d73: Non-privileged (73) attempt to map I/O space 00000000
Haozhong Zhang [Thu, 15 Dec 2016 10:12:34 +0000 (11:12 +0100)]
nestedhvm: replace VMCX_EADDR by INVALID_PADDR
... because INVALID_PADDR is a more general one.
Suggested-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Haozhong Zhang [Thu, 15 Dec 2016 10:12:06 +0000 (11:12 +0100)]
vvmx: check the operand of L1 vmxon
Check whether the operand of L1 vmxon is a valid VMXON region address
and whether the VMXON region at that address contains a valid revision
ID.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Haozhong Zhang [Thu, 15 Dec 2016 10:11:45 +0000 (11:11 +0100)]
vvmx: return VMfail to L1 if L1 vmxon is executed in VMX operation
According to Intel SDM, section "VMXON - Enter VMX Operation", a
VMfail should be returned to L1 hypervisor if L1 vmxon is executed in
VMX operation, rather than just print a warning message.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Haozhong Zhang [Thu, 15 Dec 2016 10:11:20 +0000 (11:11 +0100)]
vvmx: set vmxon_region_pa of vcpu out of VMX operation to an invalid address
nvmx_handle_vmxon() previously checks whether a vcpu is in VMX
operation by comparing its vmxon_region_pa with GPA 0. However, 0 is
also a valid VMXON region address. If L1 hypervisor had set the VMXON
region address to 0, the check in nvmx_handle_vmxon() will be skipped.
Fix this problem by using an invalid VMXON region address for vcpu
out of VMX operation.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Razvan Cojocaru [Thu, 15 Dec 2016 10:09:03 +0000 (11:09 +0100)]
x86/vm_event: add support for VM_EVENT_REASON_INTERRUPT
Added support for a new event type, VM_EVENT_REASON_INTERRUPT,
which is now fired in a one-shot manner when enabled via the new
VM_EVENT_FLAG_GET_NEXT_INTERRUPT vm_event response flag.
The patch also fixes the behaviour of the xc_hvm_inject_trap()
hypercall, which would lead to non-architectural interrupts
overwriting pending (specifically reinjected) architectural ones.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Acked-by: Julien Grall <julien.grall@arm.com>
Jan Beulich [Thu, 15 Dec 2016 10:07:55 +0000 (11:07 +0100)]
x86/HVM: introduce hvm_get_cpl() and respective hook
... instead of repeating the same code in various places (and getting
it wrong in some of them).
In vmx_inst_check_privilege() also stop open coding
vmx_guest_x86_mode().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Tim Deegan <tim@xen.org>
Ross Lagerwall [Wed, 14 Dec 2016 07:52:00 +0000 (07:52 +0000)]
tools/livepatch: Exit with 2 if a timeout occurs
Exit with 0 for success.
Exit with 1 for an error.
Exit with 2 if the operation should be retried for any reason (e.g. a
timeout or because another operation was in progress).
This allows a program or script driving xen-livepatch to determine if
the operation should be retried without parsing the output.
Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:59 +0000 (07:51 +0000)]
tools/livepatch: Save errno where needed
Fix a number of incorrect uses of errno after an operation that could
set it (e.g. fprintf, close).
Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:58 +0000 (07:51 +0000)]
tools/livepatch: Remove unused struct member
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:57 +0000 (07:51 +0000)]
tools/livepatch: Remove pointless retry loop
The default timeout in the hypervisor for a livepatch operation is 30 ms,
but xen-livepatch currently waits for up to 30 seconds for the operation
to complete. Instead, remove the retry loop and simply wait for 2 * 30 ms
for the operation to complete. The extra period is to account for the
time to actually start the operation.
Furthermore, have xen-livepatch set the hypervisor timeout rather than
relying on the hypervisor default since the tool doesn't know how long
it will be. Use nanosleep rather than usleep since usleep has been
removed from POSIX.1-2008.
Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:56 +0000 (07:51 +0000)]
livepatch: Fix documentation of timeout
The hypervisor expects the timeout from the hypercall to be in
nanoseconds, so document this correctly. Also correctly document
what happens when timeout is set to zero.
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:55 +0000 (07:51 +0000)]
tools/livepatch: Improve output
Improving the output of xen-livepatch, which is currently hard to read,
especially when an error occurs.
Some examples of the changes:
Before:
$ xen-livepatch apply test
Performing apply:. completed
After:
$ xen-livepatch apply test
Applying test:. completed
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:54 +0000 (07:51 +0000)]
tools/livepatch: Set stdout and stderr unbuffered
Using both stdout and stderr interleaved without newlines can result in
strange output when using line buffered mode (e.g. a terminal) or when
fully buffered (e.g. redirected to a file). Set stdout to unbuffered mode
to fix this (stderr is always unbuffered by default).
Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:53 +0000 (07:51 +0000)]
tools/livepatch: Show the correct expected state before action
Somewhat confusingly, before the action has been executed the patch is
expected to be in the "allow" state, not the "expected" state. The
check for this was correct but the subsequent error message was not.
Fix the error message to show this state correctly.
Before:
$ xen-livepatch unload test
test: in wrong state (APPLIED), expected (unknown)
After:
$ xen-livepatch unload test
test: in wrong state (APPLIED), expected (CHECKED)
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Andrew Cooper [Wed, 14 Dec 2016 11:33:17 +0000 (11:33 +0000)]
x86/traps: Correct pagefault handling issues introduced in c/s d5c251c
There are two bugs.
Firstly, the ASSERT(paging_mode_only_log_dirty(d)) can trip when servicing a
hypervisor #PF in the context of an HVM guest, e.g. a copy_to_user() failure
in the shadow pagetable code.
Secondly, the entry conditions paging_fault() were previously guarded on
!paging_mode_external(d) which limited entry to PV contexts, but for both
guest and hypervisor faults. Switching this to paging_mode_log_dirty() opened
it up to HVM contexts as well.
Reinstate the old !paging_mode_external(d) check, as it is actually the
relevent fact, and extend the comment to explicitly state that hypervisor
faults should follow this path.
Inside, we are now guarenteed to be in the context of a PV guest, so can
safely use the assertion about log dirty.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>