Roger Pau Monne [Mon, 19 Dec 2016 15:02:02 +0000 (15:02 +0000)]
init/FreeBSD: remove xendriverdomain_precmd
...because it's empty. While there also rename xendriverdomain_startcmd to
xendriverdomain_start in order to match the nomenclature of the file.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: fix up minor error ] Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Mon, 19 Dec 2016 15:02:01 +0000 (15:02 +0000)]
init/FreeBSD: set correct PATH for xl devd
FreeBSD init scripts don't have /usr/local/{bin/sbin} in it's PATH, which
prevents `xl devd` from working properly since hotplug scripts require the set
of xenstore cli tools to be in PATH.
While there also fix the usage of --pidfile, which according to the xl help
doesn't use "=", and add braces around XLDEVD_PIDFILE.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 16:01:58 +0000 (17:01 +0100)]
x86/misc: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 16:01:34 +0000 (17:01 +0100)]
x86/traps: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 16:00:40 +0000 (17:00 +0100)]
x86/HVM: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 15:58:20 +0000 (16:58 +0100)]
x86emul: don't unconditionally clear segment bases upon null selector loads
AMD explicitly documents that namely FS and GS don't have their bases
cleared in that case, and I see no reason why guests may not rely on
that behavior. To facilitate this a new input field (the CPU vendor) is
being added.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 21 Dec 2016 15:57:34 +0000 (16:57 +0100)]
x86emul: some REX related polishing
While there are a few cases where it seems better to open-code REX_*
values, there's one where this clearly is a bad idea. And the SYSEXIT
emulation has no need to look at REX at all, it can simply use op_bytes
instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Praveen Kumar [Wed, 21 Dec 2016 15:53:35 +0000 (16:53 +0100)]
sched: removal of redundant check in Credit
The patch gets rid of a redundant check in csched_vcpu_acct. In fact,
the function is only called from csched_tick, which already checks
that current is not the idle vcpu. The patch also adds an ASSERT to
the same effect, in order to make assumption ( i.e., no calling this
on idle vcpus) even more clear and as a guard for future mis-use.
Jan Beulich [Wed, 21 Dec 2016 15:46:13 +0000 (16:46 +0100)]
x86: force EFLAGS.IF on when exiting to PV guests
Guest kernels modifying instructions in the process of being emulated
for another of their vCPU-s may effect EFLAGS.IF to be cleared upon
next exiting to guest context, by converting the being emulated
instruction to CLI (at the right point in time). Prevent any such bad
effects by always forcing EFLAGS.IF on. And to cover hypothetical other
similar issues, also force EFLAGS.{IOPL,NT,VM} to zero.
This is CVE-2016-10024 / XSA-202.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Boris Ostrovsky [Tue, 20 Dec 2016 08:54:38 +0000 (09:54 +0100)]
acpi/x86: define ACPI IO registers for PVH guests
Define VCPU available map address (used by AML's PRSC method)
and GPE0 CPU hotplug event number. Use these definitions in mk_dsdt
instead hardcoded values.
These definitions will later be used by both the hypervisor and
the toolstack (initially for PVH guests only), thus they are
placed in public headers.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Tue, 20 Dec 2016 08:54:12 +0000 (09:54 +0100)]
x86/pmtimer: move ACPI registers from PMTState to hvm_domain
These registers (pm1a specifically) are not all specific to pm timer
and are accessed by non-pmtimer code (for example, sleep/power button
emulation).
The public name for save state structure is kept as 'pmtimer' to avoid
code churn with the expected changes in migration code. hvm_hw_acpi
name is introduced for internal use but when migration code is updated
hvm_hw_pmtimer will be renamed to hvm_hw_acpi.
No functional changes are introduced.
(While this file is being modified, also add emacs mode style rune)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Haozhong Zhang [Tue, 20 Dec 2016 08:53:39 +0000 (09:53 +0100)]
vvmx: replace vmreturn() by vmsucceed() and vmfail*()
Replace vmreturn() by vmsucceed(), vmfail(), vmfail_valid() and
vmfail_invalid(), which are consistent to the pseudo code on Intel
SDM, and allow to return VM instruction error numbers to L1
hypervisor.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
c/s 08fac63 misused v->domain-arch.paging.gfn_bits as the width of
guest physical address and missed adding PAGE_SHIFT to it when
checking vmxon operand.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Mon, 19 Dec 2016 16:52:42 +0000 (17:52 +0100)]
x86: fix asm() constraint in clear_user()
Commit 2fdf5b2554 ("x86: streamline copying to/from user memory")
wrongly used "g" here, when it obviously needs to be a register.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 19 Dec 2016 10:49:20 +0000 (11:49 +0100)]
x86/SMP: CPU0's scratch mask is needed earlier
When putting together commit 3b61726458 ("x86: introduce and use
scratch CPU mask") I failed to remember that AMD IOMMU setups needs the
scratch mask prior to smp_prepare_cpus() having run. Use a static mask
for the boot CPU instead.
Note that the definition of scratch_cpu0mask could also be put inside a
"NR_CPUS > 2 * BITS_PER_LONG" conditional, but it seems preferable to
me to carry the extra variable in all cases and avoid the #ifdef-ary.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Since VMIDs are related to 2nd stage address translation, it makes more sense
to move the call to p2m_vmid_allocator_init(), which initializes the vmid
allocation bitmap, inside setup_virt_paging(), where 2nd stage address translation
is set up.
Wei Liu [Fri, 16 Dec 2016 15:51:33 +0000 (15:51 +0000)]
libxl: set rc to 0 in init_acpi_config in success path
xc_doamin_getinfo returns >=0 in success path, and if there is no vnode
configured, that rc will be returned to caller, which indicates error.
Fix that by setting rc to 0 in success path.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Wed, 14 Dec 2016 11:05:18 +0000 (11:05 +0000)]
x86/emul: Simplfy L{ES,DS,SS,FS,GS} handling
%ss, %fs and %gs can be calculated by directly masking the opcode. %es and
%ds cant, but the calculation isn't hard.
Use seg rather than dst.val for storing the calculated segment, which is
appropriately typed. Drop the sel local variable entirely and use dst.val
instead. The mode_64bit() check can be repositioned and simplified to drop
the ext check. Replace opencoding of X86EMUL_OKAY.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 16 Dec 2016 13:38:29 +0000 (14:38 +0100)]
x86/HVM: handle_{mmio*,pio}() return value adjustments
Don't ignore their return values. Don't indicate success to callers of
handle_pio() when in fact the domain has been crashed.
Make all three functions return bool. Adjust formatting of switch()
statements being touched anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:37:35 +0000 (14:37 +0100)]
x86/boot: fix build with certain older gcc versions
Despite all attempts so far (ending in commit fecf584294 ["Config.mk:
fix comment for debug option"] adjusting the respective comment),
Config.mk's debug= setting still affects the hypervisor build: CFLAGS
gets -g added there.
xen/arch/x86/boot/build32.mk includes that file, and hence inherits the
setting too. Some gcc versions take -g to create an .eh_frame section
despite -fno-asynchronous-unwind-tables (which instead one would expect
to produce .debug_frame).
In turn, commit 93c0c0287a ("x86/boot: create *.lnk files with linker
script") was - in my understanding - supposed to make sure .text is
first, but apparently it did also not really achieve that effect: Both
reloc.lnk and reloc.bin in the case here ended up with .eh_frame first,
which obviously rendered the whole final binary unusable.
Explicitly suppress generation of any kind of debug info when building
reloc.o.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:36:36 +0000 (14:36 +0100)]
x86emul: reduce CMPXCHG{8,16}B footprint and casting
Re-use an existing stack variable (reducing stack footprint, which also
results in smaller code due to some stack accesses no longer needing a
32-bit displacement), at once using a union instead of casts. Also
switch to rex_prefix based conditionals instead of op_bytes ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:34:34 +0000 (14:34 +0100)]
x86: introduce and use scratch CPU mask
__get_page_type(), so far using an on-stack CPU mask variable, is
involved in recursion when e.g. pinning page tables. This means there
may be up to five instances of the function active at a time, implying
five instances of the (up to 512 bytes large) CPU mask variable. An IRQ
happening at the deepest point of the stack has been observed to cause
a stack overflow with a 4095-pCPU build, when the IRQ handling results
in send_guest_pirq() being called (leading to vcpu_kick() -> ... ->
csched_vcpu_wake() -> __runq_tickle() -> cpumask_raise_softirq(), the
last two of which also have CPU mask variables on their stacks).
Introduce a per-CPU variable instead, which can then be used by any
code never running in IRQ context.
The mask can then also be used by other MMU code as well as by
msi_compose_msg() (and quite likely we'll find further uses down the
road).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:33:43 +0000 (14:33 +0100)]
VT-d: correct dma_msi_set_affinity()
Commit 83cd2038fe ("VT-d: use msi_compose_msg()) together with 15aa6c6748 ("amd iommu: use base platform MSI implementation"),
introducing the use of a per-CPU scratch CPU mask, went too far:
dma_msi_set_affinity() may, at least in theory, be called in
interrupt context, and hence the use of that scratch variable is not
correct.
Since the function overwrites the destination information anyway,
allow msi_compose_msg() to be called with a NULL CPU mask, avoiding
the use of that scratch variable.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 16 Dec 2016 13:32:51 +0000 (14:32 +0100)]
x86: streamline copying to/from user memory
Their size parameters being "unsigned", there's neither a point for
them returning "unsigned long", nor for any of their (assembly)
arithmetic to involved 64-bit operations on other than addresses.
Take the opportunity and fold __do_clear_user() into its single user
(using qword stores instead of dword ones), name all asm() operands,
and reduce the amount of (redundant) operands.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anshul Makkar [Mon, 12 Dec 2016 14:00:05 +0000 (14:00 +0000)]
xsm: allow relevant permission during migrate and gpu-passthrough.
During guest migrate allow permission to prevent
spurious page faults.
Prevents these errors:
d73: Non-privileged (73) attempt to map I/O space 00000000
Haozhong Zhang [Thu, 15 Dec 2016 10:12:34 +0000 (11:12 +0100)]
nestedhvm: replace VMCX_EADDR by INVALID_PADDR
... because INVALID_PADDR is a more general one.
Suggested-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Haozhong Zhang [Thu, 15 Dec 2016 10:12:06 +0000 (11:12 +0100)]
vvmx: check the operand of L1 vmxon
Check whether the operand of L1 vmxon is a valid VMXON region address
and whether the VMXON region at that address contains a valid revision
ID.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Haozhong Zhang [Thu, 15 Dec 2016 10:11:45 +0000 (11:11 +0100)]
vvmx: return VMfail to L1 if L1 vmxon is executed in VMX operation
According to Intel SDM, section "VMXON - Enter VMX Operation", a
VMfail should be returned to L1 hypervisor if L1 vmxon is executed in
VMX operation, rather than just print a warning message.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Haozhong Zhang [Thu, 15 Dec 2016 10:11:20 +0000 (11:11 +0100)]
vvmx: set vmxon_region_pa of vcpu out of VMX operation to an invalid address
nvmx_handle_vmxon() previously checks whether a vcpu is in VMX
operation by comparing its vmxon_region_pa with GPA 0. However, 0 is
also a valid VMXON region address. If L1 hypervisor had set the VMXON
region address to 0, the check in nvmx_handle_vmxon() will be skipped.
Fix this problem by using an invalid VMXON region address for vcpu
out of VMX operation.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Razvan Cojocaru [Thu, 15 Dec 2016 10:09:03 +0000 (11:09 +0100)]
x86/vm_event: add support for VM_EVENT_REASON_INTERRUPT
Added support for a new event type, VM_EVENT_REASON_INTERRUPT,
which is now fired in a one-shot manner when enabled via the new
VM_EVENT_FLAG_GET_NEXT_INTERRUPT vm_event response flag.
The patch also fixes the behaviour of the xc_hvm_inject_trap()
hypercall, which would lead to non-architectural interrupts
overwriting pending (specifically reinjected) architectural ones.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Acked-by: Julien Grall <julien.grall@arm.com>
Jan Beulich [Thu, 15 Dec 2016 10:07:55 +0000 (11:07 +0100)]
x86/HVM: introduce hvm_get_cpl() and respective hook
... instead of repeating the same code in various places (and getting
it wrong in some of them).
In vmx_inst_check_privilege() also stop open coding
vmx_guest_x86_mode().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Tim Deegan <tim@xen.org>
Ross Lagerwall [Wed, 14 Dec 2016 07:52:00 +0000 (07:52 +0000)]
tools/livepatch: Exit with 2 if a timeout occurs
Exit with 0 for success.
Exit with 1 for an error.
Exit with 2 if the operation should be retried for any reason (e.g. a
timeout or because another operation was in progress).
This allows a program or script driving xen-livepatch to determine if
the operation should be retried without parsing the output.
Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:59 +0000 (07:51 +0000)]
tools/livepatch: Save errno where needed
Fix a number of incorrect uses of errno after an operation that could
set it (e.g. fprintf, close).
Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:58 +0000 (07:51 +0000)]
tools/livepatch: Remove unused struct member
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:57 +0000 (07:51 +0000)]
tools/livepatch: Remove pointless retry loop
The default timeout in the hypervisor for a livepatch operation is 30 ms,
but xen-livepatch currently waits for up to 30 seconds for the operation
to complete. Instead, remove the retry loop and simply wait for 2 * 30 ms
for the operation to complete. The extra period is to account for the
time to actually start the operation.
Furthermore, have xen-livepatch set the hypervisor timeout rather than
relying on the hypervisor default since the tool doesn't know how long
it will be. Use nanosleep rather than usleep since usleep has been
removed from POSIX.1-2008.
Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:56 +0000 (07:51 +0000)]
livepatch: Fix documentation of timeout
The hypervisor expects the timeout from the hypercall to be in
nanoseconds, so document this correctly. Also correctly document
what happens when timeout is set to zero.
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:55 +0000 (07:51 +0000)]
tools/livepatch: Improve output
Improving the output of xen-livepatch, which is currently hard to read,
especially when an error occurs.
Some examples of the changes:
Before:
$ xen-livepatch apply test
Performing apply:. completed
After:
$ xen-livepatch apply test
Applying test:. completed
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:54 +0000 (07:51 +0000)]
tools/livepatch: Set stdout and stderr unbuffered
Using both stdout and stderr interleaved without newlines can result in
strange output when using line buffered mode (e.g. a terminal) or when
fully buffered (e.g. redirected to a file). Set stdout to unbuffered mode
to fix this (stderr is always unbuffered by default).
Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Ross Lagerwall [Wed, 14 Dec 2016 07:51:53 +0000 (07:51 +0000)]
tools/livepatch: Show the correct expected state before action
Somewhat confusingly, before the action has been executed the patch is
expected to be in the "allow" state, not the "expected" state. The
check for this was correct but the subsequent error message was not.
Fix the error message to show this state correctly.
Before:
$ xen-livepatch unload test
test: in wrong state (APPLIED), expected (unknown)
After:
$ xen-livepatch unload test
test: in wrong state (APPLIED), expected (CHECKED)
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Andrew Cooper [Wed, 14 Dec 2016 11:33:17 +0000 (11:33 +0000)]
x86/traps: Correct pagefault handling issues introduced in c/s d5c251c
There are two bugs.
Firstly, the ASSERT(paging_mode_only_log_dirty(d)) can trip when servicing a
hypervisor #PF in the context of an HVM guest, e.g. a copy_to_user() failure
in the shadow pagetable code.
Secondly, the entry conditions paging_fault() were previously guarded on
!paging_mode_external(d) which limited entry to PV contexts, but for both
guest and hypervisor faults. Switching this to paging_mode_log_dirty() opened
it up to HVM contexts as well.
Reinstate the old !paging_mode_external(d) check, as it is actually the
relevent fact, and extend the comment to explicitly state that hypervisor
faults should follow this path.
Inside, we are now guarenteed to be in the context of a PV guest, so can
safely use the assertion about log dirty.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Ross Lagerwall [Wed, 14 Dec 2016 11:12:01 +0000 (11:12 +0000)]
x86: Use ACPI reboot method for Dell OptiPlex 9020
When EFI booting the Dell OptiPlex 9020, it sometimes GP faults in the
EFI runtime instead of rebooting. Quirk this hardware to use the ACPI
reboot method instead.
dmidecode info:
BIOS Information
Vendor: Dell Inc.
Version: A15
Release Date: 11/08/2015
System Information
Manufacturer: Dell Inc.
Product Name: OptiPlex 9020
Version: 00
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Tue, 13 Dec 2016 15:38:06 +0000 (16:38 +0100)]
stubdom: modify ioemu linkfarm only if necessary
Several stubdom libraries are being rebuilt each time a top level make
is called as they depend on stubdom/ioemu/linkfarm.stamp which is
depending on tools/qemu-xen-traditional-dir. Unfortunately this
directory is modified by each "make tools" call.
This can be avoided by writing stubdom/ioemu/linkfarm.stamp only if
a source file beneath tools/qemu-xen-traditional-dir has been added
or removed.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Jan Beulich [Wed, 14 Dec 2016 09:11:08 +0000 (10:11 +0100)]
x86emul: MOVNTI does not allow REP prefixes
Just like 66, prefixes F3 and F2 cause #UD.
Also adjust a related comment, which in its previous wording was
misleading (as in 16-bit mode there would nothing be undone when
adjusting operand size from 2 to 4).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 14 Dec 2016 09:08:22 +0000 (10:08 +0100)]
x86emul: CMPXCHG{8,16}B ignore prefixes
This removes 0F C7 from the list of two-byte opcodes treating prefixes
66, F3, and F2 as opcode extensions. We better manually handle this in
the opcode specific code:
- CMPXCHG8B ignores all these prefixes (its handling is being adjusted
accordingly, with a respective test case added as well, to avoid
re-introducing the subject of XSA-200),
- RDRAND/RDSEED (support to be added subsequently) honor 66, but treat
F3 and F2 as opcode extensions (resolving to RDPID in the RDSEED
case, which in turn ignores 66).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 14 Dec 2016 08:54:03 +0000 (09:54 +0100)]
x86/PV: use generic emulator for privileged instruction handling
There's a new emulator return code being added to allow bypassing
certain operations (see the code comment).
Another small tweak to the emulator is to single iteration handling
of INS and OUTS: Since we don't want to handle any other memory access
instructions, we want these to be handled by the rep_ins() / rep_outs()
hooks here too.
And then long-mode related bits now get hidden from the guest. This
should have been that way from the beginning, but becomes a requirement
now as the emulator's in_longmode() needs this to reflect guest view.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 14 Dec 2016 08:52:35 +0000 (09:52 +0100)]
x86emul: generalize exception handling for rep_* hooks
If any of those hooks returns X86EMUL_EXCEPTION, some register state
should still get updated if some iterations have been performed (but
the rIP update will get suppressed if not all of them did get handled).
This updating is done by register_address_increment() and
__put_rep_prefix() (which hence must no longer be bypassed). As a
result put_rep_prefix() can then skip most of the writeback, but needs
to ensure proper completion of the executed instruction.
While on the HVM side the VA -> LA -> PA translation process ensures
that an exception would be raised on the first iteration only, doing so
would unduly complicate the PV side code about to be added.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Jan Beulich [Wed, 14 Dec 2016 08:51:40 +0000 (09:51 +0100)]
x86/32on64: use generic instruction decoding for call gate emulation
... instead of custom handling. Note that we can't use generic
emulation, as the emulator's far branch support is rather rudimentary
at this point in time.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Tue, 13 Dec 2016 17:15:40 +0000 (17:15 +0000)]
firmware/rombios: fix after update to libacpi
Fix a build breakage after the libacpi changes, this is due to rombios using the
libacpi headers in order to parse the ACPI tables.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 5 Dec 2016 11:29:12 +0000 (11:29 +0000)]
x86/traps: Adjust paged-guest handling in the PV pagefault path
PV guests necessarily can't be external, as Xen must steal address space from
them. Pagefaults for HVM guests are handled by {vmx,svm}_vmexit_handler() and
don't enter the PV fixup_page_fault() path. Therefore, the first call to
paging_fault() is dead, and dropped.
Logdirty mode is now the only paging mode we should ever find a PV guest with,
so add a new predicate and assertion to this fact.
Drop the final reference to paging_mode_external(). It is more accurately now
only for logdirty guests.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 5 Dec 2016 11:35:32 +0000 (11:35 +0000)]
x86/shadow: Drop all emulation for PV vcpus
Emulation is only performed for paging_mode_refcount() domains, which in
practice means HVM domains only.
Drop the PV emulation code. As it always set addr_side and sp_size to
BITS_PER_LONG, it can't have worked correctly for PV guests running in a
different mode to Xen.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Luwei Kang [Tue, 13 Dec 2016 13:21:26 +0000 (14:21 +0100)]
x86/VPMU: clear the overflow status of which counter happened to overflow
Just set the corresponding bits of counters which happened to overflow,
rather than setting all the available bits of IA32_PERF_GLOBAL_OVF_CTRL
when pmu interrupt happened.
Signed-off-by: Luwei Kang <luwei.kang@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 13 Dec 2016 13:20:34 +0000 (14:20 +0100)]
libacpi: update FADT layout to support version 5
Update the structure of the FADT table to version 5, and use that version for
PVHv2 guests. Note that HVM guests will continue to use FADT 4. In order to do
this, add a new field to acpi_config that contains the ACPI revision to use by
libacpi. Note that currently this only applies to the FADT.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Thu, 8 Dec 2016 12:09:54 +0000 (12:09 +0000)]
tools/fuzz: introduce x86 instruction emulator target
Instruction emulator fuzzing code is from code previous written by
Andrew and George. Adapt it to llvm fuzzer and hook up the build system.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Wed, 7 Dec 2016 11:28:56 +0000 (11:28 +0000)]
tools/fuzz: introduce libelf target
Source code and Makefile to fuzz libelf in Google's oss-fuzz
infrastructure.
Introduce FUZZ_NO_LIBXC in libelf-private.h. That macro will be set when
compiling libelf fuzzer target because libxc is not required in libelf
fuzzing.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 5 Dec 2016 11:23:00 +0000 (11:23 +0000)]
x86/shadow: Misc minor cleanup
* Move the #ifdefary inside sh_audit_gw() to avoid needing the else clause.
* The walk_t parameter is only ever read, so make it const.
* Use mfn_eq() rather than opencoding it.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Mon, 12 Dec 2016 18:28:40 +0000 (18:28 +0000)]
xen: Fix determining when domain creation is complete
d->creation_finished is used in several places alter behaviour depending on
whether the domain is being created, or is already running.
However, there is a latent bug if a toolstack component makes a pair of
pause/unpause calls, where creation will be considered finished prematurely.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Mon, 12 Dec 2016 18:12:54 +0000 (18:12 +0000)]
x86/hvm: Fix HVMOP_get_param when skipping creating the default ioreq server
c/s e7dabe5 "x86/hvm: don't unconditionally create a default ioreq server"
added a break statement, but the logic previously depended on falling through
into the default case to fill in the value the caller asked for.
This causes the sending migration code to put a junk PARAM into the stream,
and the receiving side to fail to zero the IOREQ pages, causing QEMU to object
when it finds stale requests while starting up.
Reorder the code so it more clearly falls through into the default case.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
pa_range_info has only 8 elements and is accessed using pa_range as
index. pa_range is initialized to 16, potentially causing out of bound
access errors. Fix the issue by checking that pa_range is not greater
than the size of the array. Remove the now superfluous pa_range&0x8
check.
HorizontalResolution and VerticalResolution are 32bit, while size is
64bit. As it stands multiplications are evaluated with 32bit arithmetic,
which could overflow. Cast HorizontalResolution to 64bit to avoid that.
Jan Beulich [Mon, 12 Dec 2016 16:48:49 +0000 (17:48 +0100)]
console: allow log level threshold adjustments
... from serial console so that one doesn't always need to reboot to
see more / less messages.
Note that upper thresholds are sticky, i.e. while they get adjusted
upwards when the lower threshold would otherwise end up above the upper
one, they don't get adjusted when reducing the lower one. Full
flexibility is available only via a future sysctl interface.
Note further that (meaningless) large threshold values aren't being
rejected, for the sake of not adding more checks to the code than are
really necessary for safe operation.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>