libxl.c: switch to LOG*D use (refactored messages)
Use LOG*D functions to output the domain ID in logs as much as
possible. This will help consumer code sorting the logs by
domain.
This commit, only changes LOG*() into LOG*D() and adds a domid
parameter. The message of these LOG* calls has been altered to
remove the domain id from it since it is already contained in
the output log string.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
libxl: add LIBXL_LOGD_* and LOG*D function families.
These functions should be used to log messages when the domain
id is known. libxl__log will now prepend the log message by
"Domain %PRIu32:" if the domain id is a valid one.
This aims at helping consumers filter logs on domain IDs.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Tue, 8 Nov 2016 09:09:41 +0000 (10:09 +0100)]
stubdom: remove EXTRA_CFLAGS meant for building tools
When building stubdoms EXTRA_CFLAGS_XEN_TOOLS and
EXTRA_CFLAGS_QEMU_TRADITIONAL should be cleared as they might contain
flags not suitable for all stubdom builds (e.g. "-m64" often to be
found in $RPM_OPT_FLAGS will break building 32 bit stubdoms).
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Fri, 4 Nov 2016 09:53:29 +0000 (10:53 +0100)]
stubdom: simplify and fix Makefile
The stubdom Makefile is setting up links for various libraries. This
is done only once when qemu links are created and each library's links
are updated/created only if the link for the Makefile of the library
isn't already existing. In case a source is added to one library after
doing the first make of stubdom the new source won't be linked by a
new call of make.
Instead of testing the existence of the Makefile link use a make
dependency which will catch changes of the linked Makefile, too.
At the same time don't repeat the same link pattern 7 times but use a
make macro to do the linking.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
[ wei: move "touch $@" to correct location in do_links ] Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Thu, 13 Oct 2016 14:33:15 +0000 (15:33 +0100)]
flask: add gcov_op check
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Wei Liu [Thu, 29 Sep 2016 20:10:53 +0000 (21:10 +0100)]
gcov: add new interface and new formats support
A new sysctl interface for passing gcov data back to userspace. The new
interface uses a customised record file format. The new sysctl reuses
original sysctl number but renames the op to gcov_op.
Formats starting from gcc version 3.4 are supported. The code is
rewritten so that a new format can be easily added in the future.
Version specific code is grouped into different files. The format one
needs to use can be picked via Kconfig. The default format is the newest
one.
Userspace programs to handle extracted data will come in a later patch.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Thu, 29 Sep 2016 17:38:30 +0000 (18:38 +0100)]
xen, tools: rip out old gcov implementation
The internal data structure and code are tied to an old gcov format.
It's easier to just redo everything from scratch.
Salvage the reusable parts: leave xen/common/gcov and an empty Makefile
there, leave gcov support in Kconfig but mark that as broken. Also
reserve the sysctl number for later use (but delete relevant sysctl
structures).
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 1 Jul 2016 17:29:46 +0000 (18:29 +0100)]
x86/emul: Use system-segment relative memory accesses
With hvm_virtual_to_linear_addr() capable of doing proper system-segment
relative memory accesses, avoid open-coding the address and limit calculations
locally.
When a table spans the 4GB boundary (32bit) or non-canonical boundary (64bit),
segmentation errors are now raised. Previously, the use of x86_seg_none
resulted in segmentation being skipped, and the linear address being truncated
through the pagewalk, and possibly coming out valid on the far side.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <JBeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Thu, 30 Jun 2016 22:55:33 +0000 (23:55 +0100)]
x86/emul: Prepare to allow use of system segments for memory references
All system segments (GDT/IDT/LDT and TR) describe a linear address and limit,
and act similarly to user segments. However all current uses of these tables
in the emulator opencode the address calculations and limit checks. In
particular, no care is taken for access which wrap around the 4GB or
non-canonical boundaries.
Alter hvm_virtual_to_linear_addr() to cope with performing segmentation checks
on system segments. This involves restricting access checks in the 32bit case
to user segments only, and adding presence/limit checks in the 64bit case.
When suffering a segmentation fault for a system segments, return
X86EMUL_EXCEPTION but leave the fault injection to the caller. The fault type
depends on the higher level action being performed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <JBeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Tue, 1 Nov 2016 20:02:35 +0000 (20:02 +0000)]
x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
to inject the pagefault themselves.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Wed, 23 Nov 2016 11:11:23 +0000 (11:11 +0000)]
x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear()
The functions use linear addresses, not virtual addresses, as no segmentation
is used. (Lots of other code in Xen makes this mistake.)
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Wed, 2 Nov 2016 11:49:25 +0000 (11:49 +0000)]
x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Tue, 1 Nov 2016 20:49:25 +0000 (20:49 +0000)]
x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer
which is filled with pagefault information should one occur.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Fri, 25 Nov 2016 15:20:44 +0000 (15:20 +0000)]
x86/shadow: Avoid raising faults behind the emulators back
Use x86_emul_{hw_exception,pagefault}() rather than
{pv,hvm}_inject_page_fault() and hvm_inject_hw_exception() to cause raised
faults to be known to the emulator. This requires altering the callers of
x86_emulate() to properly re-inject the event.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 24 Nov 2016 18:18:36 +0000 (18:18 +0000)]
x86/pv: Avoid raising faults behind the emulators back
Use x86_emul_pagefault() rather than pv_inject_page_fault() to cause raised
pagefaults to be known to the emulator. This requires altering the callers of
x86_emulate() to properly re-inject the event.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 1 Nov 2016 19:50:47 +0000 (19:50 +0000)]
x86/emul: Avoid raising faults behind the emulators back
Introduce a new x86_emul_pagefault() similar to x86_emul_hw_exception(), and
use this instead of hvm_inject_page_fault() from emulation codepaths.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 26 Sep 2016 16:13:14 +0000 (16:13 +0000)]
x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS
Intel VT-x and AMD SVM provide access to the full segment descriptor cache via
fields in the VMCB/VMCS. However, the bits which are actually checked by
hardware and preserved across vmentry/exit are inconsistent, and the vendor
accessor functions perform inconsistent modification to the raw values.
Convert {svm,vmx}_{get,set}_segment_register() into raw accessors, and alter
hvm_{get,set}_segment_register() to cook the values consistently. This allows
the common emulation code to better rely on finding architecturally-expected
values.
While moving the code performing the cooking, fix the %ss.db quirk. A NULL
selector is indicated by .p being clear, not the value of the .type field.
This does cause some functional changes because of the modifications being
applied uniformly. A side effect of this fixes latent bugs where
vmx_set_segment_register() didn't correctly fix up .G for segments, and
inconsistent fixing up of the GDTR/IDTR limits.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Tue, 27 Sep 2016 17:21:20 +0000 (18:21 +0100)]
x86/vmx: Use hvm_{get,set}_segment_register() rather than vmx_{get,set}_segment_register()
No functional change at this point, but this is a prerequisite for forthcoming
functional changes.
Make vmx_get_segment_register() private to vmx.c like all the other Vendor
get/set functions.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Mon, 7 Nov 2016 13:14:03 +0000 (13:14 +0000)]
x86/emul: Rework emulator event injection
The emulator needs to gain an understanding of interrupts and exceptions
generated by its actions.
Move hvm_emulate_ctxt.{exn_pending,trap} into struct x86_emulate_ctxt so they
are visible to the emulator. This removes the need for the
inject_{hw_exception,sw_interrupt}() hooks, which are dropped and replaced
with x86_emul_{hw_exception,software_event,reset_event}() instead.
For exceptions raised by x86_emulate() itself (rather than its callbacks), the
shadow pagetable and PV uses of x86_emulate() previously failed with
X86EMUL_UNHANDLEABLE due to the lack of inject_*() hooks.
This behaviour has changed, and such cases will now return X86EMUL_EXCEPTION
with event_pending set. Until the callers of x86_emulate() have been updated
to inject events back into the guest, divert the event_pending case back into
the X86EMUL_UNHANDLEABLE path to maintain the same guest-visible behaviour.
No overall functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 2 Nov 2016 15:59:49 +0000 (15:59 +0000)]
x86/emul: Remove opencoded exception generation
Introduce generate_exception() for unconditional exception generation, and
replace existing uses. Both generate_exception() and generate_exception_if()
are updated to make their error code parameters optional, which removes the
use of the -1 sentinal.
The ioport_access_check() check loses the presence check for %tr, as the x86
architecture has no concept of a non-usable task register.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <JBeulich@suse.com>
Andrew Cooper [Tue, 29 Nov 2016 17:56:17 +0000 (17:56 +0000)]
x86/emul: Implement singlestep as a retire flag
The behaviour of singlestep is to raise #DB after the instruction has been
completed, but implementing it with inject_hw_exception() causes x86_emulate()
to return X86EMUL_EXCEPTION, despite succesfully completing execution of the
instruction, including register writeback.
Instead, use a retire flag to indicate singlestep, which causes x86_emulate()
to return X86EMUL_OKAY.
Update all callers of x86_emulate() to use the new retire flag. This fixes
the behaviour of singlestep for shadow pagetable updates and mmcfg/mmio_ro
intercepts, which previously discarded the exception.
With this change, all uses of X86EMUL_EXCEPTION from x86_emulate() are
believed to have strictly fault semantics.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 29 Nov 2016 11:45:41 +0000 (11:45 +0000)]
x86/emul: Always use fault semantics for software events
The common case is already using fault semantics out of x86_emulate(), as that
is how VT-x/SVM expects to inject the event (given suitable hardware support).
However, x86_emulate() returning X86EMUL_EXCEPTION and also completing a
register writeback is problematic for callers.
Switch the logic to always using fault semantics, and leave svm_inject_trap()
to fix up %eip if necessary.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 29 Nov 2016 18:46:56 +0000 (18:46 +0000)]
x86/emul: Provide a wrapper to x86_emulate() to ASSERT() certain behaviour
In debug builds, confirm that some properties of x86_emulate()'s behaviour
actually hold. The first property, fixed in a previous change, is that retire
flags are only ever set in the X86EMUL_OKAY case.
While adjusting the userspace test harness to cope with ASSERT() in
x86_emulate.h, fix a build problem introduced in c/s 122dd9575c7 "x86emul:
in_longmode() should not ignore ->read_msr() errors" by providing an
implementation of likely()/unlikely().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 29 Nov 2016 18:35:46 +0000 (18:35 +0000)]
x86/emul: Correct the behaviour of pop %ss and interrupt shadowing
The mov_ss retire flag should only be set once load_seg() has returned
success. In particular, it should not be set if an exception occured when
trying to load %ss.
_hvm_emulate_one(), currently the sole user of mov_ss, only consideres it in
the case that x86_emulate() returns X86EMUL_OKAY, so this bug isn't actually
exposed to guests.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 29 Nov 2016 17:55:21 +0000 (17:55 +0000)]
x86/emul: Clean up the naming of the retire union
Rename byte to raw, as the field being a single byte long is an implementation
detail. Make the bitfields part of an anonymous struct to remove the .flags
qualifier. Change the types of the flags to being booleans, to match their
use.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
To help with event injection improvements for the PV uses of x86_emulate(),
implement a event injection API which matches its hvm counterpart.
This is started with taking do_guest_trap() and modifying its calling API to
pv_inject_event(), subsequentally implementing the former in terms of the
latter.
The existing propagate_page_fault() is fairly similar to
pv_inject_page_fault(), although it has a return value. Only a single caller
makes use of the return value, and non-NULL is only returned if the passed cr2
is non-canonical. Opencode this single case in
handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
void.
The call to reserved_bit_page_fault() in propagate_page_fault() was
conceptually wrong to start with. Complaining about reserved bits should be
part of handling the pagefault itself, not part of injecting a pagefault into
the guest. It is therefore moved ahead of the injection call in
do_page_fault() to compensate.
The remaining #PF specific bits are moved into pv_inject_event(), and
pv_inject_page_fault() is implemented as a static inline wrapper.
No practical change from a guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 7 Nov 2016 13:14:03 +0000 (13:14 +0000)]
x86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC
and move it to live with the other x86_event infrastructure in x86_emulate.h.
Switch it and x86_event.error_code to being signed, matching the rest of the
code.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 7 Nov 2016 13:14:03 +0000 (13:14 +0000)]
x86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure
The x86 emulator needs to gain an understanding of interrupts and exceptions
generated by its actions. The naming choice is to match both the Intel and
AMD terms, and to avoid 'trap' specifically as it has an architectural meaning
different to its current usage.
While making this change, make other changes for consistency
* Rename *_trap() infrastructure to *_event()
* Rename trapnr/trap parameters to vector
* Convert hvm_inject_hw_exception() and hvm_inject_page_fault() to being
static inlines, as they are only thin wrappers around hvm_inject_event()
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 23 Nov 2016 13:34:52 +0000 (13:34 +0000)]
x86/emul: Simplfy emulation state setup
The current code to set up emulation state is ad-hoc and error prone.
* Consistently zero all emulation state structures.
* Avoid explicitly initialising some state to 0.
* Explicitly identify all input and output state in x86_emulate_ctxt. This
involves rearanging some fields.
* Have x86_decode() explicitly initalise all output state at its start.
While making the above changes, two minor tweaks:
* Move the calculation of hvmemul_ctxt->ctxt.swint_emulate from
_hvm_emulate_one() to hvm_emulate_init_once(). It doesn't need
recalculating for each instruction.
* Change force_writeback to being a boolean, to match its use.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Thu, 24 Nov 2016 18:31:34 +0000 (18:31 +0000)]
x86/emul: Drop X86EMUL_CMPXCHG_FAILED
X86EMUL_CMPXCHG_FAILED was introduced in c/s d430aae25 in 2005. Even at the
time it alised what is now X86EMUL_RETRY (as well as what is now
X86EMUL_EXCEPTION). I am not sure why the distinction was considered useful
at the time.
It is only used twice; there is no need to call it out differently from other
uses of X86EMUL_RETRY.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 2 Dec 2016 17:09:11 +0000 (18:09 +0100)]
vtd: refuse to enable IOMMU if the PCI scan fails
This provides uniform behavior between Intel and AMD IOMMU initialization, and
is a requirement for PVHv2 Dom0, that depends on a working IOMMU plus the PCI
bus being scanned for devices.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Fri, 2 Dec 2016 17:08:26 +0000 (18:08 +0100)]
x86/paging: introduce paging_set_allocation
... and remove hap_set_alloc_for_pvh_dom0. While there also change the last
parameter of the {hap/shadow}_set_allocation functions to be a boolean.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 2 Dec 2016 17:07:58 +0000 (18:07 +0100)]
x86: allow calling {shadow/hap}_set_allocation with the idle domain
... and using the "preempted" parameter. Introduce a new helper that can
be used from both hypercall or idle vcpu context (ie: during Dom0
creation) in order to check if preemption is needed. If such preemption
happens, the caller should then call process_pending_softirqs in order to
drain the pending softirqs, and then call *_set_allocation again to continue
with it's execution.
This allows us to call *_set_allocation() when building domain 0.
While there also document hypercall_preempt_check and add an assert to
local_events_need_delivery in order to be sure it's not called by the idle
domain, which doesn't receive any events (and that in turn
hypercall_preempt_check is also not called by the idle domain).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Boris Ostrovsky [Fri, 2 Dec 2016 17:06:25 +0000 (18:06 +0100)]
acpi: power and sleep ACPI buttons are not emulated for PVH guests
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Fri, 2 Dec 2016 17:06:06 +0000 (18:06 +0100)]
acpi: make pmtimer optional in FADT
PM timer is not supported by PVH guests.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
He Chen [Mon, 21 Nov 2016 06:01:14 +0000 (14:01 +0800)]
x86/cpuid: Add AVX512_4VNNIW and AVX512_4FMAPS support
Add two new AVX512 subfeatures support for guest.
AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.
AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.
Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: He Chen <he.chen@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 23 Sep 2016 14:03:08 +0000 (15:03 +0100)]
x86/vmx: Shorten vmx_{get,set}_segment_register() for user segments
The x86_segment enumeration matches hardware SReg encoding, which can be used
to calculate the appropriate VMCS fields, rather than open coding every
instance.
This reduces the size of the switch statement, and the number of embedded BUG
frames from the __vm{read,write}() calls. In the unlikely case that a call
does fault, the field can unambiguously be retrieved from the GPR state
printed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostorvsky@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Ian Jackson [Fri, 2 Dec 2016 12:16:35 +0000 (12:16 +0000)]
Re-enable hypervisor debug as part of opening 4.9
AFAICT following bacbf0cb7349 "build: convert debug to Kconfig"
hypervisor debug enablement is controlled here, rather than in
Config.mk.
The release checklist says that when branching, the new staging should
have debug enabled. It seems to me that I should be changing this
here, therefore.
As additional evidence, I offer e1d1c68ea8a3 "xen: disable debug
build" which went in between 4.8.0 RC5 and RC6. It does not explain
why this was done but it does STM that reverting that change is right.
CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: George Dunlap <George.Dunlap@eu.citrix.com> CC: Jan Beulich <jbeulich@suse.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Tim Deegan <tim@xen.org> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Dario Faggioli [Tue, 29 Nov 2016 15:01:03 +0000 (16:01 +0100)]
credit2: make runqueues be per-socket by default
Benchmarks have shown that per-socket runqueues arrangement
behaves better (e.g., we achieve better load balancing)
than the current per-core default.
Here's an example (coming from
https://lists.xen.org/archives/html/xen-devel/2016-06/msg02287.html ):
|=======================================|
| XEN BUILD TIME, LOW LOAD, NO NOISE |
|---------------------------------------|
| runq=core runq=socket |
| 35.200 33.433 |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, NO NOISE | IPERF, HIGH LOAD, NO NOISE |
|---------------------------------------|------------------------------|
| runq=core runq=socket | runq=core runq=socket |
| 18.013 18.530 | 23.200 23.466 |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, LOW LOAD, WITH NOISE |
|------------------------------------- |
| runq=core runq=socket |
| 45.866 39.493 |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, WITH NOISE | IPERF, HIGH LOAD, WITH NOISE |
|---------------------------------------|------------------------------|
| runq=core runq=socket | runq=core runq=socket |
| 36.840 29.080 | 19.967 21.000 |
|=======================================|==============================|
The only reason why we went for per-core, initially, was to
introduce some form of hyperthreading support. Now we have
hyperthreading support, independently from how runqueues
are organized (9bb9c7388 "xen: credit2: implement true SMT
support"), and thus we can switch to per-socket.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Tue, 29 Nov 2016 15:00:48 +0000 (16:00 +0100)]
libacpi: fix compilation when cross building the tools
The tools (such as mk_dsdt) can be cross-built when it may not be
desirable to build them on the target.
The commit c4ac1077 "libxl/arm: Generate static ACPI DSDT table"
introduced support of ARM64 in mk_dsdt but also break cross-building
tools because the ACPI tables are not correct.
While mk_dsdt should generate ACPI table for the target architecture, it
currently generates the one for the host. This is because the source
code contains reference to the host architecture (__aarch64__,
__x86_64__, __i386__) when it should be the target architecture.
Replace all __aarch64__, __x86_64__, __i386__ by the corresponding
CONFIG_*.
Also expose the CONFIG_* to the source code as the currently only
exposed to the Makefile.
Reported-by: Andrii Anisov <andrii.anisov@gmail.com> Suggested-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Chen [Tue, 29 Nov 2016 14:59:55 +0000 (15:59 +0100)]
arm32: handle async aborts delivered while at HYP
If guest generates an asynchronous abort and then traps into HYP
(by HVC or IRQ) before the abort has been delivered, the hypervisor
could not catch it, because the PSTATE.A bit is masked all the time
in hypervisor. So this asynchronous abort may be slipped to next
running guest with PSTATE.A bit unmasked.
In order to avoid this, it is necessary to take the abort at HYP, by
clearing the PSTATE.A bit. In this patch, we unmask the PSTATE.A bit
to open a window to catch guest-generated asynchronous abort in all
Guest -> HYP switch paths. If we caught such asynchronous abort in
checking window, the HYP data abort exception will be triggered and
the abort source guest will be crashed.
Wei Chen [Tue, 29 Nov 2016 14:58:57 +0000 (15:58 +0100)]
arm64: handle async aborts delivered while at EL2
If EL1 generates an asynchronous abort and then traps into EL2
(by HVC or IRQ) before the abort has been delivered, the hypervisor
could not catch it, because the PSTATE.A bit is masked all the time
in hypervisor. So this asynchronous abort may be slipped to next
running guest with PSTATE.A bit unmasked.
In order to avoid this, it is necessary to take the abort at EL2, by
clearing the PSTATE.A bit. In this patch, we unmask the PSTATE.A bit
to open a window to catch guest-generated asynchronous abort in all
EL1 -> EL2 swich paths. If we catched such asynchronous abort in
checking window, the hyp_error exception will be triggered and the
abort source guest will be crashed.
In current code, when the hypervisor receives an asynchronous abort
from a guest, the hypervisor will do panic, the host will be down.
We have to prevent such security issue, so, in this patch we crash
the guest, when the hypervisor receives an asynchronous abort from
the guest.
Juergen Gross [Fri, 25 Nov 2016 13:32:44 +0000 (14:32 +0100)]
remove reference to xensource.com
xen/include/public/hvm/pvdrivers.h contains a reference to
xen-devel@lists.xensource.com. Replace it by the correct address
xen-devel@lists.xenproject.org
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Fri, 25 Nov 2016 13:32:19 +0000 (14:32 +0100)]
blkif: kill some repetitions in protocol description
The whole block describing multiqueue support was repeated
two times.
There also was some repetition in the description of the
'discard-enable' property.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Will <Konrad.wilk@oracle.com>
Jan Beulich [Fri, 25 Nov 2016 13:30:58 +0000 (14:30 +0100)]
x86: re-add stack alignment check
Commit 279840d5ea ("x86/boot: install trap handlers much earlier on
boot"), perhaps not really intentionally, removed this check. Add it
back,
- preventing it from triggering before any output is set up,
- accompanying it with a (weaker, due to its open coding of what
get_stack_bottom() does) build time check.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Thu, 24 Nov 2016 15:36:13 +0000 (15:36 +0000)]
x86/vmx: Don't deliver #MC with an error code
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Wed, 23 Nov 2016 11:32:55 +0000 (11:32 +0000)]
x86/hvm: Rename hvm_emulate_init() and hvm_emulate_prepare() for clarity
* Move hvm_emulate_init() to immediately after hvm_emulate_prepare(), as they
are very closely related.
* Rename hvm_emulate_prepare() to hvm_emulate_init_once() and
hvm_emulate_init() to hvm_emulate_init_per_insn() to make it clearer how to
and when to use them.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Wed, 23 Nov 2016 16:56:39 +0000 (16:56 +0000)]
libxl: fix creation of pkgconf install dir
When PKG_INSTALLDIR was introduced the creation of the previous pkgconf install
directory was not changed. Fix this by correctly using PKG_INSTALLDIR for the
directory creation in libxl Makefile.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 23 Nov 2016 14:27:47 +0000 (15:27 +0100)]
x86emul: in_longmode() should not ignore ->read_msr() errors
All present hook implementations succeed for EFER, but we shouldn't
really build on this being the case.
Suggested-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 23 Nov 2016 14:27:17 +0000 (15:27 +0100)]
x86emul: simplify DstBitBase handling code
..., at once making it more obvious that even in the negative bit
offset case the resulting bit offset to be used by the inlined
instructions will always be constrained to the operand size of the
original instruction.
Also add a test case which would have failed without the XSA-195 fix.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 23 Nov 2016 14:26:51 +0000 (15:26 +0100)]
x86/HVM: correct error code writing during task switch
Whether to write 32 or just 16 bits depends on the D bit of the target
CS. The width of the stack pointer to use depends on the B bit of the
target SS.
Also avoid using the no-fault copying routine.
Finally avoid using yet another struct segment_register variable here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 23 Nov 2016 14:25:35 +0000 (15:25 +0100)]
x86/HVM: limit writes to incoming TSS during task switch
The only field modified (and even that conditionally) is the back link.
Write only that field, and only when it actually has been written to.
Take the opportunity and also ditch the pointless initializer from the
"tss" local variable, which gets completely filled anyway by reading
from guest memory.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Wed, 23 Nov 2016 12:27:38 +0000 (12:27 +0000)]
libelf: fix symtab/strtab loading for 32bit domains
Commit ed04ca introduced a bug in the symtab/strtab loading for 32bit
guests, that corrupted the section headers array due to the padding
introduced by the elf_shdr union.
The Elf section header array on 32bit should be accessible as an array of
Elf32_Shdr elements, and the union with Elf64_Shdr done in elf_shdr was
breaking this due to size differences between Elf32_Shdr and Elf64_Shdr.
Fix this by copying each section header one by one, and using the proper
size depending on the bitness of the guest kernel. While there, also fix
a couple of consistency issues, by making sure we always use the sizes of
our local versions of the ELF header and the ELF sections headers.
Reported-by: Brian Marcotte <marcotte@panix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Tue, 22 Nov 2016 16:28:52 +0000 (17:28 +0100)]
x86/memshr: properly check grant references
They need to be range checked against the current table limit in any
event.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Move the code to where it belongs, eliminating a number of duplicate
definitions. Add locking. Produce proper error codes, and consume them
instead of making one up. Check grant type. Convert parameter types at
once.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Tue, 22 Nov 2016 16:12:50 +0000 (17:12 +0100)]
credit2: fix wrong assert in runq_tickle()
Since b047f888d489 ("xen: sched: leave CPUs doing tasklet
work alone") a cpu executing a tasklet, is not marked as
idle.
Therefore:
- avoid asserting that we can't find the idle vcpu running
on one of them, which is not true,
- avoid triggering a preemption on them (and add an assert
checking that).
This fixes a bug identified by OSSTest, in flight 102372
(on ARM, but it's not at all ARM specific), where the
ASSERT() was triggering like this:
Jan Beulich [Tue, 22 Nov 2016 12:52:53 +0000 (13:52 +0100)]
x86/EFI: meet further spec requirements for runtime calls
So far we didn't guarantee 16-byte alignment of the stack: While (so
far) we don't tell the compiler to use smaller alignment, we also don't
guarantee 16-byte alignment when establishing stack pointers for new
vCPU-s. Runtime service functions using SSE instructions may end with
#GP(0) without that.
Note that making use of -mpreferred-stack-boundary=3, as mentioned in
the comment, wouldn't help to reduce the needed alignment: The compiler
would then be free to align the stack of the function with the aligned
object, but would be permitted to place an odd number of 8-byte objects
there, resulting in the callee to still run on an unaligned stack.
(The only working alternative to the approach chosen here would be to
use -mincoming-stack-boundary=3, but that would affect all functions in
runtime.c, not just the ones actually making runtime services calls.
And it would still require the manual alignment logic here to be used
with gcc 5.2 and earlier - not permitting that command line option -,
just that then the alignment amount would become conditional.)
Hence enforce the needed alignment by making efi_rs_enter() return a
suitably aligned structure, which the caller then necessarily has to
store in a suitably aligned local variable, the address of which then
gets passed to efi_rs_leave(). Also (to limit exposure) move the
function declarations to where they belong: They're local to runtime.c,
and shared only with compat.c (by the latter including the former).
Furthermore we should avoid #MF to be raised on the FLDCW we do.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Thu, 3 Nov 2016 16:37:40 +0000 (16:37 +0000)]
pygrub: Properly quote results, when returning them to the caller:
* When the caller wants sexpr output, use `repr()'
This is what Xend expects.
The returned S-expressions are now escaped and quoted by Python,
generally using '...'. Previously kernel and ramdisk were unquoted
and args was quoted with "..." but without proper escaping. This
change may break toolstacks which do not properly dequote the
returned S-expressions.
* When the caller wants "simple" output, crash if the delimiter is
contained in the returned value.
With --output-format=simple it does not seem like this could ever
happen, because the bootloader config parsers all take line-based
input from the various bootloader config files.
With --output-format=simple0, this can happen if the bootloader
config file contains nul bytes.
This is CVE-2016-9379 and CVE-2016-9380 / XSA-198.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>