]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agoxsm: Permit dom0 to use dmops
Andrew Cooper [Fri, 27 Jan 2017 14:16:58 +0000 (14:16 +0000)]
xsm: Permit dom0 to use dmops

c/s 524a98c2ac5 "public / x86: introduce __HYPERCALL_dm_op" gave flask
permisisons for a stubdomain to use dmops, but omitted the case of a device
model running in dom0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Paul Durrant <paul.durrant@citrix.com>
8 years agoVT-d/RMRR: Avoid memory corruption in add_user_rmrr()
Andrew Cooper [Mon, 30 Jan 2017 10:09:06 +0000 (10:09 +0000)]
VT-d/RMRR: Avoid memory corruption in add_user_rmrr()

register_one_rmrr() already frees its parameter if errors are encountered.

Introduced by c/s 431685e8de and spotted by Coverity.

Coverity-ID: 1399607
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agotools/libxc: Fix missing va_end() in do_dm_op() error path
Andrew Cooper [Mon, 30 Jan 2017 10:08:50 +0000 (10:08 +0000)]
tools/libxc: Fix missing va_end() in do_dm_op() error path

The fail3 error path skips the va_end() call, which typically leaks memory for
64bit x86 code.

Introduced by c/s 524a98c2ac5, spotted by Coverity.

Coverity-ID: 1399608
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: clarify xl mem-max semantics
Juergen Gross [Fri, 27 Jan 2017 11:45:18 +0000 (12:45 +0100)]
docs: clarify xl mem-max semantics

The information given in the xl man page for the mem-max command is
rather brief. Expand it in order to let the reader understand what it
is really doing.

As the related libxl function libxl_domain_setmaxmem() isn't much
clearer add a comment to it explaining the desired semantics.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoarm/p2m: Fix regression during domain shutdown with active mem_access
Tamas K Lengyel [Wed, 25 Jan 2017 16:12:01 +0000 (09:12 -0700)]
arm/p2m: Fix regression during domain shutdown with active mem_access

The change in commit 438c5fe4f0c introduced a regression for domains where
mem_acces is or was active. When relinquish_p2m_mapping attempts to clear
a page where the order is not 0 the following ASSERT is triggered:

    ASSERT(!p2m->mem_access_enabled || page_order == 0);

This regression was unfortunately not caught during testing in preparation
for the 4.8 release.

In this patch we adjust the ASSERT to not trip when the domain
is being shutdown.

Ideally this fix would be part of Xen 4.8.1.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/dmar: place the initdata annotation after the variable type
Roger Pau Monne [Thu, 26 Jan 2017 16:18:10 +0000 (16:18 +0000)]
x86/dmar: place the initdata annotation after the variable type

clang cannot cope with the annotation being in the middle of the variable
declaration.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoflask: fix build after the introduction of DMOP
Wei Liu [Wed, 25 Jan 2017 10:43:11 +0000 (10:43 +0000)]
flask: fix build after the introduction of DMOP

In 58cbc034 send_irq permission was removed but there was still
reference to it in policy file. Remove the stale reference.

And now we also need dm permission. Add that.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
xsm/build: Further build fixes following the DMop series

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agofuzz/libelf: exit with fuzzer function return value
Wei Liu [Wed, 25 Jan 2017 11:14:43 +0000 (11:14 +0000)]
fuzz/libelf: exit with fuzzer function return value

Now the function can return nonzero value. Use that value as exit code
for the stub program. AFL might be able to use such information to
optimise fuzzing process.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agofuzz/libelf: return early if elf_init fails
Wei Liu [Wed, 25 Jan 2017 11:14:42 +0000 (11:14 +0000)]
fuzz/libelf: return early if elf_init fails

Coverity-ID: 1399557

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs/misc: update the meaning of the 'disk unplug' flag
Paul Durrant [Wed, 25 Jan 2017 10:42:55 +0000 (10:42 +0000)]
docs/misc: update the meaning of the 'disk unplug' flag

The documentation states that a value of '1' will cause unplug of
emulated IDE disks. This is not quite correct, as QEMU will also unplug
emulated SCSI disks at the same time.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoMoving ept code to ept specific files.
Paul Lai [Thu, 10 Nov 2016 23:45:52 +0000 (15:45 -0800)]
Moving ept code to ept specific files.

Renamed p2m_init_altp2m_helper() to p2m_init_altp2m_ept().

Signed-off-by: Paul Lai <paul.c.lai@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoinclude: speed up compat header generation
Jan Beulich [Wed, 25 Jan 2017 14:10:21 +0000 (15:10 +0100)]
include: speed up compat header generation

Recent additions to xlat.lst have apparently resulted in Python's
garbage collection getting in the way: I would guess that so far it
managed to re-use previously compiled regular expressions, but with the
higher number of them now can't anymore (at least with default
settings). Do the compilation explicitly. While at it, combine the two
lists, and avoid using re.subn() when re.sub() suffices.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/emulate: don't assume that addr_size == 32 implies protected mode
George Dunlap [Wed, 25 Jan 2017 14:09:55 +0000 (15:09 +0100)]
x86/emulate: don't assume that addr_size == 32 implies protected mode

Callers of x86_emulate() generally define addr_size based on the code
segment.  In vm86 mode, the code segment is set by the hardware to be
16-bits; but it is entirely possible to enable protected mode, set the
CS to 32-bits, and then disable protected mode.  (This is commonly
called "unreal mode".)

But the instruction decoder only checks for protected mode when
addr_size == 16.  So in unreal mode, hardware will throw a #UD for VEX
prefixes, but our instruction decoder will decode them, triggering an
ASSERT() further on in _get_fpu().  (With debug=n the emulator will
incorrectly emulate the instruction rather than throwing a #UD, but
this is only a bug, not a crash, so it's not a security issue.)

Teach the instruction decoder to check that we're in protected mode,
even if addr_size is 32.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Split real mode and VM86 mode handling, as VM86 mode is strictly 16-bit
at all times. Re-base.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: correct VEX/XOP/EVEX operand size handling for 16-bit code
Jan Beulich [Wed, 25 Jan 2017 14:08:59 +0000 (15:08 +0100)]
x86emul: correct VEX/XOP/EVEX operand size handling for 16-bit code

Operand size defaults to 32 bits in that case, but would not have been
set that way in the absence of an operand size override.

Reported-by: Wei Liu <wei.liu2@citrix.com> (by AFL fuzzing)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/cpuid: Remove the legacy path handling extd leaves
Andrew Cooper [Fri, 20 Jan 2017 13:58:44 +0000 (13:58 +0000)]
x86/cpuid: Remove the legacy path handling extd leaves

All leaves in the extd union are handled in guest_cpuid() now, so remove
legacy handling.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle leaf 0x8000001c in guest_cpuid()
Andrew Cooper [Fri, 20 Jan 2017 13:56:10 +0000 (13:56 +0000)]
x86/cpuid: Handle leaf 0x8000001c in guest_cpuid()

Leaf 0x8000001c contains LWP information.  edx contains hardware supported
features (and is clamped against the maximum), while ecx and ebx contain
various properties of the implementation.  eax is entirely dynamic, depending
on xcr0 and MSR_LWP_CFG.

The call to guest_cpuid() in svm_update_lwp_cfg() can now be replaced by
reading the data straight out of the cpuid_policy block.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpufeatures: Hide Instruction Based Sampling from guests
Andrew Cooper [Fri, 20 Jan 2017 14:48:57 +0000 (14:48 +0000)]
x86/cpufeatures: Hide Instruction Based Sampling from guests

Xen advertises the IBS feature flag to guests on capable AMD hardware.
However, the PV path in Xen, and both the PV and HVM paths in libxc
deliberately clobber the IBS CPUID leaf.

Furthermore, Xen has nothing providing an implementation of the IBS MSRs, so
guests can't actually use the feature at all.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle leaves 0x8000000b-1a in guest_cpuid()
Andrew Cooper [Fri, 20 Jan 2017 13:36:36 +0000 (13:36 +0000)]
x86/cpuid: Handle leaves 0x8000000b-1a in guest_cpuid()

Leaves 8000000b-18 are reserved.  Leaf 80000019 is 1G TLB information and leaf
0x8000001a is performance hints.  These leaves have previously been hidden
from guests, but are perfectly safe to expose when appicable.

Update libxc to also expose these leaves.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle leaf 0x8000000a in guest_cpuid()
Andrew Cooper [Sun, 22 Jan 2017 17:50:12 +0000 (17:50 +0000)]
x86/cpuid: Handle leaf 0x8000000a in guest_cpuid()

Leaf 0x8000000a contains SVM information.  The feature choices are borrowed
straight from the libxc policy code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/cpuid: Handle leaf 0x80000009 in guest_cpuid()
Andrew Cooper [Fri, 20 Jan 2017 13:41:47 +0000 (13:41 +0000)]
x86/cpuid: Handle leaf 0x80000009 in guest_cpuid()

Leaf 0x80000009 is reserved.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle leaf 0x80000008 in guest_cpuid()
Andrew Cooper [Fri, 20 Jan 2017 13:00:32 +0000 (13:00 +0000)]
x86/cpuid: Handle leaf 0x80000008 in guest_cpuid()

The entirety of edx is reserved.

Intel only defines the lower 16 bits of eax, although ebx is covered by the
featureset ABI, so left unclobbered.

AMD uses 24 bits in eax, although nothing thus far has ever exposed a non-zero
guest maxphysaddr to HVM guests.  Its semantics are not clearly expressed, so
it is explicitly clobbered.  ecx contains some reserved bits, and several
pieces of static topology information, which are left as the toolstack
chooses.

A side effect of the common recalculation of maxlinaddr is that 32bit PV
guests see a maximum linear address of 32, which is consistent with the hiding
of other long mode information from them.

Finally, the call to guest_cpuid() in mtrr_var_range_msr_set() (introduced in
c/s fff8160a) can be dropped, now that maxphysaddr can be read straight out of
the cpuid_policy block.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle leaves 0x80000005-7 in guest_cpuid()
Andrew Cooper [Fri, 20 Jan 2017 15:35:08 +0000 (15:35 +0000)]
x86/cpuid: Handle leaves 0x80000005-7 in guest_cpuid()

Leaf 0x80000005 contains L1 cache/TLB information, 0x80000006 L2 & L3
cache/TLB information, and 0x80000007 Power management information.

Intel reserves all of this information other than the L2 cache information,
and the ITSC bit from the power management leaf.

AMD passes all of the cache/TLB information through to the guest, while most
of of the power management information is explicitly clobbered by the
toolstack.

0x80000007 edx (containing ITSC) is covered by the featureset logic.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle the long vendor string in guest_cpuid()
Andrew Cooper [Wed, 18 Jan 2017 18:13:17 +0000 (18:13 +0000)]
x86/cpuid: Handle the long vendor string in guest_cpuid()

Leaves 0x80000002 through 0x80000004 are plain ASCII text, and are left
exactly as the toolstack chooses.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle leaf 0x80000001 in guest_cpuid()
Andrew Cooper [Fri, 20 Jan 2017 14:47:34 +0000 (14:47 +0000)]
x86/cpuid: Handle leaf 0x80000001 in guest_cpuid()

Intel reserve eax and ebx, while AMD duplicates eax from the low
family/model/stepping leaf.  For AMD, ebx contains further brand/package
information which is left as the toolstack chooses (other than bits 27:16
which are reserved).

While moving the dynamic adjustments from the legacy path, simplify the shadow
PSE36 adjustment.  PAE paging is a prerequisite for enabling long mode, making
the long mode check redundant; the case where it doesn't get short circuited
is the case where it is architecturally 0.  Make the same adjustment to the
leaf 1 legacy path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle more simple Intel leaves in guest_cpuid()
Andrew Cooper [Tue, 17 Jan 2017 17:32:50 +0000 (17:32 +0000)]
x86/cpuid: Handle more simple Intel leaves in guest_cpuid()

Intel now document leaf 2 as a plain leaf, with %al always containing the
value 0x01.  Collect this leaf normally in calculate_raw_policy() and expose
it to guests.  The leaf is reserved by AMD.

Intel leaves 3 and 9 (PSN and DCA respectively) are not exposed to guests at
all.  They are reserved by AMD.

Leaves 8 and 0xc are reserved by both vendors.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Only recalculate the shared feature bits once
Andrew Cooper [Tue, 17 Jan 2017 17:08:04 +0000 (17:08 +0000)]
x86/cpuid: Only recalculate the shared feature bits once

With accurate vendor information available, the shared bits can be sorted out
during recalculation, rather than at query time in the legacy cpuid path.

This means that:
 * Duplication can be dropped from the automatically generated cpuid data.
 * The toolstack need not worry about setting them appropriately.
 * They can be dropped from the system maximum featuresets.

While editing gen-cpuid.py, reflow some comments which exceeded the expected
line length.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Handle leaf 0x80000000 in guest_cpuid()
Andrew Cooper [Tue, 17 Jan 2017 16:52:14 +0000 (16:52 +0000)]
x86/cpuid: Handle leaf 0x80000000 in guest_cpuid()

The calculations for p->extd.max_leaf are reworked to force a value of at
least 0x80000000, and to take the domains chosen vendor into account when
clamping maximum value.

The high short vendor information is clobbered or duplicated according to the
chosen vendor.

As a side effect of handing out an audited max_leaf value, the 0x8000001e case
can be dropped from pv_cpuid(), as it outside of the visible range.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
8 years agox86/cpufeatures: Expose self-snoop to all guests
Andrew Cooper [Thu, 19 Jan 2017 10:26:14 +0000 (10:26 +0000)]
x86/cpufeatures: Expose self-snoop to all guests

Self-snoop describes a property of the CPU cache behaviour, which FreeBSD uses
to optimise its cache flushing algorithm.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
8 years agotools/fuzz: add README.afl
Wei Liu [Fri, 20 Jan 2017 11:21:40 +0000 (11:21 +0000)]
tools/fuzz: add README.afl

And rename README to README.oss-fuzz.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/fuzz: add AFL stub program for libefl fuzzer
Wei Liu [Fri, 20 Jan 2017 11:57:58 +0000 (11:57 +0000)]
tools/fuzz: add AFL stub program for libefl fuzzer

And hook it up into build system.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/fuzz: add AFL stub program for x86 insn emulator fuzzer
Wei Liu [Fri, 20 Jan 2017 11:17:29 +0000 (11:17 +0000)]
tools/fuzz: add AFL stub program for x86 insn emulator fuzzer

This is a basic program to call into the unified fuzzing function.

Hook it up into build system so that we can always build test it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/fuzz: add missing dependencies in x86 insn fuzzer build rule
Wei Liu [Fri, 20 Jan 2017 11:39:41 +0000 (11:39 +0000)]
tools/fuzz: add missing dependencies in x86 insn fuzzer build rule

The said file needs the two header files.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agocredit2: performance counter for load balancing call
Praveen Kumar [Wed, 25 Jan 2017 09:51:47 +0000 (10:51 +0100)]
credit2: performance counter for load balancing call

The patch introduces a new performance counter that counts how many times we go
through the load balancing logic in Credit2.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
8 years agox86/hvm: serialize trap injecting producer and consumer
Jan Beulich [Wed, 25 Jan 2017 09:51:10 +0000 (10:51 +0100)]
x86/hvm: serialize trap injecting producer and consumer

Since injection works on a remote vCPU, and since there's no
enforcement of the subject vCPU being paused, there's a potential race
between the producing and consuming sides. Fix this by leveraging the
vector field as synchronization variable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
[re-based]
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agodm_op: convert HVMOP_inject_trap and HVMOP_inject_msi
Paul Durrant [Wed, 25 Jan 2017 09:49:52 +0000 (10:49 +0100)]
dm_op: convert HVMOP_inject_trap and HVMOP_inject_msi

NOTE: This patch also modifies the types of the 'vector', 'type' and
      'insn_len' arguments of xc_hvm_inject_trap() from uint32_t to
      uint8_t. In practice the values passed were always truncated to
      8 bits.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agodm_op: convert HVMOP_set_mem_type
Paul Durrant [Wed, 25 Jan 2017 09:48:25 +0000 (10:48 +0100)]
dm_op: convert HVMOP_set_mem_type

This patch removes the need for handling HVMOP restarts, so that
infrastructure is removed.

NOTE: This patch also modifies the type of the 'nr' argument of
      xc_hvm_set_mem_type() from uint64_t to uint32_t. In practice the
      value passed was always truncated to 32 bits.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agodm_op: convert HVMOP_modified_memory
Paul Durrant [Wed, 25 Jan 2017 09:47:13 +0000 (10:47 +0100)]
dm_op: convert HVMOP_modified_memory

This patch introduces code to handle DMOP continuations.

NOTE: This patch also modifies the type of the 'nr' argument of
      xc_hvm_modified_memory() from uint64_t to uint32_t. In practice the
      value passed was always truncated to 32 bits.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agodm_op: convert HVMOP_set_pci_intx_level, HVMOP_set_isa_irq_level, and...
Paul Durrant [Wed, 25 Jan 2017 09:44:50 +0000 (10:44 +0100)]
dm_op: convert HVMOP_set_pci_intx_level, HVMOP_set_isa_irq_level, and...

... HVMOP_set_pci_link_route

These HVMOPs were exposed to guests so their definitions need to be
preserved for compatibility. This patch therefore updates
__XEN_LATEST_INTERFACE_VERSION__ to 0x00040900 and makes the HVMOP
defintions conditional on __XEN_INTERFACE_VERSION__ less than that value.

NOTE: This patch also widens the 'domain' parameter of
      xc_hvm_set_pci_intx_level() from a uint8_t to a uint16_t.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agodm_op: convert HVMOP_track_dirty_vram
Paul Durrant [Wed, 25 Jan 2017 09:43:14 +0000 (10:43 +0100)]
dm_op: convert HVMOP_track_dirty_vram

The handle type passed to the underlying shadow and hap functions is
changed for compatibility with the new hypercall buffer.

NOTE: This patch also modifies the type of the 'nr' parameter of
      xc_hvm_track_dirty_vram() from uint64_t to uint32_t. In practice
      the value passed was always truncated to 32 bits.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agodm_op: convert HVMOP_*ioreq_server*
Paul Durrant [Wed, 25 Jan 2017 09:41:35 +0000 (10:41 +0100)]
dm_op: convert HVMOP_*ioreq_server*

The definitions of HVM_IOREQSRV_BUFIOREQ_* have to persist as they are
already in use by callers of the libxc interface.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agopublic / x86: introduce __HYPERCALL_dm_op...
Paul Durrant [Wed, 25 Jan 2017 09:40:51 +0000 (10:40 +0100)]
public / x86: introduce __HYPERCALL_dm_op...

...as a set of hypercalls to be used by a device model.

As stated in the new docs/designs/dm_op.markdown:

"The aim of DMOP is to prevent a compromised device model from
compromising domains other then the one it is associated with. (And is
therefore likely already compromised)."

See that file for further information.

This patch simply adds the boilerplate for the hypercall.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Suggested-by: Ian Jackson <ian.jackson@citrix.com>
Suggested-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoVT-d: add command line option for extra rmrrs
Elena Ufimtseva [Wed, 25 Jan 2017 09:38:05 +0000 (10:38 +0100)]
VT-d: add command line option for extra rmrrs

On some platforms firmware fails to specify RMRR regions in ACPI tables and
thus those regions will not be mapped in dom0 or guests and may cause IO
Page Faults and prevent dom0 from booting if "iommu=dom0-strict" option is
specified on the Xen command line.

New Xen command line option rmrr allows to specify such devices and
memory regions. These regions are added to the list of RMRR defined in ACPI
if the device is present in system. As a result, additional RMRRs will be
mapped 1:1 in dom0 with correct permissions.

The above mentioned problems were discovered during the PVH work with
ThinkCentre M and Dell 5600T. No official documentation was found so far
in regards to what devices and why cause this. Experiments show that
ThinkCentre M USB devices with enabled debug port generate DMA read
transactions to the regions of memory marked reserved in host e820 map.

For Dell 5600T the device and faulting addresses are not found yet.
For detailed history of the discussion please check following threads:
    http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
    http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
    rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
    For example, for Lenovo ThinkCentre M, use:
        rmrr=0xd5d45=0:0:1d.0;0xd5d46=0:0:1a.0
    If grub2 used and multiple ranges are specified, ';' should be
    quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agopci: add wrapper for parse_pci
Elena Ufimtseva [Wed, 25 Jan 2017 09:37:43 +0000 (10:37 +0100)]
pci: add wrapper for parse_pci

For sbdf's parsing in RMRR command line, add parse_pci_seg with additional
parameter def_seg. parse_pci_seg will help to identify if segment was
found in string being parsed or default segment was used.
Make a wrapper parse_pci so the rest of the callers are not affected.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoVT-d: separate rmrr addition function
Elena Ufimtseva [Wed, 25 Jan 2017 09:37:14 +0000 (10:37 +0100)]
VT-d: separate rmrr addition function

In preparation for auxiliary RMRR data provided on Xen command line,
make RMRR adding a separate function.
Also free memery for rmrr device scope in error path.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoConfig.mk: update OVMF changeset
Wei Liu [Tue, 24 Jan 2017 17:24:56 +0000 (17:24 +0000)]
Config.mk: update OVMF changeset

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen: sched: simplify ACPI S3 resume path.
Dario Faggioli [Tue, 17 Jan 2017 17:27:10 +0000 (18:27 +0100)]
xen: sched: simplify ACPI S3 resume path.

In fact, when domains are being unpaused:
 - it's not necessary to put the vcpu to sleep, as
   they are all already paused;
 - it is not necessary to call vcpu_migrate(), as
   the vcpus are still paused, and therefore won't
   wakeup anyway.

Basically, the only important thing is to call
pick_cpu, to let the scheduler run and figure out
what would be the best initial placement (i.e., the
value stored in v->processor), for the vcpus, as
they come back up, one after another.

Note that this is consistent with what was happening
before this change, as vcpu_migrate() calls pick_cpu.
But much simpler and quicker.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: sched: impove use of cpumask scratch space in Credit1.
Dario Faggioli [Tue, 17 Jan 2017 17:27:03 +0000 (18:27 +0100)]
xen: sched: impove use of cpumask scratch space in Credit1.

It is ok to use just cpumask_scratch in csched_runq_steal().
In fact, the cpu parameter comes from the cpu local variable
in csched_load_balance(), which in turn comes from cpu in
csched_schedule(), which is smp_processor_id().

While there, also:
 - move the comment about cpumask_scratch in the header
   where the scratch space is declared;
 - spell more clearly (in that same comment) what are the
   serialization rules.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: fix shutdown/suspend when playing with cpupools.
Dario Faggioli [Tue, 17 Jan 2017 17:26:55 +0000 (18:26 +0100)]
xen: credit2: fix shutdown/suspend when playing with cpupools.

In fact, during shutdown/suspend, we temporarily move all
the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
domains, we call csched2_vcpu_migrate(), expects to find the
target pCPU in the domain's pool

Therefore, if Credit2 is the default scheduler and we have
removed pCPU 0 from cpupool0, shutdown/suspend fails like
this:

 RIP:    e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
 Xen call trace:
    [<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
    [<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
    [<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801333ed>] do_tasklet+0x66/0x8b
    [<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e

 ****************************************
 Panic on CPU 8:
 Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
 ****************************************

On the other hand, if Credit2 is the scheduler of another
pool, when trying (still during shutdown/suspend) to move
the vCPUs of the Credit2 domains to pCPU 0, it figures
out that pCPU 0 is not a Credit2 pCPU, and fails like this:

 RIP:    e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
 Xen call trace:
    [<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
    [<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801335d5>] do_tasklet+0x66/0x8b
    [<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e

The solution is to recognise the specific situation, inside
csched2_vcpu_migrate() and, considering it is something temporary,
which only happens during shutdown/suspend, quickly deal with it.

Then, in the resume path, in restore_vcpu_affinity(), things
are set back to normal, and a new v->processor is chosen, for
each vCPU, from the proper set of pCPUs (i.e., the ones of
the proper cpupool).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: never consider CPUs outside of our cpupool.
Dario Faggioli [Tue, 17 Jan 2017 17:26:46 +0000 (18:26 +0100)]
xen: credit2: never consider CPUs outside of our cpupool.

In fact, relying on the mask of what pCPUs belong to
which Credit2 runqueue is not enough. If we only do that,
when Credit2 is the boot scheduler, we may ASSERT() or
panic when moving a pCPU from Pool-0 to another cpupool.

This is because pCPUs outside of any pool are considered
part of cpupool0. This puts us at risk of crash when those
same pCPUs are added to another pool and something
different than the idle domain is found to be running
on them.

Note that, even if we prevent the above to happen (which
is the purpose of this patch), this is still pretty bad,
in fact, when we remove a pCPU from Pool-0:
- in Credit1, as we do *not* update prv->ncpus and
  prv->credit, which means we're considering the wrong
  total credits when doing accounting;
- in Credit2, the pCPU remains part of one runqueue,
  and is hence at least considered during load balancing,
  even if no vCPU should really run there.

In Credit1, this "only" causes skewed accounting and
no crashes because there is a lot of `cpumask_and`ing
going on with the cpumask of the domains' cpupool
(which, BTW, comes at a price).

A quick and not to involved (and easily backportable)
solution for Credit2, is to do exactly the same.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: use the correct scratch cpumask.
Dario Faggioli [Tue, 17 Jan 2017 17:26:38 +0000 (18:26 +0100)]
xen: credit2: use the correct scratch cpumask.

In fact, there is one scratch mask per each CPU. When
you use the one of a CPU, it must be true that:
 - the CPU belongs to your cpupool and scheduler,
 - you own the runqueue lock (the one you take via
   {v,p}cpu_schedule_lock()) for that CPU.

This was not the case within the following functions:

get_fallback_cpu(), csched2_cpu_pick(): as we can't be
sure we either are on, or hold the lock for, the CPU
that is in the vCPU's 'v->processor'.

migrate(): it's ok, when called from balance_load(),
because that comes from csched2_schedule(), which takes
the runqueue lock of the CPU where it executes. But it is
not ok when we come from csched2_vcpu_migrate(), which
can be called from other places.

The fix is to explicitly use the scratch space of the
CPUs for which we know we hold the runqueue lock.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agotools/fuzz: remove redundant rule in x86 insn fuzzer
Wei Liu [Fri, 20 Jan 2017 10:24:36 +0000 (10:24 +0000)]
tools/fuzz: remove redundant rule in x86 insn fuzzer

The predefined pattern rule works.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/fuzz: make sure targets are always built
Wei Liu [Thu, 19 Jan 2017 19:00:14 +0000 (19:00 +0000)]
tools/fuzz: make sure targets are always built

Invocation of `make' in top-level directory would end up invoking the
install target.

Adjust fuzzing target makefiles a bit so that they are always build in
that situation.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/fuzz: fix compilation after 897129d
Wei Liu [Thu, 19 Jan 2017 18:56:02 +0000 (18:56 +0000)]
tools/fuzz: fix compilation after 897129d

We need to add -D__XEN_TOOLS__ so that the correct register names are
generated.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86emul/test: don't use *_len symbols
Jan Beulich [Tue, 24 Jan 2017 16:22:03 +0000 (16:22 +0000)]
x86emul/test: don't use *_len symbols

... as they don't work as intended with -fPIC.

I did prefer them over *_end ones at the time because older gcc would
cause .L* symbols to be public, due to issuing .globl for all
referenced externals. And labels at the end of instructions collide
with the ones at the start of the next instruction, making disassembly
harder to grok. Luckily recent gcc no longer issues those .globl
directives, and hence .L* labels, staying local by default, no longer
get in the way.

Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/hvm: do not set msr_tsc_adjust on hvm_set_guest_tsc_fixed
Joao Martins [Tue, 24 Jan 2017 11:37:36 +0000 (12:37 +0100)]
x86/hvm: do not set msr_tsc_adjust on hvm_set_guest_tsc_fixed

Commit 6e03363 ("x86: Implement TSC adjust feature for HVM guest")
implemented TSC_ADJUST MSR for hvm guests. Though while booting
an HVM guest the boot CPU would have a value set with delta_tsc -
guest tsc while secondary CPUS would have 0. For example one can
observe:
 $ xen-hvmctx 17 | grep tsc_adjust
 TSC_ADJUST: tsc_adjust ff9377dfef47fe66
 TSC_ADJUST: tsc_adjust 0
 TSC_ADJUST: tsc_adjust 0
 TSC_ADJUST: tsc_adjust 0

Upcoming Linux 4.10 now validates whether this MSR is correct and
adjusts them accordingly under the following conditions: values of < 0
(our case for CPU 0) or != 0 or values > 7FFFFFFF. In this conditions it
will force set to 0 and for the CPUs that the value doesn't match all
together. If this msr is not correct we would see messages such as:

[Firmware Bug]: TSC ADJUST: CPU0: -30517044286984129 force to 0

And on HVM guests supporting TSC_ADJUST (requiring at least Haswell
Intel) it won't boot.

Our current vCPU 0 values are incorrect and according to Intel SDM which on
section "Time-Stamp Counter Adjustment" states that "On RESET, the value
of the IA32_TSC_ADJUST MSR is 0." hence we should set it 0 and be
consistent across multiple vCPUs. Perhaps this MSR should be only
changed by the guest which already happens through
hvm_set_guest_tsc_adjust(..) routines (see below). After this patch
guests running Linux 4.10 will see a valid IA32_TSC_ADJUST msr of value
 0 for all CPUs and are able to boot.

On the same section of the spec ("Time-Stamp Counter Adjustment") it is
also stated:
"If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR
 adds (or subtracts) value X from the TSC, the logical processor also
 adds (or subtracts) value X from the IA32_TSC_ADJUST MSR.

 Unlike the TSC, the value of the IA32_TSC_ADJUST MSR changes only in
 response to WRMSR (either to the MSR itself, or to the
 IA32_TIME_STAMP_COUNTER MSR). Its value does not otherwise change as
 time elapses. Software seeking to adjust the TSC can do so by using
 WRMSR to write the same value to the IA32_TSC_ADJUST MSR on each logical
 processor."

This suggests these MSRs values should only be changed through guest i.e.
throught write intercept msrs. We keep IA32_TSC MSR logic such that writes
accomodate adjustments to TSC_ADJUST, hence no functional change in the
msr_tsc_adjust for IA32_TSC msr. Though, we do that in a separate routine
namely hvm_set_guest_tsc_msr instead of through hvm_set_guest_tsc(...).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/HVM: make hvm_set_guest_tsc*() static
Jan Beulich [Tue, 24 Jan 2017 11:36:55 +0000 (12:36 +0100)]
x86/HVM: make hvm_set_guest_tsc*() static

Other than hvm_set_guest_tsc(), neither needs to be exposed. And
hvm_get_guest_tsc_adjust() is pretty pointless as a seperate function
altogether, let alone a non-static one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/PVH: only set accessed/busy bits for present segments
Jan Beulich [Tue, 24 Jan 2017 11:36:30 +0000 (12:36 +0100)]
x86/PVH: only set accessed/busy bits for present segments

Commit 366ff5f1b3 ("x86: segment attribute handling adjustments" went a
little too far: We must not do such adjustments for non-present segments.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
8 years agox86emul: correct FPU stub asm() constraints
Jan Beulich [Tue, 24 Jan 2017 11:35:59 +0000 (12:35 +0100)]
x86emul: correct FPU stub asm() constraints

Properly inform the compiler about fic's role as both an input (its
insn_bytes field) and output (its exn_raised field).

Take the opportunity and bring emulate_fpu_insn_stub() more in line
with emulate_fpu_insn_stub_eflags().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/HVM: introduce struct hvm_pi_ops
Suravee Suthikulpanit [Tue, 24 Jan 2017 11:34:41 +0000 (12:34 +0100)]
x86/HVM: introduce struct hvm_pi_ops

The current function pointers in struct vmx_domain for managing hvm
posted interrupt can be used also by SVM AVIC. Therefore, this patch
introduces the struct hvm_pi_ops in the struct hvm_domain to hold them.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/hvm: Conditionally leave CPUID Faulting active in HVM context
Andrew Cooper [Mon, 9 Jan 2017 13:42:02 +0000 (13:42 +0000)]
x86/hvm: Conditionally leave CPUID Faulting active in HVM context

If the hardware supports faulting, and the guest has chosen to use it, leave
faulting active in HVM context.

It is more efficient to have hardware convert CPUID to a #GP fault (which we
don't intercept), than to take a VMExit and have Xen re-inject a #GP fault.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/cpuid: Handle leaf 0 in guest_cpuid()
Andrew Cooper [Tue, 17 Jan 2017 16:48:58 +0000 (16:48 +0000)]
x86/cpuid: Handle leaf 0 in guest_cpuid()

Calculate a domains x86_vendor early in recalculate_cpuid_policy(); subsequent
patches need to make other recalculation decisions based on it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Remove BUG_ON() condition from guest_cpuid()
Andrew Cooper [Tue, 17 Jan 2017 11:44:29 +0000 (11:44 +0000)]
x86/cpuid: Remove BUG_ON() condition from guest_cpuid()

Include a min() against the appropriate ARRAY_SIZE(), and ASSERT() that
max_subleaf is within ARRAY_SIZE().

This is more robust to unexpected problems in a release build of Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Hide VT-x/SVM from HVM-based control domains
Andrew Cooper [Wed, 18 Jan 2017 18:10:41 +0000 (18:10 +0000)]
x86/cpuid: Hide VT-x/SVM from HVM-based control domains

The VT-x/SVM features are hidden from PV dom0 by the pv_featureset[] upper
mask, but nothing thus far has prevented the features being visible in
HVM-based control domains (where there is no toolstack decision to hide the
features).

As a side effect of calling nestedhvm_enabled() earlier during domain
creation, it needs to cope with the params[] array not having been allocated.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Fix clang build following BMI1/BMI2/TBM instruction support
Andrew Cooper [Fri, 20 Jan 2017 15:34:54 +0000 (15:34 +0000)]
x86/emul: Fix clang build following BMI1/BMI2/TBM instruction support

Travis reports that Clang objects to integer truncation during assignments to
a bitfield:

  ./x86_emulate/x86_emulate.c:6150:19: error: implicit truncation from 'int'
  to bitfield changes value from -1 to 15 [-Werror,-Wbitfield-constant-conversion]
          pxop->reg = ~0; /* rAX */
                    ^ ~~

Use 0xf instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agokexec: ensure kexec_status() return bit value of 0 or 1
Eric DeVolder [Thu, 19 Jan 2017 17:10:53 +0000 (11:10 -0600)]
kexec: ensure kexec_status() return bit value of 0 or 1

When checking kexec_flags bit corresponding to the
requested image, ensure that 0 or 1 is returned.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: segment attribute handling adjustments
Jan Beulich [Fri, 20 Jan 2017 13:39:12 +0000 (14:39 +0100)]
x86: segment attribute handling adjustments

Null selector loads into SS (possible in 64-bit mode only, and only in
rings other than ring 3) must not alter SS.DPL. (This was found to be
an issue on KVM, and fixed in Linux commit 33ab91103b.)

Further arch_set_info_hvm_guest() didn't make sure that the ASSERT()s
in hvm_set_segment_register() wouldn't trigger: Add further checks, but
tolerate (adjust) clear accessed (CS, SS, DS, ES) and busy (TR) bits.

Finally the setting of the accessed bits for user segments was lost by
commit dd5c85e312 ("x86/hvm: Reposition the modification of raw segment
data from the VMCB/VMCS"), yet VMX requires them to be set for usable
segments. Add respective ASSERT()s (the only path not properly setting
them was arch_set_info_hvm_guest()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: LOCK check adjustments
Jan Beulich [Fri, 20 Jan 2017 13:37:33 +0000 (14:37 +0100)]
x86emul: LOCK check adjustments

BT, being encoded as DstBitBase just like BT{C,R,S}, nevertheless does
not write its (register or memory) operand and hence also doesn't allow
a LOCK prefix to be used.

At the same time CLAC/STAC have no need to explicitly check lock_prefix
- this is being taken care of by generic code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: CMPXCHG{8,16}B are memory writes
Jan Beulich [Fri, 20 Jan 2017 13:36:58 +0000 (14:36 +0100)]
x86emul: CMPXCHG{8,16}B are memory writes

This fixes a regression introduced by commit ff913f68c9 ("x86/PV:
restrict permitted instructions during memory write emulation")
breaking namely 32-bit PV guests (which commonly use CMPXCHG8B for
certain page table updates).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/boot: implement early command line parser in C
Daniel Kiper [Fri, 20 Jan 2017 13:34:33 +0000 (14:34 +0100)]
x86/boot: implement early command line parser in C

Current early command line parser implementation in assembler
is very difficult to change to relocatable stuff using segment
registers. This requires a lot of changes in very weird and
fragile code. So, reimplement this functionality in C. This
way code will be relocatable out of the box (without playing
with segment registers) and much easier to maintain.

Additionally, put all common cmdline.c and reloc.c definitions
into defs.h header. This way we do not duplicate needlessly
some stuff.

And finally remove unused xen/include/asm-x86/config.h
header from reloc.c dependencies.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
8 years agotools/tests: add xenstore testing framework
Juergen Gross [Thu, 19 Jan 2017 07:18:53 +0000 (08:18 +0100)]
tools/tests: add xenstore testing framework

Add tools/tests/xenstore for a framework to do tests of xenstore.
The aim is to test for correctness and performance.

Add a test program containing some tests meant to be run against any
xenstore implementation (xenstored, oxenstored, xenstore-stubdom).

It is using libxenstore for access to xenstore and supports multiple
tests to be either selected all or individually. All tests are using
/local/domain/<own-domid>/xenstore-test/<pid> as base for doing the
tests. This allows multiple instances of the program to run in
parallel.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/PV: restrict permitted instructions during memory write emulation
Jan Beulich [Thu, 19 Jan 2017 09:38:08 +0000 (10:38 +0100)]
x86/PV: restrict permitted instructions during memory write emulation

All three code paths mean to only emulate memory writes. Refuse
emulation of any other instructions there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/mtrr: convert use_intel_if u32 to bool
Doug Goldstein [Thu, 19 Jan 2017 09:36:14 +0000 (10:36 +0100)]
x86/mtrr: convert use_intel_if u32 to bool

This field is always only 1 currently but may allow 0 in the future so
convert it to a bool to provide proper range checking by the compiler.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mtrr: drop unused func prototypes and struct
Doug Goldstein [Thu, 19 Jan 2017 09:35:45 +0000 (10:35 +0100)]
x86/mtrr: drop unused func prototypes and struct

These weren't used so drop them.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mtrr: drop positive_have_wrcomb()
Doug Goldstein [Thu, 19 Jan 2017 09:35:14 +0000 (10:35 +0100)]
x86/mtrr: drop positive_have_wrcomb()

The only call to have_wrcomb() was always to the generic implementation.
positive_have_wrcomb() was unused.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agopublic/kexec: put back blank line for readability purposes
Eric DeVolder [Thu, 19 Jan 2017 09:34:57 +0000 (10:34 +0100)]
public/kexec: put back blank line for readability purposes

This blank line was accidentally removed during
the insertion of the kexec_status() declarations.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86emul: simplify prefix handling for VMFUNC
Jan Beulich [Thu, 19 Jan 2017 09:34:21 +0000 (10:34 +0100)]
x86emul: simplify prefix handling for VMFUNC

LOCK prefixes get dealt with elsewhere, and 66, F2, and F3 can all be
checked for in one go by looking at vex.pfx.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: rename the no_writeback label
Jan Beulich [Thu, 19 Jan 2017 09:33:55 +0000 (10:33 +0100)]
x86emul: rename the no_writeback label

This is to bring its name in line with what actually happens there.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support RDPID
Jan Beulich [Thu, 19 Jan 2017 09:33:29 +0000 (10:33 +0100)]
x86emul: support RDPID

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support RDRAND/RDSEED
Jan Beulich [Thu, 19 Jan 2017 09:32:13 +0000 (10:32 +0100)]
x86emul: support RDRAND/RDSEED

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support TBM insns
Jan Beulich [Thu, 19 Jan 2017 09:30:42 +0000 (10:30 +0100)]
x86emul: support TBM insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support BMI2 insns
Jan Beulich [Thu, 19 Jan 2017 09:28:28 +0000 (10:28 +0100)]
x86emul: support BMI2 insns

Note that the adjustment to the mode_64bit() definition is so that we
can avoid "#ifdef __x86_64__" around the 64-bit asm() portions. An
alternative would be single asm()s with a conditional branch over the
(manually encoded) REX64 prefix.

Note that RORX raising #UD when VEX.VVVV is not all ones is matching
observed behavior rather than what the SDM says.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support BMI1 insns
Jan Beulich [Thu, 19 Jan 2017 09:22:53 +0000 (10:22 +0100)]
x86emul: support BMI1 insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen/arm: gic-v3: Make sure read from ICC_IAR1_EL1 is visible on the redistributor
Julien Grall [Wed, 18 Jan 2017 18:54:08 +0000 (18:54 +0000)]
xen/arm: gic-v3: Make sure read from ICC_IAR1_EL1 is visible on the redistributor

"The effects of reading ICC_IAR0_EL1 and ICC_IAR1_EL1 on the state of a
returned INTID are not guaranteed to be visible until after the execution
of a DSB".

Because of the GIC is an external component, a dsb sy is required.
Without it the sysreg read may not have been made visible on the
redistributor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/cpuid: Offer ITSC to domains which are automatically non-migrateable
Andrew Cooper [Mon, 9 Jan 2017 12:54:55 +0000 (12:54 +0000)]
x86/cpuid: Offer ITSC to domains which are automatically non-migrateable

Dom0 doesn't have a toolstack to explicitly decide that ITSC is safe to offer.
For domains which are automatically built with disable_migrate set, offer ITSC
automatically.

This is important for HVM-based dom0, and for when cpuid faulting is imposed
on the control domain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agotools/libxc: Remove xsave calculations from libxc
Andrew Cooper [Wed, 4 Jan 2017 15:07:02 +0000 (15:07 +0000)]
tools/libxc: Remove xsave calculations from libxc

libxc performs a lot of calculations for the xstate leaf when generating a
guests cpuid policy.  To correctly construct a policy for an HVM guest, this
logic depends on native cpuid leaking through from real hardware.

In particular, the logic is potentially wrong for an HVM-based toolstack
domain (e.g. PVH dom0), and definitely wrong if cpuid faulting is applied to a
PV domain.

Xen now performs all the necessary calculations, using native values.  The
only piece of information the toolstack need worry about is single xstate
feature leaf.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/cpuid: Move all xstate leaf handling into guest_cpuid()
Andrew Cooper [Fri, 16 Dec 2016 16:21:20 +0000 (16:21 +0000)]
x86/cpuid: Move all xstate leaf handling into guest_cpuid()

The xstate union now contains sanitised values, so it can be handled fully in
the non-legacy path.

c/s 1c0bc709d "x86/cpuid: Perform max_leaf calculations in guest_cpuid()"
accidentally introduced a boundary error for the subleaf check, although it
was masked by the correct logic in the legacy path.

Two dynamic adjustments need making, but a TODO and BUILD_BUG_ON() are left to
cover a latent bug which will present itself when Xen starts supporting XSS
states for guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Introduce recalculate_xstate()
Andrew Cooper [Wed, 4 Jan 2017 15:00:23 +0000 (15:00 +0000)]
x86/cpuid: Introduce recalculate_xstate()

All data in the xstate union, other than the Da1 feature word, is derived from
other state; either feature bits from other words, or layout information which
has already been collected by Xen's xstate driver.

Recalculate the xstate information for each policy object when the feature
bits may have changed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Move x86_vendor from arch_domain to cpuid_policy
Andrew Cooper [Thu, 12 Jan 2017 11:45:10 +0000 (11:45 +0000)]
x86/cpuid: Move x86_vendor from arch_domain to cpuid_policy

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86/cpuid: Drop a guests cached x86 family and model information
Andrew Cooper [Thu, 12 Jan 2017 11:45:10 +0000 (11:45 +0000)]
x86/cpuid: Drop a guests cached x86 family and model information

The model information isn't used at all, and the family information is only
used once.

Make get_cpu_family() a static inline (as it is just basic calculation, and
the function call is probably more expensive than the function itself) and
rearange the logic to avoid calculating model entirely if the caller doesn't
want it.

Calculate a guests family only when necessary in hvm_select_ioreq_server().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agokexec: implement STATUS hypercall to check if image is loaded
Eric DeVolder [Tue, 17 Jan 2017 17:29:16 +0000 (11:29 -0600)]
kexec: implement STATUS hypercall to check if image is loaded

The tools that use kexec are asynchronous in nature and do not keep
state changes. As such provide an hypercall to find out whether an
image has been loaded for either type.

Note: No need to modify XSM as it has one size fits all check and
does not check for subcommands.

Note: No need to check KEXEC_FLAG_IN_PROGRESS (and error out of
kexec_status()) as this flag is set only once by the first/only
cpu on the crash path.

Note: This is just the Xen side of the hypercall, kexec-tools patch
to come separately.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/arm: Don't mix GFN and MFN when using iomem_deny_access
Julien Grall [Tue, 17 Jan 2017 15:52:53 +0000 (15:52 +0000)]
xen/arm: Don't mix GFN and MFN when using iomem_deny_access

iomem_deny_access is working on MFN and not GFN. Make it clear by
renaming the local variables.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: bootfdt.c is only used during initialization
Julien Grall [Tue, 17 Jan 2017 15:53:24 +0000 (15:53 +0000)]
xen/arm: bootfdt.c is only used during initialization

This file contains data and code only used at initialization. Mark the
file as such in the build system and correct kind_guess.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86emul: support ADCX/ADOX
Jan Beulich [Tue, 17 Jan 2017 09:33:25 +0000 (10:33 +0100)]
x86emul: support ADCX/ADOX

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support POPCNT
Jan Beulich [Tue, 17 Jan 2017 09:32:54 +0000 (10:32 +0100)]
x86emul: support POPCNT

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: VEX.B is ignored in compatibility mode
Jan Beulich [Tue, 17 Jan 2017 09:32:25 +0000 (10:32 +0100)]
x86emul: VEX.B is ignored in compatibility mode

While VEX.R and VEX.X are guaranteed to be 1 in compatibility mode
(and hence a respective mode_64bit() check can be dropped), VEX.B can
be encoded as zero, but would be ignored by the processor. Since we
emulate instructions in 64-bit mode (except possibly in the test
harness), we need to force the bit to 1 in order to not act on the
wrong {X,Y,Z}MM register (which has no bad effect on 32-bit test
harness builds, as there the bit would again be ignored by the
hardware, and would by default be expected to be 1 anyway).

We must not, however, fiddle with the high bit of VEX.VVVV in the
decode phase, as that would undermine the checking of instructions
requiring the field to be all ones independent of mode. This is
being enforced in copy_REX_VEX() instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: suppress memory writes after faulting FPU insns
Jan Beulich [Tue, 17 Jan 2017 09:31:39 +0000 (10:31 +0100)]
x86emul: suppress memory writes after faulting FPU insns

FPU insns writing to memory must not touch memory if they latch #MF (to
be delivered on the next waiting FPU insn). Note that inspecting FSW.ES
needs to be avoided for all FNST* insns, as they don't raise exceptions
themselves, but may instead be invoked with the bit already set.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoAdd XENV to docs/misc
Stefano Stabellini [Mon, 16 Jan 2017 18:46:16 +0000 (10:46 -0800)]
Add XENV to docs/misc

Add the latest version of the XEN Environment table specification for
ACPI to docs/misc.

The original authors are:
  Parth Dixit<parth.dixit@linaro.org>
  Julien Grall<julien.grall@citrix.com>

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoAdd STAO spec to docs/misc
Stefano Stabellini [Mon, 16 Jan 2017 18:42:18 +0000 (10:42 -0800)]
Add STAO spec to docs/misc

Add the latest version of the STAtus Override table specification for
ACPI to docs/misc.

The original authors are:

  Al Stone <al.stone@linaro.org>
  Graeme Gregory <graeme.gregory@linaro.org>
  Parth Dixit<parth.dixit@linaro.org>

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/xstate: Fix array overrun on hardware with LWP
Andrew Cooper [Fri, 13 Jan 2017 18:51:04 +0000 (18:51 +0000)]
x86/xstate: Fix array overrun on hardware with LWP

c/s da62246e4c "x86/xsaves: enable xsaves/xrstors/xsavec in xen" introduced
setup_xstate_features() to allocate and fill xstate_offsets[] and
xstate_sizes[].

However, fls() casts xfeature_mask to 32bits which truncates LWP out of the
calculation.  As a result, the arrays are allocated too short, and the cpuid
infrastructure reads off the end of them when calculating xstate_size for the
guest.

On one test system, this results in 0x3fec83c0 being returned as the maximum
size of an xsave area, which surprisingly appears not to bother Windows or
Linux too much.  I suspect they both use current size based on xcr0, which Xen
forwards from real hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/pv: Check that emulate_privileged_op() don't change any unexpected flags
Andrew Cooper [Fri, 6 Jan 2017 20:05:36 +0000 (20:05 +0000)]
x86/pv: Check that emulate_privileged_op() don't change any unexpected flags

No bits, other than arithmetic ones and the resume flag (which will most
likely change from 1 to 0), can be changed by the instructions we permit.
Extend the check to cover other flags.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>