Andrew Cooper [Fri, 16 Dec 2016 16:21:20 +0000 (16:21 +0000)]
x86/cpuid: Move all xstate leaf handling into guest_cpuid()
The xstate union now contains sanitised values, so it can be handled fully in
the non-legacy path.
c/s 1c0bc709d "x86/cpuid: Perform max_leaf calculations in guest_cpuid()"
accidentally introduced a boundary error for the subleaf check, although it
was masked by the correct logic in the legacy path.
Two dynamic adjustments need making, but a TODO and BUILD_BUG_ON() are left to
cover a latent bug which will present itself when Xen starts supporting XSS
states for guests.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 4 Jan 2017 15:00:23 +0000 (15:00 +0000)]
x86/cpuid: Introduce recalculate_xstate()
All data in the xstate union, other than the Da1 feature word, is derived from
other state; either feature bits from other words, or layout information which
has already been collected by Xen's xstate driver.
Recalculate the xstate information for each policy object when the feature
bits may have changed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 12 Jan 2017 11:45:10 +0000 (11:45 +0000)]
x86/cpuid: Move x86_vendor from arch_domain to cpuid_policy
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Thu, 12 Jan 2017 11:45:10 +0000 (11:45 +0000)]
x86/cpuid: Drop a guests cached x86 family and model information
The model information isn't used at all, and the family information is only
used once.
Make get_cpu_family() a static inline (as it is just basic calculation, and
the function call is probably more expensive than the function itself) and
rearange the logic to avoid calculating model entirely if the caller doesn't
want it.
Calculate a guests family only when necessary in hvm_select_ioreq_server().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Eric DeVolder [Tue, 17 Jan 2017 17:29:16 +0000 (11:29 -0600)]
kexec: implement STATUS hypercall to check if image is loaded
The tools that use kexec are asynchronous in nature and do not keep
state changes. As such provide an hypercall to find out whether an
image has been loaded for either type.
Note: No need to modify XSM as it has one size fits all check and
does not check for subcommands.
Note: No need to check KEXEC_FLAG_IN_PROGRESS (and error out of
kexec_status()) as this flag is set only once by the first/only
cpu on the crash path.
Note: This is just the Xen side of the hypercall, kexec-tools patch
to come separately.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Tue, 17 Jan 2017 09:32:25 +0000 (10:32 +0100)]
x86emul: VEX.B is ignored in compatibility mode
While VEX.R and VEX.X are guaranteed to be 1 in compatibility mode
(and hence a respective mode_64bit() check can be dropped), VEX.B can
be encoded as zero, but would be ignored by the processor. Since we
emulate instructions in 64-bit mode (except possibly in the test
harness), we need to force the bit to 1 in order to not act on the
wrong {X,Y,Z}MM register (which has no bad effect on 32-bit test
harness builds, as there the bit would again be ignored by the
hardware, and would by default be expected to be 1 anyway).
We must not, however, fiddle with the high bit of VEX.VVVV in the
decode phase, as that would undermine the checking of instructions
requiring the field to be all ones independent of mode. This is
being enforced in copy_REX_VEX() instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 17 Jan 2017 09:31:39 +0000 (10:31 +0100)]
x86emul: suppress memory writes after faulting FPU insns
FPU insns writing to memory must not touch memory if they latch #MF (to
be delivered on the next waiting FPU insn). Note that inspecting FSW.ES
needs to be avoided for all FNST* insns, as they don't raise exceptions
themselves, but may instead be invoked with the bit already set.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 13 Jan 2017 18:51:04 +0000 (18:51 +0000)]
x86/xstate: Fix array overrun on hardware with LWP
c/s da62246e4c "x86/xsaves: enable xsaves/xrstors/xsavec in xen" introduced
setup_xstate_features() to allocate and fill xstate_offsets[] and
xstate_sizes[].
However, fls() casts xfeature_mask to 32bits which truncates LWP out of the
calculation. As a result, the arrays are allocated too short, and the cpuid
infrastructure reads off the end of them when calculating xstate_size for the
guest.
On one test system, this results in 0x3fec83c0 being returned as the maximum
size of an xsave area, which surprisingly appears not to bother Windows or
Linux too much. I suspect they both use current size based on xcr0, which Xen
forwards from real hardware.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 6 Jan 2017 20:05:36 +0000 (20:05 +0000)]
x86/pv: Check that emulate_privileged_op() don't change any unexpected flags
No bits, other than arithmetic ones and the resume flag (which will most
likely change from 1 to 0), can be changed by the instructions we permit.
Extend the check to cover other flags.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 13 Jan 2017 13:23:42 +0000 (13:23 +0000)]
x86/emul: Calculate not_64bit during instruction decode
... rather than repeating "generate_exception_if(mode_64bit(), EXC_UD);" in
the emulation switch statement.
Bloat-o-meter shows:
add/remove: 0/0 grow/shrink: 1/2 up/down: 8/-495 (-487)
function old new delta
per_cpu__state 98 106 +8
x86_decode 6782 6726 -56
x86_emulate 57160 56721 -439
The reason for x86_decode() getting smaller is that this change alters the
x86_decode_onebyte() switch statement from a chain of if()/else's to a jump
table. The jump table adds 250 bytes of data which bloat-o-meter clearly
can't see.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 13 Jan 2017 14:28:31 +0000 (15:28 +0100)]
x86emul: improve CR/DR access handling
- don't accept LOCK for DR accesses (it's undefined in the manuals)
- only accept LOCK for CR accesses when the respective feature flag is
set (which would not normally be the case for Intel)
- add (rather than or) 8 when LOCK is present; real hardware #UDs
when both REX.W and LOCK are present, implying that these would
rather access hypothetical CR16...23
- eliminate explicit decode_register() calls
- streamline remaining read/write code
No further functional change, i.e. not addressing the missing exception
generation (#UD for invalid CR/DR encodings, #GP(0) for invalid write
values, #DB for DR accesses with DR7.GD set).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 13 Jan 2017 14:24:45 +0000 (15:24 +0100)]
x86emul: conditionally clear BNDn for branches
Considering that we surface MPX to HVM guests, instructions we emulate
should also correctly deal with MPX state. While for now BND*
instructions don't get emulated, the effect of branches (which we do
emulate) without BND prefix should be taken care of.
No need to alter XABORT behavior: While not mentioned in the SDM so
far, this restores BNDn as they were at the XBEGIN, and since we make
XBEGIN abort right away, XABORT in the emulator is only a no-op.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 4 Jan 2017 12:46:09 +0000 (12:46 +0000)]
x86/cpuid: Effectively remove domain_cpuid()
The only callers of domain_cpuid() are the legacy cpuid path via
{pv,hvm}_cpuid(). Move domain_cpuid() to being private in cpuid.c, with an
adjusted API to use struct cpuid_leaf rather than individual pointers.
The ITSC clobbering logic is dropped. It is no longer necessary now that the
logic has moved into recalculate_cpuid_policy()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 4 Jan 2017 12:20:51 +0000 (12:20 +0000)]
x86/cpuid: Store the toolstacks choice of hypervisor max leaf
This removes all dependencies on the legacy cpuids[] array from
cpuid_hypervisor_leaves(). Swap a BUG() to an ASSERT_UNREACHABLE(), because
in the unlikely case that we hit it, returning all zeros to the guest is fine.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 4 Jan 2017 12:43:57 +0000 (12:43 +0000)]
x86/domctl: Move all CPUID update logic into update_domain_cpuid_info()
This simplifies the XEN_DOMCTL_set_cpuid handling, splitting the safety logic
away from the internals of how an update is completed.
The legacy cpuids[] logic is left in alone in a fuction, as it wont survive
very long. update_domain_cpuid_info() gains a small performance optimisation
to skip all update activites for leaves which won't have any impact on the
guest. This is temporary until the new hypercall API is completed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 12 Jan 2017 16:14:56 +0000 (16:14 +0000)]
x86/cpuid: Fix feature flags reported to dom0
c/s a11e8c9 "x86/pv: Use per-domain policy information in pv_cpuid()" switched
PV domains from using a (hardware for dom0, toolstack-chosen from domU) value
masked against pv_featureset[], to actually using the value calculated by
recalculate_cpuid_policy().
For domU, this is no practical change as the content is still chosen by the
toolstack. For dom0 however, we no longer have two sources of information
potentially clearing bits. Modern Linux seems to care about having CMP_LEGACY
set in its view of CPUID on an Intel box.
The deliberate setting of HTT, X2APIC and CMP_LEGACY in {pv,hvm}_featureset[]
is necessary for domUs, as the toolstack may have (tried to) set up topology
information in a different representation than the hardware uses. The bits
therefore needed to be set in the masks used in the older logic, to avoid
clobbering the toolstacks information.
Move the HTT/X2APIC/CMP_LEGACY logic from calculate_{pv,hvm}_max_policy()
(where the meaning of {pv,hvm}_featureset[] has changed subtly) to
recalculate_cpuid_policy() where the masking logic now lives.
This will cause {pv,hvm}_max_policy to actually contain real hardware values
(so dom0 sees real hardware values), but still allows the toolstack to set
bits not present in real hardware for domUs.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Tue, 10 Jan 2017 16:13:38 +0000 (17:13 +0100)]
tools/xenstore: start with empty data base
Today xenstored tries to open a tdb data base file on disk when it is
started. As this is problematic in most cases the scripts used to start
xenstored ensure xenstored won't find such a file in order to start
with an empty xenstore.
A tdb data base file can't be used to restore all Xenstore state as
e.g. Xenstore watches are not kept in the tdb data base. The file is
meant to be used for debugging purposes after a xenstored crash only.
Instead of opening a Xenstore data base file found on disk always start
with an empty data base. This will avoid problems in case someone is
testing multiple xenstored versions without rebooting (which is not
supported but helps debugging in some cases).
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 9 Jan 2017 13:17:01 +0000 (13:17 +0000)]
tools/libxc: Fix the reported max_leaf values for PV guests
When iterating through CPUID leaves to generating a policy, libxc will clip
itself at the hardcoded maxima, meaning that no data outside of the hardcoded
maxima are provided to Xen (in turn, causing Xen to return zeros if these
leaves are requested.)
The HVM code also clips the max_leaf data reported to the guest, but the PV
side didn't.
This results in a PV guest using the emulated CPUID, or via Xen using CPUID
faulting, to observe a max_leaf higher than the toolstack wants, although with
zeros being returned in the intervening leaves.
Fix the PV side to behave like the HVM side, and clip the max_leaf values in
leaf 0 and 0x80000000.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 11 Jan 2017 12:43:04 +0000 (13:43 +0100)]
x86emul: correct EFLAGS.TF handling
For repeated string instructions we should not emulate multiple
iterations in one go when a single step trap needs injecting (which
needs to happen after every iteration).
For all non-branch instructions as well as not taken conditional
branches we additionally need to take DebugCtl.BTF into consideration.
For mov-to/pop-into %ss there should be no #DB at all (EFLAGS.TF
remaining set means there'll be #DB after the next instruction).
Additionally retire.sti should remain clear when retire.singlestep gets
set to true.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com>
Jan Beulich [Wed, 11 Jan 2017 12:40:49 +0000 (13:40 +0100)]
x86/HVM: restrict permitted instructions during special purpose emulation
Most invocations of the instruction emulator are for VM exits where the
set of legitimate instructions (i.e. ones capable of causing the
respective exit) is rather small. Restrict the permitted sets via a new
callback, at once eliminating the abuse of handle_mmio() for non-MMIO
operations.
A seemingly unrelated comment adjustment is being done here to keep
x86_emulate() in sync with x86_insn_is_mem_write() (in the context of
which this was found to be wrong).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Alter the legacy-path prototypes to match guest_cpuid()
This allows the compiler to have a far easier time inlining the legacy paths
into guest_cpuid(), and avoids the need to have a full struct cpu_user_regs in
the guest_cpuid() stack frame.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <JBeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Effectively remove pv_cpuid() and hvm_cpuid()
All callers of pv_cpuid() and hvm_cpuid() (other than guest_cpuid() legacy
path) have been removed from the codebase. Move them into cpuid.c to avoid
any further use, leaving guest_cpuid() as the sole API to use.
This is purely code motion, with no functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/svm: Use guest_cpuid() rather than hvm_cpuid()
More work is required before LWP details can be read straight out of the
cpuid_policy block, but in the meantime hvm_cpuid() wants to disappear so
update the code to use the newer interface.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/hvm: Use guest_cpuid() rather than hvm_cpuid()
More work is required before maxphysaddr can be read straight out of the
cpuid_policy block, but in the meantime hvm_cpuid() wants to disappear so
update the code to use the newer interface.
Use the behaviour of max_leaf handling (returning all zeros) to avoid a double
call into guest_cpuid().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Move all leaf 7 handling into guest_cpuid()
All per-domain policy data concerning leaf 7 is accurate. Handle it all in
guest_cpuid() by reading out of the raw array block, and introduing a dynamic
adjustment for OSPKE.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Drop the temporary linear feature bitmap from struct cpuid_policy
With most uses of the *_featureset API removed, the remaining uses are only
during XEN_SYSCTL_get_cpu_featureset, init_guest_cpuid(), and
recalculate_cpuid_policy(), none of which are hot paths.
Drop the temporary infrastructure, and have the current users recreate the
linear bitmap using cpuid_policy_to_featureset(). This avoids storing
duplicated information in struct cpuid_policy.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/pv: Use per-domain policy information in pv_cpuid()
... rather than performing runtime adjustments. This is safe now that
recalculate_cpuid_policy() perfoms suitable sanitisation when the policy data
is loaded.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/svm: Improvements using named features
This avoids calling into hvm_cpuid() to obtain information which is directly
available. In particular, this avoids the need to overload flag_dr_dirty
because of hvm_cpuid() being unavailable in svm_save_dr().
flag_dr_dirty is returned to a boolean (as it was before c/s c097f549 which
introduced the need to overload it). While returning it to type bool, remove
the use of bool_t for the adjacent fields.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/vvmx: Use hvm_cr4_guest_valid_bits() to calculate MSR_IA32_VMX_CR4_FIXED1
Reuse the logic in hvm_cr4_guest_valid_bits() instead of duplicating it.
This fixes a bug to do with the handling of X86_CR4_PCE. The RDPMC
instruction predate the architectural performance feature, and has been around
since the P6. X86_CR4_PCE is like X86_CR4_TSD and only controls whether RDPMC
is available at cpl!=0, not whether RDPMC is generally unavailable.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/hvm: Improve hvm_efer_valid() using named features
Pick the appropriate cpuid_policy object rather than using hvm_cpuid() or
boot_cpu_data. This breaks the dependency on current.
As data is read straight out of cpuid_policy, there is no need to work around
the fact that X86_FEATURE_SYSCALL might be clear because of the dynamic
adjustment in hvm_cpuid(). This simplifies the SCE handling, as EFER.SCE can
be set in isolation in 32bit mode on Intel hardware.
Alter nestedhvm_enabled() to be const-correct, allowing hvm_efer_valid() to be
properly const-correct.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Introduce named feature bitfields
It greatly aids the readibility of code to express feature checks with their
direct name (e.g. p->basic.mtrr or p->extd.lm), rarther that by a field and a
bitmask. gen-cpuid.py is augmented to calculate a suitable declaration to
live in a union with the underlying feature word.
gen-cpuid.py doesn't know Xen's choice of naming for the feature word indicies
(and arguably shouldn't care), so provides the declarations in terms of their
numeric feature word index. The DECL_BITFIELD() macro (local to cpuid_policy)
takes a feature word index name and chooses the right declaration, to aid
clarity.
All X86_FEATURE_*'s are included in the naming, other than the features
fast-forwarded from other state (APIC, OSXSAVE, OSPKE), whose value cannot be
read out of the feature word.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Dispatch cpuid_hypervisor_leaves() from guest_cpuid()
... rather than from the legacy path. Update the API to match guest_cpuid(),
and remove its dependence on current.
Make use of guest_cpuid() unconditionally zeroing res to avoid repeated
re-zeroing. To use a const struct domain, domain_cpuid() needs to be
const-corrected.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/hvm: Dispatch cpuid_viridian_leaves() from guest_cpuid()
... rather than from the legacy path. Update the API to match guest_cpuid(),
and remove its dependence on current.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Recalculate a domains CPUID policy when appropriate
Introduce recalculate_cpuid_policy() which clamps a CPUID policy based on the
domains current restrictions.
Each adjustment introduced here mirrors what currently happens in
{pv,hvm}_cpuid(), although some logic is expressed differently.
* The clearing X86_FEATURE_LM for 32bit PV guests, sanitise_featureset()
takes out all 64bit-dependent features in one go.
* The toolstacks choice of X86_FEATURE_ITSC in (by default) clobbered in
domain_cpuid(), but {pv,hvm}_cpuid() needed to account for the host ITSC
value when masking the toolstack value.
This now requires that sanitise_featureset(), lookup_deep_deps() and
associated data needs to be available at runtime, so moves out of __init.
Recalculate the cpuid policy when:
* The domain is first created
* Switching a PV guest to being compat
* Setting disable_migrate or vTSC modes
* The toolstack sets new policy data
The disable_migrate code was previously common. To compensate, move the code
to each archs arch_do_domctl(), as the implementations now differ.
From this point on, domains have full and correct feature-leaf information in
their CPUID policies, allowing for substantial cleanup and improvements.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Allocate a CPUID policy for every domain
Introduce init_domain_cpuid_policy() to allocate an appropriate cpuid policy
for the domain (currently the domains maximum applicable policy), and call it
during domain construction.
init_guest_cpuid() now needs calling before dom0 is constructed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Move featuresets into struct cpuid_policy
Featuresets will eventually live only once in a struct cpuid_policy, but lots
of code currently uses the global featuresets as a linear bitmap. Remove the
existing global *_featureset bitmaps, replacing them with *_policy objects
containing named featureset words and a fs[] linear bitmap.
Two new helpers are introduced to scatter/gather a linear featureset bitmap
to/from the fixed word locations in struct cpuid_policy.
The existing calculate_raw_policy() already obtains the scattered raw
featureset. Gather the raw featureset into raw_policy.fs in
calculate_raw_policy() and drop calculate_raw_featureset() entirely.
Now that host_featureset can't be a straight define of
boot_cpu_data.x86_capability, introduce calculate_host_policy() to suitably
fill in host_policy from boot_cpu_data.x86_capability. (Future changes will
have additional sanitization logic in this function.)
The PV and HVM policy objects and calculation functions have max introduced to
their names, as there will eventually be a distinction between max and default
policies for each domain type. The existing logic works in terms of linear
bitmaps, so scatter the result back into the policy objects.
Leave some compatibility defines providing the old *_featureset API. This
results in no observed change in the *_featureset values, which are still used
at the hypercall and guest_cpuid() interfaces.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Introduce struct cpuid_policy
struct cpuid_policy will eventually be a complete replacement for the cpuids[]
array, with a fixed layout and named fields to allow O(1) access to specific
information.
For now, the CPUID content is capped at the 0xd and 0x8000001c leaves, which
matches the maximum policy that the toolstack will generate for a domain. The
xstate leaves extend up to LWP, and the structured features leaf is
implemented with subleaf properties (in anticipation of subleaf 1 appearing
soon), although only subleaf 0 is currently implemented.
Introduce calculate_raw_policy() which fills raw_policy with information,
making use of the new helpers, cpuid_{,count_}leaf().
Finally, rename calculate_featuresets() to init_guest_cpuid(), as it is going
to perform rather more work.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Jan 2017 11:59:02 +0000 (11:59 +0000)]
x86/cpuid: Introduce guest_cpuid() and struct cpuid_leaf
Longterm, pv_cpuid() and hvm_cpuid() will be merged into a single
guest_cpuid(), which is also capable of working outside of current context.
To aid this transtion, introduce guest_cpuid() with the intended API, which
simply defers back to pv_cpuid() or hvm_cpuid() as appropriate.
Introduce struct cpuid_leaf which is used to represent the results of a CPUID
query in a more efficient mannor than passing four pointers through the
calltree.
Update all codepaths which should use the new guest_cpuid() API. These are
the codepaths which have variable inputs, and (other than some specific
x86_emulate() cases) all pertain to servicing a CPUID instruction from a
guest.
The other codepaths using {pv,hvm}_cpuid() with fixed inputs will later be
adjusted to read their data straight from the policy block.
No intended functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevint.tian@intel.com>
x86/HVM: Fix teardown ordering in hvm_vcpu_destroy()
The order of destroy function calls in hvm_vcpu_destroy() should be
the reverse of init calls in hvm_vcpu_initialise().
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[ Fix up tasklet_kill() position ] Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 10 Jan 2017 10:46:59 +0000 (10:46 +0000)]
xenstore: bump TDB_VERSION
Commit 9e49dcf67f ("xenstore: add per-node generation counter) changed
the TDB layout, which - in order to not break older xenstored running
on the same system - need to be accompanied by a version bump.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Mon, 9 Jan 2017 15:22:32 +0000 (15:22 +0000)]
get_maintainer.pl: Teach brace expansion
Simpler non-nested brace expansion.
Some entries in the MAINTAINER are not understood by the script, the
ones that contain {,}. This patch fixes it.
This will convert brace expansion style use in MAINTAINER into a regex
that get_maintainer.pl can use to match a path again a maintainer
section.
It is done by using two different regex, the first one will take care of
converting ',' inside '{}' to a '|', one by one, as long as there is at
least two commas. The second regex will do the final convertion of '{,}'
to '(|)'.
With the patch, the right maintainers are displayed, instead of "THE
REST" maintainers, when using the following command for e.g.
$ ./scripts/get_maintainer.pl -f docs/misc/kconfig.txt
The patch also get rid of the warnings, with recent perl:
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^docs/misc/kconfig{ <-- HERE ,-language}\.txt/ at ./scripts/get_maintainer.pl line 731.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Tested-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
docs: move pci-device-reservations from misc to man
pci-device-reservations is references in xl.cfg(5), convert it as a man
page in pod format. The name is now prefixed with 'xen-' to avoid
possible name conflicts.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
vtpmmgr.txt is referenced in a man page, convert it to a man page.
The man page is named xen-vtpmmgr to avoid any conflict with other
potential vtpm docs.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
vtpm.txt is referenced in xl.cfg man page. Convert it to pod,
move it to the man folder and update the reference. The man page
is named xen-vtmp to avoid any potential conflict with other
VTPM documentation.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
docs: convert xl-disk-configuration into a man page
Convert xl-disk-configuration.txt from plain text file to a POD file
to get it as a man page. The references to it in the other man pages
are also updated.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Some of the docs/misc documents are written in markdown language.
As an effort to cleanup man pages these documents will be converted into
man pages. To avoid some more conversion, add rules to the docs/Makefile
to generate man pages out of markdown files as well as pod ones.
However, pandoc doesn't know how to convert man pages links. Thus the
man links in markdown pages won't work.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 6 Jan 2017 14:33:54 +0000 (14:33 +0000)]
xen/x86: Fix CONFIG_CRASH_DEBUG build following c/s 897129dea
Found by a Travis RANDCONFIG run.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Andrew Cooper [Wed, 7 Dec 2016 17:48:27 +0000 (17:48 +0000)]
x86/domctl: Make XEN_DOMCTL_set_address_size singleshot
Toolstacks (including some out-of-tree ones) use XEN_DOMCTL_set_address_size
at most once per domain, and it ends up having a destructive effect on the
available CPUID policy for a domain.
To avoid ordering issues between altering the policy via domctl, and the
constructive effects which would have to happen from switching back to native,
explicitly reject this case.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 6 Jan 2017 14:08:09 +0000 (15:08 +0100)]
x86: fix build with older versions of GCC following e34bc403c3
GCCs of at least 4.4 and earlier do not tollerate the initialisiation of the
$VENDOR_cpu_dev structures, because of c_ident becoming an anonymous union.
Instead of using an anonymous union, reintepret c_ident[] in its CPUID form
just in get_cpu_vendor().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 6 Jan 2017 14:07:31 +0000 (15:07 +0100)]
x86: use unambiguous register names
Eliminate the mis-naming of 64-bit fields with 32-bit register names
(eflags instead of rflags etc). To ensure no piece of code was missed,
transiently use the underscore prefixed names only for 32-bit register
accesses. This will be cleaned up subsequently.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 6 Jan 2017 14:06:09 +0000 (15:06 +0100)]
x86: drop cpu_has_sse{,2}
Commit dc88221c97 ("x86: rename XMM* features to SSE*") pointlessly
added them - these features are always available on 64-bit CPUs. (Let's
not assume this for MMX though in at least the insn emulator.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Thu, 5 Jan 2017 16:26:09 +0000 (10:26 -0600)]
x86/mtrr: use stdbool instead of int + define
Instead of using an int and providing a define for TRUE and FALSE,
change the code to use stdbool that Xen provides.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Minor style tweaks] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Boris Ostrovsky [Tue, 3 Jan 2017 14:04:12 +0000 (09:04 -0500)]
libxl: Update xenstore on VCPU hotplug for all guest types
Currently HVM guests that use upstream qemu do not update xenstore's
availability entry for VCPUs. While it is not strictly necessary for
hotplug to work, xenstore ends up not reflecting actual status of
VCPUs. We should fix this.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Fri, 23 Dec 2016 12:12:36 +0000 (12:12 +0000)]
build: move setting LTO options to xen/Rules.mk
Having them in StdGNU.mk would affect both hypervisor and tools build.
However judging from the commit message of e4cdd74f LTO was only meant
to affect hypvervisor build.
Move the relevant bits to xen/Rules.mk.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Thu, 5 Jan 2017 11:41:50 +0000 (11:41 +0000)]
x86/pv: Defer I/O bitmap checks even in 64bit mode for emulate_privilege_op()
The I/O bitmap doesn't change function depending on mode. 64bit userspace
such as an X server still needs to enter guest_io_okay() to find that the PV
kernel did set up an appropriate virtual I/O bitmap to permit access.
While moving the check, alter its representation to be easier to read.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 3 Jan 2017 11:55:54 +0000 (11:55 +0000)]
x86/vvmx: Drop sreg_to_index[]
Since c/s 0888d36b "x86/emul: Correct the decoding of SReg3 operands",
x86_seg_* have followed hardware encodings, meaning that this translation
table is now an identiy transform.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 5 Jan 2017 10:11:19 +0000 (11:11 +0100)]
x86/VMX: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc). Use the
guaranteed 32-bit underscore prefixed names for now where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Quan Xu [Thu, 5 Jan 2017 10:10:01 +0000 (11:10 +0100)]
x86/apicv: fix RTC periodic timer and apicv issue
When Xen apicv is enabled, wall clock time is faster on Windows7-32
guest with high payload (with 2vCPU, captured from xentrace, in
high payload, the count of IPI interrupt increases rapidly between
these vCPUs).
If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in vIRR), unfortunately, the IPI
intrrupt is high priority than periodic timer interrupt. Xen updates
IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
within VMX non-root operation without a VM-Exit. Within VMX non-root
operation, if periodic timer interrupt index of bit is set in vIRR and
highest, the apicv delivers periodic timer interrupt within VMX non-root
operation as well.
But in current code, if Xen doesn't update periodic timer interrupt bit
set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
of this case to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, then Xen will deliver a periodic timer interrupt again.
And that we update periodic timer interrupt in every VM-entry, there is
a chance that already-injected instance (before EOI-induced exit happens)
will incur another pending IRR setting if there is a VM-exit happens
between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
guest receives more periodic timer interrupt.
So we set eoi_exit_bitmap for intack.vector - give a chance to post
periodic time interrupts when periodic time interrupts become the
highest one.
Signed-off-by: Quan Xu <xuquan8@huawei.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Chao Gao <chao.gao@intel.com>
Andrew Cooper [Thu, 8 Dec 2016 08:46:42 +0000 (08:46 +0000)]
x86/cpuid: Untangle the <asm/cpufeature.h> include hierachy
The use of X86_FEATURES_ONLY was shortlived in Linux for the same problem
encountered here. The following series needs to add extra includes to
asm/cpuid.h, which breaks the build elsewhere given the current hierachy.
Move the feature definitions into a separate header file, which also matches
the solution Linux used.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Piotr Luc [Wed, 4 Jan 2017 13:29:30 +0000 (14:29 +0100)]
x86/mwait-idle: add Knights Mill CPUID
Add Knights Mill (KNM) to the list of CPUIDs supported by mwait-idle.
Signed-off-by: Piotr Luc <piotr.luc@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit: a2c1bc645e87346150516b3abf1933ed29d0f48b] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andy Shevchenko [Wed, 4 Jan 2017 13:29:08 +0000 (14:29 +0100)]
x86/mwait-idle: add CPU model 0x4a (Atom Z34xx series)
Add CPU ID for Atom Z34xx processors. Datasheets indicate support for this,
detailed information about potential quirks or limitations are missing, though.
So we just reuse the definition from official BSP code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit: 5e7ec268fd48d63cfd0e3a9be6c6443f01673bd4] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 4 Jan 2017 13:28:32 +0000 (14:28 +0100)]
x86emul: use unambiguous register names
This is in preparation of eliminating the mis-naming of 64-bit fields
with 32-bit register names (eflags instead of rflags etc).
Note that the result is not fully consistent until after at least one
more patch is in place, primarily to limit patch size (by trying to not
touch the same line twice).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 4 Jan 2017 13:28:02 +0000 (14:28 +0100)]
x86emul: make _PRE_EFLAGS() tolerate first argument being 32-bit
While this may appear to introduce a truncation issue, the high 32 bits
get zapped already anyway (early in _PRE_EFLAGS() as well as in
_POST_EFLAGS()). Once a subsequent patch switches to use proper 32-bit
EFLAGS operands, we'll in fact end up with more correct code, as that
zeroing of the upper halves will then go away.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>