]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agox86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case
Haozhong Zhang [Wed, 8 Mar 2017 14:10:29 +0000 (15:10 +0100)]
x86/vmce: fill MSR_IA32_MCG_STATUS on all vcpus in broadcast case

The current implementation only fills MC MSRs on vcpu0 and leaves MC
MSRs on other vcpus empty in the broadcast case. When guest reads 0
from MSR_IA32_MCG_STATUS on vcpuN (N > 0), it may think it's not
possible to recover the execution on that vcpu and then get panic,
although MSR_IA32_MCG_STATUS filled on vcpu0 may imply the injected
vMCE is actually recoverable. To avoid such unnecessary guest panic,
set MSR_IA32_MCG_STATUS on vcpuN (N > 0) to MCG_STATUS_MCIP|MCG_STATUS_RIPV.

In addition, fill_vmsr_data(mc_bank, ...) is changed to return -EINVAL
rather than 0, if an invalid domain ID is contained in mc_bank.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve()
Haozhong Zhang [Wed, 8 Mar 2017 14:10:06 +0000 (15:10 +0100)]
x86/mce: set mcinfo_comm.type and .size in x86_mcinfo_reserve()

All existing calls to x86_mcinfo_reserve() are followed by statements
that set the size and the type of the reserved space, so move them into
x86_mcinfo_reserve() to simplify the code.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mce: remove unused x86_mcinfo_add()
Haozhong Zhang [Wed, 8 Mar 2017 14:09:46 +0000 (15:09 +0100)]
x86/mce: remove unused x86_mcinfo_add()

c/s 9d13fd9fd320a7740c6446c048ff6a2990095966 turned to update the
mcinfo buffer in-place instead of using x86_mcinfo_add(). The last
uses of x86_mcinfo_add() were removed by that commit as well.
Therefore, x86_mcinfo_add() was deprecated in fact.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mce: adjust comment of callback register functions
Haozhong Zhang [Wed, 8 Mar 2017 14:09:16 +0000 (15:09 +0100)]
x86/mce: adjust comment of callback register functions

c/s e966818264908e842e2847f579ca4d94e586eaac added
mce_need_clearbank_register below the comment of
x86_mce_callback_register(). This commit (1) adjusts the first
paragraph of comment to be a general statement of all callback
register functions, and (2) moves the second paragraph to the
front of x86_mce_callback_register().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/MCE: sanitize domain/vcpu ID handling
Jan Beulich [Wed, 8 Mar 2017 14:07:41 +0000 (15:07 +0100)]
x86/MCE: sanitize domain/vcpu ID handling

Storing -1 into both fields was misleading consumers: We really should
have a manifest constant for "invalid vCPU" here, and the already
existing DOMID_INVALID should be used.

Also correct a bogus (dead code) check in mca_init_global(), at once
introducing a manifest constant for the early boot "invalid vCPU"
pointer (avoiding proliferation of the open coding). Make that pointer
a non-canonical address at once.

Finally, don't leave mc_domid uninitialized in mca_init_bank().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoMAINTAINERS: drop Christoph Egger
Jan Beulich [Wed, 8 Mar 2017 14:07:14 +0000 (15:07 +0100)]
MAINTAINERS: drop Christoph Egger

Other Amazon folks indicate he's not available as a maintainer anymore
at this point in time. Maintenance of the MCE sub-component will fall
back to the x86 maintainers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
8 years agox86/emul: Avoid #UD in SIMD stubs
Andrew Cooper [Tue, 7 Mar 2017 23:32:24 +0000 (23:32 +0000)]
x86/emul: Avoid #UD in SIMD stubs

v{,u}comis{s,d}, and vcvt{,t}s{s,d}2si are two-operand instructions, while
vzero{all,upper} take no operands.  Each require vex.reg set to ~0 to avoid
suffering #UD.

Spotted while fuzzing with AFL
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agovlapic/viridian: abort existing APIC assist if any vector is pending in ISR
Paul Durrant [Tue, 7 Mar 2017 14:58:04 +0000 (14:58 +0000)]
vlapic/viridian: abort existing APIC assist if any vector is pending in ISR

The vlapic code already aborts an APIC assist if an interrupt is deferred
because a higher priority interrupt has already been delivered (and hence
its vector is pending in the ISR).

However, it is also necessary to abort an APIC assist in the case where a
higher priority is about to be delivered because, in either case, at least
two vectors will be pending in the ISR and hence an EOI is necessary.

Also, following on from the above reasoning, the decision to start a new
APIC assist should clearly be based upon whether any other vector is
pending in the ISR, regardless of whether it is lower or higher in
priority. (In fact the code in question cannot be reached if the
vector is lower in priority). Thus the single use of
vlapic_find_lowest_vector() can be replaced with a call to
vlapic_find_highest_isr() and the former function removed.

Without this patch, because the logic is flawed, a domain_crash() results
when an attempt is made to erroneously start a new APIC assist.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Correct the decoding of mov to/from cr/dr
Andrew Cooper [Mon, 6 Mar 2017 10:29:17 +0000 (10:29 +0000)]
x86/emul: Correct the decoding of mov to/from cr/dr

The mov to/from cr/dr behave as if they were encoded with Mod = 3.  When
encoded with Mod != 3, no displacement or SIB bytes are fetched.

Add a test with a deliberately malformed ModRM byte.  (Also add the
automatically-generated simd.h to .gitignore.)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: drop unneeded __packed attributes
Roger Pau Monné [Tue, 7 Mar 2017 16:11:06 +0000 (17:11 +0100)]
x86: drop unneeded __packed attributes

There where a couple of unneeded packed attributes in several x86-specific
structures, that are obviously aligned. The only non-trivial one is
vmcb_struct, which has been checked to have the same layout with and without
the packed attribute using pahole. In that case add a build-time size check to
be on the safe side.

No functional change is expected as a result of this commit.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
8 years agox86emul: support SHA insns
Jan Beulich [Tue, 7 Mar 2017 16:10:33 +0000 (17:10 +0100)]
x86emul: support SHA insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support AESNI insns
Jan Beulich [Tue, 7 Mar 2017 16:10:07 +0000 (17:10 +0100)]
x86emul: support AESNI insns

... and their AVX equivalents.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support PCLMULQDQ
Jan Beulich [Tue, 7 Mar 2017 16:09:30 +0000 (17:09 +0100)]
x86emul: support PCLMULQDQ

... and its AVX equivalent.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: test coverage for SSE3/SSSE3/SSE4* insns
Jan Beulich [Tue, 7 Mar 2017 16:09:09 +0000 (17:09 +0100)]
x86emul: test coverage for SSE3/SSSE3/SSE4* insns

... and their AVX equivalents. Note that a few instructions aren't
covered (yet), but those all fall into common pattern groups, so I
would hope that for now we can do with what is there.

Just like for SSE/SSE2, MMX insns aren't being covered at all, as
they're not easy to deal with: The compiler refuses to emit such for
other than uses of built-in functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support SSE4.2 insns
Jan Beulich [Tue, 7 Mar 2017 16:08:47 +0000 (17:08 +0100)]
x86emul: support SSE4.2 insns

... and their AVX equivalents.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support SSE4.1 insns
Jan Beulich [Tue, 7 Mar 2017 16:08:19 +0000 (17:08 +0100)]
x86emul: support SSE4.1 insns

... and their AVX equivalents.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support SSSE3 insns
Jan Beulich [Tue, 7 Mar 2017 16:07:52 +0000 (17:07 +0100)]
x86emul: support SSSE3 insns

... and their AVX equivalents.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: add tables for 0f38 and 0f3a extension space
Jan Beulich [Tue, 7 Mar 2017 16:07:25 +0000 (17:07 +0100)]
x86emul: add tables for 0f38 and 0f3a extension space

Convert the few existing opcodes so far supported.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: honor MMXEXT feature flag
Jan Beulich [Tue, 7 Mar 2017 16:07:00 +0000 (17:07 +0100)]
x86emul: honor MMXEXT feature flag

This being a strict (MMX register only) subset of SSE, we can simply
adjust the respective checks while making the new predicate look at
both flags.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: test coverage for SSE/SSE2 insns
Jan Beulich [Tue, 7 Mar 2017 16:06:38 +0000 (17:06 +0100)]
x86emul: test coverage for SSE/SSE2 insns

... and their AVX equivalents. Note that a few instructions aren't
covered (yet), but those all fall into common pattern groups, so I
would hope that for now we can do with what is there.

MMX insns aren't being covered at all, as they're not easy to deal
with: The compiler refuses to emit such for other than uses of built-in
functions.

The current way of testing AVX insns is meant to be temporary only:
Once we fully support that feature, the present tests should rather be
replaced than full ones simply added.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support {,V}MOVNTDQA
Jan Beulich [Tue, 7 Mar 2017 16:05:47 +0000 (17:05 +0100)]
x86emul: support {,V}MOVNTDQA

... as the only post-SSE2 move insn.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support {,V}{LD,ST}MXCSR
Jan Beulich [Tue, 7 Mar 2017 16:05:24 +0000 (17:05 +0100)]
x86emul: support {,V}{LD,ST}MXCSR

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support MMX/SSE{,2,4a} insns with only register operands
Jan Beulich [Tue, 7 Mar 2017 16:04:57 +0000 (17:04 +0100)]
x86emul: support MMX/SSE{,2,4a} insns with only register operands

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support {,V}{,U}COMIS{S,D}
Jan Beulich [Tue, 7 Mar 2017 16:04:31 +0000 (17:04 +0100)]
x86emul: support {,V}{,U}COMIS{S,D}

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support MMX/SSE/SSE2 converts
Jan Beulich [Tue, 7 Mar 2017 16:04:08 +0000 (17:04 +0100)]
x86emul: support MMX/SSE/SSE2 converts

Note that other than most scalar instructions, vcvt{,t}s{s,d}2si do #UD
when VEX.l is set on at least some Intel models. To be on the safe
side, implement the most restrictive mode here for now when emulating
an Intel CPU, and simply clear the bit when emulating an AMD one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support MMX/SSE{,2,3} moves
Jan Beulich [Tue, 7 Mar 2017 16:03:45 +0000 (17:03 +0100)]
x86emul: support MMX/SSE{,2,3} moves

Previously supported insns are being converted to the new model, and
several new ones are being added.

To keep the stub handling reasonably simple, integrate SET_SSE_PREFIX()
into copy_REX_VEX(), at once switching the stubs to use an empty REX
prefix instead of a double DS: one (no byte registers are being
accessed, so an empty REX prefix has no effect), except (of course) for
the 32-bit test harness build.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: support most memory accessing MMX/SSE{,2,3} insns
Jan Beulich [Tue, 7 Mar 2017 16:02:53 +0000 (17:02 +0100)]
x86emul: support most memory accessing MMX/SSE{,2,3} insns

This aims at covering most MMX/SSEn/AVX instructions in the 0x0f-escape
space with memory operands. Not covered here are irregular moves,
converts, and {,U}COMIS{S,D} (modifying EFLAGS).

Note that the distinction between simd_*_fp isn't strictly needed, but
I've kept them as separate entries since in an earlier version I needed
them to be separate, and we may well find it useful down the road to
have that distinction.

Also take the opportunity and adjust the vmovdqu test case the new
LDDQU one here has been cloned from: To zero a ymm register we don't
need to go through hoops, as 128-bit AVX insns zero the upper portion
of the destination register, and in the disabled AVX2 code there was a
wrong YMM register used.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen/arm: fix affected memory range by dcache clean functions
Stefano Stabellini [Fri, 3 Mar 2017 01:15:26 +0000 (17:15 -0800)]
xen/arm: fix affected memory range by dcache clean functions

clean_dcache_va_range and clean_and_invalidate_dcache_va_range don't
calculate the range correctly when "end" is not cacheline aligned. As a
result, the last cacheline is not skipped. Fix the issue by aligning the
start address to the cacheline size.

In addition, make the code simpler and faster in
invalidate_dcache_va_range, by removing the module operation and using
bitmasks instead. Also remove the size adjustments in
invalidate_dcache_va_range, because the size variable is not used later
on.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
8 years agox86/mem_access: fix vm_event emulation check with altp2m enabled
Razvan Cojocaru [Mon, 6 Mar 2017 16:51:15 +0000 (17:51 +0100)]
x86/mem_access: fix vm_event emulation check with altp2m enabled

Currently, p2m_mem_access_emulate_check() uses p2m_get_mem_access()
to check if the page restrictions have been lifted between the time
of sending the vm_event out and the reception of the reply - in
which case emulation is no longer required. Unfortunately,
p2m_get_mem_access() uses p2m_get_hostp2m(d) which only checks the
default EPT (view 0 in altp2m parlance). This patch fixes this by
checking the active altp2m view instead, whenever applicable.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
8 years agoditch redundant integer types
Jan Beulich [Mon, 6 Mar 2017 16:49:45 +0000 (17:49 +0100)]
ditch redundant integer types

The very few uses can easily be replaced by more standard ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/cpuid: Fix booting on AMD Phenom 6-core platform
Andrew Cooper [Thu, 2 Mar 2017 19:58:20 +0000 (19:58 +0000)]
x86/cpuid: Fix booting on AMD Phenom 6-core platform

c/s 5cecf60f4 "x86/cpuid: Handle leaf 0x1 in guest_cpuid()" causes Linux 4.10
to crash during boot.

It turns out to be because of the reported apic_id, which was altered to be
more consistent across guests.  Revert back to the previous behaviour, by
limiting the apic_id adjustment to HVM guests only.  Whomever gets to fixes
topology representation is going to have a lot of fun with non-power-of-2 AMD
boxes.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
8 years agotools/xenstore: define off_t
Olaf Hering [Fri, 3 Mar 2017 08:52:09 +0000 (08:52 +0000)]
tools/xenstore: define off_t

talloc.h uses off_t, but did not include <sys/types.h>.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/arm: introduce vwfi parameter
Stefano Stabellini [Wed, 1 Mar 2017 19:43:15 +0000 (11:43 -0800)]
xen/arm: introduce vwfi parameter

Introduce new Xen command line parameter called "vwfi", which stands for
virtual wfi. The default is "trap": Xen traps guest wfi and wfe
instructions. In the case of wfi, Xen calls vcpu_block on the guest
vcpu; in the case of guest wfe, Xen calls vcpu_yield on the guest vcpu.
The behavior can be changed by setting vwfi to "native", in that case
Xen doesn't trap neither wfi nor wfe, running them in guest context.

The result is strong reduction in irq latency (from 5000ns to 2000ns,
measured using https://github.com/edgarigl/tbm, the physical timer, and
1 pcpu dedicated to 1 vcpu). The downside is that the scheduler thinks
that the guest is busy when actually is sleeping, leading to suboptimal
scheduling decisions.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
8 years agox86/SVM: correct boot time cpu_data[] handling
Jan Beulich [Fri, 3 Mar 2017 16:08:36 +0000 (17:08 +0100)]
x86/SVM: correct boot time cpu_data[] handling

start_svm() already runs after cpu_data[] was set up, so it shouldn't
modify it anymore (at least not directly). Constify the involved
pointers.

Furthermore LMSLE feature detection was broken by 566ddbe833 ("x86:
Fail CPU bringup cleanly if it cannot initialise HVM"), as Andrew
Cooper has pointed out: c couldn't possibly equal &boot_cpu_data
anymore. (But since it's unsafe migration-wise for some more time,
suppress the feature actually being enabled for us.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
8 years agox86/tboot: remove dead declarations
Jan Beulich [Fri, 3 Mar 2017 16:08:11 +0000 (17:08 +0100)]
x86/tboot: remove dead declarations

These aren't needed anymore as of c9a4a1c419 ("x86/layout: Correct
Xen's idea of its own memory layout").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoVMX: properly handle pi when all the assigned devices are removed
Feng Wu [Fri, 3 Mar 2017 16:07:08 +0000 (17:07 +0100)]
VMX: properly handle pi when all the assigned devices are removed

This patch handles some corner cases when the last assigned device
is removed from the domain. In this case we should carefully handle
pi descriptor and the per-cpu blocking list, to make sure:
- all the PI descriptor are in the right state when next time a
devices is assigned to the domain again.
- No remaining vcpus of the domain in the per-cpu blocking list.

Here we call vmx_pi_unblock_vcpu() to remove the vCPU from the blocking list
if it is on the list. However, this could happen when vmx_vcpu_block() is
being called, hence we might incorrectly add the vCPU to the blocking list
while the last devcie is detached from the domain. Consider that the situation
can only occur when detaching the last device from the domain and it is not
a frequent operation, so we use domain_pause before that, which is considered
as an clean and maintainable solution for the situation.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoxen: include xen/types.h in domain.h
Wei Liu [Wed, 1 Mar 2017 15:35:24 +0000 (15:35 +0000)]
xen: include xen/types.h in domain.h

The public header expects a few types to be present.

This works in the code base only because types.h is included by some
other headers which happen to be placed before the inclusion of
domain.h.

Include types.h before xen.h in domain.h to fix it properly.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen: move round_pg{up,down} to pfn.h
Wei Liu [Wed, 1 Mar 2017 15:23:31 +0000 (15:23 +0000)]
xen: move round_pg{up,down} to pfn.h

They are going to be needed in multiple places. Instead of replicating
more, move them to pfn.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Hold x86_emulate() to strict X86EMUL_EXCEPTION requirements
Andrew Cooper [Thu, 2 Mar 2017 11:41:17 +0000 (11:41 +0000)]
x86/emul: Hold x86_emulate() to strict X86EMUL_EXCEPTION requirements

All known paths raising faults behind the back of the emulator have been
fixed.  Reinstate the original intended assertion concerning the behaviour of
X86EMUL_EXCEPTION and ctxt->event_pending.

As x86_emulate_wrapper() now covers both PV and HVM guests properly, there is
no need for the PV assertions following calls to x86_emulate().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hvm: Don't raise #GP behind the emulators back for CR writes
Andrew Cooper [Thu, 2 Mar 2017 12:41:38 +0000 (12:41 +0000)]
x86/hvm: Don't raise #GP behind the emulators back for CR writes

hvm_set_cr{0,4}() are reachable from the emulator, but use
hvm_inject_hw_exception() directly.

Alter the API to make the callers of hvm_set_cr{0,3,4}() responsible for
raising #GP, and apply this change to all existing callers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Issues identified which I am purposefully not fixing in this patch:

(I will try to get around to them, but probably not in the 4.9 timeframe, at
this point.)

 * hvm_set_cr3() doesn't handle bad 32bit PAE PDPTRs properly, as it doesn't
   actually have a path which raises #GP.
 * There is a lot of redundancy in our HVM CR setting routines, but not enough
   to trivially dedup at this point.
 * Both nested VT-x and SVM are liable raise #GP with L1, rather than failing
   the virtual vmentry/vmexit.  This is not a change in behaviour, but is far
   more obvious now.
 * The hvm_do_resume() path for vm_event processing has the same bug as the
   MSR side, where exceptions are raised after %rip has moved forwards.  This
   is also not a change in behaviour.

8 years agox86/kconfig: Introduce CONFIG_PV and CONFIG_HVM
Andrew Cooper [Fri, 3 Feb 2017 13:55:26 +0000 (13:55 +0000)]
x86/kconfig: Introduce CONFIG_PV and CONFIG_HVM

Making PV and HVM guests individually compilable is useful as a reduction in
hypervisor size, and as an aid to enforcing clean API boundaries.

Introduce CONFIG_PV and CONFIG_HVM, although there is a lot of work to do
until either can actually be disabled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/apicv: fix wrong IPI suppression during posted interrupt delivery
Quan Xu [Fri, 3 Mar 2017 11:00:35 +0000 (12:00 +0100)]
x86/apicv: fix wrong IPI suppression during posted interrupt delivery

__vmx_deliver_posted_interrupt() wrongly used a softirq bit to decide whether
to suppress an IPI. Its logic was: the first time an IPI was sent, we set
the softirq bit. Next time, we would check that softirq bit before sending
another IPI. If the 1st IPI arrived at the pCPU which was in
non-root mode, the hardware would consume the IPI and sync PIR to vIRR.
During the process, no one (both hardware and software) will clear the
softirq bit. As a result, the following IPI would be wrongly suppressed.

This patch discards the suppression check, always sending an IPI.
The softirq also need to be raised. But there is a little change.
This patch moves the place where we raise a softirq for
'cpu != smp_processor_id()' case to the IPI interrupt handler.
Namely, don't raise a softirq for this case and set the interrupt handler
to pi_notification_interrupt()(in which a softirq is raised) regardless of
VT-d PI enabled or not. The only difference is when an IPI arrives at the
pCPU which is happened in non-root mode, the code will not raise a useless
softirq since the IPI is consumed by hardware rather than raise a softirq
unconditionally.

Signed-off-by: Quan Xu <xuquan8@huawei.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86emul: assert no duplicate mappings of stub space
Jan Beulich [Fri, 3 Mar 2017 11:00:05 +0000 (12:00 +0100)]
x86emul: assert no duplicate mappings of stub space

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/vvmx: add vmcs id check into vmptrld emulation
Sergey Dyasli [Fri, 3 Mar 2017 10:59:22 +0000 (11:59 +0100)]
x86/vvmx: add vmcs id check into vmptrld emulation

If a guest will do vmptrld with an incorrect vmcs id:

(XEN) Xen BUG at .../git/upstream/xen/xen/include/asm/hvm/vmx/vmx.h:333
(XEN) ----[ Xen-4.9-unstable  x86_64  debug=y   Tainted:    H ]----
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f925e>] vmcs.c#arch/x86/hvm/vmx/vmcs.o.unlikely+0x28/0x19a
(XEN)    [<ffff82d0801f602c>] virtual_vmcs_vmread+0x11/0x2c
(XEN)    [<ffff82d0802002cc>] vvmx.c#_map_io_bitmap+0x86/0x88
(XEN)    [<ffff82d080202399>] nvmx_handle_vmptrld+0xf0/0x1fb
(XEN)    [<ffff82d0801fe93c>] vmx_vmexit_handler+0x132b/0x1c49
(XEN)    [<ffff82d080203e6c>] vmx_asm_vmexit_handler+0x3c/0x120

Fix this by adding appropriate checks for vmcs id during vmptrld
emulation.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/vvmx: check vmcs address in vmread/vmwrite
Sergey Dyasli [Fri, 3 Mar 2017 10:58:47 +0000 (11:58 +0100)]
x86/vvmx: check vmcs address in vmread/vmwrite

If nested vmcs's address is invalid, virtual_vmcs_enter() will fail
during vmread/vmwrite:

(XEN) Xen BUG at .../git/upstream/xen/xen/include/asm/hvm/vmx/vmx.h:333
(XEN) ----[ Xen-4.9-unstable  x86_64  debug=y   Tainted:    H ]----
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f925e>] vmcs.c#arch/x86/hvm/vmx/vmcs.o.unlikely+0x28/0x19a
(XEN)    [<ffff82d0801f60e3>] virtual_vmcs_vmwrite_safe+0x16/0x52
(XEN)    [<ffff82d080202cb2>] nvmx_handle_vmwrite+0x70/0xfe
(XEN)    [<ffff82d0801fe98a>] vmx_vmexit_handler+0x1379/0x1c49
(XEN)    [<ffff82d08020427c>] vmx_asm_vmexit_handler+0x3c/0x120

Fix this by emulating VMfailInvalid if the address is invalid.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoVMX: make sure PI is in proper state before install the hooks
Feng Wu [Fri, 3 Mar 2017 10:58:13 +0000 (11:58 +0100)]
VMX: make sure PI is in proper state before install the hooks

We may hit the last ASSERT() in vmx_vcpu_block in the current code,
since vmx_vcpu_block() may get called before vmx_pi_switch_to()
has been installed or executed. Here We use cmpxchg to update
the NDST field, this can make sure we only update the NDST when
vmx_pi_switch_to() has not been called. So the NDST is in a
proper state in vmx_vcpu_block().

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoVMX: permanently assign PI hook vmx_pi_switch_to()
Feng Wu [Fri, 3 Mar 2017 10:57:30 +0000 (11:57 +0100)]
VMX: permanently assign PI hook vmx_pi_switch_to()

PI hook vmx_pi_switch_to() is needed even after any previously
assigned device is detached from the domain. Since 'SN' bit is
also used to control the CPU side PI and we change the state of
SN bit in vmx_pi_switch_to() and vmx_pi_switch_from(), then
evaluate this bit in vmx_deliver_posted_intr() when trying to
deliver the interrupt in posted way via software. The problem
is if we deassign the hooks while the vCPU is runnable in the
runqueue with 'SN' set, all the furture notificaton event will
be suppressed. This patch makes the hook permanently assigned.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>).
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/hvm: Adjust hvm_nx_enabled() to match how Xen behaves
Andrew Cooper [Tue, 5 Jul 2016 09:40:21 +0000 (10:40 +0100)]
x86/hvm: Adjust hvm_nx_enabled() to match how Xen behaves

On Intel hardware, EFER is not fully switched between host and guest contexts.
In practice, this means that Xen's EFER.NX setting leaks into guest context,
and influences the behaviour of the hardware pagewalker.

When servicing a pagefault, Xen's model of guests behaviour should match
hardware's behaviour, to allow correct interpretation of the pagefault error
code, and to avoid creating observable difference in behaviour from the guests
point of view.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/microcode: Replace sync_core() with cpuid_eax()
Andy Lutomirski [Fri, 9 Dec 2016 18:24:07 +0000 (10:24 -0800)]
x86/microcode: Replace sync_core() with cpuid_eax()

The Intel microcode driver is using sync_core() to mean "do CPUID
with EAX=1".

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
[Linux commit 484d0e5c7943644cc46e7308a8f9d83be598f2b9]
[Ported to Xen]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/common: low performance of lib/sort.c
keios [Tue, 3 Oct 2006 08:13:49 +0000 (01:13 -0700)]
xen/common: low performance of lib/sort.c

It is a non-standard heap-sort algorithm implementation because the index
of child node is wrong .  The sort function still outputs right result, but
the performance is O( n * ( log(n) + 1 ) ) , about 10% ~ 20% worse than
standard algorithm.

Signed-off-by: keios <keios.cn@gmail.com>
[Linux commit: d3717bdf8f08a0e1039158c8bab2c24d20f492b6]
[Ported to Xen]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86emul: correct decoding of vzero{all,upper}
Jan Beulich [Thu, 2 Mar 2017 15:08:27 +0000 (16:08 +0100)]
x86emul: correct decoding of vzero{all,upper}

These VEX encoded insns aren't followed by a ModR/M byte.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/VMX: switch away from temporary 32-bit register names
Jan Beulich [Thu, 2 Mar 2017 15:07:42 +0000 (16:07 +0100)]
x86/VMX: switch away from temporary 32-bit register names

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoxl: lift logfile declaration to xl.h
Wei Liu [Wed, 1 Mar 2017 10:24:55 +0000 (10:24 +0000)]
xl: lift logfile declaration to xl.h

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxl: lift common_domname declaration to xl.h
Wei Liu [Wed, 1 Mar 2017 10:24:54 +0000 (10:24 +0000)]
xl: lift common_domname declaration to xl.h

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxl: remove declaration of ctx in c files
Wei Liu [Wed, 1 Mar 2017 10:24:53 +0000 (10:24 +0000)]
xl: remove declaration of ctx in c files

There is already one in xl.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxl: add CODING_STYLE
Wei Liu [Wed, 1 Mar 2017 10:24:52 +0000 (10:24 +0000)]
xl: add CODING_STYLE

Copy the one in libxl, remove the irrelevant bits about libxl. Replace
libxl with xl where appropriate.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoCONTRIBUTING: list xl in inbound license section
Wei Liu [Wed, 1 Mar 2017 10:24:51 +0000 (10:24 +0000)]
CONTRIBUTING: list xl in inbound license section

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools: set pkg-config path when configuring qemu
Juergen Gross [Thu, 2 Mar 2017 05:13:18 +0000 (06:13 +0100)]
tools: set pkg-config path when configuring qemu

When calling configure for qemu provide the local pkg-config directory
in order to let the configure process find the libxenctrl version.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools: use a dedicated build directory for qemu
Juergen Gross [Thu, 2 Mar 2017 05:13:17 +0000 (06:13 +0100)]
tools: use a dedicated build directory for qemu

Instead of using the downloaded git tree as target directory for the
qemu build create a dedicated directory for that purpose.

This way it is possible to use the same source directory of qemu to
configure and build qemu upstream in a stubdom environment in future.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools: add pkg-config file for libxc
Juergen Gross [Thu, 2 Mar 2017 05:13:16 +0000 (06:13 +0100)]
tools: add pkg-config file for libxc

When configuring the build of qemu the configure script is building
various test programs to determine the exact version of libxencontrol.

Instead of a try and error approach needing updates for nearly each
new version of Xen just provide xencontrol.pc to be used via
pkg-config.

In the end we need two different variants of that file: one for the
target system where eventually someone wants to build qemu, and one
for the local system to be used for building qemu as part of the Xen
build process.

The local variant is created in a dedicated directory in order to be
able to collect more pkg-config files used for building tools there.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agostubdom: set xen interface version for stubdom apps using xenctrl.h
Juergen Gross [Thu, 2 Mar 2017 05:13:15 +0000 (06:13 +0100)]
stubdom: set xen interface version for stubdom apps using xenctrl.h

A stubdom app using xenctrl.h must use the latest interface version of
Xen in order to avoid compatibility issues. Add the related config
item to the stubdom config files where needed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools: Fix build of QEMU with lib xendevicemodel support
Anthony PERARD [Thu, 2 Mar 2017 11:22:35 +0000 (11:22 +0000)]
tools: Fix build of QEMU with lib xendevicemodel support

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoMAINTAINERS: add Marek as maintainer of python bindings
Wei Liu [Wed, 1 Mar 2017 12:32:26 +0000 (12:32 +0000)]
MAINTAINERS: add Marek as maintainer of python bindings

Marek has kindly agreed to step up and co-maintain the python bindings.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
8 years agotools/libxendevicemodel: define O_CLOEXEC
Olaf Hering [Wed, 1 Mar 2017 12:27:08 +0000 (12:27 +0000)]
tools/libxendevicemodel: define O_CLOEXEC

Some libc headers don't have O_CLOEXEC, we need to take care of it by
defining to 0 (on the ground that such glibc might barf on O_CLOEXEC).

Fixes e7745d8ef5 ("tools/libxendevicemodel: introduce a Linux-specific
implementation")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoacpi: check if mapping is valid before reading / writing
Wei Liu [Wed, 1 Mar 2017 11:07:24 +0000 (11:07 +0000)]
acpi: check if mapping is valid before reading / writing

If acpi_map_os_memory has failed, return early with AE_ERROR.

Coverity-ID: 1401601
Coverity-ID: 1401602

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agodocs/misc: add PV Calls Protocol
Stefano Stabellini [Tue, 14 Feb 2017 21:34:52 +0000 (13:34 -0800)]
docs/misc: add PV Calls Protocol

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

This version of the document covers networking function calls, such as
connect, accept, bind, release, listen, poll, recvmsg and sendmsg; but
the protocol is meant to be easily extended to cover different sets of
calls. Unimplemented commands return ENOTSUP.

Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agodocs/misc: Xen transport for 9pfs
Stefano Stabellini [Tue, 14 Feb 2017 21:33:15 +0000 (13:33 -0800)]
docs/misc: Xen transport for 9pfs

9pfs is a network filesystem protocol developed for Plan 9. 9pfs is very
simple and describes a series of commands and responses. It is
completely independent from the communication channels, in fact many
clients and servers support multiple channels, usually called
"transports". For example the Linux client supports tcp and unix
sockets, fds, virtio and rdma.

This design document outlines the transport protocol for
9pfs payload.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
8 years agoxen/tools: tracing: Report next slice time when continuing as well as switching
Dario Faggioli [Wed, 1 Mar 2017 16:56:35 +0000 (16:56 +0000)]
xen/tools: tracing: Report next slice time when continuing as well as switching

We record trace information about the next timeslice when
switching to a different vcpu, but not when continuing to
run the same cpu:

 csched2:schedule cpu 9, rq# 1, idle, SMT idle, tickled
 csched2:runq_candidate d0v3, 0 vcpus skipped, cpu 9 was tickled
 sched_switch prev d32767v9, run for 991.186us
 sched_switch next d0v3, was runnable for 2.515us, next slice 10000.0us
 sched_switch prev d32767v9 next d0v3              ^^^^^^^^^^^^^^^^^^^^
 runstate_change d32767v9 running->runnable
 ...
 csched2:schedule cpu 2, rq# 0, busy, not tickled
 csched2:burn_credits d1v5, credit = 9996950, delta = 502913
 csched2:runq_candidate d1v5, 0 vcpus skipped, no cpu was tickled
 runstate_continue d1v5 running->running
                                         ?????????????

This information is quite useful; so add a trace including
that information on the 'continue_running' path as well,
like this:

 csched2:schedule cpu 1, rq# 0, busy, not tickled
 csched2:burn_credits d0v8, credit = 9998645, delta = 12104
 csched2:runq_candidate d0v8, credit = 9998645, 0 vcpus skipped, no cpu was tickled
 sched_switch continue d0v8, run for 1125.820us, next slice 9998.645us
 runstate_continue d0v8 running->running         ^^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen/tools: tracing: trace (Credit2) runq traversal.
Dario Faggioli [Wed, 1 Mar 2017 16:56:35 +0000 (16:56 +0000)]
xen/tools: tracing: trace (Credit2) runq traversal.

When traversing a Credit2 runqueue to select the
best candidate vCPU to be run next, show in the
trace which vCPUs we consider.

A bit verbose, but quite useful, considering that
we may end up looking at, but then discarding, one
of more vCPU. This will help understand which ones
are skipped and why.

Also, add how much credits the chosen vCPU has
(in the TRC_CSCHED2_RUNQ_CANDIDATE record). And,
while there, fix a bug in tools/xentrace/formats
(still in the output of TRC_CSCHED2_RUNQ_CANDIDATE).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: group the runq manipulating functions.
Dario Faggioli [Wed, 1 Mar 2017 16:56:35 +0000 (16:56 +0000)]
xen: credit2: group the runq manipulating functions.

So that they're all close among each other, and
also near to the comment describing the runqueue
organization (which is also moved).

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: tidy up functions names by removing leading '__'.
Dario Faggioli [Wed, 1 Mar 2017 16:56:35 +0000 (16:56 +0000)]
xen: credit2: tidy up functions names by removing leading '__'.

There is no reason for having pretty much all of the
functions whose names begin with double underscores
('__') to actually look like that.

In fact, that is misleading and makes the code hard
to read and understand. So, remove the '__'-s.

The only two that we keep are __runq_assign() and
__runq_deassign() (althought they're converted to
single underscore). In fact, in those cases, it is
indeed useful to have those sort of a "raw" variants.

In case of __runq_insert(), which is only called
once, by runq_insert(), merge the two functions.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: make accessor helpers inline functions instead of macros
Dario Faggioli [Wed, 1 Mar 2017 16:56:34 +0000 (16:56 +0000)]
xen: credit2: make accessor helpers inline functions instead of macros

There isn't any particular reason for the accessor helpers
to be macro, so turn them into 'static inline'-s, which are
better.

Note that it is necessary to move the function definitions
below the structure declarations.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: don't miss accounting while doing a credit reset.
Dario Faggioli [Wed, 1 Mar 2017 16:56:34 +0000 (16:56 +0000)]
xen: credit2: don't miss accounting while doing a credit reset.

A credit reset basically means going through all the
vCPUs of a runqueue and altering their credits, as a
consequence of a 'scheduling epoch' having come to an
end.

Blocked or runnable vCPUs are fine, all the credits
they've spent running so far have been accounted to
them when they were scheduled out.

But if a vCPU is running on a pCPU, when a reset event
occurs (on another pCPU), that does not get properly
accounted. Let's therefore begin to do so, for better
accuracy and fairness.

In fact, after this patch, we see this in a trace:

 csched2:schedule cpu 10, rq# 1, busy, not tickled
 csched2:burn_credits d1v5, credit = 9998353, delta = 202996
 runstate_continue d1v5 running->running
 ...
 csched2:schedule cpu 12, rq# 1, busy, not tickled
 csched2:burn_credits d1v6, credit = -1327, delta = 9999544
 csched2:reset_credits d0v13, credit_start = 10500000, credit_end = 10500000, mult = 1
 csched2:reset_credits d0v14, credit_start = 10500000, credit_end = 10500000, mult = 1
 csched2:reset_credits d0v7, credit_start = 10500000, credit_end = 10500000, mult = 1
 csched2:burn_credits d1v5, credit = 201805, delta = 9796548
 csched2:reset_credits d1v5, credit_start = 201805, credit_end = 10201805, mult = 1
 csched2:burn_credits d1v6, credit = -1327, delta = 0
 csched2:reset_credits d1v6, credit_start = -1327, credit_end = 9998673, mult = 1

Which shows how d1v5 actually executed for ~9.796 ms,
on pCPU 10, when reset_credit() is executed, on pCPU
12, because of d1v6's credits going below 0.

Without this patch, this 9.796ms are not accounted
to anyone. With this patch, d1v5 is charged for that,
and its credits drop down from 9796548 to 201805.

And this is important, as it means that it will
begin the new epoch with 10201805 credits, instead
of 10500000 (which he would have, before this patch).

Basically, we were forgetting one round of accounting
in epoch x, for the vCPUs that are running at the time
the epoch ends. And this meant favouring a little bit
these same vCPUs, in epoch x+1, providing them with
the chance of execute longer than their fair share.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: always mark a tickled pCPU as... tickled!
Dario Faggioli [Wed, 1 Mar 2017 16:56:34 +0000 (16:56 +0000)]
xen: credit2: always mark a tickled pCPU as... tickled!

In fact, whether or not a pCPU has been tickled, and is
therefore about to re-schedule, is something we look at
and base decisions on in various places.

So, let's make sure that we do that basing on accurate
information.

While there, also tweak a little bit smt_idle_mask_clear()
(used for implementing SMT support), so that it only alter
the relevant cpumask when there is the actual need for this.
(This is only for reduced overhead, behavior remains the
same).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
8 years agox86/vpmu: disable VPMU if guest's CPUID indicates no PMU support
Boris Ostrovsky [Wed, 1 Mar 2017 16:51:16 +0000 (17:51 +0100)]
x86/vpmu: disable VPMU if guest's CPUID indicates no PMU support

When toolstack overrides Intel CPUID leaf 0xa's PMU version with an
invalid value VPMU should not be available to the guest.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/vpmu: add get/put_vpmu() and VPMU_AVAILABLE
Boris Ostrovsky [Wed, 1 Mar 2017 16:50:48 +0000 (17:50 +0100)]
x86/vpmu: add get/put_vpmu() and VPMU_AVAILABLE

vpmu_enabled() (used by hvm/pv_cpuid() to properly report 0xa leaf
for Intel processors) is based on the value of VPMU_CONTEXT_ALLOCATED
bit. This is problematic:
* For HVM guests VPMU context is allocated lazily, during the first
  access to VPMU MSRs. Since the leaf is typically queried before guest
  attempts to read or write the MSRs it is likely that CPUID will report
  no PMU support
* For PV guests the context is allocated eagerly but only in responce to
  guest's XENPMU_init hypercall. There is a chance that the guest will
  try to read CPUID before making this hypercall.

This patch introduces VPMU_AVAILABLE flag which is set (subject to vpmu_mode
constraints) during VCPU initialization for both PV and HVM guests. Since
this flag is expected to be managed together with vpmu_count, get/put_vpmu()
are added to simplify code.

vpmu_enabled() (renamed to vpmu_available()) can now use this new flag.

(As a side affect this patch also fixes a race in pvpmu_init() where we
increment vcpu_count in vpmu_initialise() after checking vpmu_mode)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mm: switch away from temporary 32-bit register names
Jan Beulich [Wed, 1 Mar 2017 16:49:57 +0000 (17:49 +0100)]
x86/mm: switch away from temporary 32-bit register names

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoefi/boot: Don't free ebmalloc area at all
Andrew Cooper [Tue, 28 Feb 2017 14:07:09 +0000 (14:07 +0000)]
efi/boot: Don't free ebmalloc area at all

Freeing part of the BSS back for general use proves to be problematic.  It is
not accounted for in xen_in_range(), causing errors when constructing the
IOMMU tables, resulting in a failure to boot.

Other smaller issues are that tboot treats the entire BSS as hypervisor data,
creating and checking a MAC of it on S3, and that, by being 1MB in size,
freeing it guarentees to shatter the hypervisor superpage mappings.

This is a stopgap fix to unblock master, while alternatives are discussed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/Viridian: switch away from temporary 32-bit register names
Jan Beulich [Wed, 1 Mar 2017 09:40:48 +0000 (10:40 +0100)]
x86/Viridian: switch away from temporary 32-bit register names

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86/SVM: switch away from temporary 32-bit register names
Jan Beulich [Wed, 1 Mar 2017 09:40:22 +0000 (10:40 +0100)]
x86/SVM: switch away from temporary 32-bit register names

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
8 years agox86/HVMemul: switch away from temporary 32-bit register names
Jan Beulich [Wed, 1 Mar 2017 09:39:44 +0000 (10:39 +0100)]
x86/HVMemul: switch away from temporary 32-bit register names

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86/HVM: switch away from temporary 32-bit register names
Jan Beulich [Wed, 1 Mar 2017 09:39:06 +0000 (10:39 +0100)]
x86/HVM: switch away from temporary 32-bit register names

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: switch away from temporary 32-bit register names
Jan Beulich [Wed, 1 Mar 2017 09:38:30 +0000 (10:38 +0100)]
x86: switch away from temporary 32-bit register names

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: re-introduce non-underscore prefixed 32-bit register names
Jan Beulich [Wed, 1 Mar 2017 09:37:28 +0000 (10:37 +0100)]
x86: re-introduce non-underscore prefixed 32-bit register names

For a transitional period (until we've managed to replace all
underscore prefixed instances), allow both names to co-exist.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/hvm: check HAP before enabling nested VMX
Haozhong Zhang [Wed, 1 Mar 2017 09:30:32 +0000 (10:30 +0100)]
x86/hvm: check HAP before enabling nested VMX

The current implementation of nested VMX cannot work without HAP.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: ensure copying runstate/time to L1 rather than L2
Haozhong Zhang [Wed, 1 Mar 2017 09:29:57 +0000 (10:29 +0100)]
x86: ensure copying runstate/time to L1 rather than L2

For a HVM domain, if a vcpu is in the nested guest mode,
__raw_copy_to_guest(), __copy_to_guest() and __copy_field_to_guest()
used by update_runstate_area() and update_secondary_system_time() will
copy data to L2 guest rather than the L1 guest.

This commit temporally clears the nested guest flag before all guest
copies in update_runstate_area() and update_secondary_system_time(),
and restores the flag after those guest copy operations.

The flag clear/restore is combined with the existing
smap_policy_change() which is renamed to update_guest_memory_policy().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoiommu: elaborate the usage of RMRR specification on the command line
Venu Busireddy [Wed, 1 Mar 2017 09:29:23 +0000 (10:29 +0100)]
iommu: elaborate the usage of RMRR specification on the command line

As some users have suggested, elaborate the usage of RMRR specification
on the command line, and provide a usage example.

Also, always treat the specified page numbers as hexadecimal values.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agopassthrough: reject self-(de)assignment of devices
Chao Gao [Wed, 1 Mar 2017 09:28:35 +0000 (10:28 +0100)]
passthrough: reject self-(de)assignment of devices

That is to say, don't support a domain assigns a device to itself or detachs
a device from itself.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/arm: warn if dom0_mem is not specified
Stefano Stabellini [Tue, 28 Feb 2017 18:56:14 +0000 (10:56 -0800)]
xen/arm: warn if dom0_mem is not specified

The default dom0_mem is 128M which is not sufficient to boot a Ubuntu
based Dom0. It is not clear what a better default value could be.

Instead, loudly warn the user when dom0_mem is unspecified and wait 3
secs. Then use 512M.

Update the docs to specify that dom0_mem is required on ARM. (The
current xen-command-line document does not actually reflect the current
behavior of dom0_mem on ARM correctly.)

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxl: fix compilation of xl_migrate.c
Roger Pau Monne [Tue, 28 Feb 2017 17:31:04 +0000 (17:31 +0000)]
xl: fix compilation of xl_migrate.c

The usage of signal(3) requires the inclusion of the signal.h header:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/signal.html

This fixes the build on FreeBSD.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/layout: Correct Xen's idea of its own memory layout
Andrew Cooper [Tue, 28 Feb 2017 15:17:17 +0000 (15:17 +0000)]
x86/layout: Correct Xen's idea of its own memory layout

c/s b4cd59fe "x86: reorder .data and .init when linking" had an unintended
side effect, where xen_in_range() and the tboot S3 MAC were no longer correct.

In practice, it means that Xen's .data section is excluded from consideration,
which means:
 1) Default IOMMU construction for the hardware domain could create mappings.
 2) .data isn't included in the tboot MAC checked on resume from S3.

Adjust the comments and virtual address anchors used to define the regions.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoxenstore: remove memory report command line support
Juergen Gross [Fri, 24 Feb 2017 06:21:45 +0000 (07:21 +0100)]
xenstore: remove memory report command line support

As a memory report can now be triggered via XS_CONTROL support via
command line and signal handler is no longer needed. Remove it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: make memory report available via XS_CONTROL
Juergen Gross [Fri, 24 Feb 2017 06:21:44 +0000 (07:21 +0100)]
xenstore: make memory report available via XS_CONTROL

Add a XS_CONTROL command to xenstored for doing a talloc report to a
file. Right now this is supported by specifying a command line option
when starting xenstored and sending a signal to the daemon to trigger
the report.

To dump the report to the standard log file call:

xenstore-control memreport

To dump the report to a new file call:

xenstore-control memreport <file>

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: add support for changing log functionality dynamically
Juergen Gross [Fri, 24 Feb 2017 06:21:43 +0000 (07:21 +0100)]
xenstore: add support for changing log functionality dynamically

Today Xenstore supports logging only if specified at start of the
Xenstore daemon. As it can't be disabled during runtime it is not
recommended to start xenstored with logging enabled.

Add support for switching logging on and off at runtime and to
specify a (new) logfile. This is done via the XS_CONTROL wire command
which can be sent with xenstore-control.

To switch logging on just use:

xenstore-control log on

To switch it off again:

xenstore-control log off

To specify a (new) logfile:

xenstore-control logfile <file>

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: enhance control command support
Juergen Gross [Fri, 24 Feb 2017 06:21:42 +0000 (07:21 +0100)]
xenstore: enhance control command support

The Xenstore protocol supports the XS_CONTROL command for triggering
various actions in the Xenstore daemon. Enhance that support by using
a command table and adding a help function.

Support multiple control commands in the associated xenstore-control
program used to issue XS_CONTROL commands.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: Split out XS_CONTROL action to dedicated source file
Juergen Gross [Fri, 24 Feb 2017 06:21:41 +0000 (07:21 +0100)]
xenstore: Split out XS_CONTROL action to dedicated source file

Move the XS_CONTROL handling of xenstored to a new source file
xenstored_control.c.

In order to avoid making get_string() in xenstored_core.c globally
visible use strlen() instead, which is save in this context due to
xs_count_strings() before returned a value > 1.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: rename XS_DEBUG wire command
Juergen Gross [Fri, 24 Feb 2017 06:21:40 +0000 (07:21 +0100)]
xenstore: rename XS_DEBUG wire command

In preparation to support other than pure debug functionality via the
Xenstore XS_DEBUG wire command rename it to XS_CONTROL and make
XS_DEBUG an alias of it.

Add an alias xs_control_command for the associated xs_debug_command,
too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxl: merge xl_cmdimpl.c into xl.c
Wei Liu [Fri, 24 Feb 2017 16:01:45 +0000 (16:01 +0000)]
xl: merge xl_cmdimpl.c into xl.c

After splitting out all the meaty bits, xl_cmdimpl.c doesn't contain
much. Merge the rest into xl.c and delete the file.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxl: split out migration related code
Wei Liu [Fri, 24 Feb 2017 15:59:32 +0000 (15:59 +0000)]
xl: split out migration related code

Include COLO / Remus code because they are built on top of the existing
migration protocol.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxl: split out save/restore related code
Wei Liu [Fri, 24 Feb 2017 15:54:52 +0000 (15:54 +0000)]
xl: split out save/restore related code

Add some function declarations to xl.h because they are now needed in
multiple files.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>