]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agox86emul: in_longmode() should not ignore ->read_msr() errors
Jan Beulich [Wed, 23 Nov 2016 14:27:47 +0000 (15:27 +0100)]
x86emul: in_longmode() should not ignore ->read_msr() errors

All present hook implementations succeed for EFER, but we shouldn't
really build on this being the case.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86emul: simplify DstBitBase handling code
Jan Beulich [Wed, 23 Nov 2016 14:27:17 +0000 (15:27 +0100)]
x86emul: simplify DstBitBase handling code

..., at once making it more obvious that even in the negative bit
offset case the resulting bit offset to be used by the inlined
instructions will always be constrained to the operand size of the
original instruction.

Also add a test case which would have failed without the XSA-195 fix.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/HVM: correct error code writing during task switch
Jan Beulich [Wed, 23 Nov 2016 14:26:51 +0000 (15:26 +0100)]
x86/HVM: correct error code writing during task switch

Whether to write 32 or just 16 bits depends on the D bit of the target
CS. The width of the stack pointer to use depends on the B bit of the
target SS.

Also avoid using the no-fault copying routine.

Finally avoid using yet another struct segment_register variable here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/HVM: limit writes to outgoing TSS during task switch
Jan Beulich [Wed, 23 Nov 2016 14:26:11 +0000 (15:26 +0100)]
x86/HVM: limit writes to outgoing TSS during task switch

The only fields modified are EIP, EFLAGS, GPRs, and segment selectors.
CR3 in particular is not supposed to be updated.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/HVM: limit writes to incoming TSS during task switch
Jan Beulich [Wed, 23 Nov 2016 14:25:35 +0000 (15:25 +0100)]
x86/HVM: limit writes to incoming TSS during task switch

The only field modified (and even that conditionally) is the back link.
Write only that field, and only when it actually has been written to.

Take the opportunity and also ditch the pointless initializer from the
"tss" local variable, which gets completely filled anyway by reading
from guest memory.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibelf: fix symtab/strtab loading for 32bit domains
Roger Pau Monne [Wed, 23 Nov 2016 12:27:38 +0000 (12:27 +0000)]
libelf: fix symtab/strtab loading for 32bit domains

Commit ed04ca introduced a bug in the symtab/strtab loading for 32bit
guests, that corrupted the section headers array due to the padding
introduced by the elf_shdr union.

The Elf section header array on 32bit should be accessible as an array of
Elf32_Shdr elements, and the union with Elf64_Shdr done in elf_shdr was
breaking this due to size differences between Elf32_Shdr and Elf64_Shdr.

Fix this by copying each section header one by one, and using the proper
size depending on the bitness of the guest kernel. While there, also fix
a couple of consistency issues, by making sure we always use the sizes of
our local versions of the ELF header and the ELF sections headers.

Reported-by: Brian Marcotte <marcotte@panix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/memshr: properly check grant references
Jan Beulich [Tue, 22 Nov 2016 16:28:52 +0000 (17:28 +0100)]
x86/memshr: properly check grant references

They need to be range checked against the current table limit in any
event.

Reported-by: Huawei PSIRT <psirt@huawei.com>
Move the code to where it belongs, eliminating a number of duplicate
definitions. Add locking. Produce proper error codes, and consume them
instead of making one up. Check grant type. Convert parameter types at
once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agocredit2: fix wrong assert in runq_tickle()
Dario Faggioli [Tue, 22 Nov 2016 16:12:50 +0000 (17:12 +0100)]
credit2: fix wrong assert in runq_tickle()

Since b047f888d489 ("xen: sched: leave CPUs doing tasklet
work alone") a cpu executing a tasklet, is not marked as
idle.

Therefore:
 - avoid asserting that we can't find the idle vcpu running
   on one of them, which is not true,
 - avoid triggering a preemption on them (and add an assert
   checking that).

This fixes a bug identified by OSSTest, in flight 102372
(on ARM, but it's not at all ARM specific), where the
ASSERT() was triggering like this:

(XEN) Xen call trace:
(XEN)    [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC)
(XEN)    [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR)
(XEN)    [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4
(XEN)    [<0023303c>] context_saved+0x7c/0xa4
(XEN)    [<0024f660>] domain.c#schedule_tail+0x2b4/0x308
(XEN)    [<0024faac>] context_switch+0x80/0x94
(XEN)    [<0022ff48>] schedule.c#schedule+0x76c/0x7ec
(XEN)    [<002338d4>] softirq.c#__do_softirq+0xcc/0xec
(XEN)    [<00233968>] do_softirq+0x18/0x28
(XEN)    [<00261084>] leave_hypervisor_tail+0x58/0x88
(XEN)    [<002649d0>] entry.o#return_to_guest+0xc/0xb8
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009
(XEN) ****************************************

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/EFI: meet further spec requirements for runtime calls
Jan Beulich [Tue, 22 Nov 2016 12:52:53 +0000 (13:52 +0100)]
x86/EFI: meet further spec requirements for runtime calls

So far we didn't guarantee 16-byte alignment of the stack: While (so
far) we don't tell the compiler to use smaller alignment, we also don't
guarantee 16-byte alignment when establishing stack pointers for new
vCPU-s. Runtime service functions using SSE instructions may end with
#GP(0) without that.

Note that making use of -mpreferred-stack-boundary=3, as mentioned in
the comment, wouldn't help to reduce the needed alignment: The compiler
would then be free to align the stack of the function with the aligned
object, but would be permitted to place an odd number of 8-byte objects
there, resulting in the callee to still run on an unaligned stack.

(The only working alternative to the approach chosen here would be to
use -mincoming-stack-boundary=3, but that would affect all functions in
runtime.c, not just the ones actually making runtime services calls.
And it would still require the manual alignment logic here to be used
with gcc 5.2 and earlier - not permitting that command line option -,
just that then the alignment amount would become conditional.)

Hence enforce the needed alignment by making efi_rs_enter() return a
suitably aligned structure, which the caller then necessarily has to
store in a suitably aligned local variable, the address of which then
gets passed to efi_rs_leave(). Also (to limit exposure) move the
function declarations to where they belong: They're local to runtime.c,
and shared only with compat.c (by the latter including the former).

Furthermore we should avoid #MF to be raised on the FLDCW we do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agopygrub: Properly quote results, when returning them to the caller:
Ian Jackson [Thu, 3 Nov 2016 16:37:40 +0000 (16:37 +0000)]
pygrub: Properly quote results, when returning them to the caller:

* When the caller wants sexpr output, use `repr()'
  This is what Xend expects.

  The returned S-expressions are now escaped and quoted by Python,
  generally using '...'.  Previously kernel and ramdisk were unquoted
  and args was quoted with "..." but without proper escaping.  This
  change may break toolstacks which do not properly dequote the
  returned S-expressions.

* When the caller wants "simple" output, crash if the delimiter is
  contained in the returned value.

  With --output-format=simple it does not seem like this could ever
  happen, because the bootloader config parsers all take line-based
  input from the various bootloader config files.

  With --output-format=simple0, this can happen if the bootloader
  config file contains nul bytes.

This is CVE-2016-9379 and CVE-2016-9380 / XSA-198.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/svm: fix injection of software interrupts
Andrew Cooper [Tue, 22 Nov 2016 12:51:16 +0000 (13:51 +0100)]
x86/svm: fix injection of software interrupts

The non-NextRip logic in c/s 36ebf14eb "x86/emulate: support for emulating
software event injection" was based on an older version of the AMD software
manual.  The manual was later corrected, following findings from that series.

I took the original wording of "not supported without NextRIP" to mean that
X86_EVENTTYPE_SW_INTERRUPT was not eligible for use.  It turns out that this
is not the case, and the new wording is clearer on the matter.

Despite testing the original patch series on non-NRip hardware, the
swint-emulation XTF test case focuses on the debug vectors; it never ended up
executing an `int $n` instruction for a vector which wasn't also an exception.

During a vmentry, the use of X86_EVENTTYPE_HW_EXCEPTION comes with a vector
check to ensure that it is only used with exception vectors.  Xen's use of
X86_EVENTTYPE_HW_EXCEPTION for `int $n` injection has always been buggy on AMD
hardware.

Fix this by always using X86_EVENTTYPE_SW_INTERRUPT.

Print and decode the eventinj information in svm_vmcb_dump(), as it has
several invalid combinations which cause vmentry failures.

This is CVE-2016-9378 / part of XSA-196.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: correct the IDT entry calculation in inject_swint()
Andrew Cooper [Tue, 22 Nov 2016 12:50:49 +0000 (13:50 +0100)]
x86/emul: correct the IDT entry calculation in inject_swint()

The logic, as introduced in c/s 36ebf14ebe "x86/emulate: support for emulating
software event injection" is buggy.  The size of an IDT entry depends on long
mode being active, not the width of the code segment currently in use.

In particular, this means that a compatibility code segment which hits
emulation for software event injection will end up using an incorrect offset
in the IDT for DPL/Presence checking.  In practice, this only occurs on old
AMD hardware lacking NRip support; all newer AMD hardware, and all Intel
hardware bypass this path in the emulator.

While here, fix a minor issue with reading the IDT entry.  The return value
from ops->read() wasn't checked, but in reality the only failure case is if a
pagefault occurs.  This is not a realistic problem as the kernel will almost
certainly crash with a double fault if this setup actually occured.

This is CVE-2016-9377 / part of XSA-196.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86emul: fix huge bit offset handling
Jan Beulich [Tue, 22 Nov 2016 12:49:06 +0000 (13:49 +0100)]
x86emul: fix huge bit offset handling

We must never chop off the high 32 bits.

This is CVE-2016-9383 / XSA-195.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibelf: fix stack memory leak when loading 32 bit symbol tables
Roger Pau Monné [Tue, 22 Nov 2016 12:48:30 +0000 (13:48 +0100)]
libelf: fix stack memory leak when loading 32 bit symbol tables

The 32 bit Elf structs are smaller than the 64 bit ones, which means that
when loading them there's some padding left uninitialized at the end of each
struct (because the size indicated in e_ehsize and e_shentsize is
smaller than the size of elf_ehdr and elf_shdr).

Fix this by introducing a new helper that is used to set
[caller_]xdest_{base/size} and that takes care of performing the appropriate
memset of the region. This newly introduced helper is then used to set and
unset xdest_{base/size} in elf_load_bsdsyms. Now that the full struct
is zeroed, there's no need to specifically zero the undefined section.

This is CVE-2016-9384 / XSA-164.

Suggested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Also remove the open coded (and redundant with the earlier
elf_memset_unchecked()) use of caller_xdest_* from elf_init().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
8 years agox86/PV: writes of %fs and %gs base MSRs require canonical addresses
Jan Beulich [Tue, 22 Nov 2016 12:46:28 +0000 (13:46 +0100)]
x86/PV: writes of %fs and %gs base MSRs require canonical addresses

Commit c42494acb2 ("x86: fix FS/GS base handling when using the
fsgsbase feature") replaced the use of wrmsr_safe() on these paths
without recognizing that wr{f,g}sbase() use just wrmsrl() and that the
WR{F,G}SBASE instructions also raise #GP for non-canonical input.

Similarly arch_set_info_guest() needs to prevent non-canonical
addresses from getting stored into state later to be loaded by context
switch code. For consistency also check stack pointers and LDT base.
DR0..3, otoh, already get properly checked in set_debugreg() (albeit
we discard the error there).

The SHADOW_GS_BASE check isn't strictly necessary, but I think we
better avoid trying the WRMSR if we know it's going to fail.

This is CVE-2016-9385 / XSA-193.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/HVM: don't load LDTR with VM86 mode attrs during task switch
Jan Beulich [Tue, 22 Nov 2016 12:45:44 +0000 (13:45 +0100)]
x86/HVM: don't load LDTR with VM86 mode attrs during task switch

Just like TR, LDTR is purely a protected mode facility and hence needs
to be loaded accordingly. Also move its loading to where it
architecurally belongs.

This is CVE-2016-9382 / XSA-192.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/hvm: Fix the handling of non-present segments
Andrew Cooper [Tue, 22 Nov 2016 12:44:50 +0000 (13:44 +0100)]
x86/hvm: Fix the handling of non-present segments

In 32bit, the data segments may be NULL to indicate that the segment is
ineligible for use.  In both 32bit and 64bit, the LDT selector may be NULL to
indicate that the entire LDT is ineligible for use.  However, nothing in Xen
actually checks for this condition when performing other segmentation
checks.  (Note however that limit and writeability checks are correctly
performed).

Neither Intel nor AMD specify the exact behaviour of loading a NULL segment.
Experimentally, AMD zeroes all attributes but leaves the base and limit
unmodified.  Intel zeroes the base, sets the limit to 0xfffffff and resets the
attributes to just .G and .D/B.

The use of the segment information in the VMCB/VMCS is equivalent to a native
pipeline interacting with the segment cache.  The present bit can therefore
have a subtly different meaning, and it is now cooked to uniformly indicate
whether the segment is usable or not.

GDTR and IDTR don't have access rights like the other segments, but for
consistency, they are treated as being present so no special casing is needed
elsewhere in the segmentation logic.

AMD hardware does not consider the present bit for %cs and %tr, and will
function as if they were present.  They are therefore unconditionally set to
present when reading information from the VMCB, to maintain the new meaning of
usability.

Intel hardware has a separate unusable bit in the VMCS segment attributes.
This bit is inverted and stored in the present field, so the hvm code can work
with architecturally-common state.

This is CVE-2016-9386 / XSA-191.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hvm: Fix non-debug build folling c/s 0745f665a5
Andrew Cooper [Mon, 21 Nov 2016 15:30:25 +0000 (15:30 +0000)]
x86/hvm: Fix non-debug build folling c/s 0745f665a5

The variable is named inst_len, not insn_len.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/svm: Fix svm_nextrip_insn_length() when crossing the virtual boundary to 0
Andrew Cooper [Mon, 31 Oct 2016 14:07:54 +0000 (14:07 +0000)]
x86/svm: Fix svm_nextrip_insn_length() when crossing the virtual boundary to 0

vmcb->nextrip can legitimately be less than vmcb->rip when execution wraps
back around to 0.  Instead, complain if the reported length is greater than 15
and use x86_decode_insn() as a fallback.

While making changes here, fix two whitespace issues with the case labels.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoMAINTAINERS: update EVENT CHANNEL and KEXEC maintainer
David Vrabel [Thu, 17 Nov 2016 12:17:12 +0000 (12:17 +0000)]
MAINTAINERS: update EVENT CHANNEL and KEXEC maintainer

I am no longer in a position to be a Xen maintainer.  Andrew has
kindly volunteered to continue maintainance of the KEXEC subsystem.
EVENT CHANNELS (FIFO-BASED ABI) will be maintained by the "Other"
hypervisor maintainers.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agotools/libacpi: Be specific about which DSDT files to build
Boris Ostrovsky [Tue, 15 Nov 2016 16:04:15 +0000 (11:04 -0500)]
tools/libacpi: Be specific about which DSDT files to build

There is no reason to build, for example, dsdt_pvh.asl for hvmloader. We
pass which DSDTs to build via DSDT_FILES parameter.

If DSDT_FILES is empty all DSDTs for a particular architecture will be built.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/traps: Don't call hvm_hypervisor_cpuid_leaf() for PV guests
Andrew Cooper [Mon, 14 Nov 2016 10:18:00 +0000 (10:18 +0000)]
x86/traps: Don't call hvm_hypervisor_cpuid_leaf() for PV guests

Luckily, hvm_hypervisor_cpuid_leaf() and vmx_hypervisor_cpuid_leaf() are safe
to execute in the context of a PV guest, but HVM-specific feature flags
shouldn't be visible to PV guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/vmx: Correct the long mode check in vmx_cpuid_intercept()
Andrew Cooper [Mon, 14 Nov 2016 10:15:00 +0000 (10:15 +0000)]
x86/vmx: Correct the long mode check in vmx_cpuid_intercept()

%cs.L may be set in a legacy mode segment, or clear in a compatibility mode
segment; it is not the correct way to check for long mode being active.

Both of these situations result in incorrect visibility of the SYSCALL feature
in CPUID, and by extension, incorrect behaviour in hvm_efer_valid().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/configure: Drop -lcrypto search
Ian Jackson [Tue, 15 Nov 2016 15:09:50 +0000 (15:09 +0000)]
tools/configure: Drop -lcrypto search

This seems to be looking for a function MD5.  But nothing uses it.
The build works fine if this is disabled and libcrypto is not
installed.

This check was first introduced in 68a3e1e87325 "[TOOLS] Add more
checks for devel packages." in 2006.  At that time -lcrypto was used
by tools/blktap/ and tools/vtpm_manager/, which are both gone now.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/libacpi: Re-licence remaining GPL code to LGPLv2.1
Boris Ostrovsky [Tue, 15 Nov 2016 04:52:26 +0000 (23:52 -0500)]
tools/libacpi: Re-licence remaining GPL code to LGPLv2.1

We now have permission from Lenovo to relicense commit 801d469ad8b2
("[HVM] ACPI support patch 3 of 4: ACPI _PRT table") to LGPLv2.1

This essentially means reverting commits c3397311a658 ("acpi: Prevent
GPL-only code from seeping into non-GPL binaries") and 26c4f0b8a4cf
("tools/libacpi: fix sed usage")

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ken Lancaster <klancaster@lenovo.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoRevert "x86/EFI: meet further spec requirements for runtime calls" 4.8.0-rc6
Jan Beulich [Mon, 14 Nov 2016 07:53:16 +0000 (08:53 +0100)]
Revert "x86/EFI: meet further spec requirements for runtime calls"

This reverts commit 67b5b302f5319f70288587dc98ab505c4deada1e as
being both actively wrong and latently broken.

8 years agox86/EFI: meet further spec requirements for runtime calls
Jan Beulich [Thu, 10 Nov 2016 16:06:30 +0000 (09:06 -0700)]
x86/EFI: meet further spec requirements for runtime calls

So far we didn't guarantee 16-byte alignment of the stack: While (so
far) we don't tell the compiler to use smaller alignment, we also don't
guarantee 16-byte alignment when establishing stack pointers for new
vCPU-s. Runtime service functions using SSE instructions may end with

Note that -mpreferred-stack-boundary=3 is can be used only from gcc 4.8
onwards, and -mincoming-stack-boundary=3 only from 5.3 onwards. It is
for that reason that an alternative approach (using higher than
necessary alignment) is being used when building with such older
compilers.

Furthermore we should avoid #MF to be raised on the FLDCW we do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxc/x86: Report consistent initial APIC value for PV guests
Boris Ostrovsky [Thu, 10 Nov 2016 14:50:24 +0000 (09:50 -0500)]
libxc/x86: Report consistent initial APIC value for PV guests

Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
APIC ID) with contents of that field on the processor that launched
the guest. This results in the guest reporting different initial
APIC IDs across runs.

We should be consistent in how this value is reported, let's set
it to 0 (which is also what Linux guests expect).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: remove trailing whitespace in comment ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86emul: suppress alignment check for {,v}mov{d,q}
Jan Beulich [Thu, 10 Nov 2016 12:29:32 +0000 (05:29 -0700)]
x86emul: suppress alignment check for {,v}mov{d,q}

When introducing support for these instructions, adjustment for the
alignment check logic (generating #GP(0)) was overlooked.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86: always supply .cpuid() handler to x86_emulate()
Jan Beulich [Fri, 11 Nov 2016 16:19:12 +0000 (17:19 +0100)]
x86: always supply .cpuid() handler to x86_emulate()

With us incremementally adding proper CPUID checks to x86_emulate()
(see commit de05bd965a ["x86emul: correct {,F}CMOV and F{,U}COMI{,P}
emulation"]) it is no longer appropriate to invoke the function with
that hook being NULL, as long as respective instructions may get used
in that case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoFix misleading indentation warnings
Cédric Bosdonnat [Thu, 10 Nov 2016 09:23:31 +0000 (10:23 +0100)]
Fix misleading indentation warnings

Gcc6 build reports misleading indentation as warnings. Fix a few
warnings in stubdom.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Quan Xu <xuquan8@huawei.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxc: fix unmap of ACPI guest memory region
Roger Pau Monne [Tue, 8 Nov 2016 16:22:15 +0000 (17:22 +0100)]
libxc: fix unmap of ACPI guest memory region

Commit fac7f7 changed the value of ptr so that it points to the right memory
area, taking the page offset into account, but failed to remove this when
doing the unmap, which caused the region to not be unmapped. Fix this by not
modifying ptr and instead adding the page offset directly in the memcpy
call.

Coverity-ID: 1394285

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86emul: correct direction of FPU insn emulations
Jan Beulich [Thu, 10 Nov 2016 17:12:56 +0000 (18:12 +0100)]
x86emul: correct direction of FPU insn emulations

There are two cases where this was wrong, albeit in a benign way (the
compiler - according to my checking - didn't leverage the wrongness
for any optimizations affecting overall outcome).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/svm: Don't clobber eax and edx if an RDMSR intercept fails
Andrew Cooper [Wed, 2 Nov 2016 14:36:49 +0000 (14:36 +0000)]
x86/svm: Don't clobber eax and edx if an RDMSR intercept fails

The original code has a bug; eax and edx get unconditionally updated even when
hvm_msr_read_intercept() doesn't return X86EMUL_OKAY.

It is only by blind luck (vmce_rdmsr() eagerly initialising its msr_content
pointer) that this isn't an information leak into guests.

While fixing this bug, reduce the scope of msr_content and initialise it to 0.
This makes it obvious that a stack leak won't occur, even if there were to be
a buggy codepath in hvm_msr_read_intercept().

Also make some non-functional improvements.  Make the insn_len calculation
common, and reduce the quantity of explicit casting by making better use of
the existing register names.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoPartially revert 21550029f709072aacf3b90edd574e7d3021b400
Stefano Stabellini [Tue, 8 Nov 2016 19:42:43 +0000 (11:42 -0800)]
Partially revert 21550029f709072aacf3b90edd574e7d3021b400

Commit 21550029f709072aacf3b90edd574e7d3021b400 removed the
PLATFORM_QUIRK_GIC_64K_STRIDE quirk and introduced a way to
automatically detect that the two GICC pages have a 64K stride.

However the heuristic requires that the device tree for the platform
reports a GICC size == 128K, which is not the case for some versions of
XGene.

Fix the issue by partially reverting
21550029f709072aacf3b90edd574e7d3021b400:

- reintroduce PLATFORM_QUIRK_GIC_64K_STRIDE for XGene
- force csize and vsize to SZ_128K if csize is initially 4K and if
  PLATFORM_QUIRK_GIC_64K_STRIDE

Also add a warning in case GICC is SZ_128K but not aliased.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
8 years agoRevert "xen/arm: platform: Drop the quirks callback"
Stefano Stabellini [Tue, 8 Nov 2016 19:42:42 +0000 (11:42 -0800)]
Revert "xen/arm: platform: Drop the quirks callback"

This reverts commit 14fa16961b03a23e9b883e5f0ed06b6837a489d8.
Do not reintroduce platform_dom0_evtchn_ppi.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxc: set rsdp pointer for PVHv2 guests
Roger Pau Monne [Mon, 7 Nov 2016 15:32:01 +0000 (16:32 +0100)]
libxc: set rsdp pointer for PVHv2 guests

Set the address of the RSDP in the HVM start info structure for PVHv2 DomUs
that have ACPI tables.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxc: properly account for the page offset when copying ACPI data
Roger Pau Monne [Mon, 7 Nov 2016 15:32:00 +0000 (16:32 +0100)]
libxc: properly account for the page offset when copying ACPI data

Or else ACPI data is always copied at the start of the page pointed by
guest_addr_out, ignoring the page offset.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoIOMMU: release lock on new exit path
Jan Beulich [Mon, 7 Nov 2016 15:29:15 +0000 (08:29 -0700)]
IOMMU: release lock on new exit path

This was overlooked in 7b2842a414 ("IOMMU: replace ASSERT()s checking
for NULL").

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoConfig.mk: update seabios to 1.10.0
Wei Liu [Mon, 7 Nov 2016 11:12:19 +0000 (11:12 +0000)]
Config.mk: update seabios to 1.10.0

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/shutdown: add fall-through comment
Jan Beulich [Mon, 7 Nov 2016 13:08:30 +0000 (14:08 +0100)]
x86/shutdown: add fall-through comment

Coverity ID: 1362037

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoIOMMU: replace ASSERT()s checking for NULL
Jan Beulich [Mon, 7 Nov 2016 13:08:05 +0000 (14:08 +0100)]
IOMMU: replace ASSERT()s checking for NULL

Avoid NULL derefs on non-debug builds.

Coverity ID: 1055650

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/traps: replace ASSERT() checking array bounds
Jan Beulich [Mon, 7 Nov 2016 13:07:11 +0000 (14:07 +0100)]
x86/traps: replace ASSERT() checking array bounds

Avoid out of bounds accesses on non-debug builds.

Coverity ID: 1055744

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxsm: add missing permissions discovered in testing
Daniel De Graaf [Fri, 4 Nov 2016 15:35:20 +0000 (11:35 -0400)]
xsm: add missing permissions discovered in testing

Add two missing allow rules:

1. Device model domain construction uses getvcpucontext, discovered by
Andrew Cooper while chasing an unrelated issue.

2. When a domain is destroyed with a device passthrough active, the
calls to remove_{irq,ioport,iomem} can be made by the hypervisor itself
(which results in an XSM check with the source xen_t).  It does not make
sense to deny these permissions; no domain should be using xen_t, and
forbidding the hypervisor from performing cleanup is not useful.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: disallow enabling PoD and ALTP2M at the same time
Wei Liu [Thu, 3 Nov 2016 16:41:57 +0000 (16:41 +0000)]
libxl: disallow enabling PoD and ALTP2M at the same time

That combination would cause Xen to crash.

Note that although this is a security issue, is not XSA-worthy because
ALTP2M is experimental.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: set ret in the check for nestedhvm and altp2m
Wei Liu [Thu, 3 Nov 2016 16:41:56 +0000 (16:41 +0000)]
libxl: set ret in the check for nestedhvm and altp2m

The error path expects ret to be set, otherwise an assertion is
triggered.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agogit: Add metadata to the result of `git archive`
Andrew Cooper [Thu, 3 Nov 2016 17:57:57 +0000 (17:57 +0000)]
git: Add metadata to the result of `git archive`

When building Xen from a source tarball, commit information is usually lost,
especially if the tarball was generated from a tag.

Have `git archive` automatically fill in metadata at the point of creating the
archive, which is especially useful when using web snapshot links such as:

  http://xenbits.xen.org/gitweb/?p=xen.git;a=snapshot;h=HEAD;sf=tgz

to obtain the tarball.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoflask: build policy in different locations
Wei Liu [Fri, 28 Oct 2016 15:17:17 +0000 (16:17 +0100)]
flask: build policy in different locations

The flask policy can be build twice -- one for hypervisor and one for
tools.

Before this patch, everything is built inside tools/flask/policy
directory.  It is possible to have a race to write to the same output
file when running parallel builds.

Prepend output file names with FLASK_BUILD_DIR. Hypervisor and tools
build will set that variable to different directories, so that we can
be safe from races.

Adjust other bits of the build system as needed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/libxc: Add xstate cpuid leaf of avx512
Luwei Kang [Fri, 4 Nov 2016 08:29:18 +0000 (16:29 +0800)]
tools/libxc: Add xstate cpuid leaf of avx512

Enable get xstate cpuid leaf information regarding avx512 in guest.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: replace hint with pointer in PVHv2 ACPI documentation
Roger Pau Monne [Thu, 3 Nov 2016 16:48:56 +0000 (17:48 +0100)]
docs: replace hint with pointer in PVHv2 ACPI documentation

Use pointer instead of hint, since this is the only way to get the address
of the RSDP.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86emul: {L,S}{G,I}DT ignore operand size overrides in 64-bit mode
Jan Beulich [Thu, 3 Nov 2016 16:23:22 +0000 (17:23 +0100)]
x86emul: {L,S}{G,I}DT ignore operand size overrides in 64-bit mode

This affects not only the layout of the data (always 2+8 bytes), but
also the contents (no truncation to 24 bits occurs).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoRevert "libxl: disallow enabling PoD and ALTP2M at the same time"
Wei Liu [Thu, 3 Nov 2016 16:13:05 +0000 (16:13 +0000)]
Revert "libxl: disallow enabling PoD and ALTP2M at the same time"

This reverts commit ff53c65311a32e54dba51f2b8112632e9dd2af3b.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/arm: early-printk: It should depends on CONFIG_DEBUG and not debug
Julien Grall [Thu, 3 Nov 2016 14:36:35 +0000 (14:36 +0000)]
xen/arm: early-printk: It should depends on CONFIG_DEBUG and not debug

The variable debug is used to enable debug for the tools. As
early-printk is for the hypervisor we should use CONFIG_DEBUG.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: document ACPI usage in PVHv2 guests
Roger Pau Monne [Thu, 3 Nov 2016 12:19:03 +0000 (13:19 +0100)]
docs: document ACPI usage in PVHv2 guests

It is possible for PVHv2 guests to get the hardware description from ACPI
tables, add this to the documentation also.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: fixup PVHv2 documentation regarding AP startup
Roger Pau Monne [Thu, 3 Nov 2016 11:29:21 +0000 (12:29 +0100)]
docs: fixup PVHv2 documentation regarding AP startup

On PVHv2 guests the local APIC can also be used to start APs if present.
Amend the documentation in order to reflect this.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/emul: Reject LGDT/LIDT attempts with non-canonical base addresses
Andrew Cooper [Wed, 2 Nov 2016 14:43:48 +0000 (14:43 +0000)]
x86/emul: Reject LGDT/LIDT attempts with non-canonical base addresses

No sane OS would deliberately try this, but make Xen's emulation match real
hardware by delivering #GP(0), rather than suffering a VMEntry failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: disallow enabling PoD and ALTP2M at the same time
Wei Liu [Wed, 2 Nov 2016 14:10:37 +0000 (14:10 +0000)]
libxl: disallow enabling PoD and ALTP2M at the same time

That combination would cause Xen to crash.

Note that although this is a security issue, is not XSA-worthy because
ALTP2M is experimental.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agofeatures: declare the Credit2 scheduler as Supported
Dario Faggioli [Wed, 2 Nov 2016 15:05:03 +0000 (16:05 +0100)]
features: declare the Credit2 scheduler as Supported

Credit2 is available in tree as an "Experimental" scheduler since
a few years. Recently, effort started for making it production ready
and, eventually, the new Xen's default scheduler. As a consequence of
that, it has undergone a great deal of development, testing and
benchmarking.

In fact, Credit2's much more modern (wrt Credit1) design and cleaner
code makes it a lot easier to understand what the scheduler is doing,
fix scheduling issues that may come up, and implement new and more
advanced features, in future.

In some more details:

 - key features that were missing (pinning and context switching
   rate-limiting) have now been implemented, and more (soft affinity,
   caps and reservations) are about to come. The gap wrt Credit1 is
   therefore closing. In particular, with pinning and rate-limiting
   available, the scheduler can be considered usable.

 - Credit2 is tested by OSSTest since long time. Furthermore, as a
   part of recent efforts, stress tests and benchmarks have been run
   and shown no bugs or stability issues.

 - A number of different benchmarks have been run, most of them
   comparing Credit2 with Credit1. Some of the results were posted on
   xen-devel, some others have been illustrated during a talk at 2016
   edition of Xen-Project Developer Summit. In general, performance
   look promising --if not better than Credit1 already, in some of
   the cases.

It therefore appears that we are ready to mark the Credit2 scheduler
as a 'Supported' feature, and ask users to look at it and try it, if
they think it suits their needs.

Of course, declaring something 'Supported' has security implications.
So here it is how the situation looks like from a security standpoint:

1) Is guest->host privilege escalation possible?

The only interfaces exposed to unprivileged guests are the SCHEDOP
hypercalls, and timers. None of those hypercalls contain any pointers,
and they don't look to contain any privilege escalation path. Also,
they're not specific to Credit2, as they're "used" by all schedulers
(ingluding the current default, Credit1), so anything about these
interfaces would be a security concern already.

2) Is guest user->guest kernel escalation possible?

The guest kernel is not really relying on anything from the scheduler
to protect itself or any data in any way.

3) Is there any information leakage?

The only information which the scheduler exposes to unprivileged
guests is the timing information.  This may be able to be used for
side-channel attacks to probabilistically infer things about other
vcpus running on the same system; but this has not traditionally
been considered within the security boundary. And, again, this is
possible with all schedulers.

The control domain can issue DOMCTL_SCHEDOP and SYSCTL_SCHEDOP
hypercalls, but the involved data structures are handled in a
way that does not leak information (which would be leaked "only"
to Dom0 anyway).

4) Can a Denial-of-Service be triggered?

This is a risk, with schedulers, and one that's hard to foresee.
For instance, it _did_ happen on Credit1, in the past (a vcpu
could "game the system" by sleeping at particular times to gain
BOOST priority and monopolize 95% of the cpu). In that case, it
was possible because of the probabilistic nature of accounting
in Credit1 (which was then fixed). Well, Credit2:
 - already do accurate, rather than probabilistic, accounting;
 - does not have any BOOST or, in general, any way for a vcpu to
   become 'more important' than the others: they're all subjected
   to the same crediting algorithm.

Also note that, the accounting and the crediting algorithm are a lot
simpler than in Credit1, and hence a lot easier to understand, debug
and audit.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoConfig.mk: fix comment for debug option
Wei Liu [Mon, 31 Oct 2016 17:03:04 +0000 (17:03 +0000)]
Config.mk: fix comment for debug option

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agobuild: make debug option affect tools only
Wei Liu [Mon, 31 Oct 2016 17:42:25 +0000 (17:42 +0000)]
build: make debug option affect tools only

The debug option in Config.mk affects hypervisor, tools and stubdom by
appending different flags to CFLAGS.  Mini-os under extra is not
affected because it already has its own build system when it is
separated from xen.git.

It is undesirable because now hypervisor build is affected by both
Kconfig and debug.

Disentangle the semantics of debug by pushing relevant options to
individual sub-systems.

For hypervisor, the flags previously added by debug option is now
controlled by CONFIG_DEBUG.

For tools, flags are moved from config/*.mk into tools/Rules.mk.

For stubdom, because it unilaterally sets debug=y before including
top-level Config.mk, we only need to move the debug build set of flags
into stubdom Makefile.

Specifically there are some considerations on what flags are picked:

1. we don't need -fno-optimize-sibling-calls anymore because gcc doc
   indicates that it is not enabled for -O1, which we already set in the
   debug build.
2. for tools we use -O0 -g3 in Rules.mk because they already take
   precedence over the flags set in config/*.mk.
3. for hypervisor we don't add -fno-omit-frame-pointer to debug build
   because that's controlled by CONFIG_FRAME_POINTER.

This patch doesn't intend to tune those flags, but to provide identical
set of effective flags as before.  The debug option in Config.mk will
only affect tools components after this patch is applied.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen: disable debug build
Wei Liu [Mon, 31 Oct 2016 17:01:12 +0000 (17:01 +0000)]
xen: disable debug build

Xen debug build is controlled by Kconfig.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/oxenstored: Fix transaction handling in 32bit builds 4.8.0-rc5
Andrew Cooper [Mon, 31 Oct 2016 13:21:56 +0000 (13:21 +0000)]
tools/oxenstored: Fix transaction handling in 32bit builds

In a 32bit build, the ocaml code 'proposed_id >= 0x7fffffff' compiles to:

  8055eac:       83 fb ff                cmp    $0xffffffff,%ebx
  8055eaf:       7d 0f                   jge    8055ec0 <...+0x20>

which in C is 'proposed_id >= INT_MIN', or in other words, tautologically
true.  As a result, 32bit builds of oxenstored always try to allocate the
transaction id 1, and fall into an infinite loop of trying the next id if
transaction 1 is already in use.

Restrict the range down to 1 billion, to sit in the positive half of a 31 bit
ocaml integer.  The compiled code is now:

  8055eac:       b9 ff ff ff 7f          mov    $0x7fffffff,%ecx
  8055eb1:       39 cb                   cmp    %ecx,%ebx
  8055eb3:       7d 0b                   jge    8055ec0 <...+0x20>

which (other than non-optimal code generation because of the unnecessary use
of %ecx), isn't unconditionally true.

In principle, the check could be changed to 'proposed_id == 0x7fffffff' which
would still allow for 2 billion transaction in 32bit builds.  However, in
64bit builds, this reintroduces a risk that if proposed_id is initially
greater than 0x7fffffff, it will not be clipped suitably into range.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave@recoil.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/libacpi: fix sed usage
Roger Pau Monne [Mon, 31 Oct 2016 10:05:20 +0000 (11:05 +0100)]
tools/libacpi: fix sed usage

Current usage of sed in the libacpi Makefile make uses of non-POSIX options,
that are not available on all the OSes supported by the Xen tools.

The '-i' option has slightly different semantics between GNU and BSD sed
implementations, while on the GNU version the suffix is optional, on the BSD
one it is not. Also BSD sed seems to have problems parsing the script
itself, reporting "extra characters at the end of d command".

Fix those issues by using a temporary intermediate file, and replace the
script with a simpler version that achieves the same purpose (removing the
initial license header comment).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agostubdom: fix stubdom-vtpm build
Juergen Gross [Mon, 31 Oct 2016 09:04:18 +0000 (10:04 +0100)]
stubdom: fix stubdom-vtpm build

stubdom-vtpm needs gmp and expects it under
stubdom/cross-root-x86_64/x86_64-xen-elf/lib while gmp seems to install
it under stubdom/cross-root-x86_64/x86_64-xen-elf/lib64 at least in an
openSUSE environment.

Modify gmp configure parameters to explicitly specify --libdir.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agostubdom: make GMP aware that it's being cross-compiled
Wei Liu [Sat, 29 Oct 2016 17:22:38 +0000 (18:22 +0100)]
stubdom: make GMP aware that it's being cross-compiled

Append --build and --host flags to GMP's configure script so that it
knows it is being cross-compiled.

This should fix the issue that GMP doesn't compile with gcc 6, because
configure script won't try to test the host environment anymore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agostubdom: fix "make distclean" regarding gmp
Juergen Gross [Fri, 28 Oct 2016 14:53:20 +0000 (16:53 +0200)]
stubdom: fix "make distclean" regarding gmp

make distclean tries to remove stubdom/gmp-4.3.2.tar.gz, while the
downloaded file is stubdom/gmp-4.3.2.tar.bz2

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: fix add_change_node()
Juergen Gross [Thu, 27 Oct 2016 09:55:52 +0000 (11:55 +0200)]
xenstore: fix add_change_node()

add_change_node() in xenstored is used to add a modified node to the
list of changed nodes of one transaction. It is being called with the
recurse parameter set to true when removing a node in order to get
watches for children of the node fired at transaction end, too.

If, however, the node to be deleted had been modified in the same
transaction the recurse parameter of add_change_node() is lost as
an entry already in the list of the changed nodes won't be entered
again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoVMX: fix realmode emulation SReg handling
Jan Beulich [Mon, 31 Oct 2016 07:57:47 +0000 (08:57 +0100)]
VMX: fix realmode emulation SReg handling

Commit 0888d36bb2 ("x86/emul: Correct the decoding of SReg3 operands")
overlooked three places where x86_seg_cs was assumed to be zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/hvm: Don't truncate the hvm hypercall index before range checking it
Andrew Cooper [Thu, 4 Aug 2016 18:01:15 +0000 (18:01 +0000)]
x86/hvm: Don't truncate the hvm hypercall index before range checking it

c/s 5eeca68f introduced the 64bit ABI for HVM guests, and chose to explicitly
truncate the index, despite the fact that the `mov $imm32, %eax` in the
hypercall page already provides the expected truncation.

The truncation isn't very obvious, and is counterintuitive, seeing as all
other 64bit parameters are passed without truncation.  It is also different to
the PV ABI, which is otherwise identical.

As the hypercall page has always been present for HVM guests (and indeed, is
basically mandatory to abstract away vendor differences), it is exceedingly
unlikely that any code exists which enters hvm_do_hypercall() with upper bits
set in %rax.

Therefore, take the opportunity to fix the ABI before it becomes impossible to
fix.

While tweaking this area, fix one piece of trailing whitespace.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen: rtds: Update last_start whenever cur_budget is updated
Meng Xu [Wed, 26 Oct 2016 19:06:29 +0000 (15:06 -0400)]
xen: rtds: Update last_start whenever cur_budget is updated

Make budget accounting code more consistent by making sure the values
used to compute how much budget has been consumed are updated together.

This makes code resilient to calling burn_budget() from more than just
one place -- in case we will need to do that -- without risking subtle
bugs.

No functional changes are intended.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen:rtds: Fix bug in budget accounting
Meng Xu [Wed, 26 Oct 2016 19:06:06 +0000 (15:06 -0400)]
xen:rtds: Fix bug in budget accounting

Bug scenario:
repl_timer_handler() may be called before rt_schedule() for a VCPU.
This situation may happen in two scenarios:
(1) The VCPU misses deadline due to the system is oversubscribed. For example,
    the sum of VCPUs utilization on a core is larger than one.
(2) The VCPU has budget = period, which causes the timers for
    rt_schedule() and repl_timer_handler() are fired at the same time.
When the situation happens, it causes the following incorrect behavior:
repl_timer_handler() will update the VCPU period and deadline.
If the VCPU is still the highest priority one, even with the new deadline,
it will continue to run, but with new period and deadline.
Since the budget enforcement timer for the previous period is still armed,
rt_schedule() will still be called in the new period and enforce the budget
for the previous period.
The current burn_budget() will deduct the time spent in previous period from
the budget in current period, which is incorrect.

Fix:
We keeps last_start always within the current period for a VCPU, so that
we only deduct the time spent in the current period from the VCPU budget.
We always update last_start whenever we update cur_deadline for a VCPU.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Reported-by: Dagaen Golomb <dgolomb@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoRevert "keyhandler: rework process of nonirq keyhandler"
Jan Beulich [Wed, 26 Oct 2016 14:13:21 +0000 (16:13 +0200)]
Revert "keyhandler: rework process of nonirq keyhandler"

This reverts commit 610b4eda2ce2b87cccbc8f61bdec01052e54fc66.
It's not useful without ed7e33747d, which got reverted already.

8 years agox86/emul: Move CPUID Faulting fault generation into the emulator
Andrew Cooper [Wed, 26 Oct 2016 11:06:44 +0000 (12:06 +0100)]
x86/emul: Move CPUID Faulting fault generation into the emulator

In hindsight, this is a better position for it, as it avoids opencoding
hvmemul_inject_hw_exception() in hvmemul_cpuid(), and reduces the requirements
on other ops->cpuid() hooks wanting to implement cpuid faulting in the future.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/emul: Correct the decoding of SReg3 operands
Andrew Cooper [Fri, 23 Sep 2016 13:48:27 +0000 (14:48 +0100)]
x86/emul: Correct the decoding of SReg3 operands

REX.R is ignored when considering segment register operands, and needs masking
out first.

While fixing this, reorder the user segments in x86_segment to match SReg3
encoding.  This avoids needing a translation table between hardware ordering
and Xen's ordering.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/emul: Use explicit __attribute__((__packed__)) rather than __packed
Andrew Cooper [Tue, 25 Oct 2016 17:46:39 +0000 (18:46 +0100)]
x86/emul: Use explicit __attribute__((__packed__)) rather than __packed

x86_emulate.h is included by the userspace test harness.  Avoid using
constructs which don't come from standard header files.

Reposition the test harnesses inclusion of x86_emulate.h to avoid relying on
any definitions intended for use by x86_emulate.c alone.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen: rtds: always clear the flag when replenishing a depleted vcpu
Meng Xu [Sat, 22 Oct 2016 02:12:02 +0000 (22:12 -0400)]
xen: rtds: always clear the flag when replenishing a depleted vcpu

We should clear the __RTDS_depleted bit once a VCPU budget is replenished.
Because repl_timer_handler may be called after rt_schedule
but before rt_context_saved, the VCPU may be not on CPU or on queue
when the VCPU is the middle of context switch

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: remove wrong statement about bug in xenstore
Juergen Gross [Mon, 24 Oct 2016 11:27:17 +0000 (13:27 +0200)]
docs: remove wrong statement about bug in xenstore

docs/misc/xenstore.txt states that xenstored will use "0" as a valid
transaction id after 2^32 transactions. This is not true. Remove that
statement.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/oxenstored: Avoid allocating invalid transaction ids
Andrew Cooper [Wed, 26 Oct 2016 09:34:21 +0000 (10:34 +0100)]
tools/oxenstored: Avoid allocating invalid transaction ids

The transaction id of 0 is reserved, meaning "not in a transaction".  It is up
to the xenstored server to allocate transaction ids.  While oxenstored starts
its ids at 1, but insufficient care is taken with truncation cases.

A 32bit oxenstored has an int with 31 bits of width, meaning that the
transaction id will wrap around to 0 after 2 billion transactions.

A 64bit oxenstored has an int with 63 bits of width, meaning that once 4
billion transactions are used, the allocated id will be truncated when written
into the uin32_t field in the ring.  This causes the client to reply with the
truncated id, breaking any further attempt to use any transactions.

Limit all transaction ids to the range between 1 and 0x7ffffffe.  This is the
best which can be done without making oxenstored depend on Stdint or Cstruct,
yet still work for 32bit builds.

Also check that the proposed new transaction id isn't currently in use.  For
the first 2 billion transactions there is no chance of a collision, and after
that, the chance is at most 20 (the default open transaction quota) in 2
billion.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: David Scott <dave@recoil.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/configure: fix pkg-config install path for FreeBSD
Roger Pau Monne [Tue, 25 Oct 2016 09:53:28 +0000 (11:53 +0200)]
tools/configure: fix pkg-config install path for FreeBSD

pkg-config from FreeBSD ports doesn't have ${prefix}/share/pkgconfig in the
default search path, fix this by having a PKG_INSTALLDIR variable that can
be changed on a per-OS basis.

It would be best to use PKG_INSTALLDIR as defined by the pkg.m4 macro, but
sadly this also reports a wrong value on FreeBSD (${libdir}/pkgconfig, which
expands to /usr/local/lib/pkgconfig by default, and is also _not_ part of
the default pkg-config search path).

This patch should not change the behavior for Linux installs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Alexander Nusov <alexander.nusov@nfvexpress.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoUpdate QEMU_UPSTREAM_REVISION 4.8.0-rc4
Ian Jackson [Wed, 26 Oct 2016 11:06:17 +0000 (12:06 +0100)]
Update QEMU_UPSTREAM_REVISION

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
8 years agolibacpi: require ACPI_BUILD_DIR to be set
Wei Liu [Fri, 14 Oct 2016 17:02:31 +0000 (18:02 +0100)]
libacpi: require ACPI_BUILD_DIR to be set

It's better to have a explicit error than a build failure returned by
gcc.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86: MISALIGNSSE feature depends on SSE
Jan Beulich [Mon, 24 Oct 2016 15:34:17 +0000 (17:34 +0200)]
x86: MISALIGNSSE feature depends on SSE

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86emul: fix XOP decode
Jan Beulich [Mon, 24 Oct 2016 15:33:30 +0000 (17:33 +0200)]
x86emul: fix XOP decode

Commit f09902c456 ("x86emul: add XOP decoding") ended up overwriting b
prior to the last use of its previously stored value. SLightly defer
fetching the main opcode byte.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: initialise nr_dom_vcpus to fix 4a6070ea9
Wei Liu [Mon, 24 Oct 2016 10:11:15 +0000 (11:11 +0100)]
libxl: initialise nr_dom_vcpus to fix 4a6070ea9

Clang complains nr_dom_vcpus may be used uninitialised after
4a6070ea9.

The real issue is vinfo can be NULL and nr_dom_vcpus remains
uninitialised if previous call fails.

Initialise nr_dom_vcpus to 0 at the beginning of the loop to fix the
issue.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/x86: Fixup misc stale issues
Andrew Cooper [Sat, 1 Oct 2016 18:36:12 +0000 (18:36 +0000)]
xen/x86: Fixup misc stale issues

 * Dom0 does now have an arch_config passed.
 * hypercall() and smp_alloc_memory() no longer exist.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/emul: Correctly annotate all push/pop %sreg instructions
Andrew Cooper [Wed, 19 Oct 2016 16:30:36 +0000 (17:30 +0100)]
x86/emul: Correctly annotate all push/pop %sreg instructions

c/s 373923ed9c2 "x86emul: fix pushing of selector registers" redirected
all push %sreg instructions into the general push path.  However, this
ends up hitting the assertion at the head of the push path.

Annotate All push and pop %sreg instructions as Mov, indicating that
they do not read the destination operand.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: Handle existing link to acpi directory
Boris Ostrovsky [Sun, 23 Oct 2016 23:09:19 +0000 (19:09 -0400)]
tools: Handle existing link to acpi directory

The link to acpi include directory is not removed by Makefile's 'clean'
target. This can lead to make failure when making xen/.dir target if
we try to create the link again.

We can prevent this failure by (1) removing acpi link when cleaning up
and (2) adding '-f' option to 'ln' (just like we do for other targets).

We should also add tools/include/acpi link to .gitignore.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoRevert "timer: process softirq during dumping timer info"
Wei Liu [Fri, 21 Oct 2016 16:51:59 +0000 (17:51 +0100)]
Revert "timer: process softirq during dumping timer info"

This reverts commit ed7e33747da83ce805c00cd457e71075e34f0854.

Assertion is triggered:
(XEN) Assertion '!in_irq() && local_irq_is_enabled()' failed at softirq.c:57

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: avoid considering pCPUs outside of the cpupool during NUMA placement
Dario Faggioli [Fri, 21 Oct 2016 13:49:30 +0000 (15:49 +0200)]
libxl: avoid considering pCPUs outside of the cpupool during NUMA placement

During NUMA automatic placement, the information
of how many vCPUs can run on what NUMA nodes is used,
in order to spread the load as evenly as possible.

Such information is derived from vCPU hard and soft
affinity, but that is not enough. In fact, affinity
can be set to be a superset of the pCPUs that belongs
to the cpupool in which a domain is but, of course,
the domain will never run on pCPUs outside of its
cpupool.

Take this into account in the placement algorithm.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs:RTDS: Correct mistakes in feature doc
Meng Xu [Wed, 19 Oct 2016 14:48:39 +0000 (10:48 -0400)]
docs:RTDS: Correct mistakes in feature doc

Correct the mistakes in the example command
Correct a simple typo.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agovscsiif.h: replace PAGE_SIZE with VSCSIIF_PAGE_SIZE
Stefano Stabellini [Wed, 19 Oct 2016 19:22:35 +0000 (12:22 -0700)]
vscsiif.h: replace PAGE_SIZE with VSCSIIF_PAGE_SIZE

Do not reference PAGE_SIZE directly: it could be undefined, or it could
have different values in the frontend or in the backend.

Define VSCSIIF_PAGE_SIZE as 4096, assuming all users of vscsiif.h have
4K page granularity. Replace PAGE_SIZE with VSCSIIF_PAGE_SIZE.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agousbif.h: replace PAGE_SIZE with USBIF_RING_SIZE
Stefano Stabellini [Wed, 19 Oct 2016 19:22:34 +0000 (12:22 -0700)]
usbif.h: replace PAGE_SIZE with USBIF_RING_SIZE

Do not reference PAGE_SIZE directly: it could be undefined, or it could
have different values in the frontend or in the backend.

Define USBIF_RING_SIZE as 4096, assuming all users of usbif.h have 4K
page granularity. Replace PAGE_SIZE with USBIF_RING_SIZE.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoaltp2m: don't attempt to unshare pages during change_altp2m_gfn op
Tamas K Lengyel [Fri, 14 Oct 2016 00:00:47 +0000 (18:00 -0600)]
altp2m: don't attempt to unshare pages during change_altp2m_gfn op

Attempting to change gfn mappings with altp2m on a memory shared page results
in a lock-order violation (mm locking order violation: 282 > 254), which
crashes the hypervisor. Don't attempt to automatically unshare such pages and
just fall back to failing the op if the page type is not correct.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/Intel: virtualize support for cpuid faulting
Kyle Huey [Thu, 20 Oct 2016 13:44:28 +0000 (06:44 -0700)]
x86/Intel: virtualize support for cpuid faulting

On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
faulting state in vmx_do_cpuid and hvmemul_cpuid. A new function,
hvm_check_cpuid_fault will check if cpuid faulting is enabled and the CPL > 0.
When it returns true, the cpuid handling functions will inject a GP(0). Notably
explicit hardware support for faulting on cpuid is not necessary to emulate
support for an HVM guest.

On PV guests, hardware support is required so that userspace cpuid will trap
to Xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that
aren't the control domain, see the comment in intel_ctxt_switch_levelling).
Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
do_general_protection). Once there we simply decline to emulate cpuid if the
CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
handle.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere
Kyle Huey [Thu, 20 Oct 2016 13:44:27 +0000 (06:44 -0700)]
x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere

While we're here, use bool instead of bool_t.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoConfig.mk: use non-debug build for 4.8
Wei Liu [Thu, 20 Oct 2016 13:00:47 +0000 (14:00 +0100)]
Config.mk: use non-debug build for 4.8

Set debug ?= n in preparation for late RCs and eventual release.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/svm: Drop adjustment of X86_FEATURE_APIC
Andrew Cooper [Thu, 1 Sep 2016 09:38:27 +0000 (10:38 +0100)]
x86/svm: Drop adjustment of X86_FEATURE_APIC

The common hvm_cpuid() code already does this.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself
He Chen [Wed, 19 Oct 2016 08:03:24 +0000 (16:03 +0800)]
xen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself

SMEP/SMAP is a security feature to prevent kernel executing/accessing
user address involuntarily, any such behavior will lead to a page fault.

SMEP/SMAP is open (in CR4) for both Xen and HVM guest in earlier code.
SMEP/SMAP bit set in Xen CR4 would enforce security checking for 32-bit
PV guest which will suffer unknown SMEP/SMAP page fault when guest
kernel attempt to access user address although SMEP/SMAP is close for
PV guests.

This patch introduces a new boot option value "hvm" for "sm{e,a}p", it
is going to diable SMEP/SMAP for Xen hypervisor while enable them for
HVM. In this way, 32-bit PV guest will not suffer SMEP/SMAP security
issue. Users can choose whether open SMEP/SMAP for Xen itself,
especially when they are going to run 32-bit PV guests.

Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
[Fixed up command line docs]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/vmx: Reduce the verbosity of the vmentry failure error reporting
Andrew Cooper [Thu, 13 Oct 2016 11:12:20 +0000 (12:12 +0100)]
x86/vmx: Reduce the verbosity of the vmentry failure error reporting

Identify the affected vcpu at the start of the message.  While tweaking this
area, add extra newlines between cases.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/vmx: Print the problematic MSR if a vmentry fails
Andrew Cooper [Thu, 13 Oct 2016 10:46:58 +0000 (11:46 +0100)]
x86/vmx: Print the problematic MSR if a vmentry fails

Sample error looks like:

  (XEN) Failed vm entry (exit reason 0x80000022) caused by MSR loading (entry 13).
  (XEN)   msr 0000068a val 1fff800000102af0 (mbz 0)
  (XEN) ************* VMCS Area **************

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>