]> xenbits.xensource.com Git - people/iwj/xen.git/log
people/iwj/xen.git
7 years agognttab: correct GNTTABOP_cache_flush empty batch handling
Jan Beulich [Wed, 20 Dec 2017 14:43:53 +0000 (15:43 +0100)]
gnttab: correct GNTTABOP_cache_flush empty batch handling

Jann validly points out that with a caller bogusly requesting a zero-
element batch with non-zero high command bits (the ones used for
continuation encoding), the assertion right before the call to
hypercall_create_continuation() would trigger. A similar situation would
arise afaict for non-empty batches with op and/or length zero in every
element.

While we want the former to succeed (as we do elsewhere for similar
no-op requests), the latter can clearly be converted to an error, as
this is a state that can't be the result of a prior operation.

Take the opportunity and also correct the order of argument checks:
We shouldn't accept zero-length elements with unknown bits set in "op".
Also constify cache_flush()'s first parameter.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 9c22e4d67f5552c7c896ed83bd95d5d4c5837a9d
master date: 2017-12-04 11:03:32 +0100

7 years agox86/microcode: Add support for fam17h microcode loading
Tom Lendacky [Wed, 20 Dec 2017 14:43:14 +0000 (15:43 +0100)]
x86/microcode: Add support for fam17h microcode loading

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes.  Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[Linux commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf]

Ported to Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 61d458ba8c171809e8dd9abd19339c87f3f934ca
master date: 2017-12-13 14:30:10 +0000

7 years agox86/mm: drop bogus paging mode assertion
Jan Beulich [Wed, 20 Dec 2017 14:42:42 +0000 (15:42 +0100)]
x86/mm: drop bogus paging mode assertion

Olaf has observed this assertion to trigger after an aborted migration
of a PV guest:

(XEN) Xen call trace:
(XEN)    [<ffff82d0802a85dc>] do_page_fault+0x39f/0x55c
(XEN)    [<ffff82d08036b7d8>] x86_64/entry.S#handle_exception_saved+0x66/0xa4
(XEN)    [<ffff82d0802a9274>] __copy_to_user_ll+0x22/0x30
(XEN)    [<ffff82d0802772d4>] update_runstate_area+0x19c/0x228
(XEN)    [<ffff82d080277371>] domain.c#_update_runstate_area+0x11/0x39
(XEN)    [<ffff82d080277596>] context_switch+0x1fd/0xf25
(XEN)    [<ffff82d0802395c5>] schedule.c#schedule+0x303/0x6a8
(XEN)    [<ffff82d08023d067>] softirq.c#__do_softirq+0x6c/0x95
(XEN)    [<ffff82d08023d0da>] do_softirq+0x13/0x15
(XEN)    [<ffff82d08036b2f1>] x86_64/entry.S#process_softirqs+0x21/0x30

Release builds work fine, which is a first indication that the assertion
isn't really needed.

What's worse though - there appears to be a timing window where the
guest runs in shadow mode, but not in log-dirty mode, and that is what
triggers the assertion (the same could, afaict, be achieved by test-
enabling shadow mode on a PV guest). This is because turing off log-
dirty mode is being performed in two steps: First the log-dirty bit gets
cleared (paging_log_dirty_disable() [having paused the domain] ->
sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by
unpausing the domain and only then clearing shadow mode (via
shadow_test_disable(), which pauses the domain a second time).

Hence besides removing the ASSERT() here (or optionally replacing it by
explicit translate and refcounts mode checks, but this seems rather
pointless now that the three are tied together) I wonder whether either
shadow_one_bit_disable() should turn off shadow mode if no other bit
besides PG_SH_enable remains set (just like shadow_one_bit_enable()
enables it if not already set), or the domain pausing scope should be
extended so that both steps occur without the domain getting a chance to
run in between.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b95f7be32d668fa4b09300892ebe19636ecebe36
master date: 2017-12-12 16:56:15 +0100

7 years agox86/mb2: avoid Xen image when looking for module/crashkernel position
Daniel Kiper [Wed, 20 Dec 2017 14:42:13 +0000 (15:42 +0100)]
x86/mb2: avoid Xen image when looking for module/crashkernel position

Commit e22e1c4 (x86/EFI: avoid Xen image when looking for module/kexec
position) added relevant check for EFI case. However, since commit
f75a304 (x86: add multiboot2 protocol support for relocatable images)
Multiboot2 compatible bootloaders are able to relocate Xen image too.
So, we have to avoid also Xen image region in such cases.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 9589927e5bf9e123ec42b6e0b0809f153bd92732
master date: 2017-12-12 14:30:53 +0100

7 years agox86/vvmx: don't enable vmcs shadowing for nested guests
Sergey Dyasli [Wed, 20 Dec 2017 14:41:33 +0000 (15:41 +0100)]
x86/vvmx: don't enable vmcs shadowing for nested guests

Running "./xtf_runner vvmx" in L1 Xen under L0 Xen produces the
following result on H/W with VMCS shadowing:

    Test: vmxon
    Failure in test_vmxon_in_root_cpl0()
      Expected 0x8200000f: VMfailValid(15) VMXON_IN_ROOT
           Got 0x82004400: VMfailValid(17408) <unknown>
    Test result: FAILURE

This happens because SDM allows vmentries with enabled VMCS shadowing
VM-execution control and VMCS link pointer value of ~0ull. But results
of a nested VMREAD are undefined in such cases.

Fix this by not copying the value of VMCS shadowing control from vmcs01
to vmcs02.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 19fdb8e258619aea265af9c183e035e545cbc2d2
master date: 2017-12-01 19:03:27 +0000

7 years agoxen/pv: Construct d0v0's GDT properly
Andrew Cooper [Wed, 20 Dec 2017 14:40:58 +0000 (15:40 +0100)]
xen/pv: Construct d0v0's GDT properly

c/s cf6d39f8199 "x86/PV: properly populate descriptor tables" changed the GDT
to reference zero_page for intermediate frames between the guest and Xen
frames.

Because dom0_construct_pv() doesn't call arch_set_info_guest(), some bits of
initialisation are missed, including the pv_destroy_gdt() which initially
fills the references to zero_page.

In practice, this means there is a window between starting and the first call
to HYPERCALL_set_gdt() were lar/lsl/verr/verw suffer non-architectural
behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 08f27f4468eedbeccaac9fdda4ef732247efd74e
master date: 2017-12-01 19:03:26 +0000

7 years agoupdate Xen version to 4.10.1-pre
Jan Beulich [Wed, 20 Dec 2017 14:39:44 +0000 (15:39 +0100)]
update Xen version to 4.10.1-pre

7 years agoXen 4.10 release: update README and xen/Makefile versions
Ian Jackson [Wed, 13 Dec 2017 11:37:59 +0000 (11:37 +0000)]
Xen 4.10 release: update README and xen/Makefile versions

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoXen 4.10 release: update Config.mk revisions to refer to tags
Ian Jackson [Wed, 13 Dec 2017 11:36:12 +0000 (11:36 +0000)]
Xen 4.10 release: update Config.mk revisions to refer to tags

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoMerge branch 'xsa248-251' into staging-4.10
Ian Jackson [Tue, 12 Dec 2017 12:23:17 +0000 (12:23 +0000)]
Merge branch 'xsa248-251' into staging-4.10

7 years agox86: don't wrongly trigger linear page table assertion (2)
Jan Beulich [Fri, 8 Dec 2017 15:32:05 +0000 (15:32 +0000)]
x86: don't wrongly trigger linear page table assertion (2)

_put_final_page_type(), when free_page_type() has exited early to allow
for preemption, should not update the time stamp, as the page continues
to retain the typ which is in the process of being unvalidated. I can't
see why the time stamp update was put on that path in the first place
(albeit it may well have been me who had put it there years ago).

This is part of XSA-240.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/paging: don't unconditionally BUG() on finding SHARED_M2P_ENTRY
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/paging: don't unconditionally BUG() on finding SHARED_M2P_ENTRY

PV guests can fully control the values written into the P2M.

This is XSA-251.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/shadow: fix ref-counting error handling
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/shadow: fix ref-counting error handling

The old-Linux handling in shadow_set_l4e() mistakenly ORed together the
results of sh_get_ref() and sh_pin(). As the latter failing is not a
correctness problem, simply ignore its return value.

In sh_set_toplevel_shadow() a failing sh_get_ref() must not be
accompanied by installing the entry, despite the domain being crashed.

This is XSA-250.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: fix refcount overflow check
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/shadow: fix refcount overflow check

Commit c385d27079 ("x86 shadow: for multi-page shadows, explicitly track
the first page") reduced the refcount width to 25, without adjusting the
overflow check. Eliminate the disconnect by using a manifest constant.

Interestingly, up to commit 047782fa01 ("Out-of-sync L1 shadows: OOS
snapshot") the refcount was 27 bits wide, yet the check was already
using 26.

This is XSA-249.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/mm: don't wrongly set page ownership
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/mm: don't wrongly set page ownership

PV domains can obtain mappings of any pages owned by the correct domain,
including ones that aren't actually assigned as "normal" RAM, but used
by Xen internally.  At the moment such "internal" pages marked as owned
by a guest include pages used to track logdirty bits, as well as p2m
pages and the "unpaged pagetable" for HVM guests. Since the PV memory
management and shadow code conflict in their use of struct page_info
fields, and since shadow code is being used for log-dirty handling for
PV domains, pages coming from the shadow pool must, for PV domains, not
have the domain set as their owner.

While the change could be done conditionally for just the PV case in
shadow code, do it unconditionally (and for consistency also for HAP),
just to be on the safe side.

There's one special case though for shadow code: The page table used for
running a HVM guest in unpaged mode is subject to get_page() (in
set_shadow_status()) and hence must have its owner set.

This is XSA-248.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/HVM: don't retain emulated insn cache when exiting back to guest
Jan Beulich [Thu, 7 Dec 2017 09:59:22 +0000 (10:59 +0100)]
x86/HVM: don't retain emulated insn cache when exiting back to guest

vio->mmio_retry is being set when a repeated string insn is being split
up. In that case we'll exit to the guest, expecting immediate re-entry.
Interruptions, however, may be serviced by the guest before re-entry
from the repeated string insn. Any emulation needed in the course of
handling the interruption must not fetch from the internally maintained
cache.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
master commit: 5fcb26e69e8089e20c9168774bee681b8f5a3187
master date: 2017-12-06 12:50:23 +0100

7 years agox86/hvm: fix interaction between internal and external emulation
Paul Durrant [Tue, 28 Nov 2017 14:05:19 +0000 (14:05 +0000)]
x86/hvm: fix interaction between internal and external emulation

A call to handle_hvm_io_completion() is needed for completing I/O
that requires external emulation. Such completion should be requested when
hvm_vcpu_io_need_completion() returns true after hvm_emulate_once() has
completed. This is indicative of the underlying I/O emulation having
returned X86EMUL_RETRY and hence a re-emulation of the instruction is
needed to pick up the result of the I/O.

A call to handle_hvm_io_completion() is NOT needed when the underlying
I/O has not returned X86EMUL_RETRY since there will be no result to pick
up. Hence it bogus to request such completion when mmio_retry is set,
since this can only happen if the underlying I/O emulation has returned
X86EMUL_OKAY (meaning the I/O has completed successfully).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 9c9384d6d8184ca6d21975ccf4e4f72b560540cc)

7 years agox86: Avoid corruption on migrate for vcpus using CPUID Faulting
Andrew Cooper [Sat, 25 Nov 2017 15:17:14 +0000 (15:17 +0000)]
x86: Avoid corruption on migrate for vcpus using CPUID Faulting

Xen 4.8 and later virtualises CPUID Faulting support for guests.  However, the
value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
that the current cpuid faulting setting is lost on migrate/suspend/resume.

Instead of following the MSR status quo, take the opportunity to make the
logic more generic, and in particular, trivial to extend for future MSRs.

This is done by discarding the notion of optional MSRs, and requiring the
toolstack to be prepared to move all of the MSRs, although only a subset will
typically need to move.

This allows for the use of guest_{rd,wr}msr() alone to evaluate whether an MSR
needs moving.  This is a benefit because it means there is a single piece of
logic responsible for evaluating whether a guest can use an MSR, and which
values are acceptable.

One small adjustment to guest_wrmsr() is required to cope with being called in
toolstack context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit b90f86be161c74df8cb69c98d9f22885d9d87114)

7 years agoDisable debug for 4.10 stable branch, in preparation for release
Ian Jackson [Fri, 1 Dec 2017 15:15:39 +0000 (15:15 +0000)]
Disable debug for 4.10 stable branch, in preparation for release

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoRevert "xen/arm: domain_builder: irq sanity check logic fix"
Andrew Cooper [Wed, 29 Nov 2017 11:45:02 +0000 (11:45 +0000)]
Revert "xen/arm: domain_builder: irq sanity check logic fix"

This reverts commit 11e7dd958de73a45645bd40d82280660bd2c9ee8.

It breaks boot on ARM.

Reported-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/arm: domain_builder: irq sanity check logic fix
Stewart Hildebrand [Tue, 28 Nov 2017 14:42:03 +0000 (14:42 +0000)]
xen/arm: domain_builder: irq sanity check logic fix

It's not possible for an irq to be both below 16 and greater/equal than 32.
Also fix the reference to linux documentation while we're at it.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoarm64: ITS: fix cacheability adjustment
Andre Przywara [Thu, 16 Nov 2017 12:02:35 +0000 (12:02 +0000)]
arm64: ITS: fix cacheability adjustment

If the host GICv3 redistributor reports that the pending table cannot
use shareable memory, we try to drop the cacheability attributes as
well. However we fail horribly in doing computer science 101 bit
masking, effectively clearing the whole register instead of just a few
bits.
Fix this by removing the one redundant masking operation and adding the
magic negation for the actually needed other operation.

Reported-by: Manish Jaggi <manish.jaggi@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools: xentoolcore_restrict_all: Do deregistration before close
Ian Jackson [Tue, 14 Nov 2017 12:15:42 +0000 (12:15 +0000)]
tools: xentoolcore_restrict_all: Do deregistration before close

Closing the fd before unhooking it from the list runs the risk that a
concurrent thread calls xentoolcore_restrict_all will operate on the
old fd value, which might refer to a new fd by then.  So we need to do
it in the other order.

Sadly this weakens the guarantee provided by xentoolcore_restrict_all
slightly, but not (I think) in a problematic way.  It would be
possible to implement the previous guarantee, but it would involve
replacing all of the close() calls in all of the individual osdep
parts of all of the individual libraries with calls to a new function
which does
   dup2("/dev/null", thing->fd);
   pthread_mutex_lock(&handles_lock);
   thing->fd = -1;
   pthread_mutex_unlock(&handles_lock);
   close(fd);
which would be terribly tedious.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoimprove XENMEM_add_to_physmap_batch address checking
Jan Beulich [Tue, 28 Nov 2017 12:15:12 +0000 (13:15 +0100)]
improve XENMEM_add_to_physmap_batch address checking

As a follow-up to XSA-212 we should have addressed a similar issue here:
The handles being advanced at the top of xenmem_add_to_physmap_batch()
means we allow hypervisor space accesses (in particular, for "errs",
writes) with suitably crafted input arguments. This isn't a security
issue in this case because of the limited width of struct
xen_add_to_physmap_batch's size field: It being 16-bits wide, only the
r/o M2P area can be accessed. Still we can and should do better.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: check paging mode earlier in xenmem_add_to_physmap_one()
Jan Beulich [Tue, 28 Nov 2017 12:14:43 +0000 (13:14 +0100)]
x86: check paging mode earlier in xenmem_add_to_physmap_one()

There's no point in deferring this until after some initial processing,
and it's actively wrong for the XENMAPSPACE_gmfn_foreign handling to not
have such a check at all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: replace bad ASSERT() in xenmem_add_to_physmap_one()
Jan Beulich [Tue, 28 Nov 2017 12:14:10 +0000 (13:14 +0100)]
x86: replace bad ASSERT() in xenmem_add_to_physmap_one()

There are no locks being held, i.e. it is possible to be triggered by
racy hypercall invocations. Subsequent code doesn't really depend on the
checked values, so this is not a security issue.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agop2m: Check return value of p2m_set_entry() when decreasing reservation
George Dunlap [Tue, 28 Nov 2017 12:13:26 +0000 (13:13 +0100)]
p2m: Check return value of p2m_set_entry() when decreasing reservation

If the entire range specified to p2m_pod_decrease_reservation() is marked
populate-on-demand, then it will make a single p2m_set_entry() call,
reducing its PoD entry count.

Unfortunately, in the right circumstances, this p2m_set_entry() call
may fail.  It that case, repeated calls to decrease_reservation() may
cause p2m->pod.entry_count to fall below zero, potentially tripping
over BUG_ON()s to the contrary.

Instead, check to see if the entry succeeded, and return false if not.
The caller will then call guest_remove_page() on the gfns, which will
return -EINVAL upon finding no valid memory there to return.

Unfortunately if the order > 0, the entry may have partially changed.
A domain_crash() is probably the safest thing in that case.

Other p2m_set_entry() calls in the same function should be fine,
because they are writing the entry at its current order.  Nonetheless,
check the return value and crash if our assumption turns otu to be
wrong.

This is part of XSA-247.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agop2m: Always check to see if removing a p2m entry actually worked
George Dunlap [Tue, 28 Nov 2017 12:13:03 +0000 (13:13 +0100)]
p2m: Always check to see if removing a p2m entry actually worked

The PoD zero-check functions speculatively remove memory from the p2m,
then check to see if it's completely zeroed, before putting it in the
cache.

Unfortunately, the p2m_set_entry() calls may fail if the underlying
pagetable structure needs to change and the domain has exhausted its
p2m memory pool: for instance, if we're removing a 2MiB region out of
a 1GiB entry (in the p2m_pod_zero_check_superpage() case), or a 4k
region out of a 2MiB or larger entry (in the p2m_pod_zero_check()
case); and the return value is not checked.

The underlying mfn will then be added into the PoD cache, and at some
point mapped into another location in the p2m.  If the guest
afterwards ballons out this memory, it will be freed to the hypervisor
and potentially reused by another domain, in spite of the fact that
the original domain still has writable mappings to it.

There are several places where p2m_set_entry() shouldn't be able to
fail, as it is guaranteed to write an entry of the same order that
succeeded before.  Add a backstop of crashing the domain just in case,
and an ASSERT_UNREACHABLE() to flag up the broken assumption on debug
builds.

While we're here, use PAGE_ORDER_2M rather than a magic constant.

This is part of XSA-247.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pod: prevent infinite loop when shattering large pages
Julien Grall [Tue, 28 Nov 2017 12:11:55 +0000 (13:11 +0100)]
x86/pod: prevent infinite loop when shattering large pages

When populating pages, the PoD may need to split large ones using
p2m_set_entry and request the caller to retry (see ept_get_entry for
instance).

p2m_set_entry may fail to shatter if it is not possible to allocate
memory for the new page table. However, the error is not propagated
resulting to the callers to retry infinitely the PoD.

Prevent the infinite loop by return false when it is not possible to
shatter the large mapping.

This is XSA-246.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoSUPPORT.md: Add statement on PCI passthrough
George Dunlap [Wed, 22 Nov 2017 19:19:04 +0000 (19:19 +0000)]
SUPPORT.md: Add statement on PCI passthrough

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add secondary memory management features
George Dunlap [Wed, 22 Nov 2017 19:19:04 +0000 (19:19 +0000)]
SUPPORT.md: Add secondary memory management features

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add Security-releated features
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add Security-releated features

With the exception of driver domains, which depend on PCI passthrough,
and will be introduced later.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoSUPPORT.md: Add 'easy' HA / FT features
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add 'easy' HA / FT features

Migration being one of the key 'non-easy' ones to be added later.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add Debugging, analysis, crash post-portem
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add Debugging, analysis, crash post-portem

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add ARM-specific virtual hardware
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add ARM-specific virtual hardware

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoSUPPORT.md: Add x86-specific virtual hardware
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add x86-specific virtual hardware

x86-specific virtual hardware provided by the hypervisor, toolstack,
or QEMU.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
7 years agoSUPPORT.md: Add virtual devices common to ARM and x86
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add virtual devices common to ARM and x86

Mostly PV protocols.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Toolstack core
George Dunlap [Wed, 22 Nov 2017 19:19:01 +0000 (19:19 +0000)]
SUPPORT.md: Toolstack core

For now only include xl-specific features, or interaction with the
system.  Feature support matrix will be added when features are
mentioned.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agoSUPPORT.md: Add scalability features
George Dunlap [Wed, 22 Nov 2017 19:19:01 +0000 (19:19 +0000)]
SUPPORT.md: Add scalability features

Superpage support and PVHVM.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.gralL@linaro.org>
7 years agoSUPPORT.md: Add core ARM features
George Dunlap [Thu, 23 Nov 2017 17:32:16 +0000 (17:32 +0000)]
SUPPORT.md: Add core ARM features

Hardware support and guest type.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoSUPPORT.md: Add some x86 features
George Dunlap [Thu, 23 Nov 2017 17:32:16 +0000 (17:32 +0000)]
SUPPORT.md: Add some x86 features

Including host architecture support and guest types.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add core functionality
George Dunlap [Thu, 23 Nov 2017 17:32:15 +0000 (17:32 +0000)]
SUPPORT.md: Add core functionality

Core memory management and scheduling.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoIntroduce skeleton SUPPORT.md
George Dunlap [Thu, 23 Nov 2017 17:32:14 +0000 (17:32 +0000)]
Introduce skeleton SUPPORT.md

Add a machine-readable file to describe what features are in what
state of being 'supported', as well as information about how long this
release will be supported, and so on.

The document should be formatted using "semantic newlines" [1], to make
changes easier.

Begin with the basic framework.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
[1] http://rhodesmill.org/brandon/2012/one-sentence-per-line/

7 years agox86emul/test: keep compiler from using {x,y,z}mm registers itself
Jan Beulich [Thu, 23 Nov 2017 10:40:31 +0000 (11:40 +0100)]
x86emul/test: keep compiler from using {x,y,z}mm registers itself

Since the emulator acts on the live hardware registers, we need to
prevent the compiler from using them e.g. for inlined memcpy() /
memset() (as gcc7 does). We can't, however, set this from the command
line, as otherwise the 64-bit build would face issues with functions
returning floating point values and being declared in standard headers.

As the pragma isn't available prior to gcc6, we need to invoke it
conditionally. Luckily up to gcc6 we haven't seen generated code access
SIMD registers beyond what our asm()s do.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agosync CPU state upon final domain destruction
Jan Beulich [Thu, 23 Nov 2017 10:38:22 +0000 (11:38 +0100)]
sync CPU state upon final domain destruction

See the code comment being added for why we need this.

This is being placed here to balance between the desire to prevent
future similar issues (the risk of which would grow if it was put
further down the call stack, e.g. in vmx_vcpu_destroy()) and the
intention to limit the performance impact (otherwise it could also go
into rcu_do_batch(), paralleling the use in do_tasklet_work()).

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: Don't corrupt the HVM context stream when writing the MSR record
Andrew Cooper [Thu, 16 Nov 2017 21:34:02 +0000 (21:34 +0000)]
x86/hvm: Don't corrupt the HVM context stream when writing the MSR record

Ever since it was introduced in c/s bd1f0b45ff, hvm_save_cpu_msrs() has had a
bug whereby it corrupts the HVM context stream if some, but fewer than the
maximum number of MSRs are written.

_hvm_init_entry() creates an hvm_save_descriptor with length for
msr_count_max, but in the case that we write fewer than max, h->cur only moves
forward by the amount of space used, causing the subsequent
hvm_save_descriptor to be written within the bounds of the previous one.

To resolve this, reduce the length reported by the descriptor to match the
actual number of bytes used.

A typical failure on the destination side looks like:

    (XEN) HVM4 restore: CPU_MSR 0
    (XEN) HVM4.0 restore: not enough data left to read 56 MSR bytes
    (XEN) HVM4 restore: failed to load entry 20/0

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/libxc: Fix restoration of PV MSRs after migrate
Andrew Cooper [Thu, 16 Nov 2017 21:10:00 +0000 (21:10 +0000)]
tools/libxc: Fix restoration of PV MSRs after migrate

There are two bugs in process_vcpu_msrs() which clearly demonstrate that I
didn't test this bit of Migration v2 very well when writing it...

vcpu->msrsz is always expected to be a multiple of xen_domctl_vcpu_msr_t
records in a spec-compliant stream, so the modulo yields 0 for the msr_count,
rather than the actual number sent in the stream.

Passing 0 for the msr_count causes the hypercall to exit early, and hides the
fact that the guest handle is inserted into the wrong field in the domctl
union.

The reason that these bugs have gone unnoticed for so long is that the only
MSRs passed like this for PV guests are the AMD DBGEXT MSRs, which only exist
in fairly modern hardware, and whose use doesn't appear to be implemented in
any contemporary PV guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: Fix altp2m_vcpu_enable_notify error handling
Adrian Pop [Wed, 15 Nov 2017 13:47:59 +0000 (15:47 +0200)]
x86/hvm: Fix altp2m_vcpu_enable_notify error handling

The altp2m_vcpu_enable_notify subop handler might skip calling
rcu_unlock_domain() after rcu_lock_current_domain().  Albeit since both
rcu functions are no-ops when run on the current domain, this doesn't
really have repercussions.

The second change is adding a missing break that would have potentially
enabled #VE for the current domain even if it had intended to enable it
for another one (not a supported functionality).

Signed-off-by: Adrian Pop <apop@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/shadow: correct SH_LINEAR mapping detection in sh_guess_wrmap()
Andrew Cooper [Thu, 16 Nov 2017 09:38:14 +0000 (10:38 +0100)]
x86/shadow: correct SH_LINEAR mapping detection in sh_guess_wrmap()

The fix for XSA-243 / CVE-2017-15592 (c/s bf2b4eadcf379) introduced a change
in behaviour for sh_guest_wrmap(), where it had to cope with no shadow linear
mapping being present.

As the name suggests, guest_vtable is a mapping of the guests pagetable, not
Xen's pagetable, meaning that it isn't the pagetable we need to check for the
shadow linear slot in.

The practical upshot is that a shadow HVM vcpu which switches into 4-level
paging mode, with an L4 pagetable that contains a mapping which aliases Xen's
SH_LINEAR_PT_VIRT_START will fool the safety check for whether a SHADOW_LINEAR
mapping is present.  As the check passes (when it should have failed), Xen
subsequently falls over the missing mapping with a pagefault such as:

    (XEN) Pagetable walk from ffff8140a0503880:
    (XEN)  L4[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L3[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L2[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L1[0x103] = 0000000000000000 ffffffffffffffff

This is part of XSA-243.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86: don't wrongly trigger linear page table assertion
Jan Beulich [Thu, 16 Nov 2017 09:37:29 +0000 (10:37 +0100)]
x86: don't wrongly trigger linear page table assertion

_put_page_type() may do multiple iterations until its cmpxchg()
succeeds. It invokes set_tlbflush_timestamp() on the first
iteration, however. Code inside the function takes care of this, but
- the assertion in _put_final_page_type() would trigger on the second
  iteration if time stamps in a debug build are permitted to be
  sufficiently much wider than the default 6 bits (see WRAP_MASK in
  flushtlb.c),
- it returning -EINTR (for a continuation to be scheduled) would leave
  the page inconsistent state (until the re-invocation completes).
Make the set_tlbflush_timestamp() invocation conditional, bypassing it
(for now) only in the case we really can't tolerate the stamp to be
stored.

This is part of XSA-240.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen/arm: p2m: Add more debug in get_page_from_gva
Julien Grall [Wed, 15 Nov 2017 19:34:14 +0000 (19:34 +0000)]
xen/arm: p2m: Add more debug in get_page_from_gva

The function get_page_from_gva is used by copy_*_guest helpers to
translate a guest virtual address to a machine physical address and take
reference on the page.

There are a couple of errors paths that will return the same value making
it difficult to know the exact error. Add more debug in each error patch
only for debug-build.

This should help narrowing down the intermittent failure with the
hypercall GNTTABOP_copy (see [1]).

[1] https://lists.xen.org/archives/html/xen-devel/2017-11/msg00942.html

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Change the return value of gvirt_to_maddr
Julien Grall [Wed, 15 Nov 2017 19:34:13 +0000 (19:34 +0000)]
xen/arm: mm: Change the return value of gvirt_to_maddr

Currently, gvirt_to_maddr return -EFAULT when the translation failed.
It might be useful to return the PAR_EL1 (Physical Address Register)
in such a case to get a better idea of the reason.

So modify the return value to use 0 on success or return the PAR on
failure.

The callers are modified to reflect the change of the return value.

Note that with the change in gvirt_to_maddr, ma needs to be initialized
to avoid GCC been confused (i.e value may be uninitialized) with the new
construction.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/mm: fix race condition in modify_xen_mappings()
Yu Zhang [Tue, 14 Nov 2017 16:11:26 +0000 (17:11 +0100)]
x86/mm: fix race condition in modify_xen_mappings()

In modify_xen_mappings(), a L1/L2 page table shall be freed,
if all entries of this page table are empty. Corresponding
L2/L3 PTE will need be cleared in such scenario.

However, concurrent paging structure modifications on different
CPUs may cause the L2/L3 PTEs to be already be cleared or set
to reference a superpage.

Therefore the logic to enumerate the L1/L2 page table and to
reset the corresponding L2/L3 PTE need to be protected with
spinlock. And the _PAGE_PRESENT and _PAGE_PSE flags need be
checked after the lock is obtained.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/mm: fix race conditions in map_pages_to_xen()
Min He [Tue, 14 Nov 2017 16:10:56 +0000 (17:10 +0100)]
x86/mm: fix race conditions in map_pages_to_xen()

In map_pages_to_xen(), a L2 page table entry may be reset to point to
a superpage, and its corresponding L1 page table need be freed in such
scenario, when these L1 page table entries are mapping to consecutive
page frames and having the same mapping flags.

However, variable `pl1e` is not protected by the lock before L1 page table
is enumerated. A race condition may happen if this code path is invoked
simultaneously on different CPUs.

For example, `pl1e` value on CPU0 may hold an obsolete value, pointing
to a page which has just been freed on CPU1. Besides, before this page
is reused, it will still be holding the old PTEs, referencing consecutive
page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` will
be triggered on CPU0, resulting the unexpected free of a normal page.

This patch fixes the above problem by protecting the `pl1e` with the lock.

Also, there're other potential race conditions. For instance, the L2/L3
entry may be modified concurrently on different CPUs, by routines such as
map_pages_to_xen(), modify_xen_mappings() etc. To fix this, this patch will
check the _PAGE_PRESENT and _PAGE_PSE flags, after the spinlock is obtained,
for the corresponding L2/L3 entry.

Signed-off-by: Min He <min.he@intel.com>
Signed-off-by: Yi Zhang <yi.z.zhang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: do not register hpet mmio during s3 cycle
Eric Chanudet [Tue, 14 Nov 2017 16:09:50 +0000 (17:09 +0100)]
x86/hvm: do not register hpet mmio during s3 cycle

Do it once at domain creation (hpet_init).

Sleep -> Resume cycles will end up crashing an HVM guest with hpet as
the sequence during resume takes the path:
-> hvm_s3_suspend
  -> hpet_reset
    -> hpet_deinit
    -> hpet_init
      -> register_mmio_handler
        -> hvm_next_io_handler

register_mmio_handler will use a new io handler each time, until
eventually it reaches NR_IO_HANDLERS, then hvm_next_io_handler calls
domain_crash.

Signed-off-by: Eric Chanudet <chanudete@ainfosec.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/xenstored: Check number of strings passed to do_control()
Pawel Wieczorkiewicz [Fri, 27 Oct 2017 16:32:15 +0000 (16:32 +0000)]
tools/xenstored: Check number of strings passed to do_control()

It is possible to send a zero-string message body to xenstore's
XS_CONTROL handling function. Then the number of strings is used
for an array allocation. This leads to a crash in strcmp() in a
CONTROL sub-command invocation loop.
The output of xs_count_string() should be verified and all 0 or
negative values should be rejected with an EINVAL. At least the
sub-command name must be specified.

The xenstore crash can only be triggered from within dom0 (there
is a check in do_control() rejecting all non-dom0 requests with
an EACCES).

Testing: reproduced with the following command:
python -c 'print 16*"\x00"' | nc -U $XENSTORED_RUNDIR/socket

Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Martin Pohlack <mpohlack@amazon.de>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agolibxl: Fix the bug introduced in commit "libxl: use correct type modifier for vuart_gfn"
Bhupinder Thakur [Tue, 31 Oct 2017 06:55:05 +0000 (12:25 +0530)]
libxl: Fix the bug introduced in commit "libxl: use correct type modifier for vuart_gfn"

In libxl__device_vuart_add vuart_gfn is getting stored as a hex value:

> flexarray_append(ro_front, GCSPRINTF("%"PRI_xen_pfn, state->vuart_gfn));

However, xenstore reads this value as a decimal value and tries to map the
wrong address and fails.

This patch introduces a new format specifier "PRIu_xen_pfn" which formats the value as a
decimal value.

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agolibs/evtchn: Remove active handler on clean-up or failure
Julien Grall [Fri, 10 Nov 2017 17:10:50 +0000 (17:10 +0000)]
libs/evtchn: Remove active handler on clean-up or failure

Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all:
Implement for libxenevtchn" added a call to register allowing to
restrict the event channel.

However, the call to deregister the handler was not performed if open
failed or when closing the event channel. This will result to corrupt
the list of handlers and potentially crash the application later one.

Fix it by calling xentoolcore_deregister_active_handle on failure and
closure.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoConfig.mk: Update QEMU changeset
Anthony PERARD [Mon, 13 Nov 2017 12:27:32 +0000 (12:27 +0000)]
Config.mk: Update QEMU changeset

New commits:
- xen/pt: allow QEMU to request MSI unmasking at bind time
To fix a passthrough bug.
- ui/gtk: Fix deprecation of vte_terminal_copy_clipboard
A build fix.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agodocs: rename hvmlite.markdown to pvh.markdown
Wei Liu [Sun, 12 Nov 2017 11:03:06 +0000 (11:03 +0000)]
docs: rename hvmlite.markdown to pvh.markdown

And remove stale paragraph and escape underscores.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agolibevtchn: fix build on non-Linux hosts
Roger Pau Monne [Wed, 8 Nov 2017 12:52:57 +0000 (12:52 +0000)]
libevtchn: fix build on non-Linux hosts

Non-Linux hosts (where osdep_evtchn_restrict is not yet supported)
made use of errno without including errno.h, fix this by including the
header.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agogcov: return EOPNOTSUPP for unimplemented gcov sysctl
Roger Pau Monné [Wed, 8 Nov 2017 12:41:51 +0000 (13:41 +0100)]
gcov: return EOPNOTSUPP for unimplemented gcov sysctl

ENOSYS should only be used by unimplemented top-level syscalls. Use
EOPNOTSUPP instead.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/cpuid: minor fixups missed from previous work
Andrew Cooper [Wed, 8 Nov 2017 12:40:40 +0000 (13:40 +0100)]
x86/cpuid: minor fixups missed from previous work

 * Add more feature names to ./xen-cpuid
 * Vertically align the magic comments in cpufeatureset.h

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agodocs/features: update the status of Credit2 implemented features
Dario Faggioli [Mon, 6 Nov 2017 10:35:23 +0000 (11:35 +0100)]
docs/features: update the status of Credit2 implemented features

As soft-affinity and caps will be available in Xen 4.10.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoRevert "tools/dombuilder: Switch to using gfn terminology for console and xenstore...
Wei Liu [Mon, 6 Nov 2017 14:52:39 +0000 (14:52 +0000)]
Revert "tools/dombuilder: Switch to using gfn terminology for console and xenstore rings"

This reverts commit f48b5449dabc770acdde6d25cfbd265cfb71034d, which
breaks pvgrub.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoRevert "tools/dombuilder: Fix asymmetry when setting up console and xenstore rings"
Wei Liu [Mon, 6 Nov 2017 14:52:20 +0000 (14:52 +0000)]
Revert "tools/dombuilder: Fix asymmetry when setting up console and xenstore rings"

This reverts commit 87b0ae7e8277d2fa13486ce2e11a941e55f8df40.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoRevert "tools/dombuilder: Prevent failures of xc_dom_gnttab_init()"
Wei Liu [Fri, 3 Nov 2017 14:14:46 +0000 (14:14 +0000)]
Revert "tools/dombuilder: Prevent failures of xc_dom_gnttab_init()"

This reverts commit 9ff6dbfa7576cc1c5d6f9a3c59c69a8503e36f11, which
breaks hvm save/restore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoMerge remote-tracking branch 'origin/staging' into staging
Wei Liu [Thu, 2 Nov 2017 17:07:58 +0000 (17:07 +0000)]
Merge remote-tracking branch 'origin/staging' into staging

7 years agotools/dombuilder: Prevent failures of xc_dom_gnttab_init()
Andrew Cooper [Thu, 12 Oct 2017 19:19:09 +0000 (20:19 +0100)]
tools/dombuilder: Prevent failures of xc_dom_gnttab_init()

Recent changes in grant table configuration have caused calls to
xc_dom_gnttab_init() to fail if not proceeded with a call to
xc_domain_set_gnttab_limits().  This is backwards from the point of view of
3rd party dombuilder users.

Add max_{grant,maptrack}_frames parameters to struct xc_dom_image, and require
them to be set by callers using xc_dom_gnttab_init().  Libxl, which uses
xc_dom_gnttab_init() itself is updated appropriately.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/dombuilder: Fix asymmetry when setting up console and xenstore rings
Andrew Cooper [Thu, 12 Oct 2017 19:19:08 +0000 (20:19 +0100)]
tools/dombuilder: Fix asymmetry when setting up console and xenstore rings

libxl always uses xc_dom_gnttab_init(), which internally calls
xc_dom_gnttab{_hvm,}_seed() to set up the grants point at the console and
xenstore rings.  For HVM guests, libxl then asks Xen for the information set
up previously, and calls xc_dom_gnttab_hvm_seed() a second time, which is
wasteful.  ARM construction expects libxl to have set up
dom->{console,xenstore}_evtchn earlier, so only actually functions because of
this second call.

Rationalise everything and make it consistent for all guests.

 1) Users of the domain builder are expected to provide
    dom->{console,xenstore}_{evtchn,domid} unconditionally.  This is checked
    by setting invalid values in xc_dom_allocate(), and checking in
    xc_dom_boot_image().

 2) For x86 HVM and ARM guests, the event channels are given to Xen at the
    same time as the ring gfns.  ARM already did this, but x86 is updated to
    match.  x86 PV already provides this information in the start_info page.

 3) Libxl is updated to drop all relevant functionality from
    hvm_build_set_params(), and behave consistently with PV guests when it
    comes to the handling of dom->{console,xenstore}_{evtchn,domid,gfn}.

This removes several redundant hypercalls (including a foreign mapping) from
the x86 HVM and ARM construction paths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/dombuilder: Switch to using gfn terminology for console and xenstore rings
Wei Liu [Thu, 12 Oct 2017 19:19:07 +0000 (20:19 +0100)]
tools/dombuilder: Switch to using gfn terminology for console and xenstore rings

The sole use of xc_dom_translated() and xc_dom_p2m() outside of the domain
builder is for libxl_dom() to translate the console and xenstore pfns back
into useful values.  PV guest pfns are only interesting to the domain builder,
and gfns are the address space used by all other hypercalls.

Renaming the fields in xc_dom_image is deliberate, as it will cause
out-of-tree users of the dombuilder to notice the different semantics.

Correct the terminology throughout xc_dom_gnttab{_hvm,}_seed(), which are all
using gfns despite the existing variable names.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
[ wei: fix stubdom build ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agocommon/multicall: Increase debugability for bad hypercalls
Andrew Cooper [Tue, 31 Oct 2017 17:07:41 +0000 (17:07 +0000)]
common/multicall: Increase debugability for bad hypercalls

While investigating an issue (in a new codepath I'd introduced, as it turns
out), leaving interrupts disabled manifested as a subsequent op in the
multicall failing a check_lock() test.

The codepath would have hit the ASSERT_NOT_IN_ATOMIC on the return-to-guest
path, had it not hit the check_lock() first.

Call ASSERT_NOT_IN_ATOMIC() after each operation in the multicall, to make
failures more obvious.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agocommon/spinlock: Improve the output from check_lock() if it trips
Andrew Cooper [Mon, 30 Oct 2017 17:42:52 +0000 (17:42 +0000)]
common/spinlock: Improve the output from check_lock() if it trips

If check_lock() triggers, a crash will occur.  Instead of simply identifying
"the irq context was different", indicate the expected and current irq
context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/dombuilder: Remove clear_page() from xc_dom_boot.c
Andrew Cooper [Thu, 12 Oct 2017 19:19:06 +0000 (20:19 +0100)]
tools/dombuilder: Remove clear_page() from xc_dom_boot.c

pfn 0 is a legitimate (albeit unlikely) frame to use, so skipping it is wrong.
This behaviour appears to exists simply to cover the fact that zero is the
default value of an uninitialised field in dom.

ARM already clears the frames at the point that the pfns are allocated,
meaning that the added clear_page() is wasteful.  Alter x86 to match ARM and
clear the page when it is allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/dombuilder: Drop more PVH v1 leftovers
Andrew Cooper [Thu, 12 Oct 2017 19:19:05 +0000 (20:19 +0100)]
tools/dombuilder: Drop more PVH v1 leftovers

alloc_magic_pages() is renamed to alloc_magic_pages_pv() to mirror its
alloc_magic_pages_hvm() counterpart.  Delete a redundant comment, introduce
some newlines clarity, and remove a logically dead allocation of shared info.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agogdbsx: prefer privcmd character device
Doug Goldstein [Tue, 31 Oct 2017 15:20:11 +0000 (10:20 -0500)]
gdbsx: prefer privcmd character device

Prefer using the character device over the proc file if the character
device exists.

CC: Elena Ufimtseva <elena.ufimtseva@oracle.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/hotplug: create XEN_LOG_DIR at runtime
Andrii Anisov [Fri, 27 Oct 2017 16:52:37 +0000 (19:52 +0300)]
tools/hotplug: create XEN_LOG_DIR at runtime

/var/log could be a tmpfs mount point, so create xen subfolder at runtime.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoarm/xen: vpl011: Fix SBSA UART interrupt assertion
Bhupinder Thakur [Tue, 24 Oct 2017 17:09:22 +0000 (18:09 +0100)]
arm/xen: vpl011: Fix SBSA UART interrupt assertion

With the current SBSA UART emulation, streaming larger amounts of data
(as caused by "find /", for instance) can lead to character losses.
This is due to the OUT ring buffer getting full, because we change the
TX interrupt bit only when the FIFO is actually full, and not already
when it's half-way filled, as the Linux driver expects.
The SBSA spec does not explicitly state this, but we assume that an
SBSA compliant UART uses the PL011 default "interrupt FIFO level select
register" value of "1/2 way". The Linux driver certainly makes this
assumption, so it expect to be able to write a number of characters
after the TX interrupt line has been asserted.
On a similar issue we have the same wrong behaviour on the receive side.
However changing the RX interrupt to trigger on reaching half of the FIFO
level will lead to lag, because the guest would not be notified of incoming
characters until the FIFO is half way filled. This leads to inacceptible
lags when typing on a terminal.
Real hardware solves this issue by using the "receive timeout
interrupt" (RTI), which is triggered when character reception stops for
32 baud cycles. As we cannot and do not want to emulate any timing here,
we slightly abuse the timeout interrupt to notify the guest of new
characters: when a new character comes in, the RTI is asserted, when
the FIFO is cleared, the interrupt gets cleared.

So this patch changes the emulated interrupt trigger behaviour to come
as close to real hardware as possible: the RX and TX interrupt trigger
when the FIFO gets half full / half empty, and the RTI interrupt signals
new incoming characters.

[Andre: reword commit message, introduce receive timeout interrupt, add
        comments]

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoarm/xen: vpl011: Fix the slow early console SBSA UART output
Bhupinder Thakur [Tue, 24 Oct 2017 17:09:21 +0000 (18:09 +0100)]
arm/xen: vpl011: Fix the slow early console SBSA UART output

The early console output uses pl011_early_write() to write data. This
function waits for BUSY bit to get cleared before writing the next byte.

In the SBSA UART emulation logic, the BUSY bit was set as soon one
byte was written in the FIFO and it remained set until the FIFO was
emptied. This meant that the output was delayed as each character needed
the BUSY to get cleared.

Since the SBSA UART is getting emulated in Xen using ring buffers, it
ensures that once the data is enqueued in the FIFO, it will be received
by xenconsole so it is safe to set the BUSY bit only when FIFO becomes
full. This will ensure that pl011_early_write() is not delayed unduly
to write the data.

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agogcov: return ENOSYS for unimplemented gcov domctl
Roger Pau Monne [Thu, 26 Oct 2017 09:19:30 +0000 (10:19 +0100)]
gcov: return ENOSYS for unimplemented gcov domctl

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: Implement for libxenevtchn
Ross Lagerwall [Wed, 18 Oct 2017 13:42:33 +0000 (14:42 +0100)]
xentoolcore_restrict_all: Implement for libxenevtchn

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/libs/evtchn: Add support for restricting a handle
Ross Lagerwall [Wed, 18 Oct 2017 13:42:32 +0000 (14:42 +0100)]
tools/libs/evtchn: Add support for restricting a handle

Implement support for restricting evtchn handles to a particular domain
on Linux by calling the IOCTL_EVTCHN_RESTRICT_DOMID ioctl (support added
in Linux v4.8).

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agofuzz/x86_emulate: Fix afl-harness batch mode file pointer leak
George Dunlap [Fri, 13 Oct 2017 08:36:00 +0000 (09:36 +0100)]
fuzz/x86_emulate: Fix afl-harness batch mode file pointer leak

Changeset 2b1cde7783 introduced "batch mode" to afl-harness, which allowed
the handling of several inputs in sequence.

Unfortunately, it introduced a file pointer leak when the file was
larger than the maximum size.  Restructure the code to always close fp
if we opened it.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/mm: Make PV linear pagetables optional
George Dunlap [Fri, 27 Oct 2017 13:26:27 +0000 (14:26 +0100)]
x86/mm: Make PV linear pagetables optional

Allowing pagetables to point to other pagetables of the same level
(often called 'linear pagetables') has been included in Xen since its
inception; but recently it has been the source of a number of subtle
reference-counting bugs.

It is not used by Linux or MiniOS; but it is used by NetBSD and Novell
Netware.  There are significant numbers of people who are never going
to use the feature, along with significant numbers who need the
feature.

Add a Kconfig option for the feature (default to 'y').  Also add a
command-line option to control whether PV linear pagetables are
allowed (default to 'true').

NB that we leave linear_pt_count in the page struct.  It's in a union,
so its presence doesn't increase the size of the data struct.
Changing the layout of the other elements based on configuration
options is asking for trouble however; so we'll just leave it there
and ASSERT that it's zero.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/vpmu: Remove unnecessary call to do_interrupt()
Boris Ostrovsky [Tue, 24 Oct 2017 23:30:20 +0000 (19:30 -0400)]
x86/vpmu: Remove unnecessary call to do_interrupt()

This call was left during PVHv1 removal (commit 33e5c32559e1 ("x86:
remove PVHv1 code")):

-        if ( is_pvh_vcpu(sampling) &&
-             !(vpmu_mode & XENPMU_MODE_ALL) &&
+        if ( !(vpmu_mode & XENPMU_MODE_ALL) &&
              !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return;

As result of this extra call VPMU no longer works for PV guests on Intel
because we effectively lose value of MSR_CORE_PERF_GLOBAL_STATUS.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: fix asm() constraint for GS selector update
Jan Beulich [Thu, 26 Oct 2017 07:57:31 +0000 (01:57 -0600)]
x86: fix asm() constraint for GS selector update

Exception fixup code may alter the operand, which ought to be reflected
in the constraint.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: don't latch wrong (stale) GS base addresses
Jan Beulich [Thu, 26 Oct 2017 07:57:04 +0000 (01:57 -0600)]
x86: don't latch wrong (stale) GS base addresses

load_segments() writes selector registers before doing any of the base
address updates. Any of these selector loads can cause a page fault in
case it references the LDT, and the LDT page accessed was only recently
installed. Therefore the call tree map_ldt_shadow_page() ->
guest_get_eff_kern_l1e() -> toggle_guest_mode() would in such a case
wrongly latch the outgoing vCPU's GS.base into the incoming vCPU's
recorded state.

Split page table toggling from GS handling - neither
guest_get_eff_kern_l1e() nor guest_io_okay() need more than the page
tables being the kernel ones for the memory access they want to do.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agolibxc: remove stale error check for domain size in xc_sr_save_x86_hvm.c
Juergen Gross [Tue, 26 Sep 2017 12:02:56 +0000 (14:02 +0200)]
libxc: remove stale error check for domain size in xc_sr_save_x86_hvm.c

Long ago domains to be saved were limited to 1TB size due to the
migration stream v1 limitations which used a 32 bit value for the
PFN and the frame type (4 bits) leaving only 28 bits for the PFN.

Migration stream V2 uses a 64 bit value for this purpose, so there
is no need to refuse saving (or migrating) domains larger than 1 TB.

For 32 bit toolstacks there is still a size limit, as domains larger
than about 1TB will lead to an exhausted virtual address space of the
saving process. So keep the test for 32 bit, but don't base it on the
page type macros. As a migration could lead to the situation where a
32 bit toolstack would have to handle such a large domain (in case the
sending side is 64 bit) the same test should be added for restoring a
domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agogcov: fix typos in documentation
Roger Pau Monne [Thu, 26 Oct 2017 08:47:31 +0000 (09:47 +0100)]
gcov: fix typos in documentation

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoConfig.mk: update mini-os changeset
Wei Liu [Fri, 20 Oct 2017 11:10:02 +0000 (12:10 +0100)]
Config.mk: update mini-os changeset

The new changeset contains the new console.h fix in xen.git.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoxenalyze: fix compilation
Roger Pau Monne [Mon, 23 Oct 2017 16:28:32 +0000 (17:28 +0100)]
xenalyze: fix compilation

Recent changes in xenalyze introduced INT_MIN without also adding the
required header, fix this by adding the header.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: also show FS/GS base addresses when dumping registers
Jan Beulich [Tue, 24 Oct 2017 16:13:13 +0000 (18:13 +0200)]
x86: also show FS/GS base addresses when dumping registers

Their state may be important to figure the reason for a crash. To not
further grow duplicate code, break out a helper function.

I realize that (ab)using the control register array here may not be
considered the nicest solution, but it seems easier (and less overall
overhead) to do so compared to the alternative of introducing another
helper structure.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: fix GS-base-dirty determination
Jan Beulich [Tue, 24 Oct 2017 16:12:31 +0000 (18:12 +0200)]
x86: fix GS-base-dirty determination

load_segments() writes the two MSRs in their "canonical" positions
(GS_BASE for the user base, SHADOW_GS_BASE for the kernel one) and uses
SWAPGS to switch them around if the incoming vCPU is in kernel mode. In
order to not leave a stale kernel address in GS_BASE when the incoming
guest is in user mode, the check on the outgoing vCPU needs to be
dependent upon the mode it is currently in, rather than blindly looking
at the user base.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/vpt: guarantee the return value of pt_update_irq() set in vIRR or PIR
Chao Gao [Tue, 24 Oct 2017 14:02:40 +0000 (16:02 +0200)]
x86/vpt: guarantee the return value of pt_update_irq() set in vIRR or PIR

pt_update_irq() is expected to return the vector number of periodic
timer interrupt, which should be set in vIRR of vlapic or in PIR.
Otherwise it would trigger the assertion in vmx_intr_assist(), please
seeing https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg00915.html.

But it fails to achieve that in the following two case:
1. hvm_isa_irq_assert() may not set the corresponding bit in vIRR for
mask field of IOAPIC RTE is set. Please refer to the call tree
vmx_intr_assist() -> pt_update_irq() -> hvm_isa_irq_assert() ->
assert_irq() -> assert_gsi() -> vioapic_irq_positive_edge(). The patch
checks whether the vector is set or not in vIRR of vlapic or PIR before
returning.

2. someone changes the vector field of IOAPIC RTE between asserting
the irq and getting the vector of the irq, leading to setting the
old vector number but returning a different vector number. This patch
allows hvm_isa_irq_assert() to accept a callback which can get the
interrupt vector with irq_lock held. Thus, no one can change the vector
between the two operations.

BTW, the first argument of pi_test_and_set_pir() should be uint8_t
and I take this chance to fix it.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agognttab: fix pin count / page reference race
Jan Beulich [Tue, 24 Oct 2017 14:01:33 +0000 (16:01 +0200)]
gnttab: fix pin count / page reference race

Dropping page references before decrementing pin counts is a bad idea
if assumptions are being made that a non-zero pin count implies a valid
page. Fix the order of operations in gnttab_copy_release_buf(), but at
the same time also remove the assertion that was found to trigger:
map_grant_ref() also has the potential of causing a race here, and
changing the order of operations there would likely be quite a bit more
involved.

This is CVE-2017-15597 / XSA-236.

Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agolibxl: Replace open-coded __attribute__ with NN() macro
Ian Jackson [Fri, 20 Oct 2017 10:42:42 +0000 (11:42 +0100)]
libxl: Replace open-coded __attribute__ with NN() macro

Inspired by
  #define __nonnull(...) __attribute__((__nonnull__(__VA_ARGS__)))
which is used in the hypervisor.

These annotations may well become very common in libxl, so we choose a
short name.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agodocs: update coverage.markdown
Wei Liu [Fri, 20 Oct 2017 16:30:41 +0000 (17:30 +0100)]
docs: update coverage.markdown

The coverage support in hypervisor is redone. Update the document.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agolibxl: annotate s to be nonnull in libxl__enum_from_string
Wei Liu [Mon, 16 Oct 2017 14:04:10 +0000 (15:04 +0100)]
libxl: annotate s to be nonnull in libxl__enum_from_string

Hope this can placate coverity.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/Makefile: unset MAKELEVEL before building QEMU
Anthony PERARD [Thu, 19 Oct 2017 14:29:56 +0000 (15:29 +0100)]
tools/Makefile: unset MAKELEVEL before building QEMU

Since QEMU commits aef45d51d1204f3335fb99de6658e0c5612c2b67
"build: automatically handle GIT submodule checkout for dtc"
the QEMU makefiles rely on the variable MAKELEVEL to make a decision on
whether to update some git submodules or not. Since we call QEMU build
from within the Xen one, MAKELEVEL would already be greater than 0 and
the git submodules would not be updated and QEMU would fail to build.

Fix this by removing MAKELEVEL from the environment before trying to
build QEMU.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agogcov: support gcc 7.x
Jan Beulich [Fri, 20 Oct 2017 07:31:54 +0000 (09:31 +0200)]
gcov: support gcc 7.x

Taking Linux commit 0538421343 ("gcov: support GCC 7.1") as reference,
enable gcc 7 support requiring __gcov_exit() and having 9 counters.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>