]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agomemory: don't suppress P2M update in populate_physmap()
Jan Beulich [Tue, 20 Jun 2017 15:07:12 +0000 (17:07 +0200)]
memory: don't suppress P2M update in populate_physmap()

Commit d18627583d ("memory: don't hand MFN info to translated guests")
wrongly added a null-handle check there - just like stated in its
description for memory_exchange(), the array is also an input for
populate_physmap() (and hence can't reasonably be null). I have no idea
how I've managed to overlook this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
master commit: b964e3106d2cdaa11cc4524181ff14607d110ae4
master date: 2017-06-20 14:51:53 +0200

7 years agoxen/arm: vgic: Sanitize target mask used to send SGI
Julien Grall [Tue, 20 Jun 2017 13:48:42 +0000 (15:48 +0200)]
xen/arm: vgic: Sanitize target mask used to send SGI

The current function vgic_to_sgi does not sanitize the target mask and
may therefore get an invalid vCPU ID. This will result to an out of
bound access of d->vcpu[...] as there is no check whether the vCPU ID is
within the maximum supported by the guest.

This was introduced by commit ea37fd2111 "xen/arm: split vgic driver
into generic and vgic-v2 driver".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 6fb94196730f30929f1e617fd1d05daf55376664
master date: 2017-06-20 14:47:07 +0200

7 years agognttab: __gnttab_unmap_common_complete() is all-or-nothing
Jan Beulich [Tue, 20 Jun 2017 13:48:11 +0000 (15:48 +0200)]
gnttab: __gnttab_unmap_common_complete() is all-or-nothing

All failures have to be detected in __gnttab_unmap_common(), the
completion function must not skip part of its processing. In particular
the GNTMAP_device_map related putting of page references and adjustment
of pin count must not occur if __gnttab_unmap_common() signaled an
error. Furthermore the function must not make adjustments to global
state (here: clearing GNTTAB_device_map) before all possibly failing
operations have been performed.

There's one exception for IOMMU related failures: As IOMMU manipulation
occurs after GNTMAP_*_map have been cleared already, the related page
reference and pin count adjustments need to be done nevertheless. A
fundamental requirement for the correctness of this is that
iommu_{,un}map_page() crash any affected DomU in case of failure.

The version check appears to be pointless (or could perhaps be a
BUG_ON() or ASSERT()), but for the moment also move it.

This is part of XSA-224.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 11fc7ccb7217ccfb79edb727d1349618bcb0602d
master date: 2017-06-20 14:46:47 +0200

7 years agognttab: correct logic to get page references during map requests
George Dunlap [Tue, 20 Jun 2017 13:47:46 +0000 (15:47 +0200)]
gnttab: correct logic to get page references during map requests

The rules for reference counting are somewhat complicated:

* Each of GNTTAB_host_map and GNTTAB_device_map need their own
reference count

* If the mapping is writeable:
 - GNTTAB_host_map needs a type count under only some conditions
 - GNTTAB_device_map always needs a type count

If the mapping succeeds, we need to keep all of these; if the mapping
fails, we need to release whatever references we have acquired so far.

Additionally, the code that does a lot of this calculation "inherits"
a reference as part of the process of finding out who the owner is.

Finally, if the grant is mapped as writeable (without the
GNTMAP_readonly flag), but the hypervisor cannot grab a
PGT_writeable_page type, the entire operation should fail.

Unfortunately, the current code has several logic holes:

* If a grant is mapped only GNTTAB_device_map, and with a writeable
  mapping, but in conditions where a *host* type count is not
  necessary, the code will fail to grab the necessary type count.

* If a grant is mapped both GNTTAB_device_map and GNTTAB_host_map,
  with a writeable mapping, in conditions where the host type count is
  not necessary, *and* where the page cannot be changed to type
  PGT_writeable, the condition will not be detected.

In both cases, this means that on success, the type count will be
erroneously reduced when the grant is unmapped.  In the second case,
the type count will be erroneously reduced on the failure path as
well.  (In the first case the failure path logic has the same hole
as the reference grabbing logic.)

Additionally, the return value of get_page() is not checked; but this
may fail even if the first get_page() succeeded due to a reference
counting overflow.

First of all, simplify the restoration logic by explicitly counting
the reference and type references acquired.

Consider each mapping type separately, explicitly marking the
'incoming' reference as used so we know when we need to grab a second
one.

Finally, always check the return value of get_page[_type]() and go to
the failure path if appropriate.

This is part of XSA-224.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 75b384ece635adc55c2bafbdc2d8959c10542c31
master date: 2017-06-20 14:46:21 +0200

7 years agognttab: never create host mapping unless asked to
Jan Beulich [Tue, 20 Jun 2017 13:47:19 +0000 (15:47 +0200)]
gnttab: never create host mapping unless asked to

We shouldn't create a host mapping unless asked to even in the case of
mapping a granted MMIO page. In particular the mapping wouldn't be torn
down when processing the matching unmap request.

This is part of XSA-224.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 56f2ab5b970f1b18cf2019df4bf27db544cda6ea
master date: 2017-06-20 14:46:01 +0200

7 years agognttab: fix handling of dev_bus_addr during unmap
George Dunlap [Tue, 20 Jun 2017 13:46:50 +0000 (15:46 +0200)]
gnttab: fix handling of dev_bus_addr during unmap

If a grant has been mapped with the GNTTAB_device_map flag, calling
grant_unmap_ref() with dev_bus_addr set to zero should cause the
GNTTAB_device_map part of the mapping to be left alone.

Unfortunately, at the moment, op->dev_bus_addr is implicitly checked
before clearing the map and adjusting the pin count, but only the bits
above 12; and it is not checked at all before dropping page
references.  This means a guest can repeatedly make such a call to
cause the reference count to drop to zero, causing the page to be
freed and re-used, even though it's still mapped in its pagetables.

To fix this, always check op->dev_bus_addr explicitly for being
non-zero, as well as op->flag & GNTMAP_device_map, before doing
operations on the device_map.

While we're here, make the logic a bit cleaner:

* Always initialize op->frame to zero and set it from act->frame, to reduce the
chance of untrusted input being used

* Explicitly check the full dev_bus_addr against act->frame <<
  PAGE_SHIFT, rather than ignoring the lower 12 bits

This is part of XSA-224.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 8fdfcb2b6bcd074776560e76843815f124d587f1
master date: 2017-06-20 14:45:33 +0200

7 years agoarm: vgic: Don't update the LR when the IRQ is not enabled
Julien Grall [Tue, 20 Jun 2017 13:46:38 +0000 (15:46 +0200)]
arm: vgic: Don't update the LR when the IRQ is not enabled

gic_raise_inflight_irq will be called if the IRQ is already inflight
(i.e the IRQ is injected to the guest). If the IRQ is already already in
the LRs, then the associated LR will be updated.

To know if the interrupt is already in the LR, the function check if the
interrupt is queued. However, if the interrupt is not enabled then the
interrupt may not be queued nor in the LR. So gic_update_one_lr may be
called (if we inject on the current vCPU) and read the LR.

Because the interrupt is not in the LR, Xen will either read:
    * LR 0 if the interrupt was never injected before
    * LR 255 (GIC_INVALID_LR) if the interrupt was injected once. This
    is because gic_update_one_lr will reset p->lr.

Reading LR 0 will result to potentially update the wrong interrupt and
not keep the LRs in sync with Xen.

Reading LR 255 will result to:
    * Crash Xen on GICv3 as the LR index is bigger than supported (see
    gicv3_ich_read_lr).
    * Read/write always GICH_LR + 255 * 4 that is not part of the memory
    mapped.

The problem can be prevented by checking whether the interrupt is
enabled in gic_raise_inflight_irq before calling gic_update_one_lr.

A follow-up of this patch is expected to mitigate the issue in the
future.

This is XSA-223.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: c84e4b2dd4050ef3eecc13fcfa6842373ba4519c
master date: 2017-06-20 14:41:55 +0200

7 years agoguest_physmap_remove_page() needs its return value checked
Jan Beulich [Tue, 20 Jun 2017 13:45:55 +0000 (15:45 +0200)]
guest_physmap_remove_page() needs its return value checked

Callers, namely such subsequently freeing the page, must not blindly
assume success - the function may namely fail when needing to shatter a
super page, but there not being memory available for the then needed
intermediate page table.

As it happens, guest_remove_page() callers now also all check the
return value.

Furthermore a missed put_gfn() on an error path in gnttab_transfer() is
also being taken care of.

This is part of XSA-222.

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a0cce6048d010a30ac82f8db7787bbf9aada64f4
master date: 2017-06-20 14:41:16 +0200

7 years agomemory: fix return value handing of guest_remove_page()
Andrew Cooper [Tue, 20 Jun 2017 13:44:53 +0000 (15:44 +0200)]
memory: fix return value handing of guest_remove_page()

Despite the description in mm.h, guest_remove_page() previously returned 0 for
paging errors.

Switch guest_remove_page() to having regular 0/-error semantics, and propagate
the return values from clear_mmio_p2m_entry() and mem_sharing_unshare_page()
to the callers (although decrease_reservation() is the only caller which
currently cares).

This is part of XSA-222.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b614f642c35da5184416787352f51a6379a92628
master date: 2017-06-20 14:39:56 +0200

7 years agoevtchn: avoid NULL derefs
Jan Beulich [Tue, 20 Jun 2017 13:44:11 +0000 (15:44 +0200)]
evtchn: avoid NULL derefs

Commit fbbd5009e6 ("evtchn: refactor low-level event channel port ops")
added a de-reference of the struct evtchn pointer for a port without
first making sure the bucket pointer is non-NULL. This de-reference is
actually entirely unnecessary, as all relevant callers (beyond the
problematic do_poll()) already hold the port number in their hands, and
the actual leaf functions need nothing else.

For FIFO event channels there's a second problem in that the ordering
of reads and updates to ->num_evtchns and ->event_array[] was so far
undefined (the read side isn't always holding the domain's event lock).
Add respective barriers.

This is XSA-221.

Reported-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: e7719a0dfac7a20cb7da5529e09773d8271bb78b
master date: 2017-06-20 14:37:47 +0200

7 years agox86: avoid leaking PKRU and BND* between vCPU-s
Jan Beulich [Tue, 20 Jun 2017 13:43:13 +0000 (15:43 +0200)]
x86: avoid leaking PKRU and BND* between vCPU-s

PKRU is explicitly "XSAVE-managed but not XSAVE-enabled", so guests
might access the register (via {RD,WR}PKRU) without setting XCR0.PKRU.
Force context switching as well as migrating the register as soon as
CR4.PKE is being set the first time.

For MPX (BND<n>, BNDCFGU, and BNDSTATUS) the situation is less clear,
and the SDM has not entirely consistent information for that case.
While experimentally the instructions don't change register state as
long as the two XCR0 bits aren't both 1, be on the safe side and enable
both if BNDCFGS.EN is being set the first time.

This is XSA-220.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: de20bb6c4f65c4161e0931402613f9ffac86302d
master date: 2017-06-20 14:36:51 +0200

7 years agox86/shadow: hold references for the duration of emulated writes
Andrew Cooper [Tue, 20 Jun 2017 13:42:01 +0000 (15:42 +0200)]
x86/shadow: hold references for the duration of emulated writes

The (misnamed) emulate_gva_to_mfn() function translates a linear address to an
mfn, but releases its page reference before returning the mfn to its caller.

sh_emulate_map_dest() uses the results of one or two translations to construct
a virtual mapping to the underlying frames, completes an emulated
write/cmpxchg, then unmaps the virtual mappings.

The page references need holding until the mappings are unmapped, or the
frames can change ownership before the writes occurs.

This is XSA-219.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 26217aff67ae1538d4e1b2226afab6993cdbe772
master date: 2017-06-20 14:36:11 +0200

7 years agognttab: correct maptrack table accesses
Jan Beulich [Tue, 20 Jun 2017 13:41:36 +0000 (15:41 +0200)]
gnttab: correct maptrack table accesses

In order to observe a consistent (limit,pointer-table) pair, the reader
needs to either hold the maptrack lock (in line with documentation) or
both sides need to order their accesses suitably (the writer side
barrier was removed by commit dff515dfea ["gnttab: use per-VCPU
maptrack free lists"], and a read side barrier has never been there).

Make the writer publish a new table page before limit (for bounds
checks to work), and new list head last (for racing maptrack_entry()
invocations to work). At the same time add read barriers to lockless
readers.

Additionally get_maptrack_handle() must not assume ->maptrack_head to
not change behind its back: Another handle may be put (updating only
->maptrack_tail) and then got or stolen (updating ->maptrack_head).

This is part of XSA-218.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 4b78efa91c8ae3c42e14b8eaeaad773c5eb3b71a
master date: 2017-06-20 14:34:34 +0200

7 years agognttab: Avoid potential double-put of maptrack entry
George Dunlap [Tue, 20 Jun 2017 13:41:07 +0000 (15:41 +0200)]
gnttab: Avoid potential double-put of maptrack entry

Each grant mapping for a particular domain is tracked by an in-Xen
"maptrack" entry.  This entry is is referenced by a "handle", which is
given to the guest when it calls gnttab_map_grant_ref().

There are two types of mapping a particular handle can refer to:
GNTMAP_host_map and GNTMAP_device_map.  A given
gnttab_unmap_grant_ref() call can remove either only one or both of
these entries.  When a particular handle has no entries left, it must
be freed.

gnttab_unmap_grant_ref() loops through its grant unmap request list
twice.  It first removes entries from any host pagetables and (if
appropraite) iommus; then it does a single domain TLB flush; then it
does the clean-up, including telling the granter that entries are no
longer being used (if appropriate).

At the moment, it's during the first pass that the maptrack flags are
cleared, but the second pass that the maptrack entry is freed.

Unfortunately this allows the following race, which results in a
double-free:

 A: (pass 1) clear host_map
 B: (pass 1) clear device_map
 A: (pass 2) See that maptrack entry has no mappings, free it
 B: (pass 2) See that maptrack entry has no mappings, free it #

Unfortunately, unlike the active entry pinning update, we can't simply
move the maptrack flag changes to the second half, because the
maptrack flags are used to determine if iommu entries need to be
added: a domain's iommu must never have fewer permissions than the
maptrack flags indicate, or a subsequent map_grant_ref() might fail to
add the necessary iommu entries.

Instead, free the maptrack entry in the first pass if there are no
further mappings.

This is part of XSA-218.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: b7f6cbb9d43f7384e1f38f8764b9a48216c8a525
master date: 2017-06-20 14:33:13 +0200

7 years agognttab: fix unmap pin accounting race
Jan Beulich [Tue, 20 Jun 2017 13:40:19 +0000 (15:40 +0200)]
gnttab: fix unmap pin accounting race

Once all {writable} mappings of a grant entry have been unmapped, the
hypervisor informs the guest that the grant entry has been released by
clearing the _GTF_{reading,writing} usage flags in the guest's grant
table as appropriate.

Unfortunately, at the moment, the code that updates the accounting
happens in a different critical section than the one which updates the
usage flags; this means that under the right circumstances, there may be
a window in time after the hypervisor reported the grant as being free
during which the grant referee still had access to the page.

Move the grant accounting code into the same critical section as the
reporting code to make sure this kind of race can't happen.

This is part of XSA-218.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 9a0bd460cfc28564d39fa23541bb872b13e7f7ea
master date: 2017-06-20 14:32:03 +0200

7 years agox86/mm: disallow page stealing from HVM domains
Jan Beulich [Tue, 20 Jun 2017 13:39:16 +0000 (15:39 +0200)]
x86/mm: disallow page stealing from HVM domains

The operation's success can't be controlled by the guest, as the device
model may have an active mapping of the page. If we nevertheless
permitted this operation, we'd have to add further TLB flushing to
prevent scenarios like

"Domains A (HVM), B (PV), C (PV); B->target==A
 Steps:
 1. B maps page X from A as writable
 2. B unmaps page X without a TLB flush
 3. A sends page X to C via GNTTABOP_transfer
 4. C maps page X as pagetable (potentially causing a TLB flush in C,
 but not in B)

 At this point, X would be mapped as a pagetable in C while being
 writable through a stale TLB entry in B."

A similar scenario could be constructed for A using XENMEM_exchange and
some arbitrary PV domain C then having this page allocated.

This is XSA-217.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
master commit: fae7d5be8bb8b7a5b7005c4f3b812a47661a721e
master date: 2017-06-20 14:29:51 +0200

7 years agotools: fix several "format-truncation" warnings with GCC 7
Zhongze Liu [Wed, 14 Jun 2017 01:11:48 +0000 (09:11 +0800)]
tools: fix several "format-truncation" warnings with GCC 7

GCC 7.1.1 complains that several buffers passed to snprintf() in xenpmd
and tools/ocmal/xc are too small to hold the largest possible resulting string,
which is calculated by adding up the maximum length of all the substrings.

The warnings are treated as errors by -Werror, and goes like this (abbreviated):

xenpmd.c:94:36: error: ‘%s’ directive output may be truncated writing up to
255 bytes into a region of size 13 [-Werror=format-truncation=]
     #define BATTERY_INFO_FILE_PATH "/proc/acpi/battery/%s/info"
                                    ^
xenpmd.c:113:13: note: ‘snprintf’ output between 25 and 280 bytes into a
destination of size 32

xenpmd.c:95:37: error: ‘%s’ directive output may be truncated writing up to
255 bytes into a region of size 13 [-Werror=format-truncation=]
     #define BATTERY_STATE_FILE_PATH "/proc/acpi/battery/%s/state"
                                     ^
xenpmd.c:116:13: note: ‘snprintf’ output between 26 and 281 bytes into a
destination of size 32

xenctrl_stubs.c:65:15: error: ‘%s’ directive output may be truncated writing
up to 1023 bytes into a region of size 252 [-Werror=format-truncation=]
      "%d: %s: %s", error->code,
               ^~
xenctrl_stubs.c:64:4: note: ‘snprintf’ output 5 or more bytes (assuming 1028)
into a destination of size 256

Enlarge the size of these buffers as suggested by the complier
(and slightly rounded) to fix the warnings.

No functional changes.

Signed-off-by: Zhongze Liu <blackskygg@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 2d78f78a14528752266982473c07118f1bc336e3)

7 years agox86/boot: Fix the boot time relocation calculations
Andrew Cooper [Fri, 2 Jun 2017 10:22:17 +0000 (11:22 +0100)]
x86/boot: Fix the boot time relocation calculations

c/s b28044226e1 "x86: make Xen early boot code relocatable" introduces

    mov $sym_offs(__image_base__),%esi

to the legacy boot path.  However, this is by definition 0, which means the
boot code only functions correctly when Xen is loaded at its preferred
physical address (2M at the time of writing).

Xen does cope if loaded at an alternative physical address, if the
MULTIBOOT2_TAG_TYPE_LOAD_BASE_ADDR tag is filled in properly.  While recent
versions of Grub do fill this in appropriately, tboot does not.  (In fact,
tboot loads Xen at the preferred address, but claims a load address of 8M.)

Both Multiboot 1 and 2 specify the execution environment as being flat.  As a
result, Xen needs no help calculating the proper load address.

However, Multiboot specifies %esp as undefined.  Experimentally, using the
entry %esp is fine, but this is certainly no guarantee.  Use a temporary stack
in the first page of RAM, which is one of the safest areas to clobber.

Calculate the load address from %eip alone, and ignore
MULTIBOOT2_TAG_TYPE_LOAD_BASE_ADDR entirely.  This fixes legacy boot under
various versions of tboot.

Finally, set up the stack as soon as possible, which means the BIOS path has a
usable stack for the entirety of its duration.  Use the full available stack
size, rather than limiting to an arbitrary 1k.  One side effect is that the
MB2/EFI path continues to use the EFI stack until the trampoline is entered.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 1695e53851e523b62dbfa1990556ef68393199a8)

7 years agoMakefile: Provide way to ship livepatch test files
Ian Jackson [Wed, 7 Jun 2017 14:05:44 +0000 (15:05 +0100)]
Makefile: Provide way to ship livepatch test files

In the toplevel Makefile, provide build-tests and install-tests
targets which descend into xen/test.  (dist-tests is provided
automatically by the pattern rule, as is the convention here.)

We have to set BASEDIR ourselves, and use these curious runes, because
the convention in Makefiles under xen/ is to "make -f Rules.mk" with
BASEDIR set and to expect Rules.mk to reinvoke the per-directory
Makefile.  (This is really very strange.)  Normally this invocation
pattern is organised by the machinery in xen/Makefile (which sets
BASEDIR) and Rules.mk, but we need to invoke it from outside that
context.

In theory it would be nice to have a pattern rule %-tests.  But this
is not the style in the rest of the toplevel Makefile; and doing that
might interfere with the dist-% pattern rule.

None of this is invoked by default.  If install-tests or dist-tests is
requested, the livepatches (the only current output from xen/tests)
are shipped in DESTDIR/usr/lib/debug/xen-livepatch/.

This allows CI systems such as osstest which are trying to consume
this to arrange for the files to be built, and output, without them
having to have special knowledge of the details of Xen's build system.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit c55667bd0ad8f04688abfd5c6317709dc00f88ab)

7 years agoxen/test/livepatch: Add xen_nop.livepatch to .gitignore
Ian Jackson [Wed, 7 Jun 2017 14:09:57 +0000 (15:09 +0100)]
xen/test/livepatch: Add xen_nop.livepatch to .gitignore

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 5225e70ccbb02f786647eddfeb65d6db1e230782)

7 years agoxen/test/livepatch: Regularise Makefiles
Ian Jackson [Wed, 7 Jun 2017 13:44:51 +0000 (14:44 +0100)]
xen/test/livepatch: Regularise Makefiles

In xen/test/livepatch/Makefile:

  Provide a `build' target, as most of the
  subdir-invoking Makefiles elsewhere expect.

In xen/test/Makefile:

  Replace the two open-coded targets with a generalised pattern rule
  which descends into each of SUBDIRS.  This allows `install' to work
  too (it is already supported by xen/test/livepatch/Makefile).

  Provide an explicit default target of `tests', and an `all' target
  (which is conventional).

  Suppress entry into the xen/test/livepatch subdir when we are
  building for i386, since the 32-bit hypervisor is not supported any
  more and we can't build livepatches for it either.

After this, the xen/test subdirectory is somewhere were make can be
invoked in the way which is conventional for xen.git/xen/ subdirs.

None of this is yet invoked from the top-level Makefile.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit e541982dc21dcc5be61279d22d477ed5c0bc41c5)

7 years agoxen/test/livepatch/Makefile: Install in DESTDIR/usr/lib/debug/xen-livepatch
Ian Jackson [Wed, 7 Jun 2017 14:00:17 +0000 (15:00 +0100)]
xen/test/livepatch/Makefile: Install in DESTDIR/usr/lib/debug/xen-livepatch

Dumping these patch files in /usr/lib/debug/xen-*.livepatch is a bit
ugly.

Also, refactor the Makefile to have a LIVEPATCHES variable, to reduce
repetition.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit a38d1af5fb02bee68c9a30e38b97c6129815f943)

7 years agotools/xenstat: fix missing linkage of libxenstat against libyajl
Peter Große [Mon, 12 Jun 2017 23:05:21 +0000 (01:05 +0200)]
tools/xenstat: fix missing linkage of libxenstat against libyajl

This fixes the python bindings, since symbols were missing in libxenstat.
xentop doesn't use any yajl functions, so drop linking libyajl.

Signed-off-by: Peter Große <pegro@friiks.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit a7307762f90d337585d17d45551a226028b89836)

7 years agolibxenstat: use python detected by configure for python bindings
Peter Große [Mon, 12 Jun 2017 23:05:20 +0000 (01:05 +0200)]
libxenstat: use python detected by configure for python bindings

Signed-off-by: Peter Große <pegro@friiks.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit b2107b79b006ded5cf2ef41ac65399c3e629f693)

7 years agopublic: there's no MMUEXT_SET_FOREIGNDOM
Jan Beulich [Wed, 14 Jun 2017 09:44:44 +0000 (11:44 +0200)]
public: there's no MMUEXT_SET_FOREIGNDOM

Correct respective comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3db971fa33fa2ee3989859b455213bb33bac7e05
master date: 2017-06-14 11:40:02 +0200

7 years agoRevert "x86/mm: add temporary debugging code to get_page_from_gfn_p2m()"
Jan Beulich [Wed, 14 Jun 2017 09:43:12 +0000 (11:43 +0200)]
Revert "x86/mm: add temporary debugging code to get_page_from_gfn_p2m()"

This reverts commit 933f966bcdf4f4255b432071fc12c9ee2efb05ef.

Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxl.cfg man page cleanup and fixes
Armando Vega [Thu, 8 Jun 2017 18:39:14 +0000 (20:39 +0200)]
xl.cfg man page cleanup and fixes

- fixed some minor numbering and syntax issues in the CPU allocation
  examples for the 'cpus' option
- semantic fixes to make explanations more clear throughout
- fixed all the typo's I could see
- general styling and makeup fixes to make everything look more consistent

Signed-off-by: Armando Vega <armando@greenhost.nl>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit af178156620bbfa0d7f3c95287cdd7e6f14807db)

7 years agox86/HAP: avoid using bogus/misleading locking
Jan Beulich [Tue, 13 Jun 2017 08:46:13 +0000 (10:46 +0200)]
x86/HAP: avoid using bogus/misleading locking

hap_teardown() unconditionally releases the paging lock and is always
being called without the lock held: Lock acquire should then be
unconditional too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
master commit: c9ec0d34e462151d39e0e901b50501db4f6ae78d
master date: 2017-06-13 10:38:02 +0200

7 years agolivepatch: Wrong usage of spinlock on debug console.
Konrad Rzeszutek Wilk [Fri, 9 Jun 2017 13:31:28 +0000 (09:31 -0400)]
livepatch: Wrong usage of spinlock on debug console.

If we have a large amount of livepatches and want to print them
on the console using 'xl debug-keys x' we eventually hit
the preemption check:

  if ( i && !(i % 64) )
  {
spin_unlock(&payload_lock);
process_pending_softirqs();
if ( spin_trylock(&payload_lock) )
return

<facepalm> The effect is that we have just effectively
taken the lock and returned without unlocking!

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 75dfe7c566c36e0af4714557a666827f49b69191)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agox86/HVM: correct notion of new CPL in task switch emulation
Jan Beulich [Wed, 7 Jun 2017 09:37:42 +0000 (11:37 +0200)]
x86/HVM: correct notion of new CPL in task switch emulation

Commit aac1df3d03 ("x86/HVM: introduce hvm_get_cpl() and respective
hook") went too far in one aspect: When emulating a task switch we
really shouldn't be looking at what hvm_get_cpl() returns, as we're
switching all segment registers.

The issue manifests as a vmentry failure for 32bit VMs which use task
gates to service interrupts/exceptions, in situations where delivering
the event interrupts user code, and a privilege increase is required.

However, instead of reverting the relevant parts of that commit, have
the caller tell the segment loading function what the new CPL is. This
at once fixes ES being loaded before CS so far having had its checks
done against the old CPL.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
master commit: 9c4f1b72571b215e80abf0490073438831dc785b
master date: 2017-06-06 14:36:41 +0200

7 years agox86/NPT: deal with fallout from 2Mb/1Gb unmapping change
Jan Beulich [Wed, 7 Jun 2017 09:36:30 +0000 (11:36 +0200)]
x86/NPT: deal with fallout from 2Mb/1Gb unmapping change

Commit efa9596e9d ("x86/mm: fix incorrect unmapping of 2MB and 1GB
pages") left the NPT code untouched, as there is no explicit alignment
check matching the one in EPT code. However, the now more widespread
storing of INVALID_MFN into PTEs requires adjustments:
- calculations when shattering large pages may spill into the p2m type
  field (converting p2m_populate_on_demand to p2m_grant_map_rw) - use
  OR instead of PLUS,
- the use of plain l{2,3}e_from_pfn() in p2m_pt_set_entry() results in
  all upper (flag) bits being clobbered - introduce and use
  p2m_l{2,3}e_from_pfn(), paralleling the existing L1 variant.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
master commit: 83520cb4aa39ebeb4eb1a7cac2e85b413e75a336
master date: 2017-06-06 14:32:54 +0200

7 years agovif-common.sh: Have iptables wait for the xtables lock
George Dunlap [Mon, 5 Jun 2017 10:02:30 +0000 (11:02 +0100)]
vif-common.sh: Have iptables wait for the xtables lock

iptables has a system-wide lock on the xtables.  Strangely though, in
the case of two concurrent invocations, the default is for the
instance not grabbing the lock to exit out rather than waiting for it.
This means that when starting a large number of guests in parallel,
many will fail out with messages like this:

  2017-05-10 11:45:40 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4
  2017-05-10 11:50:52 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4

In order to instruct iptables to wait for the lock, you have to
specify '-w'.  Unfortunately, not all versions of iptables have the
'-w' option, so on first invocation check to see if it accepts the -w
command.

Reported-by: Antony Saba <awsaba@gmail.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
master commit: 3d2010f9ffeacc8836811420460e15f2c1233695

7 years agoxen/public: Correct the HYPERVISOR_dm_op() documentation to match reality 4.9.0-rc8
Andrew Cooper [Thu, 1 Jun 2017 13:09:30 +0000 (14:09 +0100)]
xen/public: Correct the HYPERVISOR_dm_op() documentation to match reality

The number of buffers is ahead of the buffer list in the argument list.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit d8eed4021d50eb48ca75c8559aed95a2ad74afaa)

7 years agox86/pagewalk: Fix pagewalk's handling of instruction fetches
Andrew Cooper [Tue, 23 May 2017 16:32:30 +0000 (16:32 +0000)]
x86/pagewalk: Fix pagewalk's handling of instruction fetches

Despite the claim in the comment (which was based partly on the code already
being like that, and mistaken reasoning because of Xen leaking NX into guest
context), reality differs.

Use of the SMAP feature without NX, or in a 2-level guest, demonstrate an
observable difference between reads and instruction fetches, despite
PFEC_insn_fetch not being reported in the #PF error code.  This demonstrates
that instruction fetches are distinguished from data reads even without
PFEC_insn_fetch being reported.

Alter the pagewalk logic to keep the pagewalk insn_fetch input intact, but
only conditionally report insn_fetch in the error code.  This logic is more
in line with the Intel SDM text:

 * I/D flag (bit 4).
   This flag is 1 if (1) the access causing the page-fault exception was an
   instruction fetch; and (2) either (a) CR4.SMEP = 1; or (b) both (i) CR4.PAE
   = 1 (either PAE paging or 4-level paging is in use); and (ii) IA32_EFER.NXE
   = 1. Otherwise, the flag is 0. This flag describes the access causing the
   page-fault exception, not the access rights specified by paging.

and the AMD SDM text:

 * I/D - Bit 4. If this bit is set to 1, it indicates that the access that
   caused the page fault was an instruction fetch. Otherwise, this bit is
   cleared to 0. This bit is only defined if no-execute feature is enabled
   (EFER.NXE=1 && CR4.PAE=1).

Curiously, the AMD manual doesn't mention SMEP despite some Fam16h processors
and all Fam17h processors supporting it.  Experimentally, it behaves as
described by Intel.

In addition, add some extra clarification and sanity checking around the use
of NX for the access checks, where it might be reserved.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a0b40c3e08bb81192063f97089cb8c3849b8cfa0)

7 years agoRevert "x86/hvm: disable pkeys for guests in non-paging mode"
Andrew Cooper [Thu, 25 May 2017 17:17:01 +0000 (18:17 +0100)]
Revert "x86/hvm: disable pkeys for guests in non-paging mode"

This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.

When determining Access Rights, Protection Keys only take effect when CR4.PKE
it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
PAE paging) skip the Protection Key control mechanism.

Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
not using paging, as such a guest is necesserily running with EFER.LMA
disabled.

The {RD,WR}PKRU instructions are specified as being legal for use in any
operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
back of an unpaged guest, these instructions yield #UD despite the guest
correctly seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Huaitong Han <huaitong.han@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
(cherry picked from commit 224acdd04a9f6ffe44d2f716287cac74787899ec)

7 years agoxl man page cleanup and fixes
Armando Vega [Wed, 31 May 2017 06:30:09 +0000 (08:30 +0200)]
xl man page cleanup and fixes

Signed-off-by: Armando Vega <armando@greenhost.nl>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ wei: remove trailing spaces ]
Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: 91708086566ccdf287fe3b7a660a940a5e3582a1

7 years agox86/vpmu: add cpu hot unplug notifier for vpmu
Luwei Kang [Wed, 31 May 2017 10:39:07 +0000 (12:39 +0200)]
x86/vpmu: add cpu hot unplug notifier for vpmu

Currently, Hot unplug a physical CPU with vpmu enabled may cause
system hang due to send a remote call to an offlined pCPU. This
patch add a cpu hot unplug notifer to save vpmu context before
cpu offline.

Consider one scenario, hot unplug pCPU N with vpmu enabled.
The vcpu which running on this pCPU will be switch to other
online cpu. A remote call will be send to pCPU N to save the
vpmu context before loading the vpmu context on this pCPU.
System will hang in function on_select_cpus() because of that
pCPU is offlined and can not do any respond.

The purpose of add a VPMU_CONTEXT_LOADED check in vpmu_arch_destroy()
before send a remote call to save vpmu contex is:
a. when a vpmu context has been loaded in a remote pCPU, make a
   remote call to save the vpmu contex and stop counters is necessary.
b. VPMU_CONTEXT_LOADED flag will be reset if a pCPU is offlined.
   this check will prevent send a remote call to an offlined pCPU.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
master commit: 2d08fb32bc3d252046748e908bafc1bf6376313e
master date: 2017-05-31 08:41:43 +0200

7 years agodocs: remove PVHv1 document
Roger Pau Monné [Wed, 31 May 2017 06:53:18 +0000 (08:53 +0200)]
docs: remove PVHv1 document

The current misc/pvh.markdown document refers to PVHv1, remove it to
avoid confusion with PVHv2 since the PVHv1 code has already been
removed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: 8ac9a25b26841b539fd7f345fc87a4142a86adb3
master date: 2017-05-31 08:47:57 +0200

7 years agoMakefile: Mention usual targets of subdir Makefiles
Ian Jackson [Thu, 25 May 2017 15:42:12 +0000 (16:42 +0100)]
Makefile: Mention usual targets of subdir Makefiles

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: M A Young <m.a.young@durham.ac.uk>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 624dc330171a27f21b9664da7fc8b06bcc29be48)

7 years agoMakefile: Regularise subdir targets and their dependencies 4.9.0-rc7
Ian Jackson [Wed, 24 May 2017 15:54:11 +0000 (16:54 +0100)]
Makefile: Regularise subdir targets and their dependencies

Recent changes to this Makefile have broken some build targets, and
some parallel builds.

Looking at it, I think I have identified the undocumented design
intent in the top-level Makefile.  So in this patch I document it, and
also make it true.

In detail:

 * Add a comment with the new design intent
 * Get rid of the ad-hoc rules for recursing into tools/include,
   and replace them with a pattern rule
 * Add an appropriate dependency on TARGET-tools-public-headers from
   TARGET-tools and TARGET-stubdom (but not dist-*).
 * Get rid of all the separate invocations of $(MAKE) -C tools/include
   which are now obsolete
 * Un-deprecate the simple `tools' etc. targets (aliases for `dist-tools')
   which we seem not to be making any effort to get rid of

I have verified with the following shell script that after my change,
the tree producese the same results for various build targets as
3fafdc28eb98 (before the Makefile-hacking started).

My tests failed as expected for make -C tools, both before and after.

Separately, there is a bug in the Makefiles that `make distclean-tools'
fails.  I have not investigated that bug in detail.

    #!/bin/bash

    set -e
    set -o pipefail

    listings=../listings

    rm -rf $listings
    mkdir $listings

    chks () {
         reskey="C$subdir $*"
         reskey="${reskey// /_}"
         reskey="${reskey//\//:}"
         lk=$listings/$reskey
         for suffix in '' -xen -tools -stubdom -docs; do
             case "$subdir:$suffix" in
             .:*) ;;
             *:) ;;
             *) continue;;
             esac
             git clean -qxdff
             rm -rf $output
             printf '%s' "running -C$subdir suffix=$suffix "
             case "$subdir $suffix" in
             *xen*) ;;
             *) printf 'configure '; ./configure >$lk.cfg 2>&1 ;;
             esac
             fail=''
             for targ in $*; do
                 realtarg=$targ$suffix
                 printf '%s ' "$realtarg"
                 if ! make -C $subdir -j10 $realtarg >${lk}_${realtarg}.log 2>&1
                 then
                    fail=$realtarg
                    break
                 fi
             done
             if [ "$fail" ]; then
               echo fail!
               echo "$fail failed" >$lk.list
             else
               echo ok.
               (test ! -e "$output" || find $output) |sort >$lk.list
             fi
        done
    }

    subdirs='. xen docs tools'

    output=$PWD/dist
    for subdir in $subdirs; do
        chks build clean distclean
    done

    output=$PWD/dist
    subdir=.
    chks dist

    export DESTDIR=$PWD/destdir
    output=$PWD/destdir
    for subdir in $subdirs; do
        chks install
    done

And the output:

    (64)iwj@mariner:~/work/xen.git$ ~/junk/chks
    running -C. suffix= configure build clean distclean ok.
    running -C. suffix=-xen build-xen clean-xen distclean-xen ok.
    running -C. suffix=-tools configure build-tools clean-tools distclean-tools fail!
    running -C. suffix=-stubdom configure build-stubdom clean-stubdom distclean-stubdom ok.
    running -C. suffix=-docs configure build-docs clean-docs distclean-docs ok.
    running -Cxen suffix= build clean distclean ok.
    running -Cdocs suffix= configure build clean distclean ok.
    running -Ctools suffix= configure build fail!
    running -C. suffix= configure dist ok.
    running -C. suffix=-xen dist-xen ok.
    running -C. suffix=-tools configure dist-tools ok.
    running -C. suffix=-stubdom configure dist-stubdom ok.
    running -C. suffix=-docs configure dist-docs ok.
    running -C. suffix= configure install ok.
    running -C. suffix=-xen install-xen ok.
    running -C. suffix=-tools configure install-tools ok.
    running -C. suffix=-stubdom configure install-stubdom ok.
    running -C. suffix=-docs configure install-docs ok.
    running -Cxen suffix= install ok.
    running -Cdocs suffix= configure install ok.
    running -Ctools suffix= configure install fail!
    (64)iwj@mariner:~/work/xen.git$

CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: M A Young <m.a.young@durham.ac.uk>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agotools/include/Makefile: Support `build' target
Ian Jackson [Wed, 24 May 2017 15:53:28 +0000 (16:53 +0100)]
tools/include/Makefile: Support `build' target

This is the only one of the Makefiles invoked with -C from the
toplevel which lacks this target.

CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: M A Young <m.a.young@durham.ac.uk>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/hvmloader: Don't wait for the producer to fill the ring if
Anshul Makkar [Tue, 23 May 2017 14:12:58 +0000 (15:12 +0100)]
x86/hvmloader: Don't wait for the producer to fill the ring if

The condition: if there is a space in the ring then wait for the producer
to fill the ring also evaluates to true even if the ring if full. It
leads to a deadlock where producer is waiting for consumer
to consume the items and consumer is waiting for producer to fill the ring.

Fix for the issue: check if the ring is full and then break from
the loop to consume the items from the ring.
eg. case: prod = 1272, cons = 248.

Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoRestore HVM_OP hypercall continuation (partial revert of ae20ccf)
George Dunlap [Mon, 22 May 2017 10:38:31 +0000 (11:38 +0100)]
Restore HVM_OP hypercall continuation (partial revert of ae20ccf)

Commit ae20ccf removed the hypercall continuation logic from the end
of do_hvm_op(), claiming:

"This patch removes the need for handling HVMOP restarts, so that
infrastructure is removed."

That turns out to be false.  The removal of HVMOP_set_mem_type removed
the need to store a start iteration value in the hypercall
continuation, but a grep through hvm.c for ERESTART turns up at least
two places where do_hvm_op() may still need a hypercall continuation:

 * HVMOP_set_hvm_param can return -ERESTART when setting
HVM_PARAM_IDENT_PT in the event that it fails to acquire the domctl
lock

 * HVMOP_flush_tlbs can return -ERESTART if several vcpus call it at
   the same time

In both cases, a simple restart (with no stored iteration information)
is necessary.

Add a check for -ERESTART again, along with a comment at the top of
the function regarding the lack of decoding any information from the
op value.

Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
7 years agoxen/arm: p2m: Fix incorrect mapping of superpages 4.9.0-rc6
Julien Grall [Fri, 19 May 2017 16:08:39 +0000 (17:08 +0100)]
xen/arm: p2m: Fix incorrect mapping of superpages

The same set of functions is used to set as well as to clean P2M
entries, except for clean operations (INVALID_MFN ~0UL) is passed as a
parameter. Unfortunately, when calculating an appropriate target order
for a particular mapping INVALID_MFN is taken into account which leads
to 4K page target order being set each time even for 2MB and 1GB
mappings.

This will result to break down the superpage into 4K mappings and leave
empty tables allocated.

This was introduced by commit 2ef3e36ec7 "xen/arm: p2m: Introduce
p2m_set_entry and __p2m_set_entry".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/pagewalk: Fix determination of Protection Key access rights
Andrew Cooper [Tue, 16 May 2017 14:47:33 +0000 (15:47 +0100)]
x86/pagewalk: Fix determination of Protection Key access rights

 * When fabricating gl1e's from superpages, propagate the protection key as
   well, so the protection key logic sees the real key as opposed to 0.

 * Experimentally, the protection key checks are performed ahead of the other
   access rights.  In particular, accesses which fail both protection key and
   regular permission checks yield PFEC_prot_key in the resulting pagefault.

 * Protection keys apply to all data accesses to user-mode addresses,
   including accesses from supervisor code.  PKRU WD applies to any data
   write, not just to mapping which are writable.  However, a supervisor
   access without CR0.WP bypasses any protection from protection keys.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agohvmloader: avoid tests when they would clobber used memory
Jan Beulich [Fri, 19 May 2017 14:04:38 +0000 (16:04 +0200)]
hvmloader: avoid tests when they would clobber used memory

First of all limit the memory range used for testing to 4Mb: There's no
point placing page tables right above 8Mb when they can equally well
live at the bottom of the chunk at 4Mb - rep_io_test() cares about the
5Mb...7Mb range only anyway. In a subsequent patch this will then also
allow simply looking for an unused 4Mb range (instead of using a build
time determined one).

Extend the "skip tests" condition beyond the "is there enough memory"
question.

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Gary Lin <glin@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoUse non-debug build for Xen 4.9
Julien Grall [Thu, 18 May 2017 15:38:29 +0000 (16:38 +0100)]
Use non-debug build for Xen 4.9

Modify Config.mk and Kconfig.debug to disable debug by default in
preparation for late RCs and eventual release.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl/devd: move the device allocation/removal code
Roger Pau Monne [Tue, 16 May 2017 07:59:25 +0000 (08:59 +0100)]
libxl/devd: move the device allocation/removal code

Move the device addition/removal code to the {add/remove}_device functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agolibxl/devd: correctly manipulate the dguest list
Roger Pau Monne [Tue, 16 May 2017 07:59:24 +0000 (08:59 +0100)]
libxl/devd: correctly manipulate the dguest list

Current code in backend_watch_callback has two issues when manipulating the
dguest list:

1. backend_watch_callback forgets to remove a libxl__ddomain_guest from the
list of tracked domains when the related data is freed, causing dereferences
later on when the list is traversed. Make sure that a domain is always removed
from the list when freed.

2. A spurious device state change can cause a dguest to be freed, with active
devices and without being removed from the list. Fix this by always checking if
a dguest has active devices before freeing and removing it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Reinis Martinsons <admin@frp.lv>
Suggested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agolibxl/devd: fix a race with concurrent device addition/removal
Roger Pau Monne [Tue, 16 May 2017 07:59:23 +0000 (08:59 +0100)]
libxl/devd: fix a race with concurrent device addition/removal

Current code can free the libxl__device inside of the libxl__ddomain_device
before the addition has finished if a removal happens while an addition is
still in process:

  backend_watch_callback
            |
            v
       add_device
            |                 backend_watch_callback
    (async operation)                   |
            |                           v
            |                     remove_device
            |                           |
            |                           V
            |                    device_complete
            |                 (free libxl__device)
            v
     device_complete
  (deref libxl__device)

Fix this by creating a temporary copy of the libxl__device, that's tracked by
the GC of the nested async operation. This ensures that the libxl__device used
by the async operations cannot be freed while being used.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agobuild: more adjustments to top-level Makefile dependencies
Wei Liu [Fri, 19 May 2017 11:55:26 +0000 (12:55 +0100)]
build: more adjustments to top-level Makefile dependencies

In the original code, top-level dist target unconditionally invokes
dist target for tools/include, which is wrong when tools component is
not enabled.

Make dist-tools depend on dist-tools-public-headers, which depends on
build-tools-public-headers.

Discovered by Travis-CI.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
elease-acked-by: Julien Grall <julien.grall@arm.com>

7 years agoarm: fix build with gcc 7
Jan Beulich [Fri, 19 May 2017 08:12:08 +0000 (10:12 +0200)]
arm: fix build with gcc 7

The compiler dislikes duplicate "const", and the ones it complains
about look like they we in fact meant to be placed differently.

Also fix array_access_okay() (just like on x86), despite the construct
being unused on ARM: -Wint-in-bool-context, enabled by default in
gcc 7, doesn't like multiplication in conditional operators. "Hide" it,
at the risk of the next compiler version becoming smarter and
recognizing even that. (The hope is that added smartness then would
also better deal with legitimate cases like the one here.) The change
could have been done in access_ok(), but I think we better keep it at
the place the compiler is actually unhappy about.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86: fix build with gcc 7
Jan Beulich [Fri, 19 May 2017 08:11:36 +0000 (10:11 +0200)]
x86: fix build with gcc 7

-Wint-in-bool-context, enabled by default in gcc 7, doesn't like
multiplication in conditional operators. "Hide" them, at the risk of
the next compiler version becoming smarter and recognizing even those.
(The hope is that added smartness then would also better deal with
legitimate cases like the ones here.)

The change could have been done in access_ok(), but I think we better
keep it at the places the compiler is actually unhappy about.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxmalloc: correct _xmalloc_array() indentation
Jan Beulich [Fri, 19 May 2017 08:10:49 +0000 (10:10 +0200)]
xmalloc: correct _xmalloc_array() indentation

It's been wrongly indented using tabs till now, and the stray blank
ahead of the final return statement gets in the way of using .i files
for detailed analysis of other compiler issues
(-Wmisleading-indentation kicks in due to the tab->space
transformation done in the course of pre-processing).

Also add missing spaces inside the if() at once, including the similar
case in _xzalloc_array().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agobuild: add missing dependency
Wei Liu [Thu, 18 May 2017 10:57:32 +0000 (11:57 +0100)]
build: add missing dependency

Commit f745b55 missed install-tools' dependency on
build-tools-public-headers.

Discovered by Travis-CI.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agobuild: stubdom and tools should depend on public header target
Wei Liu [Wed, 17 May 2017 14:26:08 +0000 (15:26 +0100)]
build: stubdom and tools should depend on public header target

Build can fail if stubdom build is run before tools build because:

1. tools/include build uses relative path and depends on XEN_OS
2. stubdom needs tools/include to be built, at which time XEN_OS is
   mini-os and corresponding symlinks are created
3. libraries inside tools needs tools/include to be built, at which
   time XEN_OS is the host os name, but symlinks won't be created
   because they are already there
4. libraries get the wrong headers and fail to build

Since both tools and stubdom build need the public headers, we build
tools/include before stubdom and tools. Remove runes in stubdom and
tools to avoid building tools/include more than once.

Provide a new dist target for tools/include.  Hook up the install,
clean, dist and distclean targets for tools/include.

The new arrangement ensures tools build gets the correct headers
because XEN_OS is set to host os when building tools/include. As for
stubdom, it explicitly links to the mini-os directory without relying
on XEN_OS so it should be fine.

Reported-by: Steven Haigh <netwiz@crc.id.au>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
7 years agotools/xenconsoled: Preserve errno while rotating logfile handles
Andrew Cooper [Tue, 16 May 2017 13:57:27 +0000 (14:57 +0100)]
tools/xenconsoled: Preserve errno while rotating logfile handles

The logic to optionally exit after a poll() error relies on errno, but
handle_log_reload() does not preserve it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agolibxl/arm: Fix ARM build.
Andrii Anisov [Tue, 16 May 2017 15:57:53 +0000 (18:57 +0300)]
libxl/arm: Fix ARM build.

Initialise *size in default branch to prevent certain compilers (i.e.
Linaro GCC 5.2-2015.11-2) from reporting "variable may be used uninitialized"
errors in caller function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/ioreq_server: make p2m_finish_type_change actually work
Xiong Zhang [Wed, 17 May 2017 15:24:45 +0000 (17:24 +0200)]
x86/ioreq_server: make p2m_finish_type_change actually work

Commit 6d774a951696 ("x86/ioreq server: synchronously reset outstanding
p2m_ioreq_server entries when an ioreq server unmaps") introduced
p2m_finish_type_change(), which was meant to synchronously finish a
previously initiated type change over a gpfn range.  It did this by
calling get_entry(), checking if it was the appropriate type, and then
calling set_entry().

Unfortunately, a previous commit (1679e0df3df6 "x86/ioreq server:
asynchronously reset outstanding p2m_ioreq_server entries") modified
get_entry() to always return the new type after the type change, meaning
that p2m_finish_type_change() never changed any entries.  Which means
when an ioreq server was detached and then re-attached (as happens in
XenGT on reboot) the re-attach failed.

Fix this by using the existing p2m-specific recalculation logic instead
of doing a read-check-write loop.

Fix: 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
      outstanding p2m_ioreq_server entries when an ioreq server unmaps")'

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/mm: fix incorrect unmapping of 2MB and 1GB pages
Igor Druzhinin [Wed, 17 May 2017 15:23:15 +0000 (17:23 +0200)]
x86/mm: fix incorrect unmapping of 2MB and 1GB pages

The same set of functions is used to set as well as to clean
P2M entries, except that for clean operations INVALID_MFN (~0UL)
is passed as a parameter. Unfortunately, when calculating an
appropriate target order for a particular mapping INVALID_MFN
is not taken into account which leads to 4K page target order
being set each time even for 2MB and 1GB mappings. This eventually
breaks down an EPT structure irreversibly into 4K mappings which
prevents consecutive high order mappings to this area.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/pv: Fix the handling of `int $x` for vectors which alias exceptions
Andrew Cooper [Mon, 15 May 2017 12:05:45 +0000 (13:05 +0100)]
x86/pv: Fix the handling of `int $x` for vectors which alias exceptions

The claim at the top of c/s 2e426d6eecf "x86/traps: Drop use_error_code
parameter from do_{,guest_}trap()" is only actually true for hardware
exceptions.  It is not true for `int $x` instructions (which never push error
code), irrespective of whether the vector aliases an exception or not.

Furthermore, c/s 6480cc6280e "x86/traps: Fix failed ASSERT() in
do_guest_trap()" really should have helped highlight that a regression had
been introduced.

Modify pv_inject_event() to understand event types other than
X86_EVENTTYPE_HW_EXCEPTION, and introduce pv_inject_sw_interrupt() for the
`int $x` handling code.

Add further assertions to pv_inject_event() concerning the type of events
passed in, which in turn requires that do_guest_trap() set its type
appropriately (which is now used exclusively for hardware exceptions).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoinclude: fix build without C++ compiler installed
Jan Beulich [Fri, 12 May 2017 06:52:54 +0000 (00:52 -0600)]
include: fix build without C++ compiler installed

The rule for headers++.chk wants to move headers++.chk.new to the
designated target, which means we have to create that file in the first
place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoioemu-stubdom: don't link *-softmmu* and *-linux-user*
Wei Liu [Fri, 12 May 2017 15:21:06 +0000 (16:21 +0100)]
ioemu-stubdom: don't link *-softmmu* and *-linux-user*

They are generated by ./configure. Having them linked can cause race
between tools build and stubdom build.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agotools: don't require unavailable optional libraries in pkg-config files
Juergen Gross [Fri, 12 May 2017 13:10:51 +0000 (15:10 +0200)]
tools: don't require unavailable optional libraries in pkg-config files

blktap2 is optional, so there should be no pkg-config file requiring
xenblktapctl if it isn't enabled for the build.

Add a filter mechanism to tools/Rules.mk to filter out optional
libraries.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agopublic/elfnote: document non-alignment of relocated init-P2M 4.9.0-rc5
Jan Beulich [Fri, 12 May 2017 15:24:17 +0000 (17:24 +0200)]
public/elfnote: document non-alignment of relocated init-P2M

Since PV kernels can't use large pages anyway, when the init-P2M
support was added it was decided to keep the implementation simple and
not align large pages in PFN space. Document this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoFix broken package config file xenlight.pc.in
Charles Arnold [Thu, 11 May 2017 16:29:42 +0000 (10:29 -0600)]
Fix broken package config file xenlight.pc.in

The Requires line in this config file uses the wrong names for two dependencies.

The package config file for xenctrl is called 'xencontrol' and for blktapctl is
called 'xenblktapctl'. Running a command like 'pkg-config --exists xenlight' will
fail without this fix.

Signed-off-by: Charles Arnold <carnold@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxl: don't ignore return value from libxl_device_events_handler
Wei Liu [Fri, 12 May 2017 10:02:58 +0000 (11:02 +0100)]
xl: don't ignore return value from libxl_device_events_handler

That function can return a whole slew of error codes. Translate them
to EXIT_FAILURE.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agolibxenforeignmemory: bump minor version number
Wei Liu [Wed, 10 May 2017 11:51:09 +0000 (12:51 +0100)]
libxenforeignmemory: bump minor version number

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/pv: Align %rsp before pushing the failsafe stack frame
Andrew Cooper [Fri, 5 May 2017 16:38:19 +0000 (17:38 +0100)]
x86/pv: Align %rsp before pushing the failsafe stack frame

Architecturally, all 64bit stacks are aligned on a 16 byte boundary before an
exception frame is pushed.  The failsafe frame should not special in this
regard.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/pv: Fix bugs with the handling of int80_bounce
Andrew Cooper [Fri, 5 May 2017 16:38:19 +0000 (17:38 +0100)]
x86/pv: Fix bugs with the handling of int80_bounce

Testing has revealed two issues:

 1) Passing a NULL handle to set_trap_table() is intended to flush the entire
    table.  The 64bit guest case (and 32bit guest on 32bit Xen, when it
    existed) called init_int80_direct_trap() to reset int80_bounce, but c/s
    cda335c279 which introduced the 32bit guest on 64bit Xen support omitted
    this step.  Previously therefore, it was impossible for a 32bit guest to
    reset its registered int80_bounce details.

 2) init_int80_direct_trap() doesn't honour the guests request to have
    interrupts disabled on entry.  PVops Linux requests that interrupts are
    disabled, but Xen currently leaves them enabled when following the int80
    fastpath.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm: Survive unknown traps from guests
Julien Grall [Fri, 5 May 2017 14:30:36 +0000 (15:30 +0100)]
xen/arm: Survive unknown traps from guests

Currently we crash Xen if we see an ESR_EL2.EC value we don't recognise.
As configurable disables/enables are added to the architecture
(controlled by RES1/RESO bits respectively), with associated synchronous
exceptions, it may be possible for a guest to trigger exceptions with
classes that we don't recognise.

While we can't service these exceptions in a manner useful to the guest,
we can avoid bringing down the host. Per ARM DDI 0487A.k_iss10775, page
D7-1937, EC values within the range 0x00 - 0x2c are reserved for future
use with synchronous exceptions, and EC within the range 0x2d - 0x3f may
be used for either synchronous or asynchronous exceptions.

The patch makes Xen handle any unknown EC by injecting an UNDEFINED
exception into the guest, with a corresponding (ratelimited) warning in
the log.

This patch is based on Linux commit f050fe7a9164 "arm: KVM: Survive unknown
traps from the guest".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: do_trap_hypervisor: Separate hypervisor and guest traps
Julien Grall [Fri, 5 May 2017 14:30:35 +0000 (15:30 +0100)]
xen/arm: do_trap_hypervisor: Separate hypervisor and guest traps

The function do_trap_hypervisor is currently handling both trap coming
from the hypervisor and the guest. This makes difficult to get specific
behavior when a trap is coming from either the guest or the hypervisor.

Split the function into two parts:
    - do_trap_guest_sync to handle guest traps
    - do_trap_hyp_sync to handle hypervisor traps

On AArch32, the Hyp Trap Exception provides the standard mechanism for
trapping Guest OS functions to the hypervisor (see B1.14.1 in ARM DDI
0406C.c). It cannot be generated when generated when the processor is in
Hyp Mode, instead other exception will be used. So it is fine to replace
the call to do_trap_hypervisor by do_trap_guest_sync.

For AArch64, there are two distincts exception depending whether the
exception was taken from the current level (hypervisor) or lower level
(guest).

Note that the unknown traps from guests will lead to panic Xen. This is
already behavior and is left unchanged for simplicy. A follow-up patch
will address that.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm32: Rename the trap to the correct name
Julien Grall [Fri, 5 May 2017 14:30:34 +0000 (15:30 +0100)]
xen/arm: arm32: Rename the trap to the correct name

Per Table B1-3 in ARM DDI 0406C.c, the vector 0x8 for hyp is called
"Hypervisor Call".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/mm: add temporary debugging code to get_page_from_gfn_p2m()
Jan Beulich [Mon, 8 May 2017 15:48:32 +0000 (17:48 +0200)]
x86/mm: add temporary debugging code to get_page_from_gfn_p2m()

See the code comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agolibxl: u.hvm.usbdevice_list is checked for emptiness
Robin Lee [Fri, 5 May 2017 19:02:32 +0000 (03:02 +0800)]
libxl: u.hvm.usbdevice_list is checked for emptiness

Currently usbdevice_list is only checked for nullity. But the OCaml
binding will convert empty list to a pointer to NULL, instead of a
NULL pointer. That means the OCaml binding will fail to disable USB.

This patch will check emptiness of usbdevice_list. And NULL is still a
valid empty list.

Signed-off-by: Robin Lee <robinlee.sysu@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86: correct boot time page table setup
Jan Beulich [Mon, 8 May 2017 12:56:14 +0000 (14:56 +0200)]
x86: correct boot time page table setup

While using alloc_domheap_pages() and assuming the allocated memory is
directly accessible is okay at boot time (as we run on the idle page
tables there), memory hotplug code too assumes it can access the
resulting page tables without using map_domain_page() or alike, and
hence we need to obtain memory suitable for ordinary page table use
here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86: correct create_bounce_frame
Jan Beulich [Mon, 8 May 2017 12:55:20 +0000 (14:55 +0200)]
x86: correct create_bounce_frame

Commit d9b7ef209a7 ("x86: drop failsafe callback invocation from
assembly") didn't go quite far enough with the cleanup it did: The
changed maximum frame size should also have been reflected in the early
address range check (which has now been pointed out to have been wrong
anyway, using 60 instead of 0x60), and it should have updated the
comment ahead of the function.

Also adjust the lower bound - all is fine (for our purposes) if the
initial guest kernel stack pointer points right at the hypervisor base
address, as only memory _below_ that address is going to be written.

Additionally limit the number of times %rsi is being adjusted to what
is really needed.

Finally move exception fixup code into the designated .fixup section
and macroize the stores to guest stack.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien grall <julien.grall@arm.com>
8 years agox86/vm_event: fix race between __context_switch() and vm_event_resume()
Razvan Cojocaru [Mon, 8 May 2017 12:54:00 +0000 (14:54 +0200)]
x86/vm_event: fix race between __context_switch() and vm_event_resume()

The introspection agent can reply to a vm_event faster than
vmx_vmexit_handler() can complete in some cases, where it is then
not safe for vm_event_set_registers() to modify v->arch.user_regs.
In the test scenario, we were stepping over an INT3 breakpoint by
setting RIP += 1. The quick reply tended to complete before the VCPU
triggering the introspection event had properly paused and been
descheduled. If the reply occurs before __context_switch() happens,
__context_switch() clobbers the reply by overwriting
v->arch.user_regs from the stack. If we don't pass through
__context_switch() (due to switching to the idle vCPU), reply data
wouldn't be picked up when switching back straight to the original
vCPU.

This patch ensures that vm_event_resume() code only sets per-VCPU
data to be used for the actual setting of registers later in
hvm_do_resume() (similar to the model used to control setting of CRs
and MSRs).

The patch additionally removes the sync_vcpu_execstate(v) call from
vm_event_resume(), which is no longer necessary, which removes the
associated broadcast TLB flush (read: performance improvement).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/vm_event: add hvm/vm_event.{h,c}
Razvan Cojocaru [Mon, 8 May 2017 12:52:31 +0000 (14:52 +0200)]
x86/vm_event: add hvm/vm_event.{h,c}

Created arch/x86/hvm/vm_event.c and include/asm-x86/hvm/vm_event.h,
where HVM-specific vm_event-related code will live. This cleans up
hvm_do_resume() and ensures that the vm_event maintainers are
responsible for changes to that code.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/vpmu_intel: fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL
Mohit Gambhir [Mon, 8 May 2017 11:37:17 +0000 (13:37 +0200)]
x86/vpmu_intel: fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL

Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a General
Protection Fault and thus results in a hypervisor crash. This behavior has
been observed on two generations of Intel processors namely, Haswell and
Broadwell. Other Intel processor generations were not tested. However, it
does seem to be a possible erratum that hasn't yet been confirmed by Intel.

To fix the problem this patch masks PC bit and returns an error in
case any guest tries to write to it on any Intel processor. In addition
to the fact that setting this bit crashes the hypervisor on Haswell and
Broadwell, the PC flag bit toggles a hardware pin on the physical CPU
every time the programmed event occurs and the hardware behavior in
response to the toggle is undefined in the SDM, which makes this bit
unsafe to be used by guests and hence should be masked on all machines.

Signed-off-by: Mohit Gambhir <mohit.gambhir@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoVMX: constrain vmx_intr_assist() debugging code to debug builds
Jan Beulich [Mon, 8 May 2017 11:36:28 +0000 (13:36 +0200)]
VMX: constrain vmx_intr_assist() debugging code to debug builds

This is because that code, added by commit 997382b771 ("y86/vmx: dump
PIR and vIRR before ASSERT()"), was meant to be removed by the time we
finalize 4.9, but the root cause of the ASSERT() wrongly(?) triggering
still wasn't found.

Take the opportunity and also correct the format specifiers, which I
had got wrong when editing said change while committing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/public: correct register naming 4.9.0-rc4
Jan Beulich [Fri, 5 May 2017 15:09:49 +0000 (17:09 +0200)]
x86/public: correct register naming

Commit 897129deab ("x86: use unambiguous register names") went a little
too far: With it we also get register names like _e15 and e15 for
non-Xen consumers using a gcc compatible compiler. Correct this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86: polish __{get,put}_user_{,no}check()
Jan Beulich [Fri, 5 May 2017 15:08:14 +0000 (17:08 +0200)]
x86: polish __{get,put}_user_{,no}check()

The primary purpose is correcting a latent bug in __get_user_check()
(the macro has no active user at present): The access_ok() check should
be before the actual access, or else any PV guest could initiate MMIO
reads with side effects.

Clean up all four macros at once:
- all arguments evaluated exactly once
- build the "check" flavor using the "nocheck" ones, instead of open
  coding them
- "int" is wide enough for error codes
- name local variables without using underscores as prefixes
- avoid pointless parentheses
- add blanks after commas separating parameters or arguments
- consistently use tabs for indentation

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien grall <julien.grall@arm.com>
8 years agox86/asm: Clobber %r{8..15} on exit to 32bit PV guests
Andrew Cooper [Thu, 13 Apr 2017 09:51:44 +0000 (10:51 +0100)]
x86/asm: Clobber %r{8..15} on exit to 32bit PV guests

In the presence of bugs such as XSA-214 where a 32bit PV guest can get its
hands on a long mode segment, this change prevents register content leaking
between domains.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/asm: Fold LOAD_C_CLOBBERED into RESTORE_ALL
Andrew Cooper [Wed, 12 Apr 2017 13:50:35 +0000 (13:50 +0000)]
x86/asm: Fold LOAD_C_CLOBBERED into RESTORE_ALL

With its sole other user removed, fold LOAD_C_CLOBBERED into RESTORE_ALL to
reduce the cognitive load of trying to work out which registers get modified.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Lift all non-entrypoint logic in entry_int82() up into C
Andrew Cooper [Wed, 12 Apr 2017 16:40:32 +0000 (17:40 +0100)]
x86/traps: Lift all non-entrypoint logic in entry_int82() up into C

This is more readable, maintainable, and livepatchable.

This involves declaring check_for_unexpected_msi(), untrusted_msi and
pv_hypercall() suitably for use by C.  While making these changes,
untrusted_msi is switched over to being a C99 bool.

No behavioural change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Rename compat_hypercall() to entry_int82()
Andrew Cooper [Wed, 12 Apr 2017 16:37:56 +0000 (17:37 +0100)]
x86/traps: Rename compat_hypercall() to entry_int82()

This follows the Linux example of naming the entry point by how it is arrived
at, rather than its purpose.

Doing so highlights that the SAVE_VOLATILE instantiation sets up the wrong
entry_vector on the stack (although this is currently benign as we never
sysret back to a 32bit PV, and the iret path doesn't care).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/mm: Further restrict permissions on some virtual mappings
Andrew Cooper [Tue, 11 Apr 2017 15:34:58 +0000 (16:34 +0100)]
x86/mm: Further restrict permissions on some virtual mappings

As originally reported, the Linear Pagetable slot maps 512GB of ram as RWX,
where the guest has full read access and a lot of direct or indirect control
over the written content.  It isn't hard for a PV guest to hide shellcode
here.

Therefore, increase defence in depth by auditing our current pagetable
mappings.

 * The regular linear, shadow linear, and per-domain slots have no business
   being executable (but need to be written), so are updated to be NX.
 * The Read Only mappings of the M2P (compat and regular) don't need to be
   writeable or executable.
 * The PV GDT mappings and bits of the directmap don't need to be executable.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Poison unused stack pointers in the TSS
Andrew Cooper [Tue, 11 Apr 2017 14:39:08 +0000 (15:39 +0100)]
x86/traps: Poison unused stack pointers in the TSS

This is for additional defence-in-depth following LDT/GDT/IDT corruption.

It causes attempted control transfers to ring 1 or 2 (via a call gate), or
attempts to use IST 3 through 7 to yield #SS, rather than executing with a
stack starting at the top of virtual address space.

Express the TSS setup in terms of structure assignment, which should be less
fragile if the IST indexes need to change, and has the useful side effect of
zeroing the reserved fields.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Drop 32bit fields out of tss_struct
Andrew Cooper [Tue, 11 Apr 2017 14:42:41 +0000 (15:42 +0100)]
x86/traps: Drop 32bit fields out of tss_struct

The backlink field doesn't exist in a 64bit TSS, and union for esp{0..2} is of
no practical use.  Specify everything with stdint types, and empty bitfields
for reserved values.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm32: Distinguish guest SError from Xen data aborts
Wei Chen [Thu, 4 May 2017 03:27:49 +0000 (11:27 +0800)]
xen/arm32: Distinguish guest SError from Xen data aborts

ARM32 doesn't have an exception similar to hyp_sync of ARM64 to catch
the synchronous data abort (For example, a NULL pointer has been referenced).
Hence the SError and sync data abort will be caught by the same data abort
exception.

Since commit "3f16c8cb" we treat all data aborts caught by this excetpion
as SError. This means, we will forward Xen synchronous data abort to guest,
if the serror_op=FORWARD. This is obviously incorrect. But we don't have
any method to distinguish SError from Xen data aborts.

But we can distinguish guest generated SError from Xen data aborts. So we
want to change the policy to handle data aborts for ARM32:
1. If this data abort is guest generated SError, we will handle this data
   abort follow the SError handle option setting.
2. If this data abort is synchronous data abort or Xen generate SError, we
   will PANIC the whole system.

Signed-off-by: Wei Chen <Wei.Chen@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: efi: Avoid out-of-bounds write in meminfo_add_bank
Julien Grall [Thu, 4 May 2017 19:36:41 +0000 (20:36 +0100)]
xen/arm: efi: Avoid out-of-bounds write in meminfo_add_bank

Commit 2c77db77 "xen/arm: efi: Avoid duplicating the addition of a new
bank", introduced a new function meminfo_add_bank that add a new bank.
This new code fails to check correctly the size of the array which may
result to an out-of-bounds write.

Coverity-ID: 1433183
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agolibs/devicemodel: Fix dependency with libxencall
Anthony PERARD [Thu, 4 May 2017 10:50:52 +0000 (11:50 +0100)]
libs/devicemodel: Fix dependency with libxencall

libxendevicemodel.so do depends on libxencall.so but the dependency was
missing at link time.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agohvm: fix hypervisor crash in hvm_save_one()
Jan Beulich [Thu, 4 May 2017 13:05:26 +0000 (15:05 +0200)]
hvm: fix hypervisor crash in hvm_save_one()

hvm_save_cpu_ctxt() returns success without writing any data into
hvm_domain_context_t when all VCPUs are offline. This can then crash
the hypervisor (with FATAL PAGE FAULT) in hvm_save_one() via the
"off < (ctxt.cur - sizeof(*desc))" for() test, where ctxt.cur remains 0,
causing an underflow which leads the hypervisor to go off the end of the
ctxt buffer.

This has been broken since Xen 4.4 (c/s e019c606f59).
It has happened in practice with an HVM Linux VM (Debian 8) queried around
shutdown:

(XEN) hvm.c:1595:d3v0 All CPUs offline -- powering off.
(XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    5
(XEN) RIP:    e008:[<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v2)
(XEN) rax: ffff830492cbb445   rbx: 0000000000000000   rcx: ffff83039343b400
(XEN) rdx: 00000000ff88004d   rsi: fffffffffffffff8   rdi: 0000000000000000
(XEN) rbp: ffff8304103e7c88   rsp: ffff8304103e7c48   r8:  0000000000000001
(XEN) r9:  deadbeefdeadf00d   r10: 0000000000000000   r11: 0000000000000282
(XEN) r12: 00007f43a3b14004   r13: 00000000fffffffe   r14: 0000000000000000
(XEN) r15: ffff830400c41000   cr0: 0000000080050033   cr4: 00000000001526e0
(XEN) cr3: 0000000402e13000   cr2: ffff830492cbb447
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d0802496d2> (hvm_save_one+0x145/0x1fd):
(XEN)  00 00 48 01 c8 83 c2 08 <66> 39 58 02 75 64 eb 08 48 89 c8 ba 08 00 00 00
(XEN) Xen stack trace from rsp=ffff8304103e7c48:
(XEN)    0000041000000000 ffff83039343b400 ffff8304103e7c70 ffff8304103e7da8
(XEN)    ffff830400c41000 00007f43a3b13004 ffff8304103b7000 ffffffffffffffea
(XEN)    ffff8304103e7d48 ffff82d0802683d4 ffff8300d19fd000 ffff82d0802320d8
(XEN)    ffff830400c41000 0000000000000000 ffff8304103e7cd8 ffff82d08026ff3d
(XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7cf8 ffff82d080232142
(XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7d28 ffff82d080207051
(XEN)    ffff8304103e7d18 ffff830400c41000 0000000000000202 ffff830400c41000
(XEN)    0000000000000000 00007f43a3b13004 0000000000000000 deadbeefdeadf00d
(XEN)    ffff8304103e7e68 ffff82d080206c47 0700000000000000 ffff830410375bd0
(XEN)    0000000000000296 ffff830410375c78 ffff830410375c80 0000000000000003
(XEN)    ffff8304103e7e68 ffff8304103b67c0 ffff8304103b7000 ffff8304103b67c0
(XEN)    0000000d00000037 0000000000000003 0000000000000002 00007f43a3b14004
(XEN)    00007ffd5d925590 0000000000000000 0000000100000000 0000000000000000
(XEN)    00000000ea8f8000 0000000000000000 00007ffd00000000 0000000000000000
(XEN)    00007f43a276f557 0000000000000000 00000000ea8f8000 0000000000000000
(XEN)    00007ffd5d9255e0 00007f43a23280b2 00007ffd5d926058 ffff8304103e7f18
(XEN)    ffff8300d19fe000 0000000000000024 ffff82d0802053e5 deadbeefdeadf00d
(XEN)    ffff8304103e7f08 ffff82d080351565 010000003fffffff 00007f43a3b13004
(XEN)    deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d
(XEN)    ffff8800781425c0 ffff88007ce94300 ffff8304103e7ed8 ffff82d0802719ec
(XEN) Xen call trace:
(XEN)    [<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
(XEN)    [<ffff82d0802683d4>] arch_do_domctl+0xa7a/0x259f
(XEN)    [<ffff82d080206c47>] do_domctl+0x1862/0x1b7b
(XEN)    [<ffff82d080351565>] pv_hypercall+0x1ef/0x42c
(XEN)    [<ffff82d080355106>] entry.o#test_all_events+0/0x30
(XEN)
(XEN) Pagetable walk from ffff830492cbb447:
(XEN)  L4[0x106] = 00000000dbc36063 ffffffffffffffff
(XEN)  L3[0x012] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 5:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff830492cbb447
(XEN) ****************************************

At the same time pave the way for having zero-length records.

Inspired by an earlier patch from Andrew and Razvan.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Diagnosed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Release-Acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/mm: silence a pointless warning
Jan Beulich [Thu, 4 May 2017 13:04:29 +0000 (15:04 +0200)]
x86/mm: silence a pointless warning

get_page() logs a message when it fails (dom_cow is never dying or
paging_mode_external()), so better avoid the call when it's pointless
to do anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agovTPM: update email address and file path in MAINTAINERS file
Quan Xu [Thu, 27 Apr 2017 18:14:29 +0000 (02:14 +0800)]
vTPM: update email address and file path in MAINTAINERS file

Signed-off-by: Quan Xu <xuquan8@huawei.com>
8 years agox86: discard type information when stealing pages
Jan Beulich [Tue, 2 May 2017 12:46:58 +0000 (14:46 +0200)]
x86: discard type information when stealing pages

While a page having just a single general reference left necessarily
has a zero type reference count too, its type may still be valid (and
in validated state; at present this is only possible and relevant for
PGT_seg_desc_page, as page tables have their type forcibly zapped when
their type reference count drops to zero, and
PGT_{writable,shared}_page pages don't require any validation). In
such a case when the page is being re-used with the same type again,
validation is being skipped. As validation criteria differ between
32- and 64-bit guests, pages to be transferred between guests need to
have their validation indicator zapped (and with it we zap all other
type information at once).

This is XSA-214.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agomulticall: deal with early exit conditions
Jan Beulich [Tue, 2 May 2017 12:45:02 +0000 (14:45 +0200)]
multicall: deal with early exit conditions

In particular changes to guest privilege level require the multicall
sequence to be aborted, as hypercalls are permitted from kernel mode
only. While likely not very useful in a multicall, also properly handle
the return value in the HYPERVISOR_iret case (which should be the guest
specified value).

This is XSA-213.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86emul: correct stub invocation constraints again
Jan Beulich [Fri, 28 Apr 2017 14:03:40 +0000 (16:03 +0200)]
x86emul: correct stub invocation constraints again

While the hypervisor side of commit cd91ab08ea ("x86emul: correct stub
invocation constraints") was fine, the tools side triggered a bogus
error with old gcc (4.3 and 4.4 at least). Use a slightly less
appropriate variant instead, proven to be good enough to not
re-introduce the original problem: Which of the addresses is actually
used doesn't matter much as long as the compiler can't prove that the
two pointers don't alias one another.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoseabios: run olddefconfig 4.9.0-rc3
Wei Liu [Wed, 26 Apr 2017 11:13:34 +0000 (12:13 +0100)]
seabios: run olddefconfig

We provided a base config file in 970f8de3e. To generate a full config
file, running olddefconfig is required.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Ian Jackson <ian.jackson@eu.citrix.com>