]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agolibxl: Do not trust backend for vtpm in getinfo (uuid)
Ian Jackson [Fri, 29 Apr 2016 15:57:14 +0000 (16:57 +0100)]
libxl: Do not trust backend for vtpm in getinfo (uuid)

Use uuid from /libxl, rather than from backend.  I think the backend
is not supposed to change the uuid, since it seems to be set by libxl
during setup.

If in fact the backend is supposed to be able to change the uuid, this
patch needs to be dropped and replaced by a patch which makes the vtpm
uuid lookup tolerate bad or missing data.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for vtpm in getinfo (except uuid)
Ian Jackson [Fri, 29 Apr 2016 16:18:44 +0000 (17:18 +0100)]
libxl: Do not trust backend for vtpm in getinfo (except uuid)

* Do not check the backend for existence.  We have already read the
  /libxl path so know that the vtpm exists (or is supposed to); if the
  backend doesn't exist then that must be the backend's doing.
* Get the frontend path from the /libxl directory.
* The frontend domid is the guest domid, and does not need to be read
  from xenstore (!)

We still attempt to read the uuid from the backend.  This will be
fixed in the next patch.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Make copy of every xs backend in /libxl in _generic_add
Ian Jackson [Fri, 29 Apr 2016 15:19:28 +0000 (16:19 +0100)]
libxl: Make copy of every xs backend in /libxl in _generic_add

We want to stop libxl trustingly reading information from the backend
directory (since this is, of course, writeable by the backend, which
might be a semi-trusted driver domain).

In principle it is wrong in current libxl for anything to try to
divine virtual device configuration from xenstore: the JSON domain
config ought to supply that, and xenstore should only tell us which
devices actually exist.

However:

Firstly, there are several existing places where configuration
information is retrieved from xenstore rather than JSON.  We do not
want to reen gineer this in a security patch.

Secondly, we want to make a security patch which can be backported to
versions of libxl without the JSON configuration machinery.

So we take the expedient approach of keeping a copy of the
configuration somewhere we trust, namely /libxl.  This is obviously
fairly low-risk, although it does write significantly more keys in
xenstore.

In this patch we make this change in libxl__device_generic_add.  This
is responsible for actually writing the vast majority of device
information to xenstore.  There are a few loose ends which will be
dealt with in a moment.

Likewise, changes to readers to use the new location will appear in
further patches.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for nic in getinfo
Ian Jackson [Tue, 3 May 2016 15:31:07 +0000 (16:31 +0100)]
libxl: Do not trust frontend for nic in getinfo

libxl_device_nic_getinfo needs to examine devices without trusting
frontend-controlled data.  So:

* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
  reading it from the frontend.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for nic in libxl_devid_to_device_nic
Ian Jackson [Tue, 3 May 2016 14:52:53 +0000 (15:52 +0100)]
libxl: Do not trust frontend for nic in libxl_devid_to_device_nic

Find the backend by reading the pointer in /libxl rather than in the
guest's frontend area.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for vtpm in getinfo
Ian Jackson [Tue, 3 May 2016 15:00:20 +0000 (16:00 +0100)]
libxl: Do not trust frontend for vtpm in getinfo

libxl_device_vtpm_getinfo needs to examine devices without trusting
frontend-controlled data.  So:

* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
  reading it from the frontend.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for vtpm list
Ian Jackson [Tue, 3 May 2016 14:58:32 +0000 (15:58 +0100)]
libxl: Do not trust frontend for vtpm list

libxl_device_vtpm_list needs to enumerate and identify devices without
trusting frontend-controlled data.  So

* Use the /libxl path to enumerate vtpms.
* Use the /libxl path to find the corresponding backends.
* Parse the backend path to find the backend domid.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for disk in getinfo
Ian Jackson [Fri, 29 Apr 2016 18:21:51 +0000 (19:21 +0100)]
libxl: Do not trust frontend for disk in getinfo

* Rename the frontend variable to `fe_path' to check we caught them all
* Read the backend path from /libxl, rather than from the frontend
* Parse the backend domid from the backend path, rather than reading it
  from the frontend (and add the appropriate error path and initialisation)

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for disk eject event
Ian Jackson [Wed, 27 Apr 2016 15:08:49 +0000 (16:08 +0100)]
libxl: Do not trust frontend for disk eject event

Use the /libxl path for interpreting disk eject watch events: do not
read the backend path out of the frontend.  Instead, use the version
in /libxl.  That avoids us relying on the guest-modifiable
$frontend/backend pointer.

To implement this we store the path
  /libxl/$guest/device/vbd/$devid/backend
in the evgen structure.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend in libxl__device_nextid
Ian Jackson [Wed, 4 May 2016 14:30:32 +0000 (15:30 +0100)]
libxl: Do not trust frontend in libxl__device_nextid

When selecting the devid for a new device, we should look in
/libxl/device for existing devices, not in the frontend area.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend in libxl__devices_destroy
Ian Jackson [Tue, 3 May 2016 17:39:36 +0000 (18:39 +0100)]
libxl: Do not trust frontend in libxl__devices_destroy

We need to enumerate the devices we have provided to a domain, without
trusting the guest-writeable (or, at least, guest-deletable) frontend
paths.

Instead, enumerate via, and read the backend path from, /libxl.

The console /libxl path is regular, so the special case for console 0
is not relevant any more: /libxl/GUEST/device/console/0 will be found,
and then libxl__device_destroy will DTRT to the right frontend path.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Provide libxl__backendpath_parse_domid
Ian Jackson [Wed, 27 Apr 2016 15:34:19 +0000 (16:34 +0100)]
libxl: Provide libxl__backendpath_parse_domid

Multiple places in libxl need to figure out the backend domid of a
device.  This can be discovered easily by looking at the backend path,
which always starts /local/domain/$backend_domid/.

There are no call sites yet.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Record backend/frontend paths in /libxl/$DOMID
Ian Jackson [Mon, 16 May 2016 15:44:31 +0000 (16:44 +0100)]
libxl: Record backend/frontend paths in /libxl/$DOMID

This gives us a record of all the backends we have set up for a
domain, which is separate from the frontends in
  /local/domain/$DOMID/device.

In particular:

1. A guest has write permission for the frontend path:
  /local/domain/$DOMID/device/$KIND/$DEVID
which means that the guest can completely delete the frontend.
(They can't recreate it because they don't have write permission
on the containing directory.)

2. A guest has write permission for the backend path recorded in the
frontend, ie, it can write to
  /local/domain/$DOMID/device/$KIND/$DEVID/backend
which means that the guest can break the association between
frontend and backend.

So we can't rely on iterating over the frontends to find all the
backends, or examining a frontend to discover how a device is
configured.

So, have libxl__device_generic_add record the frontend and backend
paths in /libxl/$DOMID/device, and have libxl__device_destroy remove
them again.

Create the containing directory /libxl/GUEST/device in
libxl__domain_make.  The already existing xs_rm in devices_destroy_cb
will take care of removing it.

This is part of XSA-175.

Backport note: Backported over 7472ced, which fixes a bug in driver
domain teardown.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/arm: Don't free p2m->first_level in p2m_teardown() before it has been allocated
Andrew Cooper [Thu, 2 Jun 2016 13:19:00 +0000 (14:19 +0100)]
xen/arm: Don't free p2m->first_level in p2m_teardown() before it has been allocated

If p2m_init() didn't complete successfully, (e.g. due to VMID
exhaustion), p2m_teardown() is called and unconditionally tries to free
p2m->first_level before it has been allocated.  free_domheap_pages() doesn't
tolerate NULL pointers.

This is XSA-181

Reported-by: Aaron Cornelius <Aaron.Cornelius@dornerworks.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
8 years agox86/mm: fully honor PS bits in guest page table walks
Jan Beulich [Tue, 17 May 2016 12:56:46 +0000 (14:56 +0200)]
x86/mm: fully honor PS bits in guest page table walks

In L4 entries it is currently unconditionally reserved (and hence
should, when set, always result in a reserved bit page fault), and is
reserved on hardware not supporting 1Gb pages (and hence should, when
set, similarly cause a reserved bit page fault on such hardware).

This is CVE-2016-4480 / XSA-176.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 46699c7393bd991234b5642763c5c24b6b39a6c4
master date: 2016-05-17 14:41:14 +0200

8 years agoUpdate QEMU_UPSTREAM_REVISION
Ian Jackson [Tue, 10 May 2016 18:20:49 +0000 (19:20 +0100)]
Update QEMU_UPSTREAM_REVISION

8 years agoQEMU_TAG update
Ian Jackson [Tue, 10 May 2016 18:16:37 +0000 (19:16 +0100)]
QEMU_TAG update

8 years agoQEMU_TAG update
Ian Jackson [Tue, 10 May 2016 18:10:57 +0000 (19:10 +0100)]
QEMU_TAG update

9 years agox86: limit GFNs to 32 bits for shadowed superpages.
Tim Deegan [Wed, 16 Mar 2016 17:05:25 +0000 (17:05 +0000)]
x86: limit GFNs to 32 bits for shadowed superpages.

Superpage shadows store the shadowed GFN in the backpointer field,
which for non-BIGMEM builds is 32 bits wide.  Shadowing a superpage
mapping of a guest-physical address above 2^44 would lead to the GFN
being truncated there, and a crash when we come to remove the shadow
from the hash table.

Track the valid width of a GFN for each guest, including reporting it
through CPUID, and enforce it in the shadow pagetables.  Set the
maximum witth to 32 for guests where this truncation could occur.

This is XSA-173.

Reported-by: Ling Liu <liuling-it@360.cn>
Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: fix information leak on AMD CPUs
Jan Beulich [Tue, 29 Mar 2016 13:20:58 +0000 (15:20 +0200)]
x86: fix information leak on AMD CPUs

The fix for XSA-52 was wrong, and so was the change synchronizing that
new behavior to the FXRSTOR logic: AMD's manuals explictly state that
writes to the ES bit are ignored, and it instead gets calculated from
the exception and mask bits (it gets set whenever there is an unmasked
exception, and cleared otherwise). Hence we need to follow that model
in our workaround.

This is CVE-2016-3158 / CVE-2016-3159 / XSA-172.
[xen/arch/x86/xstate.c:xrstor: CVE-2016-3158]
[xen/arch/x86/i387.c:fpu_fxrstor: CVE-2016-3159]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 7bd9dc3adfbb014c55f0928ebb3b20950ca9c019
master date: 2016-03-29 14:24:26 +0200

9 years agolibvchan: Read prod/cons only once.
Konrad Rzeszutek Wilk [Thu, 10 Mar 2016 07:26:42 +0000 (08:26 +0100)]
libvchan: Read prod/cons only once.

We must ensure that the prod/cons are only read once and that
the compiler won't try to optimize the reads. That is split
the read of these in multiple instructions influencing later
branch code. As such insert barriers when fetching the cons
and prod index.

This is part of XSA155.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
master commit: 7d66a4ba695ab8d13b214fb816dd59e443ae1ec9
master date: 2015-12-18 09:50:02 -0500

9 years agotools: pygrub: if partition table is empty, try treating as a whole disk
Ian Campbell [Thu, 5 Nov 2015 14:46:12 +0000 (14:46 +0000)]
tools: pygrub: if partition table is empty, try treating as a whole disk

pygrub (in identify_disk_image()) detects a DOS style partition table
via the presence of the 0xaa55 signature at the end of the first
sector of the disk.

However this signature is also present in whole-disk configurations
when there is an MBR on the disk. Many filesystems (e.g. ext[234])
include leading padding in their on disk format specifically to enable
this.

So if we think we have a DOS partition table but do not find any
actual partition table entries we may as well try looking at it as a
whole disk image. Worst case is we probe and find there isn't anything
there.

This was reported by Sjors Gielen in Debian bug #745419. The fix was
inspired by a patch by Adi Kriegisch in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745419#27

Tested by genext2fs'ing my /boot into a new raw image (works) and
then:
   dd if=/usr/lib/grub/i386-pc/g2ldr.mbr of=img conv=notrunc bs=512 count=1

to add an MBR (with 0xaa55 signature) to it, which after this patch
also works.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: 745419-forwarded@bugs.debian.org
(cherry picked from commit fb31b1475f1bf179f033b8de3f0e173006fd77e9)
(cherry picked from commit 6c9b1bcce4fcc872edddd44f88390a67d5954069)
(cherry picked from commit 812406cf2b6731d07f0f840d799fcfa5917dbaf4)

9 years agox86: fix unintended fallthrough case from XSA-154
Andrew Cooper [Thu, 18 Feb 2016 14:28:25 +0000 (15:28 +0100)]
x86: fix unintended fallthrough case from XSA-154

... and annotate the other deliberate one: Coverity objects otherwise.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
One of the two instances was actually a bug.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 8dd6d1c099865ee5f5916616a0ca79cd943c46f9
master date: 2016-02-18 15:10:07 +0100

9 years agox86/VMX: sanitize rIP before re-entering guest
Jan Beulich [Wed, 17 Feb 2016 15:52:25 +0000 (16:52 +0100)]
x86/VMX: sanitize rIP before re-entering guest

... to prevent guest user mode arranging for a guest crash (due to
failed VM entry). (On the AMD system I checked, hardware is doing
exactly the canonicalization being added here.)

Note that fixing this in an architecturally correct way would be quite
a bit more involved: Making the x86 instruction emulator check all
branch targets for validity, plus dealing with invalid rIP resulting
from update_guest_eip() or incoming directly during a VM exit. The only
way to get the latter right would be by not having hardware do the
injection.

Note further that there are a two early returns from
vmx_vmexit_handler(): One (through vmx_failed_vmentry()) leads to
domain_crash() anyway, and the other covers real mode only and can
neither occur with a non-canonical rIP nor result in an altered rIP,
so we don't need to force those paths through the checking logic.

This is CVE-2016-2271 / XSA-170.

Reported-by: 刘令 <liuling-it@360.cn>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: ffbbfda37782a2408953af1a3e00ada80bb141bc
master date: 2016-02-17 16:18:08 +0100

9 years agox86: enforce consistent cachability of MMIO mappings
Jan Beulich [Wed, 17 Feb 2016 15:51:44 +0000 (16:51 +0100)]
x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: c61a6f74f80eb36ed83a82f713db3143159b9009
master date: 2016-02-17 16:16:53 +0100

9 years agoupdate Xen version to 4.4.4 RELEASE-4.4.4
Jan Beulich [Thu, 21 Jan 2016 12:38:50 +0000 (13:38 +0100)]
update Xen version to 4.4.4

9 years agox86/vmx: Fix injection of #DB traps following XSA-156
Andrew Cooper [Wed, 20 Jan 2016 13:10:12 +0000 (14:10 +0100)]
x86/vmx: Fix injection of #DB traps following XSA-156

Most #DB exceptions are traps rather than faults, meaning that the instruction
pointer in the exception frame points after the instruction rather than at it.

However, VMX intercepts all have fault semantics, even when intercepting a
trap.  Re-injecting an intercepted trap as a fault causes an infinite loop in
the guest, by re-executing the same trapping instruction repeatedly.  This
breaks debugging inside the guest.

Introduce a helper which copies VM_EXIT_INTR_INTO to VM_ENTRY_INTR_INFO, and
use it to mirror the intercepted interrupt back to the guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 0747bc8b4d85f3fc0ee1e58418418fa0229e8ff8
master date: 2016-01-05 11:28:56 +0000

9 years agox86/VMX: prevent INVVPID failure due to non-canonical guest address
Jan Beulich [Wed, 20 Jan 2016 13:09:21 +0000 (14:09 +0100)]
x86/VMX: prevent INVVPID failure due to non-canonical guest address

While INVLPG (and on SVM INVLPGA) don't fault on non-canonical
addresses, INVVPID fails (in the "individual address" case) when passed
such an address.

Since such intercepted INVLPG are effectively no-ops anyway, don't fix
this in vmx_invlpg_intercept(), but instead have paging_invlpg() never
return true in such a case.

This is CVE-2016-1571 / XSA-168.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: bf05e88ed7342a91cceba050b6c622accb809842
master date: 2016-01-20 13:50:10 +0100

9 years agox86/mm: PV superpage handling lacks sanity checks
Jan Beulich [Wed, 20 Jan 2016 13:08:27 +0000 (14:08 +0100)]
x86/mm: PV superpage handling lacks sanity checks

MMUEXT_{,UN}MARK_SUPER fail to check the input MFN for validity before
dereferencing pointers into the superpage frame table.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
get_superpage() has a similar issue.

This is CVE-2016-1570 / XSA-167.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 47abf29a9255b2e7b94e56d66b455d0a584b68b8
master date: 2016-01-20 13:49:23 +0100

9 years agoConfig.mk: update OVMF changeset
Wei Liu [Wed, 14 Oct 2015 11:41:13 +0000 (12:41 +0100)]
Config.mk: update OVMF changeset

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commits 04c5efb0a141fa53e805e396970419436e74ce67
 and f046e501bbca1c8a46853b2e1f1b587e228c73de)

Apropos of discussion in
 "OVMF related osstest failures on multiple branches"
 http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg00442.html

We believe the older ovmf.git does not work when built with the gcc in
Debian jessie.  We do not know where this bug lies but we are fixing
it by updating ovmf.

We have decided that we are not in a position to review the changes to
OVMF upstream, and ourselves decide what to cherry pick.  Instead we
will update the revision wholesale and use the xen.git stable
branches' push gate.

Conflicts:
Config.mk

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commits 6c3c6ff9ecaa5ee0be8b535d36fdcd12380564a1
 and 1d3cc6e62c4d2fc3dd9251d4921881425c9d27bd)

Conflicts:
Config.mk
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit ee576d71103e6795dc0add91db1b0d281eab1caf)

Conflicts:
Config.mk
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoblktap: Fix two 'maybe uninitialized' variables
Dario Faggioli [Fri, 20 Jun 2014 14:09:00 +0000 (16:09 +0200)]
blktap: Fix two 'maybe uninitialized' variables

[ Cross-ported to blktap1 from 345e44a85d71a
  "blktap2: Fix two 'maybe uninitialized' variables" -iwj;
  Remainder of commit message is from blktap2's version. ]

for which gcc 4.9.0 complains about, like this:

block-qcow.c: In function `get_cluster_offset':
block-qcow.c:431:3: error: `tmp_ptr' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
   memcpy(tmp_ptr, l1_ptr, 4096);
   ^
block-qcow.c:606:7: error: `tmp_ptr2' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
   if (write(s->fd, tmp_ptr2, 4096) != 4096) {
       ^
cc1: all warnings being treated as errors
/home/dario/Sources/xen/xen/xen.git/tools/blktap2/drivers/../../../tools/Rules.mk:89:
 recipe for target 'block-qcow.o' failed
make[5]: *** [block-qcow.o] Error 1

The proper behavior is to return upon allocation failure.
About what to return, 0 seems the best option, looking
at both the function and the call sites.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Backport-requested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 845e8c1653242bbd9b9de5a081182db0f3f39054)

9 years agoQEMU_TAG update
Ian Jackson [Mon, 4 Jan 2016 15:36:32 +0000 (15:36 +0000)]
QEMU_TAG update

9 years agolibxl: Fix building libxlu_cfg_y.y with bison 3.0
Ed Swierk [Tue, 6 Jan 2015 15:21:07 +0000 (15:21 +0000)]
libxl: Fix building libxlu_cfg_y.y with bison 3.0

- Use %lex-param instead of obsolete YYLEX_PARAM to override lex scanner
  parameter
- Change deprecated %name-prefix= to %name-prefix

Tested against bison 2.4.1 and 3.0.2.

This is expected to sometimes (depending on timestamps and whether the
bison input files are edited) break building on systems with ancient
versions of bison.  Bison 2.4.1 is known to work and was released in
December 2008.

Also, consquentially, regenerate bison output files with bison
1:2.5.dfsg-2.1 from Debian wheezy.

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 7ba4cdfadd4f3c45d65ffe50e621759f458fedc0)

[ I have checked that rebuilding the bison and flex input produces no
  further changes. -iwj ]

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: Rerun bison and flex
Ian Jackson [Mon, 4 Jan 2016 14:45:12 +0000 (14:45 +0000)]
libxl: Rerun bison and flex

We are going to want to cherry pick a change to the bison input, which
will involve rerunning bison.

So firstly, update the bison and flex output to that from current
Debian wheezy (i386; 1:2.5.dfsg-2.1 and 2.5.35-10.1 respectively).

There should be no functional change since there is no change to the
source file, but we will inherit bugfixes and behavioural changes from
the new version of bison.  So this is more a matter of hope than
knowledge.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Fri, 18 Dec 2015 14:58:40 +0000 (14:58 +0000)]
QEMU_TAG update

9 years agox86/HVM: avoid reading ioreq state more than once
Jan Beulich [Thu, 17 Dec 2015 13:31:28 +0000 (14:31 +0100)]
x86/HVM: avoid reading ioreq state more than once

Otherwise, especially when the compiler chooses to translate the
switch() to a jump table, unpredictable behavior (and in the jump table
case arbitrary code execution) can result.

This is XSA-166.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: b452430a4cdfc801fa4bc391aed7522365e1deb6
master date: 2015-12-17 14:22:46 +0100

9 years agox86: don't leak ST(n)/XMMn values to domains first using them
Jan Beulich [Thu, 17 Dec 2015 13:30:57 +0000 (14:30 +0100)]
x86: don't leak ST(n)/XMMn values to domains first using them

FNINIT doesn't alter these registers, and hence using it is
insufficient to initialize a guest's initial state.

This is CVE-2015-8555 / XSA-165.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 81818b3f277544535974204f8d840da86fa8a44f
master date: 2015-12-17 14:22:13 +0100

9 years agox86/time: fix domain type check in tsc_set_info()
Haozhong Zhang [Tue, 15 Dec 2015 14:56:02 +0000 (15:56 +0100)]
x86/time: fix domain type check in tsc_set_info()

Replace is_hvm_domain() in tsc_set_info() by has_hvm_container_domain()
to keep consistent with other domain type checks in tsc_set_info().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 3c80d6f3c61eb0f8072f70b0a9a8c8c7adf17572
master date: 2015-12-08 09:46:30 +0100

9 years agoevtchn: don't reuse ports that are still "busy"
David Vrabel [Tue, 15 Dec 2015 14:55:23 +0000 (15:55 +0100)]
evtchn: don't reuse ports that are still "busy"

When using the FIFO ABI a guest may close an event channel that is
still LINKED.  If this port is reused, subsequent events may be lost
because they may become pending on the wrong queue.

This could be fixed by requiring guests to only close event channels
that are not linked.  This is difficult since: a) irq cleanup in the
guest may be done in a context that cannot wait for the event to be
unlinked; b) the guest may attempt to rebind a PIRQ whose previous
close is still pending; and c) existing guests already have the
problematic behaviour.

Instead, simply check a port is not "busy" (i.e., it's not linked)
before reusing it.

Guests should still drain any queues for VCPUs that are being
offlined, or the port will become unusable until the VCPU is onlined
and starts processing events again.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 78e24c269b0a4a8b864ece725e6d4209ed95dfa7
master date: 2015-12-02 15:21:46 +0100

9 years agox86/boot: check for not allowed sections before linking
Daniel Kiper [Tue, 15 Dec 2015 14:55:00 +0000 (15:55 +0100)]
x86/boot: check for not allowed sections before linking

Currently check for not allowed sections is performed just after
compilation. However, if compilation succeeds and check fails then
second build will create xen.gz/xen.efi without any visible error.
This happens because %.o: %.c recipe created object file during first
run and make do not execute this recipe during second run. So, look
for not allowed sections before linking. This way check will be
executed every time.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d380b3559734739ae009cd3c0e9aabb5602863e2
master date: 2015-11-25 17:24:36 +0100

9 years agox86/vPMU: document as unsupported
Jan Beulich [Tue, 15 Dec 2015 14:53:49 +0000 (15:53 +0100)]
x86/vPMU: document as unsupported

This is XSA-163.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: c03480cf5c4e96fb4afb2237ad0a3cac7162564a
master date: 2015-11-24 18:32:20 +0100

9 years agoVMX: fix/adjust trap injection
Jan Beulich [Tue, 15 Dec 2015 14:53:12 +0000 (15:53 +0100)]
VMX: fix/adjust trap injection

In the course of investigating the 4.1.6 backport issue of the XSA-156
patch I realized that #DB injection has always been broken, but with it
now getting always intercepted the problem has got worse: Documentation
clearly states that neither DR7.GD nor DebugCtl.LBR get cleared before
the intercept, so this is something we need to do before reflecting the
intercepted exception.

While adjusting this (and also with 4.1.6's strange use of
X86_EVENTTYPE_SW_EXCEPTION for #DB in mind) I further realized that
the special casing of individual vectors shouldn't be done for
software interrupts (resulting from INT $nn).

And then some code movement: Setting of CR2 for #PF can be done in the
same switch() statement (no need for a separate if()), and reading of
intr_info is better done close the the consumption of the variable
(allowing the compiler to generate better code / use fewer registers
for variables).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 81a28f14009f4d8577a81b28dd06f6828112054b
master date: 2015-11-24 12:30:31 +0100

9 years agosched: fix locking for insert_vcpu() in credit1 and RTDS
Dario Faggioli [Tue, 15 Dec 2015 14:52:51 +0000 (15:52 +0100)]
sched: fix locking for insert_vcpu() in credit1 and RTDS

The insert_vcpu() hook is handled with inconsistent locking.
In fact, schedule_cpu_switch() calls the hook with runqueue
lock held, while sched_move_domain() relies on the hook
implementations to take the lock themselves (and, since that
is not done in Credit1 and RTDS, such operation is not safe
in those cases).

This is fixed as follows:
 - take the lock in the hook implementations, in specific
   schedulers' code;
 - avoid calling insert_vcpu(), for the idle vCPU, in
   schedule_cpu_switch(). In fact, idle vCPUs are set to run
   immediately, and the various schedulers won't insert them
   in their runqueues anyway, even when explicitly asked to.

While there, still in schedule_cpu_switch(), locking with
_irq() is enough (there's no need to do *_irqsave()).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: ae2f41e3d7e7798537b7ea6dbb9a0c6aeb1179e3
master date: 2015-11-24 14:48:34 +0100

9 years agox86/HVM: don't inject #DB with error code
Jan Beulich [Tue, 15 Dec 2015 14:51:44 +0000 (15:51 +0100)]
x86/HVM: don't inject #DB with error code

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
master commit: 057e0e72d2a5d598087c5f167ec6a13203a3cf65
master date: 2015-11-12 16:59:18 +0100

9 years agox86/vmx: improvements to vmentry failure handling
Andrew Cooper [Tue, 15 Dec 2015 14:50:47 +0000 (15:50 +0100)]
x86/vmx: improvements to vmentry failure handling

Combine the almost identical vm_launch_fail() and vm_resume_fail() into a
single vmx_vmentry_failure().

Re-save all GPRs so that domain_crash() prints the real register values,
rather than the stack frame of the vmx_vmentry_failure() call.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: bbcf0b218f64b1e3e2b66b0fbb623f51d9014e81
master date: 2015-11-03 18:14:02 +0100

9 years agox86/PoD: Make p2m_pod_empty_cache() restartable
Andrew Cooper [Tue, 15 Dec 2015 14:50:21 +0000 (15:50 +0100)]
x86/PoD: Make p2m_pod_empty_cache() restartable

This avoids a long running operation when destroying a domain with a
large PoD cache.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 59a5061723ba47c0028cf48487e5de551c42a378
master date: 2015-11-02 15:33:38 +0100

9 years agox86/NUMA: fix SRAT table processor entry parsing and consumption
Jan Beulich [Tue, 15 Dec 2015 14:48:44 +0000 (15:48 +0100)]
x86/NUMA: fix SRAT table processor entry parsing and consumption

- don't overrun apicid_to_node[] (possible in the x2APIC case)
- don't limit number of processor related SRAT entries we can consume
- make acpi_numa_{processor,x2apic}_affinity_init() as similar to one
  another as possible
- print APIC IDs in hex (to ease matching with other log messages), at
  once making legacy and x2APIC ones distinguishable (by width)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 83281fc9b31396e94c0bfb6550b75c165037a0ad
master date: 2015-10-14 12:46:27 +0200

9 years agox86: hide MWAITX from PV domains
Jan Beulich [Tue, 15 Dec 2015 14:47:59 +0000 (15:47 +0100)]
x86: hide MWAITX from PV domains

Since MWAIT is hidden too. (Linux starting with 4.3 is making use of
that feature, and is checking for it without looking at the MWAIT one.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 941cd44324db7eddc46cba4596fa13d505066ccf
master date: 2015-10-13 17:17:52 +0200

9 years agoVT-d: don't suppress invalidation address write when it is zero
Jan Beulich [Tue, 15 Dec 2015 14:47:26 +0000 (15:47 +0100)]
VT-d: don't suppress invalidation address write when it is zero

GFN zero is a valid address, and hence may need invalidation done for
it just like for any other GFN.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 710942e57fb42ff8f344ca82f6b678f67e38ae63
master date: 2015-10-12 15:58:35 +0200

9 years agomemory: fix XSA-158 fix
Jan Beulich [Wed, 9 Dec 2015 12:56:40 +0000 (13:56 +0100)]
memory: fix XSA-158 fix

For one the uses of domu_max_order and ptdom_max_order were swapped.

And then gcc warns about an unused result of a __must_check function
in the control part of a conditional expression when both other
expressions can be determined by the compiler to produce the same value
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68039), which happens
when HAS_PASSTHROUGH is undefined (i.e. for ARM on 4.4 and older).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: ff841cead287d7913901ba5c4e7628a6958b5bea
master date: 2015-12-09 13:53:13 +0100

9 years agoQEMU_TAG update
Ian Jackson [Wed, 9 Dec 2015 11:50:24 +0000 (11:50 +0000)]
QEMU_TAG update

9 years agolibxl: Fix bootloader-related virtual memory leak on pv build failure
Ian Jackson [Wed, 18 Nov 2015 15:34:54 +0000 (15:34 +0000)]
libxl: Fix bootloader-related virtual memory leak on pv build failure

The bootloader may call libxl__file_reference_map(), which mmap's the
pv_kernel and pv_ramdisk into process memory.  This was only unmapped,
however, on the success path of libxl__build_pv().  If there were a
failure anywhere between libxl_bootloader.c:parse_bootloader_result()
and the end of libxl__build_pv(), the calls to
libxl__file_reference_unmap() would be skipped, leaking the mapped
virtual memory.

Ideally this would be fixed by adding the unmap calls to the
destruction path for libxl__domain_build_state.  Unfortunately the
lifetime of the libxl__domain_build_state is opaque, and it doesn't
have a proper destruction path.  But, the only thing in it that isn't
from the gc are these bootloader references, and they are only ever
set for one libxl__domain_build_state, the one which is
libxl__domain_create_state.build_state.

So we can clean up in the exit path from libxl__domain_create_*, which
always comes through domcreate_complete.

Remove the now-redundant unmaps in libxl__build_pv's success path.

This is XSA-160.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agomemory: fix XENMEM_exchange error handling
Jan Beulich [Tue, 8 Dec 2015 13:11:08 +0000 (14:11 +0100)]
memory: fix XENMEM_exchange error handling

assign_pages() can fail due to the domain getting killed in parallel,
which should not result in a hypervisor crash.

Reported-by: Julien Grall <julien.grall@citrix.com>
Also delete a redundant put_gfn() - all relevant paths leading to the
"fail" label already do this (and there are also paths where it was
plain wrong). All of the put_gfn()-s got introduced by 51032ca058
("Modify naming of queries into the p2m"), including the otherwise
unneeded initializer for k (with even a kind of misleading comment -
the compiler warning could actually have served as a hint that the use
is wrong).

This is CVE-2015-8339 + CVE-2015-8340 / XSA-159.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: eedecb3cf0b2ce1ffc2eb08f3c73f88d42c382c9
master date: 2015-12-08 14:01:43 +0100

9 years agomemory: split and tighten maximum order permitted in memops
Jan Beulich [Tue, 8 Dec 2015 13:10:34 +0000 (14:10 +0100)]
memory: split and tighten maximum order permitted in memops

Introduce and enforce separate limits for ordinary DomU, DomU with
pass-through device(s), control domain, and hardware domain.

The DomU defaults were determined based on what so far was allowed by
multipage_allocation_permitted().

The x86 hwdom default was chosen based on linux-2.6.18-xen.hg c/s
1102:82782f1361a9 indicating 2Mb is not enough, plus some slack.

The ARM hwdom default was chosen to allow 2Mb (order-9) mappings, plus
a little bit of slack.

This is CVE-2015-8338 / XSA-158.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 4a578b316eb98975374d88f28904acf13dbcfac2
master date: 2015-12-08 14:00:33 +0100

9 years agoConfig: Switch to unified qemu trees.
Ian Campbell [Thu, 10 Sep 2015 13:31:34 +0000 (14:31 +0100)]
Config: Switch to unified qemu trees.

Upstream qemu is now in qemu-xen.git and the trad fork is in
qemu-xen-traditional.git.

QEMU_UPSTREAM_REVISION is currently a tag and
QEMU_TRADITIONAL_REVISION is a specific revision, so no changes are
required to those.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Conflicts:
Config.mk
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 78833c04250416f1870c458309d3ac0e5cf915fd)

Conflicts:
Config.mk
(cherry picked from commit 0cabed0a0b322fbef30f3519465538df4ab1d9f2)

Conflicts:
Config.mk

9 years agox86/HVM: always intercept #AC and #DB
Jan Beulich [Tue, 10 Nov 2015 11:22:25 +0000 (12:22 +0100)]
x86/HVM: always intercept #AC and #DB

Both being benign exceptions, and both being possible to get triggered
by exception delivery, this is required to prevent a guest from locking
up a CPU (resulting from no other VM exits occurring once getting into
such a loop).

The specific scenarios:

1) #AC may be raised during exception delivery if the handler is set to
be a ring-3 one by a 32-bit guest, and the stack is misaligned.

This is CVE-2015-5307 / XSA-156.

Reported-by: Benjamin Serebrin <serebrin@google.com>
2) #DB may be raised during exception delivery when a breakpoint got
placed on a data structure involved in delivering the exception. This
can result in an endless loop when a 64-bit guest uses a non-zero IST
for the vector 1 IDT entry, but even without use of IST the time it
takes until a contributory fault would get raised (results depending
on the handler) may be quite long.

This is CVE-2015-8104 / XSA-156.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bd2239d9fa975a1ee5bcd27c218ae042cd0a57bc
master date: 2015-11-10 12:03:08 +0100

9 years agolibxl: adjust PoD target by memory fudge, too
Ian Jackson [Wed, 21 Oct 2015 15:18:30 +0000 (16:18 +0100)]
libxl: adjust PoD target by memory fudge, too

PoD guests need to balloon at least as far as required by PoD, or risk
crashing.  Currently they don't necessarily know what the right value
is, because our memory accounting is (at the very least) confusing.

Apply the memory limit fudge factor to the in-hypervisor PoD memory
target, too.  This will increase the size of the guest's PoD cache by
the fudge factor LIBXL_MAXMEM_CONSTANT (currently 1Mby).  This ensures
that even with a slightly-off balloon driver, the guest will be
stable even under memory pressure.

There are two call sites of xc_domain_set_pod_target that need fixing:

The one in libxl_set_memory_target is straightforward.

The one in xc_hvm_build_x86.c:setup_guest is more awkward.  Simply
setting the PoD target differently does not work because the various
amounts of memory during domain construction no longer match up.
Instead, we adjust the guest memory target in xenstore (but only for
PoD guests).

This introduces a 1Mby discrepancy between the balloon target of a PoD
guest at boot, and the target set by an apparently-equivalent `xl
mem-set' (or similar) later.  This approach is low-risk for a security
fix but we need to fix this up properly in xen.git#staging and
probably also in stable trees.

This is XSA-153.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 56fb5fd62320eb40a7517206f9706aa9188d6f7b)
(cherry picked from commit 423d2cd814e8460d5ea8bd191a770f3c48b3947c)

Conflicts:
tools/libxl/libxl_dom.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agox86: rate-limit logging in do_xen{oprof,pmu}_op()
Jan Beulich [Thu, 29 Oct 2015 13:15:22 +0000 (14:15 +0100)]
x86: rate-limit logging in do_xen{oprof,pmu}_op()

Some of the sub-ops are acessible to all guests, and hence should be
rate-limited. In the xenoprof case, just like for XSA-146, include them
only in debug builds. Since the vPMU code is rather new, allow them to
be always present, but downgrade them to (rate limited) guest messages.

This is CVE-2015-7971 / XSA-152.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 95e7415843b94c346e5ba8682665f508f220e04b
master date: 2015-10-29 13:37:19 +0100

9 years agoxenoprof: free domain's vcpu array
Jan Beulich [Thu, 29 Oct 2015 13:14:55 +0000 (14:14 +0100)]
xenoprof: free domain's vcpu array

This was overlooked in fb442e2171 ("x86_64: allow more vCPU-s per
guest").

This is CVE-2015-7969 / XSA-151.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 6e97c4b37386c2d09e09e9b5d5d232e37728b960
master date: 2015-10-29 13:36:52 +0100

9 years agox86/PoD: Eager sweep for zeroed pages
Andrew Cooper [Thu, 29 Oct 2015 13:14:30 +0000 (14:14 +0100)]
x86/PoD: Eager sweep for zeroed pages

Based on the contents of a guests physical address space,
p2m_pod_emergency_sweep() could degrade into a linear memcmp() from 0 to
max_gfn, which runs non-preemptibly.

As p2m_pod_emergency_sweep() runs behind the scenes in a number of contexts,
making it preemptible is not feasible.

Instead, a different approach is taken.  Recently-populated pages are eagerly
checked for reclaimation, which amortises the p2m_pod_emergency_sweep()
operation across each p2m_pod_demand_populate() operation.

Note that in the case that a 2M superpage can't be reclaimed as a superpage,
it is shattered if 4K pages of zeros can be reclaimed.  This is unfortunate
but matches the previous behaviour, and is required to avoid regressions
(domain crash from PoD exhaustion) with VMs configured close to the limit.

This is CVE-2015-7970 / XSA-150.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 101ce53266866144e724ed593173bc4098b300b9
master date: 2015-10-29 13:36:25 +0100

9 years agofree domain's vcpu array
Jan Beulich [Thu, 29 Oct 2015 13:13:30 +0000 (14:13 +0100)]
free domain's vcpu array

This was overlooked in fb442e2171 ("x86_64: allow more vCPU-s per
guest").

This is CVE-2015-7969 / XSA-149.

Reported-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen: common: Use unbounded array for symbols_offset.
Ian Campbell [Thu, 29 Oct 2015 13:09:45 +0000 (14:09 +0100)]
xen: common: Use unbounded array for symbols_offset.

Using a singleton array causes gcc5 to report:
symbols.c: In function 'symbols_lookup':
symbols.c:128:359: error: array subscript is above array bounds [-Werror=array-bounds]
symbols.c:136:176: error: array subscript is above array bounds [-Werror=array-bounds]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 3f82ea62826d4eb06002d8dba475bafcc454b845
master date: 2015-03-20 12:02:03 +0000

9 years agox86: guard against undue super page PTE creation
Jan Beulich [Thu, 29 Oct 2015 13:07:01 +0000 (14:07 +0100)]
x86: guard against undue super page PTE creation

When optional super page support got added (commit bd1cd81d64 "x86: PV
support for hugepages"), two adjustments were missed: mod_l2_entry()
needs to consider the PSE and RW bits when deciding whether to use the
fast path, and the PSE bit must not be removed from L2_DISALLOW_MASK
unconditionally.

This is CVE-2015-7835 / XSA-148.

Reported-by: "栾尚聪(好风)" <shangcong.lsc@alibaba-inc.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: fe360c90ea13f309ef78810f1a2b92f2ae3b30b8
master date: 2015-10-29 13:35:07 +0100

9 years agoarm: handle races between relinquish_memory and free_domheap_pages
Ian Campbell [Thu, 29 Oct 2015 13:06:35 +0000 (14:06 +0100)]
arm: handle races between relinquish_memory and free_domheap_pages

Primarily this means XENMEM_decrease_reservation from a toolstack
domain.

Unlike x86 we have no requirement right now to queue such pages onto
a separate list, if we hit this race then the other code has already
fully accepted responsibility for freeing this page and therefore
there is no more for relinquish_memory to do.

This is CVE-2015-7814 / XSA-147.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 1ef01396fdff88b1c3331a09ca5c69619b90f4ea
master date: 2015-10-29 13:34:17 +0100

9 years agoarm: rate-limit logging from unimplemented PHYSDEVOP and HVMOP.
Ian Campbell [Thu, 29 Oct 2015 13:05:25 +0000 (14:05 +0100)]
arm: rate-limit logging from unimplemented PHYSDEVOP and HVMOP.

These are guest accessible and should therefore be rate-limited.
Moreover, include them only in debug builds.

This is CVE-2015-7813 / XSA-146.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 1c0e59ff15764e7b0c59282365974f5b8924ce83
master date: 2015-10-29 13:33:38 +0100

9 years agoarm: Support hypercall_create_continuation for multicall
Julien Grall [Thu, 29 Oct 2015 13:05:07 +0000 (14:05 +0100)]
arm: Support hypercall_create_continuation for multicall

Multicall for ARM has been supported since commit f0dbdc6 "xen: arm: fully
implement multicall interface.". Although, if an hypercall in multicall
requires preemption, it will crash the host:

(XEN) Xen BUG at domain.c:347
(XEN) ----[ Xen-4.7-unstable  arm64  debug=y  Tainted:    C ]----
[...]
(XEN) Xen call trace:
(XEN)    [<00000000002420cc>] hypercall_create_continuation+0x64/0x380 (PC)
(XEN)    [<0000000000217274>] do_memory_op+0x1b00/0x2334 (LR)
(XEN)    [<0000000000250d2c>] do_multicall_call+0x114/0x124
(XEN)    [<0000000000217ff0>] do_multicall+0x17c/0x23c
(XEN)    [<000000000024f97c>] do_trap_hypercall+0x90/0x12c
(XEN)    [<0000000000251ca8>] do_trap_hypervisor+0xd2c/0x1ba4
(XEN)    [<00000000002582cc>] guest_sync+0x88/0xb8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 5:
(XEN) Xen BUG at domain.c:347
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)

Looking to the code, the support of multicall looks valid to me, as we only
need to fill call.args[...]. So drop the BUG();

This is CVE-2015-7812 / XSA-145.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 29bcf64ce8bc0b1b7aacd00c8668f255c4f0686c
master date: 2015-10-29 13:31:10 +0100

9 years agodocs: xl.cfg: permissive option is not PV only.
Ian Campbell [Tue, 6 Oct 2015 08:42:35 +0000 (09:42 +0100)]
docs: xl.cfg: permissive option is not PV only.

Since XSA-131 qemu-xen has defaulted to non-permissive mode and the
option was extended to cover that case in 015a373351e5 "tools: libxl:
allow permissive qemu-upstream pci passthrough".

Since I was rewrapping to adjust the text anyway I've split the safety
warning into a separate paragraph to make it more obvious.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric <epretorious@yahoo.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 7f25baba1c0942e50757f4ecb233202dbbc057b9)

9 years agotools: libxl: allow permissive qemu-upstream pci passthrough.
Ian Campbell [Mon, 1 Jun 2015 10:32:23 +0000 (11:32 +0100)]
tools: libxl: allow permissive qemu-upstream pci passthrough.

Since XSA-131 qemu-xen now restricts access to PCI cfg by default. In
order to allow local configuration of the existing libxl_device_pci
"permissive" flag needs to be plumbed through via the new QMP property
added by the XSA-131 patches.

Versions of QEMU prior to XSA-131 did not support this permissive
property, so we only pass it if it is true. Older versions only
supported permissive mode.

qemu-xen-traditional already supports the permissive mode setting via
xenstore.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 015a373351e5c3553f848324ac0f07a9d92883fa)

9 years agotools/console: xenconsole tolerate tty errors
Ian Jackson [Thu, 27 Feb 2014 17:46:49 +0000 (17:46 +0000)]
tools/console: xenconsole tolerate tty errors

Since 28d386fc4341 (XSA-57), libxl writes an empty value for the
console tty node, with read-only permission for the guest, when
setting up pv console "frontends".  (The actual tty value is later set
by xenconsoled.)   Writing an empty node is not strictly necessary to
stop the frontend from writing dangerous values here, but it is a good
belt-and-braces approach.

Unfortunately this confuses xenconsole.  It reads the empty value, and
tries to open it as the tty.  xenconsole then exits.

Fix this by having xenconsole treat an empty value the same way as no
value at all.

Also, make the error opening the tty be nonfatal: we just print a
warning, but do not exit.  I think this is helpful in theoretical
situations where xenconsole is racing with libxl and/or xenconsoled.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
v2: Combine two conditions and move the free
(cherry picked from commit 39ba2989b10b6a1852e253b204eb010f8e7026f1)

9 years agox86/p2m-pt: correct condition of IOMMU mapping updates
Jan Beulich [Thu, 8 Oct 2015 10:46:39 +0000 (12:46 +0200)]
x86/p2m-pt: correct condition of IOMMU mapping updates

Whether the MFN changes does not depend on the new entry being valid
(but solely on the old one).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 660fd65d5578a95ec5eac522128bba23325179eb
master date: 2015-10-02 13:40:36 +0200

9 years agocredit1: fix tickling when it happens from a remote pCPU
Dario Faggioli [Thu, 8 Oct 2015 10:46:06 +0000 (12:46 +0200)]
credit1: fix tickling when it happens from a remote pCPU

especially if that is also from a different cpupool than the
processor of the vCPU that triggered the tickling.

In fact, it is possible that we get as far as calling vcpu_unblock()-->
vcpu_wake()-->csched_vcpu_wake()-->__runq_tickle() for the vCPU 'vc',
but all while running on a pCPU that is different from 'vc->processor'.

For instance, this can happen when an HVM domain runs in a cpupool,
with a different scheduler than the default one, and issues IOREQs
to Dom0, running in Pool-0 with the default scheduler.
In fact, right in this case, the following crash can be observed:

(XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    7
(XEN) RIP:    e008:[<ffff82d0801230de>] __runq_tickle+0x18f/0x430
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor (d1v0)
(XEN) rax: 0000000000000001   rbx: ffff8303184fee00   rcx: 0000000000000000
(XEN) ... ... ...
(XEN) Xen stack trace from rsp=ffff83031fa57a08:
(XEN)    ffff82d0801fe664 ffff82d08033c820 0000000100000002 0000000a00000001
(XEN)    0000000000006831 0000000000000000 0000000000000000 0000000000000000
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN)    [<ffff82d0801230de>] __runq_tickle+0x18f/0x430
(XEN)    [<ffff82d08012348a>] csched_vcpu_wake+0x10b/0x110
(XEN)    [<ffff82d08012b421>] vcpu_wake+0x20a/0x3ce
(XEN)    [<ffff82d08012b91c>] vcpu_unblock+0x4b/0x4e
(XEN)    [<ffff82d080167bd0>] vcpu_kick+0x17/0x61
(XEN)    [<ffff82d080167c46>] vcpu_mark_events_pending+0x2c/0x2f
(XEN)    [<ffff82d08010ac35>] evtchn_fifo_set_pending+0x381/0x3f6
(XEN)    [<ffff82d08010a0f6>] notify_via_xen_event_channel+0xc9/0xd6
(XEN)    [<ffff82d0801c29ed>] hvm_send_ioreq+0x3e9/0x441
(XEN)    [<ffff82d0801bba7d>] hvmemul_do_io+0x23f/0x2d2
(XEN)    [<ffff82d0801bbb43>] hvmemul_do_io_buffer+0x33/0x64
(XEN)    [<ffff82d0801bc92b>] hvmemul_do_pio_buffer+0x35/0x37
(XEN)    [<ffff82d0801cc49f>] handle_pio+0x58/0x14c
(XEN)    [<ffff82d0801eabcb>] vmx_vmexit_handler+0x16b3/0x1bea
(XEN)    [<ffff82d0801efd21>] vmx_asm_vmexit_handler+0x41/0xc0

In this case, pCPU 7 is not in Pool-0, while the (Dom0's) vCPU being
woken is. pCPU's 7 pool has a different scheduler than credit, but it
is, however, right from pCPU 7 that we are waking the Dom0's vCPUs.
Therefore, the current code tries to access csched_balance_mask for
pCPU 7, but that is not defined, and hence the Oops.

(Note that, in case the two pools run the same scheduler we see no
Oops, but things are still conceptually wrong.)

Cure things by making the csched_balance_mask macro accept a
parameter for fetching a specific pCPU's mask (instead than always
using smp_processor_id()).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: ea5637968a09a81a64fa5fd73ce49b4ea9789e12
master date: 2015-09-30 14:44:22 +0200

9 years agox86/p2m-pt: ignore pt-share flag for shadow mode guests
Jan Beulich [Thu, 8 Oct 2015 10:45:34 +0000 (12:45 +0200)]
x86/p2m-pt: ignore pt-share flag for shadow mode guests

There is no page table sharing in shadow mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: c0a85795d864dd64c116af661bf676d66ddfd5fc
master date: 2015-09-29 13:56:03 +0200

9 years agox86/p2m-pt: delay freeing of intermediate page tables
Jan Beulich [Thu, 8 Oct 2015 10:45:08 +0000 (12:45 +0200)]
x86/p2m-pt: delay freeing of intermediate page tables

Old intermediate page tables must be freed only after IOMMU side
updates/flushes have got carried out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 960265fbd878cdc9841473b755e4ccc9eb1942d2
master date: 2015-09-29 13:55:34 +0200

9 years agovt-d: fix IM bit mask and unmask of Fault Event Control Register
Quan Xu [Thu, 8 Oct 2015 10:44:27 +0000 (12:44 +0200)]
vt-d: fix IM bit mask and unmask of Fault Event Control Register

Bit 0:29 in Fault Event Control Register are 'Reserved and Preserved',
software cannot write 0 to it unconditionally. Software must preserve
the value read for writes.

Signed-off-by: Quan Xu <quan.xu@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
vt-d: fix IM bit unmask of Fault Event Control Register in init_vtd_hw()

Bit 0:29 in Fault Event Control Register are 'Reserved and Preserved',
software cannot write 0 to it unconditionally. Software must preserve
the value read for writes.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Quan Xu <quan.xu@intel.com>
master commit: 86f3ff9fc4cc3cb69b96c1de74bcc51f738fe2b9
master date: 2015-09-25 09:08:22 +0200
master commit: 26b300bd727ef00a8f60329212a83c3b027a48f7
master date: 2015-09-25 18:03:04 +0200

9 years agoxen/xsm: Make p->policyvers be a local variable (ver) to shut up GCC 5.1.1 warnings.
Konrad Rzeszutek Wilk [Thu, 8 Oct 2015 10:43:53 +0000 (12:43 +0200)]
xen/xsm: Make p->policyvers be a local variable (ver) to shut up GCC 5.1.1 warnings.

policydb.c: In function ‘user_read’:
policydb.c:1443:26: error: ‘buf[2]’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
         usrdatum->bounds = le32_to_cpu(buf[2]);
                          ^
cc1: all warnings being treated as errors

Which (as Andrew mentioned) is because GCC cannot assume
that 'p->policyvers' has the same value between checks.

We make it local, optimize the name to 'ver' and the warnings go away.
We also update another call site with this modification to
make it more inline with the rest of the functions.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 6a2f81459e1455d65a9a6f78dd2a0d0278619680
master date: 2015-09-22 12:09:03 -0400

9 years agox86/sysctl: don't clobber memory if NCAPINTS > ARRAY_SIZE(pi->hw_cap)
Andrew Cooper [Thu, 8 Oct 2015 10:43:18 +0000 (12:43 +0200)]
x86/sysctl: don't clobber memory if NCAPINTS > ARRAY_SIZE(pi->hw_cap)

There is no current problem, as both NCAPINTS and pi->hw_cap are 8 entries,
but the limit should be calculated appropriately so as to avoid hypervisor
stack corruption if the two do get out of sync.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: c373b912e74659f0e0898ae93e89513694cfd94e
master date: 2015-09-16 11:22:00 +0200

9 years agox86/MSI: fail if no hardware support
Jan Beulich [Thu, 8 Oct 2015 10:42:50 +0000 (12:42 +0200)]
x86/MSI: fail if no hardware support

This is to guard against buggy callers (luckily Dom0 only) invoking
the respective hypercall for a device not being MSI-capable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: c7d5d5d8ea1ecbd6ef8b47dace4dec825f0f6e48
master date: 2015-09-16 11:20:27 +0200

9 years agox86/p2m: fix mismatched unlock
Jan Beulich [Thu, 8 Oct 2015 10:42:16 +0000 (12:42 +0200)]
x86/p2m: fix mismatched unlock

Luckily, due to gfn_unlock() currently mapping to p2m_unlock(), this is
only a cosmetic issue right now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 1f180822ad3fe83fe293393ec175f14ded98f082
master date: 2015-09-14 13:39:19 +0200

9 years agox86/hvm: fix saved pmtimer and hpet values
Kouya Shimura [Thu, 8 Oct 2015 10:41:22 +0000 (12:41 +0200)]
x86/hvm: fix saved pmtimer and hpet values

The ACPI PM timer is sometimes broken on live migration.
Since vcpu->arch.hvm_vcpu.guest_time is always zero in other than
"delay for missed ticks mode". Even in "delay for missed ticks mode",
vcpu's guest_time field is not valid (i.e. zero) when
the state of vcpu is "blocked". (see pt_save_timer function)

The original author (Tim Deegan) of pmtimer_save() must have intended
that it saves the last scheduled time of the vcpu. Unfortunately it was
already implied this bug. FYI, there is no other timer mode than
"delay for missed ticks mode" then.

For consistency with HPET, pmtimer_save() should refer hvm_get_guest_time()
to update the counter as well as hpet_save() does.

Without this patch, the clock of windows server 2012R2 without HPET
might leap forward several minutes on live migration.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Retain use of ->arch.hvm_vcpu.guest_time when non-zero. Do the inverse
adjustment for vHPET.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kouya Shimura <kouya@jp.fujitsu.com>
master commit: 244582a01dcb49fa30083725964a066937cc94f2
master date: 2015-09-11 16:24:56 +0200

9 years agolibxl: handle read-only drives with qemu-xen
Stefano Stabellini [Tue, 22 Sep 2015 15:56:29 +0000 (16:56 +0100)]
libxl: handle read-only drives with qemu-xen

The current libxl code doesn't deal with read-only drives at all.

Upstream QEMU and qemu-xen only support read-only cdrom drives: make
sure to specify "readonly=on" for cdrom drives and return error in case
the user requested a non-cdrom read-only drive.

This is XSA-142, discovered by Lin Liu
(https://bugzilla.redhat.com/show_bug.cgi?id=1257893).

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Backport to Xen 4.5 and earlier, apropos of report and review from
Michael Young.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: Increase device model startup timeout to 1min.
Anthony PERARD [Tue, 7 Jul 2015 15:09:13 +0000 (16:09 +0100)]
libxl: Increase device model startup timeout to 1min.

On a busy host, QEMU may take more than 10s to load and start.

This is likely due to a bug in Linux where the I/O subsystem sometime
produce high latency under load and result in QEMU taking a long time to
load every single dynamic libraries.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 9acfbe14d7261b03e3b3f4dc3c850ba2a7093e1f)

Conflicts:
tools/libxl/libxl_internal.h
(cherry picked from commit bbbd29a25d090f1180d14210358c6d7ccdebef85)

9 years agoxl: correct handling of extra_config in main_cpupoolcreate
Wei Liu [Tue, 14 Jul 2015 16:41:10 +0000 (17:41 +0100)]
xl: correct handling of extra_config in main_cpupoolcreate

Don't dereference extra_config if it's NULL. Don't leak extra_config in
the end.

Also fixed a typo in error string while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 705c9e12426cba82804cb578fc70785281655d94)
(cherry picked from commit ffb4e6387f489b6b5ce287f51db43cb37ebae064)

9 years agoQEMU_TAG update
Ian Jackson [Fri, 11 Sep 2015 10:51:06 +0000 (11:51 +0100)]
QEMU_TAG update

9 years agox86/NUMA: make init_node_heap() respect Xen heap limit
Jan Beulich [Thu, 10 Sep 2015 13:58:41 +0000 (15:58 +0200)]
x86/NUMA: make init_node_heap() respect Xen heap limit

On NUMA systems, where we try to use node local memory for the basic
control structures of the buddy allocator, this special case needs to
take into consideration a possible address width limit placed on the
Xen heap. In turn this (but also other, more abstract considerations)
requires that xenheap_max_mfn() not be called more than once (at most
we might permit it to be called a second time with a larger value than
was passed the first time), and be called only before calling
end_boot_allocator().

While inspecting all the involved code, a couple of off-by-one issues
were found (and are being corrected here at once):
- arch_init_memory() cleared one too many page table slots
- the highmem_start based invocation of xenheap_max_mfn() passed too
  big a value
- xenheap_max_mfn() calculated the wrong bit count in edge cases

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm64: do not (incorrectly) limit size of xenheap

The commit 88e3ed61642bb393458acc7a9bd2f96edc337190 "x86/NUMA: make
init_node_heap() respect Xen heap limit" breaks boot on the arm64 board
X-Gene.

The xenheap bits variable is used to know the last RAM MFN always mapped
in Xen virtual memory. If the value is 0, it means that all the memory is
always mapped in Xen virtual memory.

On X-gene the RAM bank resides above 128GB and last xenheap MFN is
0x4400000. With the new way to calculate the number of bits, xenheap_bits
will be equal to 38 bits. This will result to hide all the RAM and the
impossibility to allocate xenheap memory.

Given that aarch64 have always all the memory mapped in Xen virtual
memory, it's not necessary to call xenheap_max_mfn which set the number
of bits.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 88e3ed61642bb393458acc7a9bd2f96edc337190
master date: 2015-09-01 14:02:57 +0200
master commit: 0a7167d9b20cdc48e6ea320fbbb920b3267c9757
master date: 2015-09-04 14:58:07 +0100

9 years agomm: populate_physmap: validate correctly the gfn for direct mapped domain
Julien Grall [Thu, 10 Sep 2015 13:56:03 +0000 (15:56 +0200)]
mm: populate_physmap: validate correctly the gfn for direct mapped domain

Direct mapped domain has already the memory allocated 1:1, so we are
directly using the gfn as mfn to map the RAM in the guest.

While we are validating that the page associated to the first mfn belongs to
the domain, the subsequent MFN are not validated when the extent_order
is > 0.

This may result to map memory region (MMIO, RAM) which doesn't belong to the
domain.

Although, only DOM0 on ARM is using a direct memory mapped. So it
doesn't affect any guest (at least on the upstream version) or even x86.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 9503ab0e9c6a41a1ee7a70c8ea9313d08ebaa8c5
master date: 2015-08-13 14:41:09 +0200

9 years agox86/mm: Make {hap, shadow}_teardown() preemptible
Anshul Makkar [Thu, 10 Sep 2015 13:55:23 +0000 (15:55 +0200)]
x86/mm: Make {hap, shadow}_teardown() preemptible

A domain with sufficient shadow allocation can cause a watchdog timeout
during domain destruction.  Expand the existing -EAGAIN logic in
paging_teardown() to allow {hap/sh}_set_allocation() to become
restartable during the DOMCTL_destroydomain hypercall.

Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: 0174da5b79752e2d5d6ca0faed89536e8f3d91c7
master date: 2015-08-06 10:04:43 +0100

9 years agox86/NUMA: don't account hotplug regions
Jan Beulich [Thu, 10 Sep 2015 13:54:13 +0000 (15:54 +0200)]
x86/NUMA: don't account hotplug regions

... except in cases where they really matter: node_memblk_range[] now
is the only place all regions get stored. nodes[] and NODE_DATA() track
present memory only. This improves the reporting when nodes have
disjoint "normal" and hotplug regions, with the hotplug region sitting
above the highest populated page. In such cases a node's spanned-pages
value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
covered all the way up to top of populated memory, giving quite
different a picture from what an otherwise identically configured
system without and hotplug regions would report. Note, however, that
the actual hotplug case (as well as cases of nodes with multiple
disjoint present regions) is still not being handled such that the
reported values would represent how much memory a node really has (but
that can be considered intentional).

Reported-by: Jim Fehlig <jfehlig@suse.com>
This at once makes nodes_cover_memory() no longer consider E820_RAM
regions covered by SRAT hotplug regions.

Also reject self-overlaps with mismatching hotplug flags.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
master commit: c011f470e6e79208f5baa071b4d072b78c88e2ba
master date: 2015-08-31 13:52:24 +0200

9 years agox86/NUMA: fix setup_node()
Jan Beulich [Thu, 10 Sep 2015 13:53:37 +0000 (15:53 +0200)]
x86/NUMA: fix setup_node()

The function referenced an __initdata object (nodes_found). Since this
being a node mask was more complicated than needed, the variable gets
replaced by a simple counter. Check at once that the count of nodes
doesn't go beyond MAX_NUMNODES.

Also consolidate three printk()s related to the function's use into just
one.

Finally (quite the opposite of the above issue) __init-annotate
nodes_cover_memory().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 8f945d36d9bddd5b589ba23c7322b30d623dd084
master date: 2015-08-31 13:51:52 +0200

9 years agoIOMMU: skip domains without page tables when dumping
Jan Beulich [Thu, 10 Sep 2015 13:52:28 +0000 (15:52 +0200)]
IOMMU: skip domains without page tables when dumping

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 5f335544cf5b716b0af51223e33373c4a7d65e8c
master date: 2015-08-27 17:40:38 +0200

9 years agox86/IO-APIC: don't create pIRQ mapping from masked RTE
Jan Beulich [Thu, 10 Sep 2015 13:51:56 +0000 (15:51 +0200)]
x86/IO-APIC: don't create pIRQ mapping from masked RTE

While moving our XenoLinux patches to 4.2-rc I noticed bogus "already
mapped" messages resulting from Linux (legitimately) writing RTEs with
only the mask bit set. Clearly we shouldn't even attempt to create a
pIRQ <-> IRQ mapping from such RTEs.

In the course of this I also found that the respective message isn't
really useful without also printing the pre-existing mapping. And I
noticed that map_domain_pirq() allowed IRQ0 to get through, despite us
never allowing a domain to control that interrupt.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 669d4b85c433674ab3b52ef707af0d3a551c941f
master date: 2015-08-25 16:18:31 +0200

9 years agox86, amd_ucode: skip microcode updates for final levels
Aravind Gopalakrishnan [Thu, 10 Sep 2015 13:51:02 +0000 (15:51 +0200)]
x86, amd_ucode: skip microcode updates for final levels

Some of older[Fam10h] systems require that certain number of
applied microcode patch levels should not be overwritten by
the microcode loader. Otherwise, system hangs are known to occur.

The 'final_levels' of patch ids have been obtained empirically.
Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
for details of the issue.

The short version is that people have predominantly noticed
system hang issues when trying to update microcode levels
beyond the patch IDs below.
[0x01000098, 0x0100009f, 0x010000af]

From internal discussions, we gathered that OS/hypervisor
cannot reliably perform microcode updates beyond these levels
due to hardware issues. Therefore, we need to abort microcode
update process if we hit any of these levels.

In this patch, we check for those microcode versions and abort
if the current core has one of those final patch levels applied
by the BIOS

A linux version of the patch has already made it into tip-
http://marc.info/?l=linux-kernel&m=143703405627170

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 22c5675877c8209adcfdb6bceddb561320374529
master date: 2015-08-25 16:17:13 +0200

9 years agox86/gdt: Drop write-only, xalloc()'d array from set_gdt()
Andrew Cooper [Thu, 10 Sep 2015 13:49:59 +0000 (15:49 +0200)]
x86/gdt: Drop write-only, xalloc()'d array from set_gdt()

It is not used, and can cause a spurious failure of the set_gdt() hypercall in
low memory situations.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: a7bd9b1661304500cd18b7d216d616ecf053ebdb
master date: 2015-08-05 10:32:45 +0100

9 years agoConfig.mk: update in-tree OVMF changeset
Wei Liu [Wed, 9 Sep 2015 14:14:16 +0000 (16:14 +0200)]
Config.mk: update in-tree OVMF changeset

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 28e5d9a9ad8e7e1a50503ec97ce9d20cd451a5d1
master date: 2015-06-30 16:16:47 +0100

9 years agoxen/arm: mm: Do not dump the p2m when mapping a foreign gfn
Julien Grall [Thu, 13 Aug 2015 11:03:43 +0000 (12:03 +0100)]
xen/arm: mm: Do not dump the p2m when mapping a foreign gfn

The physmap operation XENMAPSPACE_gfmn_foreign is dumping the p2m when
an error occured by calling dump_p2m_lookup. But this function is not
using ratelimited printk.

Any domain able to map foreign gfmn would be able to flood the Xen
console.

The information wasn't not useful so drop it.

This is XSA-141.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit afc13fe5e21d18c09e44f8ae6f7f4484e9f1de7f)

9 years agoupdate Xen version to 4.4.4-pre
Jan Beulich [Wed, 2 Sep 2015 12:47:25 +0000 (14:47 +0200)]
update Xen version to 4.4.4-pre

9 years agoupdate Xen version to 4.4.3 RELEASE-4.4.3
Jan Beulich [Thu, 20 Aug 2015 14:19:38 +0000 (16:19 +0200)]
update Xen version to 4.4.3

9 years agolibxl: poll: Avoid fd deregistration race POLLNVAL crash
Ian Jackson [Thu, 9 Jul 2015 16:28:48 +0000 (17:28 +0100)]
libxl: poll: Avoid fd deregistration race POLLNVAL crash

It can happen that an fd is deregistered, and closed, and then a new
fd opened, and reregistered, all while another thread is in poll().

If this happens poll might report POLLNVAL, but the event loop would
think that the fd was supposed to have been valid, and then fail an
assertion:
  libxl_event.c:1183: afterpoll_check_fd: Assertion `poller->fds_changed || !(fds[slot].revents & 0x020)' failed.

We can't simply ignore POLLNVAL because if we have bugs which cause
messed-up fds, it is a serious problem which we really need to detect.

Instead, add extra tracking to spot when this possibility arises, and
abort on POLLNVAL if we are sure that it is unexpected.

Reported-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 681ce1681622a46d111cfdc4fc07e4cb565ae131)

(cherry picked from commit 7f7642f778b78e8e204fc082ce03072bb26887c7)

And, adjusted for semantic conflict over CTX vs ctx.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: poll: Use poller_get and poller_put for poller_app
Ian Jackson [Thu, 9 Jul 2015 16:05:07 +0000 (17:05 +0100)]
libxl: poll: Use poller_get and poller_put for poller_app

This makes the code more regular.  We are going to want to do some
more work in poller_get and poller_put, which work also wants to be
done for poller_app.

Two very minor functional changes:

 * We call malloc an extra time since poller_app is now a pointer

 * ERROR_FAIL on poller_get failing for poller_app is generated in
   libxl_ctx_init rather than passed through by libxl_poller_init
   from libxl__pipe_nonblock.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit aae37652067eafd0f2b85050306772df0cb71f08)

(cherry picked from commit 9f6f513eecbdc76ce30b5f2e6c52e02076bac30b)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: poll: Make libxl__poller_get have only one success return path
Ian Jackson [Thu, 9 Jul 2015 15:52:02 +0000 (16:52 +0100)]
libxl: poll: Make libxl__poller_get have only one success return path

In preparation for doing some more work on successful exit.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 6fc946bc5520ebdbba5cbae4d49e53895df8b393)

(cherry picked from commit 8c409135e69c7321cb6d82b8cae0868a81d05ddc)
Conflicts:
tools/libxl/libxl_event.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools: libxl: Handle failure to create qemu dm logfile
Ian Campbell [Mon, 13 Jul 2015 12:31:23 +0000 (13:31 +0100)]
tools: libxl: Handle failure to create qemu dm logfile

If libxl_create_logfile fails for some reason then
libxl__create_qemu_logfile previously just carried on and dereferenced
the uninitialised logfile.

Check for the error from libxl_create_logfile, which has already
logged for us.

This was reported as Debian bug #784880.

Reported-by: Russell Coker <russell@coker.com.au>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: 784880@bugs.debian.org
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit e9d3a859977913704605e0fd87887451b12d4722)
(cherry picked from commit 9a4c62515c2cac2db23f88957579792b3bdb81b3)