]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agox86/pv: Remove unsafe bits from the mod_l?_entry() fastpath stable-4.3 staging-4.3
Andrew Cooper [Mon, 11 Jul 2016 13:32:03 +0000 (14:32 +0100)]
x86/pv: Remove unsafe bits from the mod_l?_entry() fastpath

All changes in writeability and cacheability must go through full
re-validation.

Rework the logic as a whitelist, to make it clearer to follow.

This is XSA-182

Reported-by: Jérémie Boutoille <jboutoille@ext.quarkslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
8 years agoQEMU_UPSTREAM_REVISION update (XSA-180)
Ian Jackson [Tue, 14 Jun 2016 17:40:04 +0000 (18:40 +0100)]
QEMU_UPSTREAM_REVISION update (XSA-180)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
8 years agoQEMU_TAG update
Ian Jackson [Fri, 10 Jun 2016 12:11:20 +0000 (13:11 +0100)]
QEMU_TAG update

8 years agolibxl: set XEN_QEMU_CONSOLE_LIMIT for QEMU
Wei Liu [Thu, 26 May 2016 15:11:42 +0000 (16:11 +0100)]
libxl: set XEN_QEMU_CONSOLE_LIMIT for QEMU

XSA-180 provides a patch to QEMU to bodge QEMU logging issue. We
explicitly set the limit in libxl for 4.7.

Introduce a function for setting the environment variable and call it in
the right places.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit b0d409d9c4944ed29d29457fea4ad6b41d698eca)

(cherry picked from commit fe82a96a657f36b08ade60ec4f3a53e67a4ee314)
Conflicts:
tools/libxl/libxl_dm.c

This version of libxl does not pass a dm_envs to the
*build_device_model_args* functions.  Instead, call
libxl__set_qemu_env_for_xsa_180 in libxl__spawn_local_dm.

The other call ultimate site of *build_device_model_args* (ie of
libxl__build_device_model_args) is in libxl__spawn_stub_dm, where we
don't need to set the env var.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 44a703dfbe3cb6e6ec6d2699b631807137dfdc1e)

(cherry picked from commit 27f0143ef56e23a108704a1e7f0aff365ede88e4)
Conflicts:
tools/libxl/libxl_dm.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: Fix NULL pointer due to XSA-178 fix wrong XS nodename
Ian Jackson [Wed, 8 Jun 2016 14:42:19 +0000 (15:42 +0100)]
libxl: Fix NULL pointer due to XSA-178 fix wrong XS nodename

In "libxl: Do not trust backend for disk eject vdev" (c69871a2fb26 on
xen.git#staging) we changed libxl_evenable_disk_eject to read the
device vdev out of xenstore from the /libxl path, rather than the
backend path, and to read it during setup rather than on each event.

However, the patch has a mistake:
    -        GCSPRINTF("%s/dev", backend), NULL);
    +        GCSPRINTF("%s/vdev", libxl_path), &configured_vdev);
                           ^
Spot the extra "v".  This causes configured_vdev always to be NULL.
configured_vdev is passed to [libxl__]strdup.

In Xen 4.6 and later libxl__strdup is used and tolerates NULL.
evg->vdev is set to NULL.  This propagates to the `vdev' field in the
generated event.  This may or may not cause further trouble, depending
on the calling application.  In our osstest test cases it does not
cause any trouble, so the bug goes undetected.

In Xen 4.5 and earlier, the strdup does not tolerate NULL, and libxl
crashes immediately.  This has been detected by osstest as a
regression in Xen 4.5.

IMO this patch should be applied immediately to
  xen.git#staging-4.5 (to check that it fixes the osstest regression)
  xen.git#staging     (to check that it does not break master

Subject to passes, it should then be propagated to all supported
stable trees and also be mentioned in an update to XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
CC: security@xenproject.org
CC: Jan Beulich <jbeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 62b4d4769ca39fd5263da20d786a7b9a80a22d9a)
(cherry picked from commit 8b7a356409023f60f80e9f4b00bba16ad56cd77b)
(cherry picked from commit 6d272988fb3bdcc2578c5ee6d8a408628d1f0c0e)
(cherry picked from commit ec5925c72cfd7544b86fc6273937c9cc7e21e580)

8 years agolibxl: Document ~/serial/ correctly
Ian Jackson [Wed, 4 May 2016 14:17:45 +0000 (15:17 +0100)]
libxl: Document ~/serial/ correctly

xenstore-paths.markdown talked about ~/device/serial/, but that's not
used.

(It is very wrong for this value, which contains a driver domain
filesystem path, to be in the guest's area of xenstore.  However, it
is only ever created by libxl and ready by xenconsoled.  When it is
created, it inherits the read-only permissions of /local/domain/DOMID.
So there is no security bug.)

This is a followup to XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Cleanup: Have libxl__alloc_vdev use /libxl
Ian Jackson [Tue, 3 May 2016 14:25:19 +0000 (15:25 +0100)]
libxl: Cleanup: Have libxl__alloc_vdev use /libxl

When allocating a vdev for a new disk, look in /libxl/device, rather
than the frontends directory in xenstore.

This is more in line with the other parts of libxl, which ought not to
trust frontends.  In this case, though, there is no security bug prior
to this patch because the frontend is the toolstack domain itself.

If libxl__alloc_vdev were ever changed to take a frontend domain
argument, this patch will fix a latent security bug.

This is a followup to XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for nic in list
Ian Jackson [Wed, 4 May 2016 15:23:57 +0000 (16:23 +0100)]
libxl: Do not trust backend for nic in list

libxl_device_nic_list should use the /libxl path to search for
devices, and for obtaining the device information.

The "type" parameter was always "vif".  Abolish it.  (In any case,
paths in /libxl/device are named after the frontend type which is
constant, not the backend type which might in future vary.)

Abolish a redundant store to pnic->backend_domid.  Before this commit,
that store was not needed because libxl_device_nic_init (called by
libxl__device_nic_from_xenstore) would zero it.  Now it overwrites the
correct backend domid with zero; so remove it.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for nic in devid_to_device
Ian Jackson [Wed, 4 May 2016 15:20:05 +0000 (16:20 +0100)]
libxl: Do not trust backend for nic in devid_to_device

libxl_devid_to_device_nic should read the information it needs from
the /libxl/device path, not the backend.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend in nic getinfo
Ian Jackson [Tue, 3 May 2016 15:35:21 +0000 (16:35 +0100)]
libxl: Do not trust backend in nic getinfo

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Have READ_LIBXLDEV use libxl_path rather than be_path
Ian Jackson [Tue, 3 May 2016 14:40:18 +0000 (15:40 +0100)]
libxl: Have READ_LIBXLDEV use libxl_path rather than be_path

Fix the just-introduced bug in this macro: now it reads the
trustworthy libxl_path.  Change the variable name in the two functions
(nic and channel) which use it.

Shuffling the bump in the carpet along, we now introduce three new
bugs: the three call sites pass a backend path where a frontend path
is expected.

No functional change.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Rename libxl__device_nic_from_xs_be to _from_xenstore
Ian Jackson [Wed, 4 May 2016 15:18:36 +0000 (16:18 +0100)]
libxl: Rename libxl__device_nic_from_xs_be to _from_xenstore

We are going to change these functions to expect, and be passed, a
/libxl path.  So it is wrong that they are called _from_xs_be.

Neither function reads anything which isn't found in both places, so
we can and will change the call sites later.

The only remaining function in libxl called *_from_xs_be relates to
PCI devices, for which the backend domain is hardcoded to 0 throughout
the libxl_pci.c.

No functional change.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for cdrom insert
Ian Jackson [Fri, 29 Apr 2016 18:13:17 +0000 (19:13 +0100)]
libxl: Do not trust backend for cdrom insert

Use the /libxl path where appropriate.  Rename `path' variable to
`be_path' to make sure we caught all the occurrences.

Specifically, when checking that the device still exists, check the
`frontend' value in /libxl, rather than anything in the backend
directory.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for disk in getinfo
Ian Jackson [Fri, 29 Apr 2016 18:10:45 +0000 (19:10 +0100)]
libxl: Do not trust backend for disk in getinfo

Do not read the frontend path out of the backend.  We have it in our
hand.  Likewise the guest (frontend) domid was one of our parameters (!)

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for disk; fix driver domain disks list
Ian Jackson [Fri, 29 Apr 2016 17:29:45 +0000 (18:29 +0100)]
libxl: Do not trust backend for disk; fix driver domain disks list

Rework libxl__device_disk_from_xs_be (which takes a backend path) into
to libxl__device_disk_from_xenstore (which takes a libxl path).

libxl__device_disk_from_xenstore now finds the backend path itself,
although it doesn't use it any more for most of its functions.  We
rename the variable from be_path to backend_path to make sure we
didn't miss any cases.

All the data collection is now done by reading from the copy in
/libxl.

libxl_device_disk_list and its helper libxl__append_disk_list (which
used to be libxl__append_disk_list_of_type) need extensive rework,
because they now need to specify the /libxl path rather than the
backend path.

To do that they enumerate disks by looking in the appropriate area in
/libxl.  Previously they scanned various of the backend directories in
dom0 (which was broken for driver domains).  It is no longer necessary
to enumerate the various disk backends, because they all use the same
paths in /devices.  libxl__device_disk_from_xenstore will parse the
type out of the backend path, for itself.  (Indeed, it did so before -
the now-gone type parameter to libxl__append_disk_list_of_type wasn't
used other than to construct the directory to list.)

Finally, remove a redundant store to pdisk->backend_domid in
libxl__append_disk_list[_of_type].  Even before this commit, that
store was not needed because libxl_device_disk_init (called by
libxl__device_disk_from_xenstore) would zero it.  Now it overwrites
the correct backend domid with zero; so remove it.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Also fix up COLO reads, following rebase

8 years agolibxl: Do not trust backend for disk eject vdev
Ian Jackson [Fri, 29 Apr 2016 15:23:35 +0000 (16:23 +0100)]
libxl: Do not trust backend for disk eject vdev

For disk eject, use configured vdev from /libxl, not backend.

The backend directory is writeable by driver domains.  This means that
a malicious driver domain could cause libxl to see a wrong vdev,
confusing the user or the toolstack.

Use the vdev from the /libxl space, rather than the backend.

For convenience, we read the vdev from the /libxl space into the evg
during setup and copy it on each event, rather than reading it afresh
each time (which would in any case involve generating or saving a copy
of the relevant /libxl path).

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: cdrom eject and insert: write to /libxl
Ian Jackson [Fri, 29 Apr 2016 18:15:13 +0000 (19:15 +0100)]
libxl: cdrom eject and insert: write to /libxl

Copy the new type and params values to /libxl, so that the information
in /libxl is kept up to date.

This is needed so that we can return this trustworthy information,
rather than trusting the backend-writeable parts of xenstore.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for vtpm in getinfo (uuid)
Ian Jackson [Fri, 29 Apr 2016 15:57:14 +0000 (16:57 +0100)]
libxl: Do not trust backend for vtpm in getinfo (uuid)

Use uuid from /libxl, rather than from backend.  I think the backend
is not supposed to change the uuid, since it seems to be set by libxl
during setup.

If in fact the backend is supposed to be able to change the uuid, this
patch needs to be dropped and replaced by a patch which makes the vtpm
uuid lookup tolerate bad or missing data.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust backend for vtpm in getinfo (except uuid)
Ian Jackson [Fri, 29 Apr 2016 16:18:44 +0000 (17:18 +0100)]
libxl: Do not trust backend for vtpm in getinfo (except uuid)

* Do not check the backend for existence.  We have already read the
  /libxl path so know that the vtpm exists (or is supposed to); if the
  backend doesn't exist then that must be the backend's doing.
* Get the frontend path from the /libxl directory.
* The frontend domid is the guest domid, and does not need to be read
  from xenstore (!)

We still attempt to read the uuid from the backend.  This will be
fixed in the next patch.

This is part of XSA-178.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Make copy of every xs backend in /libxl in _generic_add
Ian Jackson [Fri, 29 Apr 2016 15:19:28 +0000 (16:19 +0100)]
libxl: Make copy of every xs backend in /libxl in _generic_add

We want to stop libxl trustingly reading information from the backend
directory (since this is, of course, writeable by the backend, which
might be a semi-trusted driver domain).

In principle it is wrong in current libxl for anything to try to
divine virtual device configuration from xenstore: the JSON domain
config ought to supply that, and xenstore should only tell us which
devices actually exist.

However:

Firstly, there are several existing places where configuration
information is retrieved from xenstore rather than JSON.  We do not
want to reen gineer this in a security patch.

Secondly, we want to make a security patch which can be backported to
versions of libxl without the JSON configuration machinery.

So we take the expedient approach of keeping a copy of the
configuration somewhere we trust, namely /libxl.  This is obviously
fairly low-risk, although it does write significantly more keys in
xenstore.

In this patch we make this change in libxl__device_generic_add.  This
is responsible for actually writing the vast majority of device
information to xenstore.  There are a few loose ends which will be
dealt with in a moment.

Likewise, changes to readers to use the new location will appear in
further patches.

This is part of XSA-178.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for nic in getinfo
Ian Jackson [Tue, 3 May 2016 15:31:07 +0000 (16:31 +0100)]
libxl: Do not trust frontend for nic in getinfo

libxl_device_nic_getinfo needs to examine devices without trusting
frontend-controlled data.  So:

* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
  reading it from the frontend.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for nic in libxl_devid_to_device_nic
Ian Jackson [Tue, 3 May 2016 14:52:53 +0000 (15:52 +0100)]
libxl: Do not trust frontend for nic in libxl_devid_to_device_nic

Find the backend by reading the pointer in /libxl rather than in the
guest's frontend area.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for vtpm in getinfo
Ian Jackson [Tue, 3 May 2016 15:00:20 +0000 (16:00 +0100)]
libxl: Do not trust frontend for vtpm in getinfo

libxl_device_vtpm_getinfo needs to examine devices without trusting
frontend-controlled data.  So:

* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
  reading it from the frontend.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for vtpm list
Ian Jackson [Tue, 3 May 2016 14:58:32 +0000 (15:58 +0100)]
libxl: Do not trust frontend for vtpm list

libxl_device_vtpm_list needs to enumerate and identify devices without
trusting frontend-controlled data.  So

* Use the /libxl path to enumerate vtpms.
* Use the /libxl path to find the corresponding backends.
* Parse the backend path to find the backend domid.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for disk in getinfo
Ian Jackson [Fri, 29 Apr 2016 18:21:51 +0000 (19:21 +0100)]
libxl: Do not trust frontend for disk in getinfo

* Rename the frontend variable to `fe_path' to check we caught them all
* Read the backend path from /libxl, rather than from the frontend
* Parse the backend domid from the backend path, rather than reading it
  from the frontend (and add the appropriate error path and initialisation)

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend for disk eject event
Ian Jackson [Wed, 27 Apr 2016 15:08:49 +0000 (16:08 +0100)]
libxl: Do not trust frontend for disk eject event

Use the /libxl path for interpreting disk eject watch events: do not
read the backend path out of the frontend.  Instead, use the version
in /libxl.  That avoids us relying on the guest-modifiable
$frontend/backend pointer.

To implement this we store the path
  /libxl/$guest/device/vbd/$devid/backend
in the evgen structure.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend in libxl__device_nextid
Ian Jackson [Wed, 4 May 2016 14:30:32 +0000 (15:30 +0100)]
libxl: Do not trust frontend in libxl__device_nextid

When selecting the devid for a new device, we should look in
/libxl/device for existing devices, not in the frontend area.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Do not trust frontend in libxl__devices_destroy
Ian Jackson [Tue, 3 May 2016 17:39:36 +0000 (18:39 +0100)]
libxl: Do not trust frontend in libxl__devices_destroy

We need to enumerate the devices we have provided to a domain, without
trusting the guest-writeable (or, at least, guest-deletable) frontend
paths.

Instead, enumerate via, and read the backend path from, /libxl.

The console /libxl path is regular, so the special case for console 0
is not relevant any more: /libxl/GUEST/device/console/0 will be found,
and then libxl__device_destroy will DTRT to the right frontend path.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Provide libxl__backendpath_parse_domid
Ian Jackson [Wed, 27 Apr 2016 15:34:19 +0000 (16:34 +0100)]
libxl: Provide libxl__backendpath_parse_domid

Multiple places in libxl need to figure out the backend domid of a
device.  This can be discovered easily by looking at the backend path,
which always starts /local/domain/$backend_domid/.

There are no call sites yet.

This is part of XSA-175.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Record backend/frontend paths in /libxl/$DOMID
Ian Jackson [Mon, 16 May 2016 15:44:31 +0000 (16:44 +0100)]
libxl: Record backend/frontend paths in /libxl/$DOMID

This gives us a record of all the backends we have set up for a
domain, which is separate from the frontends in
  /local/domain/$DOMID/device.

In particular:

1. A guest has write permission for the frontend path:
  /local/domain/$DOMID/device/$KIND/$DEVID
which means that the guest can completely delete the frontend.
(They can't recreate it because they don't have write permission
on the containing directory.)

2. A guest has write permission for the backend path recorded in the
frontend, ie, it can write to
  /local/domain/$DOMID/device/$KIND/$DEVID/backend
which means that the guest can break the association between
frontend and backend.

So we can't rely on iterating over the frontends to find all the
backends, or examining a frontend to discover how a device is
configured.

So, have libxl__device_generic_add record the frontend and backend
paths in /libxl/$DOMID/device, and have libxl__device_destroy remove
them again.

Create the containing directory /libxl/GUEST/device in
libxl__domain_make.  The already existing xs_rm in devices_destroy_cb
will take care of removing it.

This is part of XSA-175.

Backport note: Backported over 7472ced, which fixes a bug in driver
domain teardown.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/mm: fully honor PS bits in guest page table walks
Jan Beulich [Tue, 17 May 2016 12:58:10 +0000 (14:58 +0200)]
x86/mm: fully honor PS bits in guest page table walks

In L4 entries it is currently unconditionally reserved (and hence
should, when set, always result in a reserved bit page fault), and is
reserved on hardware not supporting 1Gb pages (and hence should, when
set, similarly cause a reserved bit page fault on such hardware).

This is CVE-2016-4480 / XSA-176.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 46699c7393bd991234b5642763c5c24b6b39a6c4
master date: 2016-05-17 14:41:14 +0200

8 years agoUpdate QEMU_UPSTREAM_REVISION
Ian Jackson [Tue, 10 May 2016 18:21:15 +0000 (19:21 +0100)]
Update QEMU_UPSTREAM_REVISION

8 years agoQEMU_TAG update
Ian Jackson [Tue, 10 May 2016 18:17:33 +0000 (19:17 +0100)]
QEMU_TAG update

8 years agoQEMU_TAG update
Ian Jackson [Tue, 10 May 2016 18:12:15 +0000 (19:12 +0100)]
QEMU_TAG update

9 years agox86: limit GFNs to 32 bits for shadowed superpages.
Tim Deegan [Wed, 16 Mar 2016 17:07:18 +0000 (17:07 +0000)]
x86: limit GFNs to 32 bits for shadowed superpages.

Superpage shadows store the shadowed GFN in the backpointer field,
which for non-BIGMEM builds is 32 bits wide.  Shadowing a superpage
mapping of a guest-physical address above 2^44 would lead to the GFN
being truncated there, and a crash when we come to remove the shadow
from the hash table.

Track the valid width of a GFN for each guest, including reporting it
through CPUID, and enforce it in the shadow pagetables.  Set the
maximum witth to 32 for guests where this truncation could occur.

This is XSA-173.

Reported-by: Ling Liu <liuling-it@360.cn>
Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: fix information leak on AMD CPUs
Jan Beulich [Tue, 29 Mar 2016 13:24:29 +0000 (15:24 +0200)]
x86: fix information leak on AMD CPUs

The fix for XSA-52 was wrong, and so was the change synchronizing that
new behavior to the FXRSTOR logic: AMD's manuals explictly state that
writes to the ES bit are ignored, and it instead gets calculated from
the exception and mask bits (it gets set whenever there is an unmasked
exception, and cleared otherwise). Hence we need to follow that model
in our workaround.

This is CVE-2016-3158 / XSA-172.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 7bd9dc3adfbb014c55f0928ebb3b20950ca9c019
master date: 2016-03-29 14:24:26 +0200

9 years agolibvchan: Read prod/cons only once.
Konrad Rzeszutek Wilk [Thu, 10 Mar 2016 07:27:21 +0000 (08:27 +0100)]
libvchan: Read prod/cons only once.

We must ensure that the prod/cons are only read once and that
the compiler won't try to optimize the reads. That is split
the read of these in multiple instructions influencing later
branch code. As such insert barriers when fetching the cons
and prod index.

This is part of XSA155.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
master commit: 7d66a4ba695ab8d13b214fb816dd59e443ae1ec9
master date: 2015-12-18 09:50:02 -0500

9 years agotools: pygrub: if partition table is empty, try treating as a whole disk
Ian Campbell [Thu, 5 Nov 2015 14:46:12 +0000 (14:46 +0000)]
tools: pygrub: if partition table is empty, try treating as a whole disk

pygrub (in identify_disk_image()) detects a DOS style partition table
via the presence of the 0xaa55 signature at the end of the first
sector of the disk.

However this signature is also present in whole-disk configurations
when there is an MBR on the disk. Many filesystems (e.g. ext[234])
include leading padding in their on disk format specifically to enable
this.

So if we think we have a DOS partition table but do not find any
actual partition table entries we may as well try looking at it as a
whole disk image. Worst case is we probe and find there isn't anything
there.

This was reported by Sjors Gielen in Debian bug #745419. The fix was
inspired by a patch by Adi Kriegisch in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745419#27

Tested by genext2fs'ing my /boot into a new raw image (works) and
then:
   dd if=/usr/lib/grub/i386-pc/g2ldr.mbr of=img conv=notrunc bs=512 count=1

to add an MBR (with 0xaa55 signature) to it, which after this patch
also works.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: 745419-forwarded@bugs.debian.org
(cherry picked from commit fb31b1475f1bf179f033b8de3f0e173006fd77e9)
(cherry picked from commit 6c9b1bcce4fcc872edddd44f88390a67d5954069)
(cherry picked from commit 812406cf2b6731d07f0f840d799fcfa5917dbaf4)
(cherry picked from commit 83c5e46c5d6af7adc4bc46cf41e415e34846c719)

9 years agox86: fix unintended fallthrough case from XSA-154
Andrew Cooper [Thu, 18 Feb 2016 14:30:16 +0000 (15:30 +0100)]
x86: fix unintended fallthrough case from XSA-154

... and annotate the other deliberate one: Coverity objects otherwise.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
One of the two instances was actually a bug.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 8dd6d1c099865ee5f5916616a0ca79cd943c46f9
master date: 2016-02-18 15:10:07 +0100

9 years agox86/VMX: sanitize rIP before re-entering guest
Jan Beulich [Wed, 17 Feb 2016 15:55:01 +0000 (16:55 +0100)]
x86/VMX: sanitize rIP before re-entering guest

... to prevent guest user mode arranging for a guest crash (due to
failed VM entry). (On the AMD system I checked, hardware is doing
exactly the canonicalization being added here.)

Note that fixing this in an architecturally correct way would be quite
a bit more involved: Making the x86 instruction emulator check all
branch targets for validity, plus dealing with invalid rIP resulting
from update_guest_eip() or incoming directly during a VM exit. The only
way to get the latter right would be by not having hardware do the
injection.

Note further that there are a two early returns from
vmx_vmexit_handler(): One (through vmx_failed_vmentry()) leads to
domain_crash() anyway, and the other covers real mode only and can
neither occur with a non-canonical rIP nor result in an altered rIP,
so we don't need to force those paths through the checking logic.

This is CVE-2016-2271 / XSA-170.

Reported-by: 刘令 <liuling-it@360.cn>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: ffbbfda37782a2408953af1a3e00ada80bb141bc
master date: 2016-02-17 16:18:08 +0100

9 years agox86: enforce consistent cachability of MMIO mappings
Jan Beulich [Wed, 17 Feb 2016 15:54:30 +0000 (16:54 +0100)]
x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: c61a6f74f80eb36ed83a82f713db3143159b9009
master date: 2016-02-17 16:16:53 +0100

9 years agox86/vmx: Fix injection of #DB traps following XSA-156
Andrew Cooper [Wed, 20 Jan 2016 13:20:57 +0000 (14:20 +0100)]
x86/vmx: Fix injection of #DB traps following XSA-156

Most #DB exceptions are traps rather than faults, meaning that the instruction
pointer in the exception frame points after the instruction rather than at it.

However, VMX intercepts all have fault semantics, even when intercepting a
trap.  Re-injecting an intercepted trap as a fault causes an infinite loop in
the guest, by re-executing the same trapping instruction repeatedly.  This
breaks debugging inside the guest.

Introduce a helper which copies VM_EXIT_INTR_INTO to VM_ENTRY_INTR_INFO, and
use it to mirror the intercepted interrupt back to the guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 0747bc8b4d85f3fc0ee1e58418418fa0229e8ff8
master date: 2016-01-05 11:28:56 +0000

9 years agox86/VMX: prevent INVVPID failure due to non-canonical guest address
Jan Beulich [Wed, 20 Jan 2016 13:12:48 +0000 (14:12 +0100)]
x86/VMX: prevent INVVPID failure due to non-canonical guest address

While INVLPG (and on SVM INVLPGA) don't fault on non-canonical
addresses, INVVPID fails (in the "individual address" case) when passed
such an address.

Since such intercepted INVLPG are effectively no-ops anyway, don't fix
this in vmx_invlpg_intercept(), but instead have paging_invlpg() never
return true in such a case.

This is CVE-2016-1571 / XSA-168.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: bf05e88ed7342a91cceba050b6c622accb809842
master date: 2016-01-20 13:50:10 +0100

9 years agox86/mm: PV superpage handling lacks sanity checks
Jan Beulich [Wed, 20 Jan 2016 13:11:52 +0000 (14:11 +0100)]
x86/mm: PV superpage handling lacks sanity checks

MMUEXT_{,UN}MARK_SUPER fail to check the input MFN for validity before
dereferencing pointers into the superpage frame table.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
get_superpage() has a similar issue.

This is CVE-2016-1570 / XSA-167.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 47abf29a9255b2e7b94e56d66b455d0a584b68b8
master date: 2016-01-20 13:49:23 +0100

9 years agoConfig.mk: update OVMF changeset
Wei Liu [Wed, 14 Oct 2015 11:41:13 +0000 (12:41 +0100)]
Config.mk: update OVMF changeset

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commits 04c5efb0a141fa53e805e396970419436e74ce67
 and f046e501bbca1c8a46853b2e1f1b587e228c73de)

Apropos of discussion in
 "OVMF related osstest failures on multiple branches"
 http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg00442.html

We believe the older ovmf.git does not work when built with the gcc in
Debian jessie.  We do not know where this bug lies but we are fixing
it by updating ovmf.

We have decided that we are not in a position to review the changes to
OVMF upstream, and ourselves decide what to cherry pick.  Instead we
will update the revision wholesale and use the xen.git stable
branches' push gate.

Conflicts:
Config.mk

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commits 6c3c6ff9ecaa5ee0be8b535d36fdcd12380564a1
 and 1d3cc6e62c4d2fc3dd9251d4921881425c9d27bd)

Conflicts:
Config.mk
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit ee576d71103e6795dc0add91db1b0d281eab1caf)

Conflicts:
Config.mk
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 6d2c41d676339ecc6255b92876b6153d4f85f579)

In 4.3, old OVMF_UPSTREAM_REVISION was b0855f92 rather than cb9a7eba.
I have pulled in the additional changes, willy-nilly.

Conflicts:
Config.mk
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoblktap: Fix two 'maybe uninitialized' variables
Dario Faggioli [Fri, 20 Jun 2014 14:09:00 +0000 (16:09 +0200)]
blktap: Fix two 'maybe uninitialized' variables

[ Cross-ported to blktap1 from 345e44a85d71a
  "blktap2: Fix two 'maybe uninitialized' variables" -iwj;
  Remainder of commit message is from blktap2's version. ]

for which gcc 4.9.0 complains about, like this:

block-qcow.c: In function `get_cluster_offset':
block-qcow.c:431:3: error: `tmp_ptr' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
   memcpy(tmp_ptr, l1_ptr, 4096);
   ^
block-qcow.c:606:7: error: `tmp_ptr2' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
   if (write(s->fd, tmp_ptr2, 4096) != 4096) {
       ^
cc1: all warnings being treated as errors
/home/dario/Sources/xen/xen/xen.git/tools/blktap2/drivers/../../../tools/Rules.mk:89:
 recipe for target 'block-qcow.o' failed
make[5]: *** [block-qcow.o] Error 1

The proper behavior is to return upon allocation failure.
About what to return, 0 seems the best option, looking
at both the function and the call sites.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Backport-requested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 845e8c1653242bbd9b9de5a081182db0f3f39054)
(cherry picked from commit e003d429d8c63523df40663b4586c572ce796659)

9 years agoQEMU_TAG update
Ian Jackson [Mon, 4 Jan 2016 15:36:53 +0000 (15:36 +0000)]
QEMU_TAG update

9 years agoQEMU_TAG update
Ian Jackson [Fri, 18 Dec 2015 14:59:01 +0000 (14:59 +0000)]
QEMU_TAG update

9 years agox86/HVM: avoid reading ioreq state more than once
Jan Beulich [Thu, 17 Dec 2015 13:33:22 +0000 (14:33 +0100)]
x86/HVM: avoid reading ioreq state more than once

Otherwise, especially when the compiler chooses to translate the
switch() to a jump table, unpredictable behavior (and in the jump table
case arbitrary code execution) can result.

This is XSA-166.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: b452430a4cdfc801fa4bc391aed7522365e1deb6
master date: 2015-12-17 14:22:46 +0100

9 years agox86: don't leak ST(n)/XMMn values to domains first using them
Jan Beulich [Thu, 17 Dec 2015 13:32:56 +0000 (14:32 +0100)]
x86: don't leak ST(n)/XMMn values to domains first using them

FNINIT doesn't alter these registers, and hence using it is
insufficient to initialize a guest's initial state.

This is CVE-2015-8555 / XSA-165.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 81818b3f277544535974204f8d840da86fa8a44f
master date: 2015-12-17 14:22:13 +0100

9 years agomemory: fix XSA-158 fix
Jan Beulich [Wed, 9 Dec 2015 12:58:21 +0000 (13:58 +0100)]
memory: fix XSA-158 fix

For one the uses of domu_max_order and ptdom_max_order were swapped.

And then gcc warns about an unused result of a __must_check function
in the control part of a conditional expression when both other
expressions can be determined by the compiler to produce the same value
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68039), which happens
when HAS_PASSTHROUGH is undefined (i.e. for ARM on 4.4 and older).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: ff841cead287d7913901ba5c4e7628a6958b5bea
master date: 2015-12-09 13:53:13 +0100

9 years agoQEMU_TAG update
Ian Jackson [Wed, 9 Dec 2015 11:50:35 +0000 (11:50 +0000)]
QEMU_TAG update

9 years agolibxl: Fix bootloader-related virtual memory leak on pv build failure
Ian Jackson [Wed, 18 Nov 2015 15:34:54 +0000 (15:34 +0000)]
libxl: Fix bootloader-related virtual memory leak on pv build failure

The bootloader may call libxl__file_reference_map(), which mmap's the
pv_kernel and pv_ramdisk into process memory.  This was only unmapped,
however, on the success path of libxl__build_pv().  If there were a
failure anywhere between libxl_bootloader.c:parse_bootloader_result()
and the end of libxl__build_pv(), the calls to
libxl__file_reference_unmap() would be skipped, leaking the mapped
virtual memory.

Ideally this would be fixed by adding the unmap calls to the
destruction path for libxl__domain_build_state.  Unfortunately the
lifetime of the libxl__domain_build_state is opaque, and it doesn't
have a proper destruction path.  But, the only thing in it that isn't
from the gc are these bootloader references, and they are only ever
set for one libxl__domain_build_state, the one which is
libxl__domain_create_state.build_state.

So we can clean up in the exit path from libxl__domain_create_*, which
always comes through domcreate_complete.

Remove the now-redundant unmaps in libxl__build_pv's success path.

This is XSA-160.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 62dc4c1a96eb9b40ec23bdd1656ece913f540871)

9 years agomemory: fix XENMEM_exchange error handling
Jan Beulich [Tue, 8 Dec 2015 13:13:48 +0000 (14:13 +0100)]
memory: fix XENMEM_exchange error handling

assign_pages() can fail due to the domain getting killed in parallel,
which should not result in a hypervisor crash.

Reported-by: Julien Grall <julien.grall@citrix.com>
Also delete a redundant put_gfn() - all relevant paths leading to the
"fail" label already do this (and there are also paths where it was
plain wrong). All of the put_gfn()-s got introduced by 51032ca058
("Modify naming of queries into the p2m"), including the otherwise
unneeded initializer for k (with even a kind of misleading comment -
the compiler warning could actually have served as a hint that the use
is wrong).

This is CVE-2015-8339 + CVE-2015-8340 / XSA-159.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: eedecb3cf0b2ce1ffc2eb08f3c73f88d42c382c9
master date: 2015-12-08 14:01:43 +0100

9 years agomemory: split and tighten maximum order permitted in memops
Jan Beulich [Tue, 8 Dec 2015 13:12:50 +0000 (14:12 +0100)]
memory: split and tighten maximum order permitted in memops

Introduce and enforce separate limits for ordinary DomU, DomU with
pass-through device(s), control domain, and hardware domain.

The DomU defaults were determined based on what so far was allowed by
multipage_allocation_permitted().

The x86 hwdom default was chosen based on linux-2.6.18-xen.hg c/s
1102:82782f1361a9 indicating 2Mb is not enough, plus some slack.

The ARM hwdom default was chosen to allow 2Mb (order-9) mappings, plus
a little bit of slack.

This is CVE-2015-8338 / XSA-158.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 4a578b316eb98975374d88f28904acf13dbcfac2
master date: 2015-12-08 14:00:33 +0100

9 years agoConfig: Switch to unified qemu trees.
Ian Campbell [Thu, 10 Sep 2015 13:31:34 +0000 (14:31 +0100)]
Config: Switch to unified qemu trees.

Upstream qemu is now in qemu-xen.git and the trad fork is in
qemu-xen-traditional.git.

QEMU_UPSTREAM_REVISION is currently a tag and
QEMU_TRADITIONAL_REVISION is a specific revision, so no changes are
required to those.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Conflicts:
Config.mk
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 78833c04250416f1870c458309d3ac0e5cf915fd)

Conflicts:
Config.mk
(cherry picked from commit 0cabed0a0b322fbef30f3519465538df4ab1d9f2)

Conflicts:
Config.mk
(cherry picked from commit 602506b3efa8ddeae2976ec9f15d91366f823919)

Conflicts:
Config.mk

9 years agox86/HVM: always intercept #AC and #DB
Jan Beulich [Tue, 10 Nov 2015 11:23:43 +0000 (12:23 +0100)]
x86/HVM: always intercept #AC and #DB

Both being benign exceptions, and both being possible to get triggered
by exception delivery, this is required to prevent a guest from locking
up a CPU (resulting from no other VM exits occurring once getting into
such a loop).

The specific scenarios:

1) #AC may be raised during exception delivery if the handler is set to
be a ring-3 one by a 32-bit guest, and the stack is misaligned.

This is CVE-2015-5307 / XSA-156.

Reported-by: Benjamin Serebrin <serebrin@google.com>
2) #DB may be raised during exception delivery when a breakpoint got
placed on a data structure involved in delivering the exception. This
can result in an endless loop when a 64-bit guest uses a non-zero IST
for the vector 1 IDT entry, but even without use of IST the time it
takes until a contributory fault would get raised (results depending
on the handler) may be quite long.

This is CVE-2015-8104 / XSA-156.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bd2239d9fa975a1ee5bcd27c218ae042cd0a57bc
master date: 2015-11-10 12:03:08 +0100

9 years agolibxl: adjust PoD target by memory fudge, too
Ian Jackson [Wed, 21 Oct 2015 15:18:30 +0000 (16:18 +0100)]
libxl: adjust PoD target by memory fudge, too

PoD guests need to balloon at least as far as required by PoD, or risk
crashing.  Currently they don't necessarily know what the right value
is, because our memory accounting is (at the very least) confusing.

Apply the memory limit fudge factor to the in-hypervisor PoD memory
target, too.  This will increase the size of the guest's PoD cache by
the fudge factor LIBXL_MAXMEM_CONSTANT (currently 1Mby).  This ensures
that even with a slightly-off balloon driver, the guest will be
stable even under memory pressure.

There are two call sites of xc_domain_set_pod_target that need fixing:

The one in libxl_set_memory_target is straightforward.

The one in xc_hvm_build_x86.c:setup_guest is more awkward.  Simply
setting the PoD target differently does not work because the various
amounts of memory during domain construction no longer match up.
Instead, we adjust the guest memory target in xenstore (but only for
PoD guests).

This introduces a 1Mby discrepancy between the balloon target of a PoD
guest at boot, and the target set by an apparently-equivalent `xl
mem-set' (or similar) later.  This approach is low-risk for a security
fix but we need to fix this up properly in xen.git#staging and
probably also in stable trees.

This is XSA-153.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 56fb5fd62320eb40a7517206f9706aa9188d6f7b)
(cherry picked from commit 423d2cd814e8460d5ea8bd191a770f3c48b3947c)

Conflicts:
tools/libxl/libxl_dom.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 73b70e3c5d59e63126c890068ee0cbf8a2a3b640)

9 years agox86: rate-limit logging in do_xen{oprof,pmu}_op()
Jan Beulich [Thu, 29 Oct 2015 13:28:33 +0000 (14:28 +0100)]
x86: rate-limit logging in do_xen{oprof,pmu}_op()

Some of the sub-ops are acessible to all guests, and hence should be
rate-limited. In the xenoprof case, just like for XSA-146, include them
only in debug builds. Since the vPMU code is rather new, allow them to
be always present, but downgrade them to (rate limited) guest messages.

This is CVE-2015-7971 / XSA-152.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 95e7415843b94c346e5ba8682665f508f220e04b
master date: 2015-10-29 13:37:19 +0100

9 years agoxenoprof: free domain's vcpu array
Jan Beulich [Thu, 29 Oct 2015 13:28:06 +0000 (14:28 +0100)]
xenoprof: free domain's vcpu array

This was overlooked in fb442e2171 ("x86_64: allow more vCPU-s per
guest").

This is CVE-2015-7969 / XSA-151.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 6e97c4b37386c2d09e09e9b5d5d232e37728b960
master date: 2015-10-29 13:36:52 +0100

9 years agox86/PoD: Eager sweep for zeroed pages
Andrew Cooper [Thu, 29 Oct 2015 13:27:44 +0000 (14:27 +0100)]
x86/PoD: Eager sweep for zeroed pages

Based on the contents of a guests physical address space,
p2m_pod_emergency_sweep() could degrade into a linear memcmp() from 0 to
max_gfn, which runs non-preemptibly.

As p2m_pod_emergency_sweep() runs behind the scenes in a number of contexts,
making it preemptible is not feasible.

Instead, a different approach is taken.  Recently-populated pages are eagerly
checked for reclaimation, which amortises the p2m_pod_emergency_sweep()
operation across each p2m_pod_demand_populate() operation.

Note that in the case that a 2M superpage can't be reclaimed as a superpage,
it is shattered if 4K pages of zeros can be reclaimed.  This is unfortunate
but matches the previous behaviour, and is required to avoid regressions
(domain crash from PoD exhaustion) with VMs configured close to the limit.

This is CVE-2015-7970 / XSA-150.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 101ce53266866144e724ed593173bc4098b300b9
master date: 2015-10-29 13:36:25 +0100

9 years agofree domain's vcpu array
Jan Beulich [Thu, 29 Oct 2015 13:27:02 +0000 (14:27 +0100)]
free domain's vcpu array

This was overlooked in fb442e2171 ("x86_64: allow more vCPU-s per
guest").

This is CVE-2015-7969 / XSA-149.

Reported-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
master commit: d46896ebbb23f3a9fef2eb6066ae614fd1acfd96
master date: 2015-10-29 13:35:40 +0100

9 years agox86: guard against undue super page PTE creation
Jan Beulich [Thu, 29 Oct 2015 13:24:40 +0000 (14:24 +0100)]
x86: guard against undue super page PTE creation

When optional super page support got added (commit bd1cd81d64 "x86: PV
support for hugepages"), two adjustments were missed: mod_l2_entry()
needs to consider the PSE and RW bits when deciding whether to use the
fast path, and the PSE bit must not be removed from L2_DISALLOW_MASK
unconditionally.

This is CVE-2015-7835 / XSA-148.

Reported-by: "栾尚聪(好风)" <shangcong.lsc@alibaba-inc.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: fe360c90ea13f309ef78810f1a2b92f2ae3b30b8
master date: 2015-10-29 13:35:07 +0100

9 years agoarm: handle races between relinquish_memory and free_domheap_pages
Ian Campbell [Thu, 29 Oct 2015 13:24:17 +0000 (14:24 +0100)]
arm: handle races between relinquish_memory and free_domheap_pages

Primarily this means XENMEM_decrease_reservation from a toolstack
domain.

Unlike x86 we have no requirement right now to queue such pages onto
a separate list, if we hit this race then the other code has already
fully accepted responsibility for freeing this page and therefore
there is no more for relinquish_memory to do.

This is CVE-2015-7814 / XSA-147.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 1ef01396fdff88b1c3331a09ca5c69619b90f4ea
master date: 2015-10-29 13:34:17 +0100

9 years agoAMD Vi: fix HPET ID check
Jan Beulich [Thu, 29 Oct 2015 13:23:35 +0000 (14:23 +0100)]
AMD Vi: fix HPET ID check

Cherry picked from commit 2ca9fbd739 ("AMD IOMMU: allocate IRTE entries
instead of using a static mapping") mainly to fix build with gcc 5.x.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agoxen: common: Use unbounded array for symbols_offset.
Ian Campbell [Thu, 29 Oct 2015 13:10:31 +0000 (14:10 +0100)]
xen: common: Use unbounded array for symbols_offset.

Using a singleton array causes gcc5 to report:
symbols.c: In function 'symbols_lookup':
symbols.c:128:359: error: array subscript is above array bounds [-Werror=array-bounds]
symbols.c:136:176: error: array subscript is above array bounds [-Werror=array-bounds]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 3f82ea62826d4eb06002d8dba475bafcc454b845
master date: 2015-03-20 12:02:03 +0000

9 years agotools/console: Fix build after "xenconsole tolerate tty errors"
Ian Jackson [Wed, 21 Oct 2015 14:10:22 +0000 (15:10 +0100)]
tools/console: Fix build after "xenconsole tolerate tty errors"

2ddcdd96 "tools/console: xenconsole tolerate tty errors" contains a
compile error introduced during conflict resolution of the backport
from 4.4 to 4.3:

 client/main.c: In function 'get_pty_fd':
 client/main.c:124:10: error: passing argument 1 of 'warn' makes pointer from integer without a cast [-Werror]
 In file included from client/main.c:35:0:
 /usr/include/err.h:35:13: note: expected 'const char *' but argument is of type 'int'

Fix this.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
9 years agodocs: xl.cfg: permissive option is not PV only.
Ian Campbell [Tue, 6 Oct 2015 08:42:35 +0000 (09:42 +0100)]
docs: xl.cfg: permissive option is not PV only.

Since XSA-131 qemu-xen has defaulted to non-permissive mode and the
option was extended to cover that case in 015a373351e5 "tools: libxl:
allow permissive qemu-upstream pci passthrough".

Since I was rewrapping to adjust the text anyway I've split the safety
warning into a separate paragraph to make it more obvious.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric <epretorious@yahoo.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 7f25baba1c0942e50757f4ecb233202dbbc057b9)

9 years agotools: libxl: allow permissive qemu-upstream pci passthrough.
Ian Campbell [Mon, 1 Jun 2015 10:32:23 +0000 (11:32 +0100)]
tools: libxl: allow permissive qemu-upstream pci passthrough.

Since XSA-131 qemu-xen now restricts access to PCI cfg by default. In
order to allow local configuration of the existing libxl_device_pci
"permissive" flag needs to be plumbed through via the new QMP property
added by the XSA-131 patches.

Versions of QEMU prior to XSA-131 did not support this permissive
property, so we only pass it if it is true. Older versions only
supported permissive mode.

qemu-xen-traditional already supports the permissive mode setting via
xenstore.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 015a373351e5c3553f848324ac0f07a9d92883fa)

9 years agotools/console: xenconsole tolerate tty errors
Ian Jackson [Thu, 27 Feb 2014 17:46:49 +0000 (17:46 +0000)]
tools/console: xenconsole tolerate tty errors

Since 28d386fc4341 (XSA-57), libxl writes an empty value for the
console tty node, with read-only permission for the guest, when
setting up pv console "frontends".  (The actual tty value is later set
by xenconsoled.)   Writing an empty node is not strictly necessary to
stop the frontend from writing dangerous values here, but it is a good
belt-and-braces approach.

Unfortunately this confuses xenconsole.  It reads the empty value, and
tries to open it as the tty.  xenconsole then exits.

Fix this by having xenconsole treat an empty value the same way as no
value at all.

Also, make the error opening the tty be nonfatal: we just print a
warning, but do not exit.  I think this is helpful in theoretical
situations where xenconsole is racing with libxl and/or xenconsoled.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
v2: Combine two conditions and move the free
(cherry picked from commit 39ba2989b10b6a1852e253b204eb010f8e7026f1)
(cherry picked from commit 7b161be2e51c519754ac4435d63c8fc03db606ec)

Conflicts:
tools/console/client/main.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: handle read-only drives with qemu-xen
Stefano Stabellini [Tue, 22 Sep 2015 15:56:35 +0000 (16:56 +0100)]
libxl: handle read-only drives with qemu-xen

The current libxl code doesn't deal with read-only drives at all.

Upstream QEMU and qemu-xen only support read-only cdrom drives: make
sure to specify "readonly=on" for cdrom drives and return error in case
the user requested a non-cdrom read-only drive.

This is XSA-142, discovered by Lin Liu
(https://bugzilla.redhat.com/show_bug.cgi?id=1257893).

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Backport to Xen 4.5 and earlier, apropos of report and review from
Michael Young.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: Increase device model startup timeout to 1min.
Anthony PERARD [Tue, 7 Jul 2015 15:09:13 +0000 (16:09 +0100)]
libxl: Increase device model startup timeout to 1min.

On a busy host, QEMU may take more than 10s to load and start.

This is likely due to a bug in Linux where the I/O subsystem sometime
produce high latency under load and result in QEMU taking a long time to
load every single dynamic libraries.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 9acfbe14d7261b03e3b3f4dc3c850ba2a7093e1f)

Conflicts:
tools/libxl/libxl_internal.h
(cherry picked from commit bbbd29a25d090f1180d14210358c6d7ccdebef85)
(cherry picked from commit fe66a76a2d6e30f2cd3aea6b2c8cb41f20ca6ea2)

9 years agoxl: correct handling of extra_config in main_cpupoolcreate
Wei Liu [Tue, 14 Jul 2015 16:41:10 +0000 (17:41 +0100)]
xl: correct handling of extra_config in main_cpupoolcreate

Don't dereference extra_config if it's NULL. Don't leak extra_config in
the end.

Also fixed a typo in error string while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 705c9e12426cba82804cb578fc70785281655d94)
(cherry picked from commit ffb4e6387f489b6b5ce287f51db43cb37ebae064)
(cherry picked from commit 213e243819ba5f72e13afad41ce2f5df17715530)

9 years agolibxl: fix leak of config_data in main_cpupoolcreate
Matthew Daley [Wed, 18 Sep 2013 03:37:43 +0000 (15:37 +1200)]
libxl: fix leak of config_data in main_cpupoolcreate

Coverity-ID: 1087193
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 7737ecb20c5babff45445d27fff91894b43a2d28)

9 years agoQEMU_TAG update
Ian Jackson [Fri, 11 Sep 2015 10:50:48 +0000 (11:50 +0100)]
QEMU_TAG update

9 years agolibxl: poll: Avoid fd deregistration race POLLNVAL crash
Ian Jackson [Wed, 12 Aug 2015 11:50:12 +0000 (12:50 +0100)]
libxl: poll: Avoid fd deregistration race POLLNVAL crash

It can happen that an fd is deregistered, and closed, and then a new
fd opened, and reregistered, all while another thread is in poll().

If this happens poll might report POLLNVAL, but the event loop would
think that the fd was supposed to have been valid, and then fail an
assertion:
  libxl_event.c:1183: afterpoll_check_fd: Assertion `poller->fds_changed || !(fds[slot].revents & 0x020)' failed.

We can't simply ignore POLLNVAL because if we have bugs which cause
messed-up fds, it is a serious problem which we really need to detect.

Instead, add extra tracking to spot when this possibility arises, and
abort on POLLNVAL if we are sure that it is unexpected.

Reported-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 681ce1681622a46d111cfdc4fc07e4cb565ae131)

(cherry picked from commit 7f7642f778b78e8e204fc082ce03072bb26887c7)

And, adjusted for semantic conflict over CTX vs ctx.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 3646b134c1673f09c0a239de10b0da4c9265c8e8)
Conflicts:
tools/libxl/libxl_event.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: make libxl__poller_put tolerate p==NULL
Ian Jackson [Fri, 11 Oct 2013 11:10:45 +0000 (12:10 +0100)]
libxl: make libxl__poller_put tolerate p==NULL

This is less fragile, and more in keeping with the usual style of
initialising everything to 0 and freeing things unconditionally.

Correspondingly, remove the tests at the call sites.

Apropos of c1f3f174.  No overall functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 6ed09e37722f601661fff42f80279a41773c574e)

9 years agolibxl: poll: Use poller_get and poller_put for poller_app
Ian Jackson [Thu, 9 Jul 2015 16:05:07 +0000 (17:05 +0100)]
libxl: poll: Use poller_get and poller_put for poller_app

This makes the code more regular.  We are going to want to do some
more work in poller_get and poller_put, which work also wants to be
done for poller_app.

Two very minor functional changes:

 * We call malloc an extra time since poller_app is now a pointer

 * ERROR_FAIL on poller_get failing for poller_app is generated in
   libxl_ctx_init rather than passed through by libxl_poller_init
   from libxl__pipe_nonblock.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit aae37652067eafd0f2b85050306772df0cb71f08)

(cherry picked from commit 9f6f513eecbdc76ce30b5f2e6c52e02076bac30b)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 0348c450fe253b3db6edc77568820327ff991478)

9 years agolibxl: poll: Make libxl__poller_get have only one success return path
Ian Jackson [Thu, 9 Jul 2015 15:52:02 +0000 (16:52 +0100)]
libxl: poll: Make libxl__poller_get have only one success return path

In preparation for doing some more work on successful exit.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 6fc946bc5520ebdbba5cbae4d49e53895df8b393)

(cherry picked from commit 8c409135e69c7321cb6d82b8cae0868a81d05ddc)
Conflicts:
tools/libxl/libxl_event.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 900c7970de4cb809ba208465cb0abd094b4dec58)

9 years agolibxl: don't leak memory in libxl__poller_get failure case
Matthew Daley [Wed, 30 Oct 2013 07:51:53 +0000 (20:51 +1300)]
libxl: don't leak memory in libxl__poller_get failure case

Coverity-ID: 1055894
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1edd6d8da354442b860ae28b8944dbd8a102d5f7)

9 years agolibxl: event handling: ao_inprogress does waits while reports outstanding
Ian Jackson [Mon, 9 Feb 2015 15:20:32 +0000 (15:20 +0000)]
libxl: event handling: ao_inprogress does waits while reports outstanding

libxl__ao_inprogress needs to check (like
libxl__ao_complete_check_progress_reports) that there are no
oustanding progress callbacks.

Otherwise it might happen that we would destroy the ao while another
thread has an outstanding callback its egc report queue.  The other
thread would then, in its egc_run_callbacks, touch the destroyed ao.

Instead, when this happens in libxl__ao_inprogress, simply run round
the event loop again.  The thread which eventually makes the callback
will spot our poller in the ao, and notify the poller, waking us up.

This fixes an assertion failure race seen with libvirt:
  libvirtd: libxl_event.c:1792: libxl__ao_complete_check_progress_reports: Assertion `ao->in_initiator' failed.
or (after "Add an assert to egc_run_callbacks")
  libvirtd: libxl_event.c:1338: egc_run_callbacks: Assertion `aop->ao->magic == 0xA0FACE00ul' failed.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit f1335f0d7b2402e94e0c6e8a905dc309edaafcfb)
(cherry picked from commit dfed6d96fd6af00c9970e2a1c600d6bb991d137e)

9 years agolibxl: event handling: Break out ao_work_outstanding
Ian Jackson [Mon, 9 Feb 2015 15:18:30 +0000 (15:18 +0000)]
libxl: event handling: Break out ao_work_outstanding

Break out the test in libxl__ao_complete_check_progress_reports, into
ao_work_outstanding, which reports false if either (i) the ao is still
ongoing or (ii) there is a progress report (perhaps on a different
thread's callback queue) which has yet to be reported to the
application.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 93699882d98cce9736d6e871db303275df1138a2)
(cherry picked from commit ba683105b52f60e312151654d72b0b0e8508cee5)

9 years agolibxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:29 +0000 (16:02 -0400)]
libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.

There is no sense in trying to online (or offline) CPUs when the size of
cpumap is greater than the maximum number of VCPUs the guest can go to.

As such fail the operation if the count of CPUs to online is greater
than what the guest started with. For the offline case we do not
check (as the bits are unset in the cpumap) and let it go through.

We coalesce some of the underlying libxl_set_vcpuonline code
together which was duplicated in QMP and XenStore codepaths.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 0d8cbcad03764e42ff2f0d224aff883c3734d782)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit ca0f468192d12d8d30c2a48a37c5d3460a464a29)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
Andrew Cooper [Mon, 13 Apr 2015 16:11:12 +0000 (16:11 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125

gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 6c1cb3dba4ff97dd40909670755f24fcdf903012)
Conflicts:
tools/libxc/xc_domain.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/xenconsoled: Increase file descriptor limit
Andrew Cooper [Mon, 2 Mar 2015 15:04:37 +0000 (15:04 +0000)]
tools/xenconsoled: Increase file descriptor limit

XenServer's VM density testing uncovered a regression when moving from
sysvinit to systemd where the file descriptor limit dropped from 4096 to
1024. (XenServer had previously inserted a ulimit statement into its
initscripts.)

One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the
lost ulimit, but that is only a stopgap solution.

As Xenconsoled genuinely needs a large number of file descriptors if a large
number of domains are running, attempt to increase the limit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 588df84c0d702e835e526ecef3af6c5444857558)
(cherry picked from commit 9d5b2b01024d18b2135c1b0deebb8bfbf5dced5f)
(cherry picked from commit a490f8df8c989483624ebd6befb63377be55722b)

9 years agoocaml/xenctrl: Fix stub_xc_readconsolering()
Andrew Cooper [Fri, 30 Jan 2015 14:11:14 +0000 (14:11 +0000)]
ocaml/xenctrl: Fix stub_xc_readconsolering()

The Ocaml stub to retrieve the hypervisor console ring had a few problems.

 * A single 32k buffer would truncate a large console ring.
 * The buffer was static and not under the protection of the Ocaml GC lock so
   could be clobbered by concurrent accesses.
 * Embedded NUL characters would cause caml_copy_string() (which is strlen()
   based) to truncate the buffer.

The function is rewritten from scratch, using the same algorithm as the python
stubs, but uses the protection of the Ocaml GC lock to maintain a static
running total of the ring size, to avoid redundant realloc()ing in future
calls.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Dave Scott <dave.scott@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1a010ca99e9b04c1cfbd0ee718aa22d5ebd530ab)
(cherry picked from commit cfc4c43be14e60608ed0b8173365c737950afe41)
(cherry picked from commit c669c244246a7e45cedb03a30d59656c95d09719)

9 years agoocaml/xenctrl: Make failwith_xc() thread safe
Andrew Cooper [Wed, 28 Jan 2015 17:55:32 +0000 (17:55 +0000)]
ocaml/xenctrl: Make failwith_xc() thread safe

The static error_str[] buffer is not thread-safe, and 1024 bytes is
unreasonably large.  Reduce to 256 bytes (which is still much larger than any
current use), and move it to being a stack variable.

Also, propagate the Noreturn attribute from caml_raise_with_string().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Dave Scott <Dave.Scott@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit c8945d51613450c19e0898b1b3056c90f4929179)
(cherry picked from commit 032673c8836e28d9e291e0d02235001c41aedaab)
(cherry picked from commit 9702e084d09550495c2e71f2639c1c2c43aeaf63)

9 years agoocaml/xenctrl: Check return values from hypercalls
Andrew Cooper [Tue, 27 Jan 2015 20:38:11 +0000 (20:38 +0000)]
ocaml/xenctrl: Check return values from hypercalls

rather than blindly continuing and possibly using negative values.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Dave Scott <dave.scott@eu.citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit 3380f5b6270e6fa4b24313f8808e7625e4c5a6ba)
(cherry picked from commit c91ed889ea3c3781a94a30909f30c3aad56c97d5)
(cherry picked from commit 10a95535b0dcde7daa3402b92f3c8d4494781c28)

9 years agolibxl: In domain death search, start search at first domid we want
Ian Jackson [Tue, 17 Mar 2015 15:30:57 +0000 (09:30 -0600)]
libxl: In domain death search, start search at first domid we want

From: Ian Jackson <Ian.Jackson@eu.citrix.com>

When domain_death_xswatch_callback needed a further call to
xc_domain_getinfolist it would restart it with the last domain it
found rather than the first one it wants.

If it only wants one it will also only ask for one domain.  The result
would then be that it gets the previous domain again (ie, the previous
one to the one it wants), which still doesn't reveal the answer to the
question, and it would therefore loop again.

It's completely unclear to me why I thought it was a good idea to
start the xc_domain_getinfolist with the last domain previously found
rather than the first one left un-confirmed.  The code has been that
way since it was introduced.

Instead, start each xc_domain_getinfolist at the next domain whose
status we need to check.

We also need to move the test for !evg into the loop, we now need evg
to compute the arguments to getinfolist.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 4783c99aab866f470bd59368cfbf5ad5f677b0ec)
(cherry picked from commit 0b19348f3cd176e4badb173dd0054c49346a6ce1)
(cherry picked from commit 13623d5d8e854f20b2da885f4d452dc870912205)

9 years agodocs: workaround markdown parser error in xen-command-line.markdown
Ian Campbell [Wed, 19 Nov 2014 10:42:18 +0000 (10:42 +0000)]
docs: workaround markdown parser error in xen-command-line.markdown

Some versions of markdown (specifically the one in Debian Wheezy, currently
used to generate
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
confused by nested lists in the middle of multi-paragraph parent list entries
as seen in the com1,com2 entry.

The effect is that the "Default" section of all following entries are replace
by some sort of hash or checksum (at least, a string of 32 random seeming hex
digits).

Workaround this issue by making the decriptions of the DPS options a nested
list, moving the existing nested list describing the options for S into a third
level list. This seems to avoid the issue, and is arguably better formatting in
its own right (at least its not a regression IMHO)

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit ae325e94d2076f3953824e069c062908221f7325)
(cherry picked from commit 3e9054c523ed089d04cf10b7f5b417240be85760)

9 years agoxl: Sane handling of extra config file arguments
Ian Jackson [Mon, 15 Jun 2015 13:50:42 +0000 (14:50 +0100)]
xl: Sane handling of extra config file arguments

Various xl sub-commands take additional parameters containing = as
additional config fragments.

The handling of these config fragments has a number of bugs:

 1. Use of a static 1024-byte buffer.  (If truncation would occur,
    with semi-trusted input, a security risk arises due to quotes
    being lost.)

 2. Mishandling of the return value from snprintf, so that if
    truncation occurs, the to-write pointer is updated with the
    wanted-to-write length, resulting in stack corruption.  (This is
    XSA-137.)

 3. Clone-and-hack of the code for constructing the appended
    config file.

These are fixed here, by introducing a new function
`string_realloc_append' and using it everywhere.  The `extra_info'
buffers are replaced by pointers, which start off NULL and are
explicitly freed on all return paths.

The separate variable which will become dom_info.extra_config is
abolished (which involves moving the clearing of dom_info).

Additional bugs I observe, not fixed here:

 4. The functions which now call string_realloc_append use ad-hoc
    error returns, with multiple calls to `return'.  This currently
    necessitates multiple new calls to `free'.

 5. Many of the paths in xl call exit(-rc) where rc is a libxl status
    code.  This is a ridiculous exit status `convention'.

 6. The loops for handling extra config data are clone-and-hacks.

 7. Once the extra config buffer is accumulated, it must be combined
    with the appropriate main config file.  The code to do this
    combining is clone-and-hacked too.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian,campbell@citrix.com>
(cherry picked from commit dd84604f35bd3855c57146eb8fe53924c10d3963)
(cherry picked from commit 6040b3aeb32b4bce2d9958ecbcbd020c46c35d61)
(cherry picked from commit 214fd40a20fa5988b4ea021c2d06e8aca8dda184)

9 years agoQEMU_TAG update
Ian Jackson [Wed, 29 Jul 2015 15:34:52 +0000 (16:34 +0100)]
QEMU_TAG update

9 years agoQEMU_TAG update
Ian Jackson [Tue, 23 Jun 2015 10:43:50 +0000 (11:43 +0100)]
QEMU_TAG update

9 years agox86/traps: loop in the correct direction in compat_iret()
Andrew Cooper [Thu, 11 Jun 2015 13:02:35 +0000 (15:02 +0200)]
x86/traps: loop in the correct direction in compat_iret()

This is CVE-2015-4164 / XSA-136.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 1f0721377952fc038b48f175d7061ec701359aac
master date: 2015-06-11 14:44:47 +0200

9 years agognttab: add missing version check to GNTTABOP_swap_grant_ref handling
Jan Beulich [Thu, 11 Jun 2015 13:02:01 +0000 (15:02 +0200)]
gnttab: add missing version check to GNTTABOP_swap_grant_ref handling

... avoiding NULL derefs when the version to use wasn't set yet (via
GNTTABOP_setup_table or GNTTABOP_set_version).

This is CVE-2015-4163 / XSA-134.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 5d5c09d853d3f212861f70c577c65d1703f752ae
master date: 2015-06-11 14:44:12 +0200

9 years agoQEMU_TAG update
Ian Jackson [Wed, 10 Jun 2015 13:14:50 +0000 (14:14 +0100)]
QEMU_TAG update

9 years agoQEMU_TAG update
Ian Jackson [Wed, 27 May 2015 15:56:47 +0000 (16:56 +0100)]
QEMU_TAG update

10 years agodomctl/sysctl: don't leak hypervisor stack to toolstacks
Andrew Cooper [Tue, 21 Apr 2015 07:25:03 +0000 (09:25 +0200)]
domctl/sysctl: don't leak hypervisor stack to toolstacks

This is CVE-2015-3340 / XSA-132.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 4ff3449f0e9d175ceb9551d3f2aecb59273f639d
master date: 2015-04-21 09:03:15 +0200

10 years agodomctl: don't allow a toolstack domain to call domain_pause() on itself
Andrew Cooper [Wed, 1 Apr 2015 09:11:30 +0000 (10:11 +0100)]
domctl: don't allow a toolstack domain to call domain_pause() on itself

These DOMCTL subops were accidentally declared safe for disaggregation
in the wake of XSA-77.

This is XSA-127 / CVE-2015-2751.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoLimit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)
Konrad Rzeszutek Wilk [Wed, 1 Apr 2015 09:11:28 +0000 (10:11 +0100)]
Limit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)

Said hypercall for large BARs can take quite a while. As such
we can require that the hypercall MUST break up the request
in smaller values.

Another approach is to add preemption to it - whether we do the
preemption using hypercall_create_continuation or returning
EAGAIN to userspace (and have it re-invocate the call) - either
way the issue we cannot easily solve is that in 'map_mmio_regions'
if we encounter an error we MUST call 'unmap_mmio_regions' for the
whole BAR region.

Since the preemption would re-use input fields such as nr_mfns,
first_gfn, first_mfn - we would lose the original values -
and only undo what was done in the current round (i.e. ignoring
anything that was done prior to earlier preemptions).

Unless we re-used the return value as 'EAGAIN|nr_mfns_done<<10' but
that puts a limit (since the return value is a long) on the amount
of nr_mfns that can provided.

This patch sidesteps this problem by:
 - Setting an hard limit of nr_mfns having to be 64 or less.
 - Toolstack adjusts correspondingly to the nr_mfn limit.
 - If the there is an error when adding the toolstack will call the
   remove operation to remove the whole region.

The need to break this hypercall down is for large BARs can take
more than the guest (initial domain usually) time-slice. This has
the negative result in that the guest is locked out for a long
duration and is unable to act on any pending events.

We also augment the code to return zero if nr_mfns instead
of trying to the hypercall.

This is XSA-125 / CVE-2015-2752.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>