]> xenbits.xensource.com Git - xen.git/log
xen.git
11 years agoxen: arm: define guest virtual platform in API headers
Ian Campbell [Tue, 19 Nov 2013 13:00:18 +0000 (13:00 +0000)]
xen: arm: define guest virtual platform in API headers

The tools and the hypervisor need to agree on various aspects of the guest
environment, such as interrupt numbers, memory layout, initial register values
for registers which are implementation defined etc. Therefore move the
associated defines into the public interface headers, or create them as
necessary.

This just exposes the current de-facto standard guest layout, which may be
subject to change in the future. This deliberately does not make the guest
layout dynamic since there is currently no need.

These values should not be exposed to guests, they should find these things
out via device tree or should not be relying on implementation defined
defaults.

Various bits of the hypervisor needed to change to configure dom0 with the real
platform values while using the virtual platform configuration for guests.
Arrange for this where appropriate and plumb through as needed.

We also need to expose some 64-bit values (e.g. PSR_GUEST64_INIT) for the
benefit of 32 bit toolstacks building 64 bit guests.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agotools: check for libfdt when building for ARM
Ian Campbell [Tue, 19 Nov 2013 13:00:17 +0000 (13:00 +0000)]
tools: check for libfdt when building for ARM

libxl is going to want this to aid in the creation of guest device tree blobs.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agoxen: arm: implement arch_set_info_guest for 64-bit vcpus
Ian Campbell [Tue, 19 Nov 2013 13:00:16 +0000 (13:00 +0000)]
xen: arm: implement arch_set_info_guest for 64-bit vcpus

This all seems too easy...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: implement XEN_DOMCTL_set_address_size
Ian Campbell [Tue, 19 Nov 2013 13:00:15 +0000 (13:00 +0000)]
xen: arm: implement XEN_DOMCTL_set_address_size

This is subarch specific to plumb through to arm32 and arm64 versions.

The toolstack uses this to select 32- vs 64-bit guests (or rather it does on
x86 and soon will for arm too).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: include header for for arch_do_{sys, dom}ctl prototype
Ian Campbell [Tue, 19 Nov 2013 13:00:14 +0000 (13:00 +0000)]
xen: arm: include header for for arch_do_{sys, dom}ctl prototype

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agoxen: arm: add enable-method to cpu nodes for arm64 guests.
Ian Campbell [Tue, 19 Nov 2013 13:00:13 +0000 (13:00 +0000)]
xen: arm: add enable-method to cpu nodes for arm64 guests.

This is required by the Linux arm64 boot protocol.

We use PSCI.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: allocate dom0 memory separately from preparing the dtb
Ian Campbell [Tue, 19 Nov 2013 13:00:12 +0000 (13:00 +0000)]
xen: arm: allocate dom0 memory separately from preparing the dtb

Mixing these two together is a pain, it forces us to prepare the dtb before
processing the kernel which means we don't know whether the guest is 32- or
64-bit while we construct its DTB.

Instead split out the memory allocation (including 1:1 workaround handling)
and p2m setup into a separate phase and then create a memory node in the DTB
based on the result.

This allows us to move kernel parsing before DTB setup.

As part of this it was also necessary to rework where the decision regarding
the placement of the DTB and initrd in RAM was made. It is now made when
loading the kernel, which allows it to make use of the zImage/ELF specific
information and therefore to make decisions based on complete knowledge and do
it right rather than guessing in prepare_dtb and relying on a later check to
see if things worked.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: move dom0 gic and timer device tree nodes under /xen-core-devices/
Ian Campbell [Tue, 19 Nov 2013 13:00:11 +0000 (13:00 +0000)]
xen: arm: move dom0 gic and timer device tree nodes under /xen-core-devices/

Julien observed that we were relying on the provided host DTB supplying
suitable #address-cells and #size-cells values to allow us to represent these
addresses, which may not reliably be the case. Moving these under our own
known (somewhat analogous to the use of /soc/ or /motherboard/ on some
platforms) allows us to control these sizes.

Since the new node is created out of thin air it does not have a corresponding
struct dt_device_node and therefore we cannot use dt_n_addr_cells or
dt_n_size_cells, we can use hardcoded constants instead. For the same reason
we define and use set_xen_range instead of dt_set_range.

The hypervisor, cpus and psci node all either defined #foo-cells for their
children or do not contain reg properties and therefore can remain at the top
level.

The logging in make_gic_node was inconsistent. Fix it.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: Add comment regard arm64 zImage v0 vs v1
Ian Campbell [Tue, 19 Nov 2013 13:00:10 +0000 (13:00 +0000)]
xen: arm: Add comment regard arm64 zImage v0 vs v1

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: Report aarch64 capability.
Ian Campbell [Tue, 19 Nov 2013 13:00:09 +0000 (13:00 +0000)]
xen: arm: Report aarch64 capability.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen/arm: Panic if we are unable to initialize platform timer
Julien Grall [Fri, 15 Nov 2013 15:27:37 +0000 (15:27 +0000)]
xen/arm: Panic if we are unable to initialize platform timer

The caller of xen_init_time, start_xen, doesn't check the return value
of the function. Xen will silently ignore the error and continue.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Panic if platform initialization failed
Julien Grall [Fri, 15 Nov 2013 15:27:36 +0000 (15:27 +0000)]
xen/arm: Panic if platform initialization failed

Actually, if an error occurs, Xen will silently ignore it and continue.
Convert platform_init to a void function and panic if we fail to
correctly initialize the platform.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: ioremap_attr: return NULL is __vmap failed
Julien Grall [Mon, 18 Nov 2013 13:08:23 +0000 (13:08 +0000)]
xen/arm: ioremap_attr: return NULL is __vmap failed

Most of ioremap_* caller check if ioremap returns NULL. Actually, if the
physical address is non-aligned, Xen will return the pointer given by
__vmap plus the offset in the page. So if ioremap_* fails, the caller
will retrieve an non-NULL address and continue as if there was no error.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: correct duplicate MPIDR check to actually skip the node
Matthew Daley [Fri, 8 Nov 2013 00:32:03 +0000 (13:32 +1300)]
xen/arm: correct duplicate MPIDR check to actually skip the node

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen/arm: p2m: flush TLB by VMID when a new domain is creating
Julien Grall [Thu, 14 Nov 2013 17:00:34 +0000 (17:00 +0000)]
xen/arm: p2m: flush TLB by VMID when a new domain is creating

Once the VMID is marked unused, a new domain can reuse the VMID for its
own. If the TLB is not flushed, entries can contain wrong translation.
When a new p2m is allocated, switch to the new VMID and flush TLB on
every physical CPUs.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoopw: libxl: use CTX macro in libxl_utils.c
Kelley Nielsen [Mon, 11 Nov 2013 23:24:00 +0000 (15:24 -0800)]
opw: libxl: use CTX macro in libxl_utils.c

The new coding style uses the convenience macro CTX as declared in
libxl_internal.h. Substitute an invocation of this macro for its
body at the two places it occurs in libxl_utils.c.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: use LOG instead of LIBXL__LOG in libxl_utils.c
Kelley Nielsen [Mon, 11 Nov 2013 23:23:56 +0000 (15:23 -0800)]
libxl: use LOG instead of LIBXL__LOG in libxl_utils.c

To conform to the new coding style, replace the invocation of
LIBXL__LOG in the function libxl_pipe() in the file libxl_utils.c
with an invocation of LOG. Create a local libxl__gc gc* for LOG
to use by invoking GC_INIT(ctx) at the top of the function, and
clean it up by invoking GC_FREE at the exit. Create a variable,
ret, to consolidate exits in one place and avoid invoking GC_FREE
twice.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: use LOG and LOGE instead of LIBXL__LOG* in libxl_utils.c
Kelley Nielsen [Mon, 11 Nov 2013 23:23:55 +0000 (15:23 -0800)]
libxl: use LOG and LOGE instead of LIBXL__LOG* in libxl_utils.c

Code cleanup - no functional changes

The convenience macros LOG and LOGE have been written to take the
place of the old macros in the LIBXL__LOG* family. Replace the
invocations of the old macros in the function libxl_read_file_contents()
with invocations of the corresponding new ones. Create a local
libxl__gc gc* for the new macros to use by invoking GC_INIT(ctx) at the
top of the function, and clean it up by invoking GC_FREE at the two
exit points.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: use LOGE instead of LIBXL__LOG_ERRNO in libxl_utils.c
Kelley Nielsen [Fri, 15 Nov 2013 01:50:43 +0000 (17:50 -0800)]
libxl: use LOGE instead of LIBXL__LOG_ERRNO in libxl_utils.c

Code cleanup - no functional changes

The convenience macro LOGE has been written to take the place of
LIBXL__LOG_ERRNO. LOGE depends on the existence of a local libgl__gc
*gc. Replace two invocations of LIBXL__LOG_ERRNO, which are in
functions that already have a libxl__gc *gc present, to invocations
of the new macro.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agolibxl: use GCSPRINTF instead of libxl__sprintf
Kelley Nielsen [Mon, 11 Nov 2013 23:23:53 +0000 (15:23 -0800)]
libxl: use GCSPRINTF instead of libxl__sprintf

Code cleanup - no functional changes

The convenience macro GCSPRINTF has been written to be used in place
of libxl__sprintf(). Replace all calls to libxl__sprintf() in
libxl_utils.c with invocations of the new macro.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: use GCSPRINTF in place of libxl_sprintf() in libxl_qmp.c
Kelley Nielsen [Mon, 11 Nov 2013 23:23:52 +0000 (15:23 -0800)]
libxl: use GCSPRINTF in place of libxl_sprintf() in libxl_qmp.c

Code cleanup -- no functional changes

The convenience macro GCSPRINTF has been written to be used in place of
libxl_sprintf. Change all calls to libxl_sprintf() in libxl_qmp.c to
invocations of the new macro.

Suggested-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: Use new macro LOGE() in libxl_qmp.c
Kelley Nielsen [Mon, 11 Nov 2013 23:23:51 +0000 (15:23 -0800)]
libxl: Use new macro LOGE() in libxl_qmp.c

Code cleanup -- no functional changes

Coding style has recently been changed for libxl. The convenience
macro LOGE() has been introduced, and invocations of the old macro
LIBXL__LOG_ERROR() are to be replaced with it. Change all occurences
of the old macro (in functions that have a local libxl_gc *gc) except
the one in register_serials_chardev_callback() to the new one. (This
function lacks a local libxl__gc *gc, which LOGE() requires.)

Suggested-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: change most remaining LIBXL_LOG to LOG in libxl_qmp.c
Kelley Nielsen [Fri, 15 Nov 2013 01:41:07 +0000 (17:41 -0800)]
libxl: change most remaining LIBXL_LOG to LOG in libxl_qmp.c

Coding style has recently been changed for libxl. The convenience
macro LOG() has been introduced, and invocations of the old macro
LIBXL__LOG() are to be replaced with it. Change occurences of the
old macro to the new one in the functions qmp_handle_response()
and qmp_handle_error_response(). The new macros need access to a
local libxl__gc *gc, so add it as a parameter to both these functions,
and pass the instance in qmp_next() down the call chain to
qmp_handle_response() and in turn to qmp_handle_error_response().

Suggested-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
[ijc -- reverted one unintential w/s change]

11 years agoget_maintainer.pl: Adjust to Xen workflow
Don Slutz [Tue, 5 Nov 2013 14:11:51 +0000 (09:11 -0500)]
get_maintainer.pl: Adjust to Xen workflow

Based on feedback from reviewers:
* Disable git fallback by default: it has a tendency to mail
  anyone who did a single oneline change and should not be
  necessary for a project of Xen's size.
* Disable rolestats: Makes cut-and-paste from the output into the
  commit message easy.
* Drop "THE REST" fallback: Don't spam Keir *too* much.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ijc -- expanded the changelog]

11 years agoget_maintainer.pl: Convert to Xen tree
Don Slutz [Tue, 5 Nov 2013 14:11:50 +0000 (09:11 -0500)]
get_maintainer.pl: Convert to Xen tree

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoAdd linux version of get_maintainer.pl
Don Slutz [Tue, 5 Nov 2013 14:11:49 +0000 (09:11 -0500)]
Add linux version of get_maintainer.pl

This is get_maintainer.pl from linux commit bbbe96ed899e8ebde1a12d28f10461eb8bef1074

Tag at time of commit: v3.9-2313-gbbbe96e
Was released as: v3.10-0-g8bb495e

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: add device backend listener in order to launch backends
Roger Pau Monne [Fri, 13 Sep 2013 08:53:58 +0000 (10:53 +0200)]
libxl: add device backend listener in order to launch backends

Add the necessary logic in libxl to allow it to act as a listener for
launching backends in a driver domain, replacing udev (like we already
do on Dom0). This new functionality is acomplished by watching the
domain backend path (/local/domain/<domid>/backend) and reacting to
device creation/destruction.

The way to launch this listener daemon is from xl, using the newly
introduced "devd" command. The command will daemonize by default,
using "xldevd.log" as it's logfile. Optionally the user can force the
execution of the listener in the foreground by passing the "-F"
option to the devd command.

Current backends handled by this daemon include Qdisk, vbd and vif
device types.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxl: revert 326a7b74
Roger Pau Monne [Fri, 20 Sep 2013 15:55:32 +0000 (17:55 +0200)]
libxl: revert 326a7b74

When running libxl from a driver domain there's no xenstore pid file
(because xenstore is not running on the driver domain). Also, at that
point in libxl initialization there's no way to know wether libxl is
running on a domain different than Dom0, so just revert the change in
order to allow libxl to work on driver domains.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agoxl: put daemonize code in it's own function
Roger Pau Monne [Fri, 20 Sep 2013 15:14:09 +0000 (17:14 +0200)]
xl: put daemonize code in it's own function

Move the daemonizer code from create_domain into it's own function
that can be called from other places different than create_domain.
This will be used to daemonize the driver domain backend handler.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxl: add Qdisk backend launch helper
Roger Pau Monne [Thu, 19 Sep 2013 13:33:59 +0000 (15:33 +0200)]
libxl: add Qdisk backend launch helper

Current Qemu launch functions in libxl require the usage of data
structures only avaialbe on domain creation. All this information is
not need in order to launch a Qemu instance to serve Qdisk backends,
so introduce a new simplified helper that can be used to launch
Qemu/Qdisk, that will be used to launch Qdisk in driver domains.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxl: don't launch Qemu on Dom0 for Qdisk devices on driver domains
Roger Pau Monne [Thu, 19 Sep 2013 09:17:45 +0000 (11:17 +0200)]
libxl: don't launch Qemu on Dom0 for Qdisk devices on driver domains

In libxl__need_xenpv_qemu check that the backend domain of the Qdisk
device is Dom0 before launching a Qemu instance in the toolstack
domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxl: remove the Qemu bodge for driver domain devices
Roger Pau Monne [Wed, 18 Sep 2013 15:35:00 +0000 (17:35 +0200)]
libxl: remove the Qemu bodge for driver domain devices

When Qemu is launched from a driver domain to act as a PV disk
backend we can make sure that Qemu is running before detaching
devices, so there's no need for the bodge there.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxl: synchronize device removal when using driver domains
Roger Pau Monne [Fri, 27 Sep 2013 09:37:04 +0000 (11:37 +0200)]
libxl: synchronize device removal when using driver domains

Synchronize the clean up of the backend from the toolstack domain when
the driver domain has actually finished closing the backend for the
device.

This is accomplished by waiting for the driver domain to  remove the
directory containing the backend keys, then the toolstack domain will
finish the cleanup by removing the empty folders on the backend path.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxl: don't remove device frontend path from driver domains
Roger Pau Monne [Wed, 18 Sep 2013 11:15:14 +0000 (13:15 +0200)]
libxl: don't remove device frontend path from driver domains

A domain different than LIBXL_TOOLSTACK_DOMID should not try to remove
the frontend paths of a device.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxl: create a local xenstore libxl and device-model dir for guests
Roger Pau Monne [Wed, 18 Sep 2013 10:42:47 +0000 (12:42 +0200)]
libxl: create a local xenstore libxl and device-model dir for guests

If libxl is executed inside a guest domain it needs write access to
the local libxl xenstore dir (/local/<domid>/libxl) to store internal
data. This also applies to Qemu which needs a
/local/<domid>/device-model xenstore directory.

This patch creates the mentioned directories for each guest launched
from libxl.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agox86: consider modules when cutting off memory
Jan Beulich [Mon, 18 Nov 2013 12:57:20 +0000 (13:57 +0100)]
x86: consider modules when cutting off memory

The code in question runs after module ranges got already removed from
the E820 table, so when determining the new maximum page/PDX we need to
explicitly take them into account.

Furthermore we need to round up the ending addresses here, in order to
fully cover eventual partial trailing pages.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoVT-d: fix TLB flushing in dma_pte_clear_one()
Jan Beulich [Mon, 18 Nov 2013 12:55:55 +0000 (13:55 +0100)]
VT-d: fix TLB flushing in dma_pte_clear_one()

The third parameter of __intel_iommu_iotlb_flush() is to indicate
whether the to be flushed entry was a present one. A few lines before,
we bailed if !dma_pte_present(*pte), so there's no need to check the
flag here again - we can simply always pass TRUE here.

This is XSA-78.

Suggested-by: Cheng Yueqiang <yqcheng.2008@phdis.smu.edu.sg>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agonested VMX: don't ignore mapping errors
Jan Beulich [Mon, 18 Nov 2013 08:39:01 +0000 (09:39 +0100)]
nested VMX: don't ignore mapping errors

Rather than ignoring failures to map the virtual VMCS as well as MSR or
I/O port bitmaps, convert those into failures of the respective
instructions (avoiding to dereference NULL pointers). Ultimately such
failures should be handled transparently (by using transient mappings
when they actually need to be accessed, just like nested SVM does).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agofix leaking of v->cpu_affinity_saved on domain destruction
Dario Faggioli [Fri, 15 Nov 2013 16:43:28 +0000 (17:43 +0100)]
fix leaking of v->cpu_affinity_saved on domain destruction

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agocredit: Update other parameters when setting tslice_ms
Nate Studer [Fri, 15 Nov 2013 16:38:10 +0000 (17:38 +0100)]
credit: Update other parameters when setting tslice_ms

Add a utility function to update the rest of the timeslice
accounting fields when updating the timeslice of the
credit scheduler, so that capped CPUs behave correctly.

Before this patch changing the timeslice to a value higher
than the default would result in a domain not utilizing
its full capacity and changing the timeslice to a value
lower than the default would result in a domain exceeding
its capacity.

Signed-off-by: Nate Studer <nate.studer@dornerworks.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
11 years agox86/VT-x: Disable MSR intercept for SHADOW_GS_BASE
Paul Durrant [Fri, 15 Nov 2013 10:02:17 +0000 (11:02 +0100)]
x86/VT-x: Disable MSR intercept for SHADOW_GS_BASE

Intercepting this MSR is pointless - The swapgs instruction does not cause a
vmexit, so the cached result of this is potentially stale after the next guest
instruction.  It is correctly saved and restored on vcpu context switch.

Furthermore, 64bit Windows writes to this MSR on every thread context switch,
so interception causes a substantial performance hit.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
11 years agox86/HVM: 32-bit IN result must be zero-extended to 64 bits (part 2)
Jan Beulich [Fri, 15 Nov 2013 10:01:49 +0000 (11:01 +0100)]
x86/HVM: 32-bit IN result must be zero-extended to 64 bits (part 2)

Just spotted a counterpart of what commit 9d89100b (same title) dealt
with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agokexec: fail image loads if the page tables cannot be built
David Vrabel [Fri, 15 Nov 2013 10:00:46 +0000 (11:00 +0100)]
kexec: fail image loads if the page tables cannot be built

CID 1128566

If an image source page is allocated in kimage_alloc_page() but the
machine_kexec_add_page() fails, the image may appear to load
succesfully but it will not execute.  The relocation will fault
(rebooting the host) when trying to copy the source page, as it is not
mapped.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agokexec: fix kexec_lock use in kexec_swap_images()
David Vrabel [Fri, 15 Nov 2013 09:59:41 +0000 (10:59 +0100)]
kexec: fix kexec_lock use in kexec_swap_images()

CID 1128573

If a bad image type is supplied in a KEXECOP_unload hypercall, the
kexec_lock in kexec_swap_images() was left locked, causing a deadlock
on a subsequent image load or unload.

The kexec_lock is only required to serialize the swap operation
itself.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agoPVH dom0: allow all physdev ops
Mukesh Rathor [Wed, 13 Nov 2013 08:53:30 +0000 (09:53 +0100)]
PVH dom0: allow all physdev ops

Allow a PVH dom0 access to all PHYSDEVOP_* ops.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Convert flow and adjust indentation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agoPVH dom0: set eflags resvd bit #1
Mukesh Rathor [Wed, 13 Nov 2013 08:52:18 +0000 (09:52 +0100)]
PVH dom0: set eflags resvd bit #1

In this patch the eflags resv bit #1 is set in vmx_vmenter_helper. If
the bit is not set, the vmlaunch/resume will fail with guest state
invalid.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
For consistency (i.e. even if perhaps not strictly needed) also do the
same on SVM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agopvh tools: libxl changes to create a PVH guest
George Dunlap [Wed, 13 Nov 2013 08:42:51 +0000 (09:42 +0100)]
pvh tools: libxl changes to create a PVH guest

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh tools: libxc changes to build a PVH guest
Mukesh Rathor [Wed, 13 Nov 2013 08:42:14 +0000 (09:42 +0100)]
pvh tools: libxc changes to build a PVH guest

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: documentation
Mukesh Rathor [Wed, 13 Nov 2013 08:41:59 +0000 (09:41 +0100)]
pvh: documentation

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: restrict tsc_mode to NEVER_EMULATE for now
Mukesh Rathor [Wed, 13 Nov 2013 08:41:12 +0000 (09:41 +0100)]
pvh: restrict tsc_mode to NEVER_EMULATE for now

The reason given for this restriction in the first place, given in one
of the comments checking for PVH requirements, had to do with
additional infrastructure required to allow PV RDTSC emulation for PVH
guests.

Since we don't use the PV emulation path at all anymore, we may be
able to remove this restriction.

Experiments show that pvh will boot without apparent issues in
"default", "native", and "native_paravirt" mode, but not in
"always_emulate" mode.  We'll leave this restriction in until
we can sort out what's going on.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: disable 32-bit guest support for now
Mukesh Rathor [Wed, 13 Nov 2013 08:40:41 +0000 (09:40 +0100)]
pvh: disable 32-bit guest support for now

Removing the assert allows the PVH code to call this during vmcs
construction in a later patch, making the code more robust by removing
duplicate code.

To be implemented.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: use PV handlers for PIO
George Dunlap [Wed, 13 Nov 2013 08:40:03 +0000 (09:40 +0100)]
pvh: use PV handlers for PIO

Register an IO handler for the entire PIO range, and have it call the
PV PIO handlers.

NB at this point this won't do the full "copy and execute on the stack
with full GPRs" work-around; this may need to be sorted out for dom0 to allow
these instructions to happen in guest context.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: PV cpuid
Mukesh Rathor [Wed, 13 Nov 2013 08:39:13 +0000 (09:39 +0100)]
pvh: PV cpuid

NB at the moment we do not handle forced emulated ops.  This means, for example,
that xen-detect will report an HVM Xen guest instead of a PV one.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: set up more PV stuff in set_info_guest
Mukesh Rathor [Wed, 13 Nov 2013 08:37:51 +0000 (09:37 +0100)]
pvh: set up more PV stuff in set_info_guest

Allow the guest to set up a few more things when bringing up a vcpu.

This includes cr3 and gs_base.

Also set up wallclock, and only initialize a vcpu once.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: use PV e820
Mukesh Rathor [Wed, 13 Nov 2013 08:37:01 +0000 (09:37 +0100)]
pvh: use PV e820

Allow PV e820 map to be set and read from a PVH domain.  This requires
moving the pv e820 struct out from the pv-specific domain struct and
into the arch domain struct.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: access to hypercalls
Mukesh Rathor [Wed, 13 Nov 2013 08:36:32 +0000 (09:36 +0100)]
pvh: access to hypercalls

Hypercalls where we now have unrestricted access:
* memory_op
* console_io
* vcpu_op
* mmuext_op

We also restrict PVH domain access to HVMOP_*_param to reading and
writing HVM_PARAM_CALLBACK_IRQ.

Most hvm_op functions require "is_hvm_domain()" and will default to
-EINVAL; exceptions are HVMOP_get_time and HVMOP_xentrace.

Finally, we restrict setting IOPL permissions for a PVH domain.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: do not allow PVH guests to change paging modes
Mukesh Rathor [Wed, 13 Nov 2013 08:35:58 +0000 (09:35 +0100)]
pvh: do not allow PVH guests to change paging modes

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: vmx-specific changes
Mukesh Rathor [Wed, 13 Nov 2013 08:35:20 +0000 (09:35 +0100)]
pvh: vmx-specific changes

Changes:
* Enforce HAP mode for now
* Disable exits related to virtual interrupts or emulated APICs
* Disable changing paging mode
 - "unrestricted guest" (i.e., real mode for EPT) disabled
 - write guest EFER disabled
* Start in 64-bit mode
* Paging mode update to happen in arch_set_info_guest

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: disable unneeded features of HVM containers
Mukesh Rathor [Wed, 13 Nov 2013 08:34:35 +0000 (09:34 +0100)]
pvh: disable unneeded features of HVM containers

Things kept:
* cacheattr_region lists
* irq-related structures
* paging
* tm_list
* hvm params
* hvm_domaim.io_handler (for handling PV io)

Things disabled for now:
* compat xlation

Things disabled:
* Emulated timers and clock sources
* IO/MMIO ioreq pages, event channels
* msix tables
* hvm_funcs
* nested HVM
* Fast-path for emulated lapic accesses

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: introduce PVH guest type
Mukesh Rathor [Wed, 13 Nov 2013 08:33:12 +0000 (09:33 +0100)]
pvh: introduce PVH guest type

Introduce new PVH guest type, flags to create it, and ways to identify it.

To begin with, it will inherit functionality marked hvm_container.

Code to actually check for hardware support, in the VMX case, will be added
in future patches.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh prep: introduce pv guest type and has_hvm_container macros
Mukesh Rathor [Wed, 13 Nov 2013 08:30:09 +0000 (09:30 +0100)]
pvh prep: introduce pv guest type and has_hvm_container macros

The goal of this patch is to classify conditionals more clearly, as to
whether they relate to pv guests, hvm-only guests, or guests with an
"hvm container" (which will eventually include PVH).

This patch introduces an enum for guest type, as well as two new macros
for switching behavior on and off: is_pv_* and has_hvm_container_*.  At the
moment is_pv_* <=> !has_hvm_container_*.  The purpose of having two is that
it seems to me different to take a path because something does *not* have PV
structures as to take a path because it *does* have HVM structures, even if the
two happen to coincide 100% at the moment.  The exact usage is occasionally a bit
fuzzy though, and a judgement call just needs to be made on which is clearer.

In general, a switch should use is_pv_* (or !is_pv_*) if the code in question
relates directly to a PV guest.  Examples include use of pv_vcpu structs or
other behavior directly related to PV domains.

hvm_container is more of a fuzzy concept, but in general:

* Most core HVM behavior will be included in this.  Behavior not
appropriate for PVH mode will be disabled in later patches

* Hypercalls related to HVM guests will *not* be included by default;
functionality needed by PVH guests will be enabled in future patches

* The following functionality are not considered part of the HVM
container, and PVH will end up behaving like PV by default: Event
channel, vtsc offset, code related to emulated timers, nested HVM,
emuirq, PoD

* Some features are left to implement for PVH later: vpmu, shadow mode

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh: tolerate HVM guests having no ioreq page
George Dunlap [Wed, 13 Nov 2013 08:29:02 +0000 (09:29 +0100)]
pvh: tolerate HVM guests having no ioreq page

PVH guests don't have a backing device model emulator (qemu); just
tolerate this situation explicitly, rather than special-casing PVH.

For unhandled IO, hvmemul_do_io() will now return X86EMUL_OKAY, which
is I believe what would be the effect if qemu didn't have a handler
for the IO.

This also fixes a potetial DoS in the host from the reworked series:
If the guest makes a hypercall which sends an invalidate request, it
would have crashed the host.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agopvh prep: code motion
Mukesh Rathor [Wed, 13 Nov 2013 08:26:38 +0000 (09:26 +0100)]
pvh prep: code motion

There are many functions where PVH requires some code in common with
HVM.  Rearrange some of these functions so that the code is together.

In general, the HVM code that PVH also uses includes:
 - cacheattr functionality
 - paging
 - hvm_funcs
 - hvm_assert_evtchn_irq tasklet
 - tm_list
 - hvm_params

And code that PVH shares with PV but not with PVH:
 - updating the domain wallclock
 - setting v->is_initialized

There should be no end-to-end changes in behavior.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agolibxc: move temporary grant table mapping to end of memory
Roger Pau Monné [Wed, 13 Nov 2013 08:26:13 +0000 (09:26 +0100)]
libxc: move temporary grant table mapping to end of memory

In order to set up the grant table for HVM guests, libxc needs to map
the grant table temporarily.  At the moment, it does this by adding the
grant page to the HVM guest's p2m table in the MMIO hole (at gfn 0xFFFFE),
then mapping that gfn, setting up the table, then unmapping the gfn and
removing it from the p2m table.

This breaks with PVH guests with 4G or more of ram, because there is
no MMIO hole; so it ends up clobbering a valid RAM p2m entry, then
leaving a "hole" when it removes the grant map from the p2m table.
Since the guest thinks this is normal ram, when it maps it and tries
to access the page, it crashes.

This patch maps the page at max_gfn+1 instead.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agoVMX: allow vmx_update_debug_state to be called when v!=current
George Dunlap [Wed, 13 Nov 2013 08:25:36 +0000 (09:25 +0100)]
VMX: allow vmx_update_debug_state to be called when v!=current

Removing the assert allows the PVH code to call this during vmcs
construction in a later patch, making the code more robust by removing
duplicate code.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Eddie Dong <eddie.dong@intel.com>
11 years agolibxl: Avoid realloc(,0) when libxl__xs_directory returns empty list
Ian Jackson [Thu, 18 Apr 2013 15:27:46 +0000 (16:27 +0100)]
libxl: Avoid realloc(,0) when libxl__xs_directory returns empty list

If the named path is a leaf node, libxl__xs_directory can succeed,
returning non-null, but set *nb to 0.

In three places in libxl this may result in a zero size argument being
passed to malloc() or realloc(), which is not adviseable.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: Deprecate synchronous waiting for the device model
Ian Jackson [Mon, 14 Oct 2013 16:26:01 +0000 (17:26 +0100)]
libxl: Deprecate synchronous waiting for the device model

libxl__wait_for_device_model blocks, with the ctx lock held, waiting
for a response from the device model.  If the dm doesn't respond
quickly (for example, because it has crashed), this may block the
whole process.  Explain this in a comment, rename the function to
libxl__wait_for_device_model_deprecated, and explain what to use
instead.

libxl__wait_for_offspring is the core implementation for the above.
Its name leads people to think it might be generally useful for
waiting for children, which is far from the case.  It only waits for
xenstore.  Also it has the problems described above.  Explain this,
rename it to libxl__xenstore_child_wait_deprecated, and explain what
to use instead.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: Do not generate short block in libxl__datacopier_prefixdata
Ian Jackson [Tue, 3 Sep 2013 12:41:46 +0000 (13:41 +0100)]
libxl: Do not generate short block in libxl__datacopier_prefixdata

libxl__datacopier_prefixdata would prepend a deliberately short block
(not just a half-full one, but one with a short buffer) to the
dc->bufs queue.  However, this is wrong because datacopier_readable
will find it and try to continue to fill it up.

Instead, allocate a full-sized buffer.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Chunyan Liu <cyliu@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: Introduce nested async operations (nested ao)
Ian Jackson [Mon, 4 Nov 2013 17:56:15 +0000 (17:56 +0000)]
libxl: Introduce nested async operations (nested ao)

This allows a long-running ao to avoid accumulating memory.  Each
nested ao has its own gc.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agocommon/vsprintf: fix signed->unsigned error, causing glacial performance
Andrew Cooper [Tue, 12 Nov 2013 16:20:34 +0000 (17:20 +0100)]
common/vsprintf: fix signed->unsigned error, causing glacial performance

The original patch for

  c/s 67a3542c5bc356e6452d8305991617c875f87de4
  "common/vsprintf: Refactor string() out of vsnprintf()"

specifically used signed integers, identical to the code copied out of vsprintf.

When committed, these had changed to unsigned integers, which causes a
functional change.  This causes glacial boot performance and an excessive
quantity of spaces printed to the serial console, as we loop to the upper
bound of a 32bit integer.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agoQEMU_TAG update
Ian Jackson [Tue, 12 Nov 2013 15:41:37 +0000 (15:41 +0000)]
QEMU_TAG update

11 years agolibxl: save/restore errno in SIGCHLD handler
Ian Jackson [Mon, 11 Nov 2013 17:17:55 +0000 (17:17 +0000)]
libxl: save/restore errno in SIGCHLD handler

Without this, code interrupted by SIGCHLD may experience strange
values of errno.  (As far as I know this is not the cause of any
reported bugs.)

This fix should be backported in due course.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86: eliminate has_arch_mmios()
Jan Beulich [Tue, 12 Nov 2013 15:28:47 +0000 (16:28 +0100)]
x86: eliminate has_arch_mmios()

... as being generally insufficient: Either has_arch_pdevs() or
cache_flush_permitted() should be used (in particular, it is
insufficient to consider MMIO ranges alone - I/O port ranges have the
same requirements if available to a guest).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoevtchn/fifo: don't spin indefinitely when setting LINK
David Vrabel [Tue, 12 Nov 2013 12:19:25 +0000 (13:19 +0100)]
evtchn/fifo: don't spin indefinitely when setting LINK

A malicious or buggy guest can cause another domain to spin
indefinitely by repeatedly writing to an event word when the other
guest is trying to link a new event.  The cmpxchg() in
evtchn_fifo_set_link() will repeatedly fail and the loop may never
terminate.

Fixing this requires a change to the ABI which is documented in draft
H of the design.

  http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf

Since a well-behaved guest only makes a limited set of state changes,
the loop can terminate early if the guest makes an invalid state
transition.

The guest may:

- clear LINKED and LINK.
- clear PENDING
- set MASKED
- clear MASKED

It is valid for the guest to mask and unmask an event at any time so
specify that it is not valid for a guest to clear MASKED if Xen is
trying to update LINK.  Indicate this to the guest with an additional
BUSY bit in the event word.  The guest must not clear MASKED if BUSY
is set and it should spin until BUSY is cleared.

The remaining valid writes (clear LINKED, clear PENDING, set MASKED,
clear MASKED by Xen) will limit the number of failures of the
cmpxchg() to at most 4.  A clear of LINKED will also terminate the
loop early. Therefore, the loop can then be limited to at most 4
iterations.

If the buggy or malicious guest does cause the loop to exit with
LINKED set and LINK unset then that buggy guest will lose events.

Reported-by: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agoVMX: don't crash processing 'd' debug key
Jan Beulich [Tue, 12 Nov 2013 10:52:19 +0000 (11:52 +0100)]
VMX: don't crash processing 'd' debug key

There's a window during scheduling where "current" and the active VMCS
may disagree: The former gets set much earlier than the latter. Since
both vmx_vmcs_enter() and vmx_vmcs_exit() immediately return when the
subject vCPU is "current", accessing VMCS fields would, depending on
whether there is any currently active VMCS, either read wrong data, or
cause a crash.

Going forward we might want to consider reducing the window during
which vmx_vmcs_enter() might fail (e.g. doing a plain __vmptrld() when
v->arch.hvm_vmx.vmcs != this_cpu(current_vmcs) but arch_vmx->active_cpu
== -1), but that would add complexities (acquiring and - more
importantly - properly dropping v->arch.hvm_vmx.vmcs_lock) that don't
look worthwhile adding right now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agonested SVM: adjust guest handling of structure mappings
Jan Beulich [Tue, 12 Nov 2013 10:51:15 +0000 (11:51 +0100)]
nested SVM: adjust guest handling of structure mappings

For one, nestedsvm_vmcb_map() error checking must not consist of using
assertions: Global (permanent) mappings can fail, and hence failure
needs to be dealt with properly. And non-global (transient) mappings
can't fail anyway.

And then the I/O port access bitmap handling was broken: It checked
only to first of the accessed ports rather than each of them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Egger <chegger@amazon.de>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
11 years agoMAINTAINERS: Add KEXEC maintainer
David Vrabel [Tue, 12 Nov 2013 10:47:36 +0000 (11:47 +0100)]
MAINTAINERS: Add KEXEC maintainer

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: check kexec relocation code fits in a page
David Vrabel [Tue, 12 Nov 2013 10:47:26 +0000 (11:47 +0100)]
x86: check kexec relocation code fits in a page

The kexec relocation (control) code must fit in a single page so add a
link time check for this.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agolibxc: add API for kexec hypercall
David Vrabel [Tue, 12 Nov 2013 10:47:07 +0000 (11:47 +0100)]
libxc: add API for kexec hypercall

Add xc_kexec_exec(), xc_kexec_get_ranges(), xc_kexec_load(), and
xc_kexec_unload().  The load and unload calls require the v2 load and
unload ops.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agolibxc: add hypercall buffer arrays
David Vrabel [Tue, 12 Nov 2013 10:46:39 +0000 (11:46 +0100)]
libxc: add hypercall buffer arrays

Hypercall buffer arrays are used when a hypercall takes a variable
length array of buffers.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agokexec crash image when dom0 crashes
David Vrabel [Tue, 12 Nov 2013 10:46:06 +0000 (11:46 +0100)]
kexec crash image when dom0 crashes

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agokexec: extend hypercall with improved load/unload ops
David Vrabel [Tue, 12 Nov 2013 10:44:41 +0000 (11:44 +0100)]
kexec: extend hypercall with improved load/unload ops

In the existing kexec hypercall, the load and unload ops depend on
internals of the Linux kernel (the page list and code page provided by
the kernel).  The code page is used to transition between Xen context
and the image so using kernel code doesn't make sense and will not
work for PVH guests.

Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
that no longer require a code page to be provided by the guest -- Xen
now provides the code for calling the image directly.

The new load op looks similar to the Linux kexec_load system call and
allows the guest to provide the image data to be loaded.  The guest
specifies the architecture of the image which may be a 32-bit subarch
of the hypervisor's architecture (i.e., an EM_386 image on an
EM_X86_64 hypervisor).

The toolstack can now load images without kernel involvement.  This is
required for supporting kexec when using a dom0 with an upstream
kernel.

Crash images are copied directly into the crash region on load.
Default images are copied into domheap pages and a list of source and
destination machine addresses is created.  This is list is used in
kexec_reloc() to relocate the image to its destination.

The old load and unload sub-ops are still available (as
KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
of the new infrastructure.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agokexec: add infrastructure for handling kexec images
David Vrabel [Tue, 12 Nov 2013 10:41:02 +0000 (11:41 +0100)]
kexec: add infrastructure for handling kexec images

Add the code needed to handle and load kexec images into Xen memory or
into the crash region.  This is needed for the new KEXEC_CMD_load and
KEXEC_CMD_unload hypercall sub-ops.

Much of this code is derived from the Linux kernel.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agokexec: add public interface for improved load/unload sub-ops
David Vrabel [Tue, 12 Nov 2013 10:39:29 +0000 (11:39 +0100)]
kexec: add public interface for improved load/unload sub-ops

Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
kexec hypercall.  These new sub-ops allow a priviledged guest to
provide the image data to be loaded into Xen memory or the crash
region instead of guests loading the image data themselves and
providing the relocation code and metadata.

The old interface is provided to guests requesting an interface
version prior to 4.4.

Bump __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: give FIX_EFI_MPF its own fixmap entry
David Vrabel [Tue, 12 Nov 2013 10:37:19 +0000 (11:37 +0100)]
x86: give FIX_EFI_MPF its own fixmap entry

FIX_EFI_MPF was the same as FIX_KEXEC_BASE_0 which is going away.  So
add its own entry.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocommon/symbols: Remove print_symbol() and associated infrastructure
Andrew Cooper [Tue, 12 Nov 2013 10:11:30 +0000 (11:11 +0100)]
common/symbols: Remove print_symbol() and associated infrastructure

Also adjust the one common user of print_symbol() to use the new printk()
format.  While adjusting the format string, increase the width so a
long-to-expire plt_overflow() timer doesn't break the column alignment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoarm: Replace print_symbol() with new %ps/%pS format
Andrew Cooper [Tue, 12 Nov 2013 10:11:05 +0000 (11:11 +0100)]
arm: Replace print_symbol() with new %ps/%pS format

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: Replace print_symbol() with new %ps/%pS format
Andrew Cooper [Tue, 12 Nov 2013 10:10:35 +0000 (11:10 +0100)]
x86: Replace print_symbol() with new %ps/%pS format

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocommon/vsprintf: Add %ps and %pS format specifier support
Andrew Cooper [Tue, 12 Nov 2013 10:09:12 +0000 (11:09 +0100)]
common/vsprintf: Add %ps and %pS format specifier support

Introduce the %ps and %pS format options for printing a symbol.

  %ps will print the symbol name and optional offset and size
  %pS will print the symbol name and unconditional offset and size

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocommon/vsprintf: Refactor pointer() out of vsnprintf()
Andrew Cooper [Tue, 12 Nov 2013 10:06:45 +0000 (11:06 +0100)]
common/vsprintf: Refactor pointer() out of vsnprintf()

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocommon/vsprintf: Refactor string() out of vsnprintf()
Andrew Cooper [Tue, 12 Nov 2013 10:06:09 +0000 (11:06 +0100)]
common/vsprintf: Refactor string() out of vsnprintf()

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agonuma-sched: leave node-affinity alone if not in "auto" mode
Dario Faggioli [Tue, 12 Nov 2013 09:54:28 +0000 (10:54 +0100)]
numa-sched: leave node-affinity alone if not in "auto" mode

If the domain's NUMA node-affinity is being specified by the
user/toolstack (instead of being automatically computed by Xen),
we really should stick to that. This means domain_update_node_affinity()
is wrong when it filters out some stuff from there even in "!auto"
mode.

This commit fixes that. Of course, this does not mean node-affinity
is always honoured (e.g., a vcpu won't run on a pcpu of a different
cpupool) but the necessary logic for taking into account all the
possible situations lives in the scheduler code, where it belongs.

What could happen without this change is that, under certain
circumstances, the node-affinity of a domain may change when the
user modifies the vcpu-affinity of the domain's vcpus. This, even
if probably not a real bug, is at least something the user does
not expect, so let's avoid it.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoxen/arm: more info on the ARM ABI
Stefano Stabellini [Thu, 7 Nov 2013 14:52:50 +0000 (14:52 +0000)]
xen/arm: more info on the ARM ABI

Add more information about the exported ARM ABI.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agoxen/midway: Add 1:1 workaround
Julien Grall [Wed, 6 Nov 2013 19:37:15 +0000 (19:37 +0000)]
xen/midway: Add 1:1 workaround

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxl: remove pointless null pointer check
Matthew Daley [Fri, 8 Nov 2013 00:45:11 +0000 (13:45 +1300)]
xl: remove pointless null pointer check

poolinfo is guaranteed non-null here.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxc: remove pointless null pointer check
Matthew Daley [Fri, 8 Nov 2013 00:45:10 +0000 (13:45 +1300)]
libxc: remove pointless null pointer check

ctxt_buf is guaranteed non-null here.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/video: remove pointless if subcondition
Matthew Daley [Fri, 8 Nov 2013 00:45:09 +0000 (13:45 +1300)]
xen/video: remove pointless if subcondition

It's already handled just above.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: remove pointless if subcondition
Matthew Daley [Fri, 8 Nov 2013 00:45:08 +0000 (13:45 +1300)]
xen/arm: remove pointless if subcondition

It's already handled just above.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: correct strtod error check
Matthew Daley [Fri, 8 Nov 2013 00:32:58 +0000 (13:32 +1300)]
libxl: correct strtod error check

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Device Tree cpu clock-frequency
Jon Fraser [Thu, 7 Nov 2013 23:50:28 +0000 (18:50 -0500)]
xen/arm: Device Tree cpu clock-frequency

When creating CPU device tree properties, copy the
clock-frequency if present.

Quiets annoying messages from linux kernel:
"/cpus/cpu@0 missing clock-frequency property"

Signed-off-by: Jon Fraser <jfraser@broadcom.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>