]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agoxl / libxl: push parsing of SSID and CPU pool ID down to libxl
Wei Liu [Tue, 17 Jun 2014 09:32:21 +0000 (10:32 +0100)]
xl / libxl: push parsing of SSID and CPU pool ID down to libxl

This patch pushes parsing of "init_seclabel", "seclabel",
"device_model_stubdomain_seclabel" and "pool" down to libxl level.

Originally the parsing is done in xl level, which is not ideal because
libxl won't have the truely relevant information. With this patch libxl
holds important information by itself.

The libxl IDL is extended to hold the string of labels and pool name.
And if there those strings are present they take precedence over the
numeric representations.

As all relevant structures (libxl_dominfo etc) have a field called
X_name / X_label now, a string is also copied there so that callers
won't have to do ID to name / label translation.

In order to be compatible with users of older versions of libxl, this
patch also defines LIBXL_HAVE_SSID_LABEL and LIBXL_HAVE_CPUPOOL_NAME. If
they are defined, the respective strings are available. And if those
strings are not NULL, libxl will do the parsing and ignore the numeric
values.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Panic when we receive an unexpected trap
Julien Grall [Tue, 17 Jun 2014 20:44:28 +0000 (21:44 +0100)]
xen/arm: Panic when we receive an unexpected trap

The current implementation of do_unexpected_trap make Xen spin forever
on the current physical CPU. This may lead to stall guests VCPU and print
unhelpful message (RCU stall...).

Usually when Xen receives an unexpected trap, it means that something goes
wrong either in the hypervisor or in the CPU. In this case we should
directly panic to also stop the other CPUs.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/python: Remove some legacy scripts
Andrew Cooper [Tue, 17 Jun 2014 17:26:18 +0000 (18:26 +0100)]
tools/python: Remove some legacy scripts

Nothing in scripts/ is referenced by the current Xen build system.  It is a
legacy version of the XenAPI bindings, other parts of which have already been
removed from the tree.

Additionally, prevent the install target from creating an $(SBINDIR) directory
but putting nothing in it.  This appears to be something missed when removing
Xend.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Drop cpuinfo_x86 structure definition
Julien Grall [Mon, 16 Jun 2014 20:41:34 +0000 (21:41 +0100)]
xen/arm: Drop cpuinfo_x86 structure definition

I'm not sure why this structure were defined in ARM specific include...

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/EFI: allow FPU/XMM use in runtime service functions
Jan Beulich [Wed, 18 Jun 2014 13:53:27 +0000 (15:53 +0200)]
x86/EFI: allow FPU/XMM use in runtime service functions

UEFI spec update 2.4B developed a requirement to enter runtime service
functions with CR0.TS (and CR0.EM) clear, thus making feasible the
already previously stated permission for these functions to use some of
the XMM registers. Enforce this requirement (along with the connected
ones on FPU control word and MXCSR) by going through a full FPU save
cycle (if the FPU was dirty) in efi_rs_enter() (along with loading  the
specified values into the other two registers).

Note that the UEFI spec mandates that extension registers other than
XMM ones (for our purposes all that get restored eagerly) are preserved
across runtime function calls, hence there's nothing we need to restore
in efi_rs_leave() (they do get saved, but just for simplicity's sake).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: prevent PVH Dom0 from having pages with more than one ref
Roger Pau Monné [Wed, 18 Jun 2014 13:52:25 +0000 (15:52 +0200)]
x86: prevent PVH Dom0 from having pages with more than one ref

On PV guests a reference is taken when a page gets added to the page
tables, which makes pages added to the page tables have two
references, but this is not suitable for PVH that doesn't use the
PVMMU. In the PVH case only one reference has to be taken or else the
page would not be freed when the memory of the domain is decreased.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/mce: sanitise the #MC entry path
Andrew Cooper [Wed, 18 Jun 2014 13:51:28 +0000 (15:51 +0200)]
x86/mce: sanitise the #MC entry path

The 'error_code' function parameters are not used at all; drop it from the
call chain.  If it is needed at some point in the future, it is available via
cpu_user_regs.

Having do_machine_check() call the non-inlineable machine_check_vector() just
to get at the static function pointer '_machine_check_vector' is silly.  Move
do_machine_check() from traps.c to mce.c and do away with
machine_check_vector() entirely.

Both {intel,amd}_init_mce() register their own local function as the #MC
handler, each of which call mcheck_cmn_handler() in an identical way.  Fix
this craziness by actually turning mcheck_cmn_handler() into a valid #MC
handler (as its comments already state), and have {intel,amd}_init_mce()
register it instead of their own private handlers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
10 years agoIOMMU: prevent VT-d device IOTLB operations on wrong IOMMU
Malcolm Crossley [Wed, 18 Jun 2014 13:50:02 +0000 (15:50 +0200)]
IOMMU: prevent VT-d device IOTLB operations on wrong IOMMU

PCIe ATS allows for devices to contain IOTLBs, the VT-d code was iterating
around all ATS capable devices and issuing IOTLB operations for all IOMMUs,
even though each ATS device is only accessible via one particular IOMMU.

Issuing an IOMMU operation to a device not accessible via that IOMMU results
in an IOMMU timeout because the device does not reply. VT-d IOMMU timeouts
result in a Xen panic.

Therefore this bug prevents any Intel system with 2 or more ATS enabled IOMMUs,
each with an ATS device connected to them, from booting Xen.

The patch adds a IOMMU pointer to the ATS device struct so the VT-d code can
ensure it does not issue IOMMU ATS operations on the wrong IOMMU. A void
pointer has to be used because AMD and Intel IOMMU implementations do not have
a common IOMMU structure or indexing mechanism.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/arm: gic_events_need_delivery and irq priorities
Stefano Stabellini [Tue, 10 Jun 2014 14:07:20 +0000 (15:07 +0100)]
xen/arm: gic_events_need_delivery and irq priorities

Introduce GIC_IRQ_GUEST_ACTIVE to track which irqs are currently
active in the guest.

gic_events_need_delivery should only return positive if an outstanding
pending irq has an higher group priority than the currently active group
priotity and the priority mask.
Read GICH_APR to find the active group priority.
Read GICH_VMCR to find the priority mask.
Find the highest priority non-active enabled irq by going through the
inflight list.

In gic_restore_pending_irqs replace lower priority pending (and not
active) irqs in GICH_LRs with higher priority irqs if no more GICH_LRs
are available.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: introduce GIC_PRI_TO_GUEST macro
Stefano Stabellini [Tue, 10 Jun 2014 14:07:19 +0000 (15:07 +0100)]
xen/arm: introduce GIC_PRI_TO_GUEST macro

GICH_LR registers and GICH_VMCR only support 5 bits for guest irq
priorities.
Introduce a macro to reduce the 8-bit priority fields to 5 bits; use it
in gic.c.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: don't protect GICH and lr_queue accesses with gic.lock
Stefano Stabellini [Tue, 10 Jun 2014 14:07:18 +0000 (15:07 +0100)]
xen/arm: don't protect GICH and lr_queue accesses with gic.lock

GICH is banked, protect accesses by disabling interrupts.
Protect lr_queue accesses with the vgic.lock only.
gic.lock only protects accesses to GICD now.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen/arm: second irq injection while the first irq is still inflight
Stefano Stabellini [Tue, 10 Jun 2014 14:07:17 +0000 (15:07 +0100)]
xen/arm: second irq injection while the first irq is still inflight

Set GICH_LR_PENDING in the corresponding GICH_LR to inject a second irq
while the first one is still active.
If the first irq is already pending (not active), clear
GIC_IRQ_GUEST_QUEUED because the guest doesn't need a second
notification.If the irq has already been EOI'ed then just clear the
GICH_LR right away and move the interrupt to lr_pending so that it is
going to be reinjected by gic_restore_pending_irqs on return to guest.

If the target cpu is not the current cpu, then set GIC_IRQ_GUEST_QUEUED
and send an SGI. The target cpu is going to be interrupted and call
gic_clear_lrs, that is going to take the same actions.

Do not call vgic_vcpu_inject_irq from gic_inject if
evtchn_upcall_pending is set. If we remove that call, we don't need to
special case evtchn_irq in vgic_vcpu_inject_irq anymore.
We need to force the first injection of evtchn_irq (call
gic_vcpu_inject_irq) from vgic_enable_irqs because evtchn_upcall_pending
is already set by common code on vcpu creation.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: rename GIC_IRQ_GUEST_PENDING to GIC_IRQ_GUEST_QUEUED
Stefano Stabellini [Tue, 10 Jun 2014 14:07:16 +0000 (15:07 +0100)]
xen/arm: rename GIC_IRQ_GUEST_PENDING to GIC_IRQ_GUEST_QUEUED

Rename GIC_IRQ_GUEST_PENDING to GIC_IRQ_GUEST_QUEUED and clarify its
meaning in xen/include/asm-arm/domain.h.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: s/gic_set_guest_irq/gic_raise_guest_irq
Stefano Stabellini [Tue, 10 Jun 2014 14:07:15 +0000 (15:07 +0100)]
xen/arm: s/gic_set_guest_irq/gic_raise_guest_irq

Rename gic_set_guest_irq to gic_raise_guest_irq and remove the state
parameter.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: keep track of the GICH_LR used for the irq in struct pending_irq
Stefano Stabellini [Tue, 10 Jun 2014 14:07:14 +0000 (15:07 +0100)]
xen/arm: keep track of the GICH_LR used for the irq in struct pending_irq

Move the irq field in pending_irq to improve packing.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen/arm: nr_lrs should be uint8_t
Stefano Stabellini [Tue, 10 Jun 2014 14:07:13 +0000 (15:07 +0100)]
xen/arm: nr_lrs should be uint8_t

A later patch is going to use uint8_t to keep track of LRs.
Both GICv3 and GICv2 don't need any more than an uint8_t to keep track
of the number of LRs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: support HW interrupts, do not request maintenance_interrupts
Stefano Stabellini [Tue, 10 Jun 2014 14:07:12 +0000 (15:07 +0100)]
xen/arm: support HW interrupts, do not request maintenance_interrupts

If the irq to be injected is an hardware irq (p->desc != NULL), set
GICH_LR_HW. Do not set GICH_LR_MAINTENANCE_IRQ.

Remove the code to EOI a physical interrupt on behalf of the guest
because it has become unnecessary.

Introduce a new function, gic_clear_lrs, that goes over the GICH_LR
registers, clear the invalid ones and free the corresponding interrupts
from the inflight queue if appropriate. Add the interrupt to lr_pending
if the GIC_IRQ_GUEST_PENDING is still set.

Call gic_clear_lrs on entry to the hypervisor if we are coming from
guest mode to make sure that the calculation in Xen of the highest
priority interrupt currently inflight is correct and accurate and not
based on stale data.

In vgic_vcpu_inject_irq, if the target is a vcpu running on another
pcpu, we are already sending an SGI to the other pcpu so that it would
pick up the new IRQ to inject.  Now also send an SGI to the other pcpu
even if the IRQ is already inflight, so that it can clear the LR
corresponding to the previous injection as well as injecting the new
interrupt.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen/arm: set GICH_HCR_UIE if all the LRs are in use
Stefano Stabellini [Tue, 10 Jun 2014 14:07:11 +0000 (15:07 +0100)]
xen/arm: set GICH_HCR_UIE if all the LRs are in use

On return to guest, if there are no free LRs and we still have more
interrupt to inject, set GICH_HCR_UIE so that we are going to receive a
maintenance interrupt when no pending interrupts are present in the LR
registers.
The maintenance interrupt handler won't do anything anymore, but
receiving the interrupt is going to cause gic_inject to be called on
return to guest that is going to clear the old LRs and inject new
interrupts.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: remove unused virtual parameter from vgic_vcpu_inject_irq
Stefano Stabellini [Tue, 10 Jun 2014 14:07:10 +0000 (15:07 +0100)]
xen/arm: remove unused virtual parameter from vgic_vcpu_inject_irq

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: no need to set HCR_VI when using the vgic to inject irqs
Stefano Stabellini [Tue, 10 Jun 2014 14:07:09 +0000 (15:07 +0100)]
xen/arm: no need to set HCR_VI when using the vgic to inject irqs

HCR_VI forces the guest to resume execution in IRQ mode and can actually
cause spurious interrupt injections.
The GIC is capable of injecting interrupts into the guest and causing it
to switch to IRQ mode automatically, without any need for the hypervisor
to set HCR_VI manually.

See ARM ARM B1.8.11 and chapter 5.4 of the Generic Interrupt Controller
Architecture Specification.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agopage-alloc: scrub pages used by hypervisor upon freeing
Jan Beulich [Tue, 17 Jun 2014 13:21:10 +0000 (15:21 +0200)]
page-alloc: scrub pages used by hypervisor upon freeing

... unless they're part of a fully separate pool (and hence can't ever
be used for guest allocations).

This is CVE-2014-4021 / XSA-100.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 17 Jun 2014 09:40:39 +0000 (10:40 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agolibxl: properly set default of discard_enable
Olaf Hering [Tue, 17 Jun 2014 08:44:40 +0000 (10:44 +0200)]
libxl: properly set default of discard_enable

Initialze discard_enable properly. This avoids a crash if a
libxl_device_disk with an uninitialized discard_enable is passed to
device_disk_add. Up to now only xl initialized discard_enable in its
config parser. External users of libxl, such as libvirt, do not need to
provide a default value.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agosched: DOMCTL_*vcpuaffinity works with hard and soft affinity
Dario Faggioli [Mon, 16 Jun 2014 10:13:25 +0000 (12:13 +0200)]
sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

by adding a flag for the caller to specify which one he cares about.

At the same time, enable the caller to get back the "effective affinity"
of the vCPU. That is the intersection between cpupool's cpus, the (new)
hard affinity and, for soft affinity, the (new) soft affinity. In fact,
despite what has been successfully set with the DOMCTL_setvcpuaffinity
hypercall, the Xen scheduler will never run a vCPU outside of its hard
affinity or of its domain's cpupool.

This happens by adding another cpumap to the interface and making both
the cpumaps IN/OUT parameters (for DOMCTL_setvcpuaffinity, they're of
course out-only for DOMCTL_getvcpuaffinity).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoderive NUMA node affinity from hard and soft CPU affinity
Dario Faggioli [Mon, 16 Jun 2014 10:13:03 +0000 (12:13 +0200)]
derive NUMA node affinity from hard and soft CPU affinity

if a domain's NUMA node-affinity (which is what controls
memory allocations) is provided by the user/toolstack, it
just is not touched. However, if the user does not say
anything, leaving it all to Xen, let's compute it in the
following way:

 1. cpupool's cpus & hard-affinity & soft-affinity
 2. if (1) is empty: cpupool's cpus & hard-affinity

This guarantees memory to be allocated from the narrowest
possible set of NUMA nodes, ad makes it relatively easy to
set up NUMA-aware scheduling on top of soft affinity.

Note that such 'narrowest set' is guaranteed to be non-empty.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agosched: introduce soft-affinity and use it instead d->node-affinity
Dario Faggioli [Mon, 16 Jun 2014 10:12:28 +0000 (12:12 +0200)]
sched: introduce soft-affinity and use it instead d->node-affinity

Before this change, each vcpu had its own vcpu-affinity
(in v->cpu_affinity), representing the set of pcpus where
the vcpu is allowed to run. Since when NUMA-aware scheduling
was introduced the (credit1 only, for now) scheduler also
tries as much as it can to run all the vcpus of a domain
on one of the nodes that constitutes the domain's
node-affinity.

The idea here is making the mechanism more general by:
  * allowing for this 'preference' for some pcpus/nodes to be
    expressed on a per-vcpu basis, instead than for the domain
    as a whole. That is to say, each vcpu should have its own
    set of preferred pcpus/nodes, instead than it being the
    very same for all the vcpus of the domain;
  * generalizing the idea of 'preferred pcpus' to not only NUMA
    awareness and support. That is to say, independently from
    it being or not (mostly) useful on NUMA systems, it should
    be possible to specify, for each vcpu, a set of pcpus where
    it prefers to run (in addition, and possibly unrelated to,
    the set of pcpus where it is allowed to run).

We will be calling this set of *preferred* pcpus the vcpu's
soft affinity, and this changes introduce it, and starts using it
for scheduling, replacing the indirect use of the domain's NUMA
node-affinity. This is more general, as soft affinity does not
have to be related to NUMA. Nevertheless, it allows to achieve the
same results of NUMA-aware scheduling, just by making soft affinity
equal to the domain's node affinity, for all the vCPUs (e.g.,
from the toolstack).

This also means renaming most of the NUMA-aware scheduling related
functions, in credit1, to something more generic, hinting toward
the concept of soft affinity rather than directly to NUMA awareness.

As a side effects, this simplifies the code quit a bit. In fact,
prior to this change, we needed to cache the translation of
d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
is what scheduling decisions require (we used to keep it in
node_affinity_cpumask). This, and all the complicated logic
required to keep it updated, is not necessary any longer.

The high level description of NUMA placement and scheduling in
docs/misc/xl-numa-placement.markdown is being updated too, to match
the new architecture.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agosched: rename v->cpu_affinity into v->cpu_hard_affinity
Dario Faggioli [Mon, 16 Jun 2014 10:11:52 +0000 (12:11 +0200)]
sched: rename v->cpu_affinity into v->cpu_hard_affinity

in order to distinguish it from the cpu_soft_affinity which will
be introduced a later commit ("xen: sched: introduce soft-affinity
and use it instead d->node-affinity").

This patch does not imply any functional change, it is basically
the result of something like the following:

 s/cpu_affinity/cpu_hard_affinity/g
 s/cpu_affinity_tmp/cpu_hard_affinity_tmp/g
 s/cpu_affinity_saved/cpu_hard_affinity_saved/g

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agospread boot time page scrubbing across all available CPU's
Malcolm Crossley [Mon, 16 Jun 2014 10:02:00 +0000 (12:02 +0200)]
spread boot time page scrubbing across all available CPU's

The page scrubbing is done in 128MB chunks in lockstep across all the
non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst each
chunk is being scrubbed and then release the heap_lock when the CPU's are
finished scrubing their individual chunk. This allows for the heap_lock to
not be held continously and for pending softirqs are to be serviced
periodically across the CPU's.

The page scrub memory chunks are allocated to the CPU's in a NUMA aware
fashion to reduce socket interconnect overhead and improve performance.
Specifically in the first phase we scrub at the same time on all the
NUMA nodes that have CPUs - we also weed out the SMT threads so that
we only use cores (that gives a 50% boost). The second phase is for NUMA
nodes that have no CPUs - for that we use the closest NUMA node's CPUs
(non-SMT again) to do the job.

This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron
6386 machine from 49 seconds to 3 seconds.
On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes
to 63 seconds.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/mce: don't spam the console with "CPUx: Temperature z"
Konrad Rzeszutek Wilk [Mon, 16 Jun 2014 09:59:32 +0000 (11:59 +0200)]
x86/mce: don't spam the console with "CPUx: Temperature z"

If the machine has been quite busy it ends up with these messages
printed on the hypervisor console:

(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature above threshold
(XEN) CPU0: Running in modulated clock mode
(XEN) CPU1: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal

While the state changes are important, the non-altered state
information is not needed. As such add a latch mechanism to only print
the information if it has changed since the last update (and the
hardware doesn't properly suppress redundant notifications).

This was observed on Intel DQ67SW,
BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
10 years agocpuidle: improve perf for certain workloads
Ross Lagerwall [Mon, 16 Jun 2014 09:59:05 +0000 (11:59 +0200)]
cpuidle: improve perf for certain workloads

The existing mechanism of using interrupt frequency as a heuristic does
not work well for certain workloads.  As an example, synchronous dd on a
small block size uses deep C-states because much of the time is spent
doing processing so the interrupt frequency is not too high, but when an
IOP is submitted, the interrupt occurs soon after going idle.  This
causes exit latency to be a significant factor.

To fix this, add a new factor which limits the exit latency to be no
more than 10% of the decaying measured idle time.  This improves
performance for workloads with a medium interrupt frequency but a short
idle duration.

In the workload given previously, throughput improves by 20% with this
patch.

This is not ported from the Linux menu governor since that uses load
average and number of IO wait processes to satisfy latency constraints.
If a process is in IO wait state, it compares the exit latency with the
predicted residency reduced by a factor of 10, which is somewhat similar
to what this patch does.

A side effect of this patch is to correctly limit the maximum idle time
used in the correction factor calculation. Previously data->measured_us
was used, and it was never set.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
10 years agox86/EFI: improve boot time diagnostics (try 2)
Jan Beulich [Mon, 16 Jun 2014 09:52:34 +0000 (11:52 +0200)]
x86/EFI: improve boot time diagnostics (try 2)

To aid analysis of eventual errors, print EFI status codes with error
messages where available. Also remove a case where the status gets
stored into a local variable without being used examined (which mis-
guided me to add an error check there in try 1 of this patch).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agopt-irq fixes and improvements
Jan Beulich [Mon, 16 Jun 2014 09:50:44 +0000 (11:50 +0200)]
pt-irq fixes and improvements

Tools side:
- don't silently ignore unrecognized PT_IRQ_TYPE_* values
- respect that the interface type contains a union, making the code at
  once no longer depend on the hypervisor ignoring the bus field of the
  PCI portion of the interface structure)

Hypervisor side:
- don't ignore the PCI bus number passed in
- don't store values (gsi, link) calculated from other stored values
- avoid calling xfree() with a spin lock held where easily possible
- have pt_irq_destroy_bind() respect the passed in type
- scope reduction and constification of various variables
- use switch instead of if/else-if chains
- formatting

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Implement a dummy debug monitor for ARM32
Julien Grall [Thu, 24 Apr 2014 22:45:55 +0000 (23:45 +0100)]
xen/arm: Implement a dummy debug monitor for ARM32

XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitors registers") disable Debug Registers access.

When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, it will try to
initialize the debug monitors. If an error occured Linux won't use this
feature.

The implementation made Xen expose a minimal set of registers which let think
the guest (i.e.) thinks HW debug won't work.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- s/DBGCR/DBGBCR/ to use correct register name ]
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Implement a dummy Performance Monitor for ARM32
Julien Grall [Thu, 24 Apr 2014 22:45:54 +0000 (23:45 +0100)]
xen/arm: Implement a dummy Performance Monitor for ARM32

XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitor registers") disable Performance Monitor.

When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, regardless the
ID_DFR0 (which tell if Perfomance Monitors Extension is implemented) the
kernel will try to access to PMCR.

Therefore we tell the guest we have 0 counters. Unfortunately we must always
support PMCCNTR (the cycle counter): we just RAZ/WI for all PM register,
which doesn't crash the kernel at least.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agomini-os: don't include queue.h if there's no libc
Thomas Leonard [Wed, 11 Jun 2014 10:30:17 +0000 (11:30 +0100)]
mini-os: don't include queue.h if there's no libc

Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: moved events code under arch
Karim Raslan [Wed, 11 Jun 2014 10:30:15 +0000 (11:30 +0100)]
mini-os: moved events code under arch

This is all code motion, except that we now initialise
the ev_actions array before calling the arch-specific code
to make it more robust against future changes.

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
[talex5@gmail.com: separated from big ARM commit]
Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: tidied up code
Karim Raslan [Wed, 11 Jun 2014 10:30:14 +0000 (11:30 +0100)]
mini-os: tidied up code

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
[talex5@gmail.com: separated from big ARM commit]
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
[talex5@gmail.com: use __func__ in DEBUG macro]
[talex5@gmail.com: drop text about "xm create"]
Signed-off-by: Thomas Leonard <talex5@gmail.com>
10 years agolibxl: const-ify libxl_uuid_*() API
David Vrabel [Tue, 10 Jun 2014 18:07:30 +0000 (19:07 +0100)]
libxl: const-ify libxl_uuid_*() API

Add const to parameters of libxl_uuid_*() calls where it does not
change the API.

Add libxl_uuid_byte_array_const() to return a const array.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: Add Valgrind client requests
Andrew Cooper [Tue, 10 Jun 2014 14:41:07 +0000 (15:41 +0100)]
tools/libxc: Add Valgrind client requests

Valgrind client requests can be used by code to provide extra debugging
information about memory ranges, or to request checks at specific points.

Reference:
  http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs

Client requests are safe to compile into code for running outside of
valgrind.  Therefore, enable client requests whenever autoconf can find
memcheck.h and debug builds are enabled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- reran autogen.sh ]

10 years agox86/VPMU: mark context LOADED before registers are loaded
Boris Ostrovsky [Wed, 11 Jun 2014 08:55:43 +0000 (10:55 +0200)]
x86/VPMU: mark context LOADED before registers are loaded

Because a PMU interrupt may be generated as soon as PMU registers are
loaded (or, more precisely, as soon as HW PMU is "armed") we don't want
to delay marking context as LOADED until after registers are loaded.
Otherwise during interrupt handling VPMU_CONTEXT_LOADED may not be set
and this could be confusing.

(Technically, only SVM needs this change right now since VMX will "arm"
PMU later, during VMRUN when global control register is loaded from
VMCS. However, both AMD and Intel code will require this patch when we
introduce PV VPMU.)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agolibxl: move some internal functions to libxl_internal.h
Wei Liu [Tue, 10 Jun 2014 21:21:40 +0000 (22:21 +0100)]
libxl: move some internal functions to libxl_internal.h

In 752f181f ("libxl_json: introduce parser functions for builtin types")
a bunch of parser functions are added to libxl_json.h, which breaks
GCC < 4.6.

These functions are internal and libxl_json.h is public header, so move
them to libxl_internal.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoRevert "x86/EFI: improve boot time diagnostics"
Jan Beulich [Tue, 10 Jun 2014 15:56:11 +0000 (17:56 +0200)]
Revert "x86/EFI: improve boot time diagnostics"

This reverts commit 9921387f0c14a3f0ed42f9112efb7260af13db35.
It added an error check where none should be.

10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 10 Jun 2014 15:04:12 +0000 (16:04 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agotools/libxc: Introduce ARRAY_SIZE() and replace handrolled examples
Andrew Cooper [Tue, 10 Jun 2014 14:07:59 +0000 (15:07 +0100)]
tools/libxc: Introduce ARRAY_SIZE() and replace handrolled examples

xen-hptool and xen-mfndump include xc_private.h.  This is bad, but not trivial
to fix, so they gain a protective #undef and a stern comment.

MiniOS leaks ARRAY_SIZE into the libxc namespace as part of a stubdom build.
Therefore, xc_private.h gains an #ifndef until MiniOS is fixed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/domctl: remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext
Andrew Cooper [Tue, 10 Jun 2014 14:59:11 +0000 (16:59 +0200)]
x86/domctl: remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext

The PV MSR functionality is now implemented as a separate set of domctls.

This is a revert of parts of c/s65e3554908
  "x86/PV: support data breakpoint extension registers"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agolibxc: use an explicit check for PV MSRs in xc_domain_save()
Andrew Cooper [Tue, 10 Jun 2014 14:58:47 +0000 (16:58 +0200)]
libxc: use an explicit check for PV MSRs in xc_domain_save()

Migrating PV domains using MSRs is not supported.  This uses the new
XEN_DOMCTL_get_vcpu_msrs and will fail the migration with an explicit error.

This is an improvement upon the current failure of
  "No extended context for VCPUxx (ENOBUFS)"

Support for migrating PV domains which are using MSRs will be included in the
migration v2 work.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
10 years agox86/domctl: implement XEN_DOMCTL_{get,set}_vcpu_msrs
Andrew Cooper [Tue, 10 Jun 2014 14:57:16 +0000 (16:57 +0200)]
x86/domctl: implement XEN_DOMCTL_{get,set}_vcpu_msrs

Despite my 'Reviewed-by' tag on c/s 65e3554908 "x86/PV: support data
breakpoint extension registers", I have re-evaluated my position as far as the
hypercall interface is concerned.

Previously, for the sake of not modifying the migration code in libxc,
XEN_DOMCTL_get_ext_vcpucontext would jump though hoops to return -ENOBUFS if
and only if MSRs were in use and no buffer was present.

This is fragile, and awkward from a toolstack point-of-view when actually
sending MSR content in the migration stream.  It also complicates fixing a
further race condition, between querying the number of MSRs for a vcpu, and
the vcpu touching a new one.

As this code is still only in unstable, take this opportunity to redesign the
interface.  This patch introduces the brand new XEN_DOMCTL_{get,set}_vcpu_msrs
subops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoarch/arm: domain build: let dom0 access I/O memory of mapped devices
Arianna Avanzini [Sun, 25 May 2014 10:51:42 +0000 (12:51 +0200)]
arch/arm: domain build: let dom0 access I/O memory of mapped devices

Currently, dom0 is allowed access to the I/O memory ranges used
to access devices exposed to it, but it doesn't have those
ranges in its iomem_caps. This commit implements the correct
bookkeeping in the generic function which actually maps a
device's I/O memory to the domain, adding the ranges to the
domain's iomem_caps.

NOTE: This commit suffers from the following limitations;
      . with this patch, I/O memory ranges pertaining disabled
        devices are not mapped;
      . the "iomem" option could be used to map memory ranges that
        are not described in the device tree.
      In both these cases, this patch does not allow the domain
      the privileges needed to map the needed I/O memory ranges
      afterwards.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Paolo Valente <paolo.valente@unimore.it>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Eric Trudeau <etrudeau@broadcom.com>
Cc: Viktor Kleinik <viktor.kleinik@globallogic.com>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 10 Jun 2014 14:11:04 +0000 (15:11 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agolibxl: introduce libxl_cpuid_policy_list_length
Wei Liu [Mon, 9 Jun 2014 12:43:26 +0000 (13:43 +0100)]
libxl: introduce libxl_cpuid_policy_list_length

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: introduce libxl_key_value_list_length
Wei Liu [Mon, 9 Jun 2014 12:43:25 +0000 (13:43 +0100)]
libxl: introduce libxl_key_value_list_length

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl/gentypes.py: special-case KeyedUnion map handle generation
Wei Liu [Mon, 9 Jun 2014 12:43:21 +0000 (13:43 +0100)]
libxl/gentypes.py: special-case KeyedUnion map handle generation

Generate JSON map handle according to KeyedUnion discriminator.

The original JSON output for a keyed union is like:
 {
   ...
   "u" : { FIELDS }
   ...
 }

The discriminator is not generated, so that the parser won't be able to
figure out the fields in the incoming stream.

So we need to change this to something more sensible. For example, for
keyed union libxl_domain_type, which has a discriminator called "type",
we generate following for HVM guest:
 {
   ...
   "type.hvm" : { HVM FIELDS }
   ...
 }

Parser then can know the type of this union and how to interpret the
incoming stream.

Note that we change the existing API here. However the original output is
quite broken anyway, we cannot make sensible use of it and I doubt that
there's existing user of existing API. So we are acutally fixing a
problem.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl_json: introduce parser functions for builtin types
Wei Liu [Mon, 9 Jun 2014 12:43:20 +0000 (13:43 +0100)]
libxl_json: introduce parser functions for builtin types

This changeset introduces following functions:
 * libxl_defbool_parse_json
 * libxl__bool_parse_json
 * libxl_uuid_parse_json
 * libxl_mac_parse_json
 * libxl_bitmap_parse_json
 * libxl_cpuid_policy_list_parse_json
 * libxl_string_list_parse_json
 * libxl_key_value_list_parse_json
 * libxl_hwcap_parse_json
 * libxl__int_parse_json
 * libxl__uint{8,16,32,64}_parse_json
 * libxl__string_parse_json

They will be used in later patch to convert the libxl__json_object
tree of a builtin type to libxl_FOO struct.

Also remove declaration of libxl_domid_gen_json as libxl_domid uses
yajl_gen_integer to generate JSON object.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
10 years agolibxl_json: introduce libxl__object_from_json
Wei Liu [Mon, 9 Jun 2014 12:43:19 +0000 (13:43 +0100)]
libxl_json: introduce libxl__object_from_json

Given a JSON string, we need to convert it to libxl_FOO struct.

The approach is:
JSON string -> libxl__json_object -> libxl_FOO struct

With this approach we can make use of libxl's infrastructure to do the
first half (JSON string -> libxl__json_object).

Second half is done by auto-generated code by libxl's IDL
infrastructure. IDL patch(es) will come later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl IDL: rename json_fn to json_gen_fn
Wei Liu [Mon, 9 Jun 2014 12:43:18 +0000 (13:43 +0100)]
libxl IDL: rename json_fn to json_gen_fn

This json_fn is in fact used to generate string representation of a json
data structure. We will introduce another json function to parse json
data structure in later changeset, so rename json_fn to json_gen_fn to
clarify.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: fix JSON generator for uint64_t
Wei Liu [Mon, 9 Jun 2014 12:43:17 +0000 (13:43 +0100)]
libxl: fix JSON generator for uint64_t

yajl_gen_integer cannot cope with uint64_t, because it takes a signed
long long. If we pass to it an uint64_t number which is between INT_MAX
and UINT_MAX, it generates a negative number. Later when we feed this
generated number into parser, the result gets signed extended, which is
wrong.

A new function called libxl__uint64_gen_json is introduced to handle
uint64_t. It utilises yajl_gen_number to generate numbers.

Also removed a duplicated definition of MemKB while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxl: remove parsing of "vncviewer" option in xl domain config file
Wei Liu [Mon, 9 Jun 2014 12:43:16 +0000 (13:43 +0100)]
xl: remove parsing of "vncviewer" option in xl domain config file

Print out a warning and suggest user use "-V" option when invoking "xl
create". Also remove that option in manpage. This will introduce a
minor functional regression but it's very easy to work around.

The rationale behind this change is that, this option is actually not
part of domain configuration. It just affects whether a vncviewer
should be automatically spawn, but has nothing to do with how a domain
should be constructed. And this option is also bogus, considering if you
migrate a domain to a remote host and the receiver spawns a vncviewer on
the receiving side then it either dies silently or occupies resource.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: make cpupool_qualifier_to_cpupoolid a library function
Wei Liu [Mon, 9 Jun 2014 12:43:12 +0000 (13:43 +0100)]
libxl: make cpupool_qualifier_to_cpupoolid a library function

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: add DECLARE_HYPERCALL_BUFFER_SHADOW()
David Vrabel [Mon, 9 Jun 2014 15:41:10 +0000 (16:41 +0100)]
tools/libxc: add DECLARE_HYPERCALL_BUFFER_SHADOW()

DECLARE_HYPERCALL_BUFFER_SHADOW() is like DECLARE_HYPERCALL_BUFFER()
except it is backed by an already allocated hypercall buffer.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: Use _Static_assert if available
Andrew Cooper [Mon, 9 Jun 2014 15:41:08 +0000 (16:41 +0100)]
tools/libxc: Use _Static_assert if available

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: Annotate xc_osdep_log with __attribute__((format))
Andrew Cooper [Mon, 9 Jun 2014 15:41:07 +0000 (16:41 +0100)]
tools/libxc: Annotate xc_osdep_log with __attribute__((format))

This helps the compiler spot printf formatting errors.

Fix up resulting errors in xenctrl_osdep_ENOSYS.c.  Substitute %p for the
slightly less bad %lx when trying to format an opaque structure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: Annotate xc_report_error with __attribute__((format))
Andrew Cooper [Mon, 9 Jun 2014 15:41:06 +0000 (16:41 +0100)]
tools/libxc: Annotate xc_report_error with __attribute__((format))

This helps the compiler spot printf formatting errors.

Fix up all errors discovered.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/traps: const-correctness for IST handlers
Andrew Cooper [Tue, 10 Jun 2014 11:13:47 +0000 (13:13 +0200)]
x86/traps: const-correctness for IST handlers

NMI and MCE interrupt handlers have no right to modify their exception frame
or underlying vcpu registers.  Apply liberal quantities of 'const' to 'struct
cpu_user_regs *' throughout the codebase.

The Double Fault handler, while an IST handler, reloads some extra
architectural state back into its regs parameter.  As this is for printing
purposes and on a terminal error path, the const requirements for #DF are
relaxed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/EFI: improve boot time diagnostics
Jan Beulich [Tue, 10 Jun 2014 11:13:13 +0000 (13:13 +0200)]
x86/EFI: improve boot time diagnostics

To aid analysis of eventual errors, print EFI status codes with error
messages where available.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/HVM: refine SMEP/SMAP tests in HVM_CR4_GUEST_RESERVED_BITS()
Jan Beulich [Tue, 10 Jun 2014 11:12:05 +0000 (13:12 +0200)]
x86/HVM: refine SMEP/SMAP tests in HVM_CR4_GUEST_RESERVED_BITS()

Andrew validly points out that the use of the macro on the restore path
can't rely on the CPUID bits for the guest already being in place (as
their setting by the tool stack in turn requires the other restore
operations already having taken place). And even worse, using
hvm_cpuid() is invalid here because that function assumes to be used in
the context of the vCPU in question.

Reverting to the behavior prior to the change from checking
cpu_has_sm?p to hvm_vcpu_has_sm?p() would break the other (non-restore)
use of the macro. So let's revert to the prior behavior only for the
restore path, by adding a respective second parameter to the macro.

Obviously the two cpu_has_* uses in the macro should really also be
converted to hvm_cpuid() based checks at least for the non-restore
path.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
10 years agoxen: arm: include .text.cold and .text.unlikely in text area
Ian Campbell [Mon, 9 Jun 2014 14:28:12 +0000 (15:28 +0100)]
xen: arm: include .text.cold and .text.unlikely in text area

Otherwise functions in these sections can end up between .text and .rodata
which is after _etext and therefore gets made non-executable.

This matches x86 (although it was done there for different reasons).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Cc: Jan Beulich <JBeulich@suse.com>
10 years agocpufreq: extend documentation for cpufreq parameter
Aravind Gopalakrishnan [Tue, 10 Jun 2014 10:05:37 +0000 (12:05 +0200)]
cpufreq: extend documentation for cpufreq parameter

cpufreq parameter can take more options than currently
documented. Include these with some comments regarding
their intention.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
10 years agocommon/grant: add a newline into error message
Andrew Cooper [Tue, 10 Jun 2014 10:04:59 +0000 (12:04 +0200)]
common/grant: add a newline into error message

Avoid corrupting the next line on the console.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86,amd: remove unused wrmsr_amd
Aravind Gopalakrishnan [Tue, 10 Jun 2014 10:04:35 +0000 (12:04 +0200)]
x86,amd: remove unused wrmsr_amd

After Andrew's commit 07884c9, all writes to password-protected
MSR's are performed using wrmsr_amd_safe.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
10 years agoavoid crash on HVM domain destroy with PCI passthrough
Juergen Gross [Tue, 10 Jun 2014 10:04:08 +0000 (12:04 +0200)]
avoid crash on HVM domain destroy with PCI passthrough

c/s bac6334b5 "move domain to cpupool0 before destroying it" introduced a
problem when destroying a HVM domain with PCI passthrough enabled. The
moving of the domain to cpupool0 includes moving the pirqs to the cpupool0
cpus, but the event channel infrastructure already is unusable for the
domain. So just avoid moving pirqs for dying domains.

Signed-off-by: Juergen Gross <jgross@suse.com>
10 years agox86/domctl: further fix to XEN_DOMCTL_[gs]etvcpuextstate
Andrew Cooper [Tue, 10 Jun 2014 10:03:16 +0000 (12:03 +0200)]
x86/domctl: further fix to XEN_DOMCTL_[gs]etvcpuextstate

Do not clobber errors from certain codepaths.  Clobbering of -EINVAL from
failing "evc->size <= PV_XSAVE_SIZE(_xcr0_accum)" was a pre-existing bug.

However, clobbering -EINVAL/-EFAULT from the get codepath was a bug
unintentionally introduced by 090ca8c1 "x86/domctl: two functional fixes to
XEN_DOMCTL_[gs]etvcpuextstate".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/amd: protect set_cpuidmask() against #GP faults
Andrew Cooper [Thu, 5 Jun 2014 15:57:07 +0000 (17:57 +0200)]
x86/amd: protect set_cpuidmask() against #GP faults

Virtual environments such as Xen HVM containers and VirtualBox do not
necessarily provide support for feature masking MSRs.

As their presence is detected by model numbers alone, and their use predicated
on command line parameters, use the safe() variants of {wr,rd}msr() to avoid
dying with an early #GP fault.

In fact, use the password variants in all cases because:
    a) they are safe to use even if not strictly required
    b) have a more useful function prototype for this purposes

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
10 years agox86: fix reboot/shutdown with running HVM guests
Roger Pau Monné [Thu, 5 Jun 2014 15:53:35 +0000 (17:53 +0200)]
x86: fix reboot/shutdown with running HVM guests

If there's a guest using VMX/SVM when the hypervisor shuts down, it
can lead to the following crash due to VMX/SVM functions being called
after hvm_cpu_down has been called. In order to prevent that, check in
{svm/vmx}_ctxt_switch_from that the cpu virtualization extensions are
still enabled.

(XEN) Domain 0 shutdown: rebooting machine.
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
...
(XEN) Xen call trace:
(XEN)    [<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
(XEN)    [<ffff82d08015d129>] __context_switch+0x127/0x462
(XEN)    [<ffff82d080160acf>] __sync_local_execstate+0x6a/0x8b
(XEN)    [<ffff82d080160af9>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82d080161728>] map_domain_page+0x88/0x4de
(XEN)    [<ffff82d08014e721>] map_vtd_domain_page+0xd/0xf
(XEN)    [<ffff82d08014cda2>] io_apic_read_remap_rte+0x158/0x29f
(XEN)    [<ffff82d0801448a8>] iommu_read_apic_from_ire+0x27/0x29
(XEN)    [<ffff82d080165625>] io_apic_read+0x17/0x65
(XEN)    [<ffff82d080166143>] __ioapic_read_entry+0x38/0x61
(XEN)    [<ffff82d080166aa8>] clear_IO_APIC_pin+0x1a/0xf3
(XEN)    [<ffff82d080166bae>] clear_IO_APIC+0x2d/0x60
(XEN)    [<ffff82d080166f63>] disable_IO_APIC+0xd/0x81
(XEN)    [<ffff82d08018228b>] smp_send_stop+0x58/0x68
(XEN)    [<ffff82d080181aa7>] machine_restart+0x80/0x20a
(XEN)    [<ffff82d080181c3c>] __machine_restart+0xb/0xf
(XEN)    [<ffff82d080128fb9>] smp_call_function_interrupt+0x99/0xc0
(XEN)    [<ffff82d080182330>] call_function_interrupt+0x33/0x43
(XEN)    [<ffff82d08016bd89>] do_IRQ+0x9e/0x63a
(XEN)    [<ffff82d08016406f>] common_interrupt+0x5f/0x70
(XEN)    [<ffff82d0801a8600>] mwait_idle+0x29c/0x2f7
(XEN)    [<ffff82d08015cf67>] idle_loop+0x58/0x76
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ****************************************

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
10 years agox86/domctl: two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate
Andrew Cooper [Thu, 5 Jun 2014 15:52:57 +0000 (17:52 +0200)]
x86/domctl: two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate

Interacting with the vcpu itself should be protected by vcpu_pause().
Buggy/naive toolstacks might encounter adverse interaction with a vcpu context
switch, or increase of xcr0_accum.  There are no much problems with current
in-tree code.

Explicitly permit a NULL guest handle as being a request for size.  It is the
prevailing Xen style, and without it, valgrind's ioctl handler is unable to
determine whether evc->buffer actually got written to.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/xsave: add fastpath for common xstate_ctxt_size() requests
Andrew Cooper [Thu, 5 Jun 2014 15:52:11 +0000 (17:52 +0200)]
x86/xsave: add fastpath for common xstate_ctxt_size() requests

xstate_ctxt_size(xfeature_mask) is runtime constant after boot, and for bounds
checking when handling xsave state.  Avoid reloading xcr0 twice to obtain a
number which has already been calculated.

Also annotate xfeature_mask as __read_mostly as it is only ever written once.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoVT-d: honor APEI firmware-first mode in XSA-59 workaround code
Jan Beulich [Thu, 5 Jun 2014 15:49:14 +0000 (17:49 +0200)]
VT-d: honor APEI firmware-first mode in XSA-59 workaround code

When firmware-first mode is being indicated by firmware, we shouldn't
be modifying AER registers - these are considered to be owned by
firmware in that case. Violating this is being reported to result in
SMI storms. While circumventing the workaround means re-exposing
affected hosts to the XSA-59 issues, this in any event seems better
than not booting at all. Respective messages are being issued to the
log, so the situation can be diagnosed.

The basic building blocks were taken from Linux 3.15-rc. Note that
this includes a block of code enclosed in #ifdef CONFIG_X86_MCE - we
don't define that symbol, and that code also wouldn't build without
suitable machine check side code added; that should happen eventually,
but isn't subject of this change.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agox86/HVM: make vmsi_deliver() return proper error values
Jan Beulich [Thu, 5 Jun 2014 15:46:13 +0000 (17:46 +0200)]
x86/HVM: make vmsi_deliver() return proper error values

... and propagate this from hvm_inject_msi(). In the course of this I
spotted further room for cleanup:
- vmsi_inj_irq()'s struct domain * parameter was unused
- vmsi_deliver() pointlessly passed on dest_ExtINT to vmsi_inj_irq()
  (which that one validly refused to handle)
- vmsi_inj_irq()'s sole caller guarantees a proper delivery mode (i.e.
  rather than printing an obscure message we can just BUG())
- some formatting and log message quirks

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/HVM: properly propagate errors from HVMOP_inject_msi
Jan Beulich [Thu, 5 Jun 2014 15:45:27 +0000 (17:45 +0200)]
x86/HVM: properly propagate errors from HVMOP_inject_msi

There are a number of ways this operation can go wrong, all of which
got ignored so far.

In the context of this I wonder whether map_domain_emuirq_pirq()
returning 0 in the "already mapped" case is really intended to be that
way (this is why the subsequent NULL check here can't be an ASSERT()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/hvm: correct hvm_ioreq_server_alloc_rangesets() failure path
Andrew Cooper [Thu, 5 Jun 2014 15:43:26 +0000 (17:43 +0200)]
x86/hvm: correct hvm_ioreq_server_alloc_rangesets() failure path

Coverity-ID: 1220092 "Unsigned compare against 0"
Coverity-ID: 1220093 "Out-of-bounds read"

Both of these are cased by the the while() loop in the fail path, which
results in an infinite loop and memory corruption from rangeset_destroy().

Move hvm_ioreq_server_free_rangesets() up and use it for cleanup on the
failure path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
10 years agoiommu: set correct IOMMU entries when !iommu_hap_pt_share
Roger Pau Monné [Thu, 5 Jun 2014 15:42:49 +0000 (17:42 +0200)]
iommu: set correct IOMMU entries when !iommu_hap_pt_share

If the memory map is not shared between HAP and IOMMU we fail to set
correct IOMMU mappings for memory types other than p2m_ram_rw.

This patchs adds IOMMU support for the following memory types:
p2m_grant_map_rw, p2m_map_foreign, p2m_ram_ro, p2m_grant_map_ro and
p2m_ram_logdirty.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Tested-by: David Zhuang <david.zhuang@oracle.com>
10 years agomake logdirty and iommu mutually exclusive
Roger Pau Monné [Thu, 5 Jun 2014 15:41:46 +0000 (17:41 +0200)]
make logdirty and iommu mutually exclusive

Prevent the usage of global logdirty if the domain is using the IOMMU,
and also prevent passthrough of devices if logdirty is enabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agodocs: Support building pdfs from markdown using pandoc
Andrew Cooper [Tue, 3 Jun 2014 13:13:48 +0000 (14:13 +0100)]
docs: Support building pdfs from markdown using pandoc

The Xen command line parameters document is far more useful as an indexed pdf
than it is as unindexed html webpage.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- reran autogen.sh ]

10 years agolibxc/trace: Fix style
Konrad Rzeszutek Wilk [Wed, 4 Jun 2014 13:44:29 +0000 (09:44 -0400)]
libxc/trace: Fix style

Most of the functions follow the proper style, but these
two are the odd ones out.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agodocs: xentrace manpage
Konrad Rzeszutek Wilk [Wed, 4 Jun 2014 13:44:27 +0000 (09:44 -0400)]
docs: xentrace manpage

Update the -c and -e parameters wording.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoadded xentop option -f , --full-name to xentop manpage
Christian Wolter [Thu, 5 Jun 2014 09:24:54 +0000 (11:24 +0200)]
added xentop option -f , --full-name to xentop manpage

Signed-off-by: Christian Wolter <wolter@b1-systems.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: ensure we hold a reference to guest pages while we copy to/from them
Ian Campbell [Wed, 4 Jun 2014 13:58:38 +0000 (14:58 +0100)]
xen: arm: ensure we hold a reference to guest pages while we copy to/from them

This at once:
 - prevents the page from being reassigned under our feet
 - ensures that the domain owns the page, which stops a domain from giving a
   grant mapping, MMIO region, other non-RAM as a hypercall input/output.

We need to hold the p2m lock while doing the lookup until we have the
reference.

This also requires that during domain 0 building current is set to an actual
dom0 vcpu, so take care of this at the same time as the p2m is temporarily
loaded.

Lastly when dumping the guest stack we need to make sure that the guest hasn't
pointed its sp off into the weeds and/or misaligned it, which could lead to
hypervisor traps. Solve this by using the new function and checking alignment
first.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: check permissions when copying to/from guest virtual addresses
Ian Campbell [Wed, 4 Jun 2014 13:58:36 +0000 (14:58 +0100)]
xen: arm: check permissions when copying to/from guest virtual addresses

In particular we need to make sure the guest has write permissions to buffers
which it passes as output buffers for hypercalls, otherwise the guest can
overwrite memory which it shouldn't be able to write (like r/o grant table
mappings).

This is XSA-98.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agox86/PVH: avoid call to handle_mmio
Mukesh Rathor [Wed, 4 Jun 2014 09:27:50 +0000 (11:27 +0200)]
x86/PVH: avoid call to handle_mmio

handle_mmio() is currently unsafe for pvh guests. A call to it would
result in call to vioapic_range that will crash xen since the vioapic
ptr in struct hvm_domain is not initialized for pvh guests.

However, one path exists for such a call. If a pvh guest, dom0 or domU,
unintentionally touches non-existing memory, an EPT violation would occur.
This would result in unconditional call to hvm_hap_nested_page_fault. In
that function, because get_gfn_type_access returns p2m_mmio_dm for non
existing mfns by default, handle_mmio() will get called. This would result
in xen crash instead of the guest crash. This patch addresses that.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
10 years agoACPI: Prevent acpi_table_entries from falling into a infinite loop
Malcolm Crossley [Wed, 4 Jun 2014 09:26:15 +0000 (11:26 +0200)]
ACPI: Prevent acpi_table_entries from falling into a infinite loop

If a buggy BIOS programs an ACPI table with to small an entry length
then acpi_table_entries gets stuck in an infinite loop.

To aid debugging, report the error and exit the loop.

Based on Linux kernel commit 369d913b242cae2205471b11b6e33ac368ed33ec

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Use < instead of <= (which I wrongly suggested), return -ENODATA
instead of -EINVAL, and make description match code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agoVT-d: replace another fixmap use with ioremap()
Jan Beulich [Wed, 4 Jun 2014 09:24:33 +0000 (11:24 +0200)]
VT-d: replace another fixmap use with ioremap()

... making the code more generic and limiting address space consumption
(however small it might be) to just those machines that need this
mapping (this is an erratum workaround after all).

At the same time properly map the full needed range from the base
address instead of just the third page and fix some formatting.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agox86/HVM: eliminate vulnerabilities from hvm_inject_msi()
Jan Beulich [Tue, 3 Jun 2014 13:17:14 +0000 (15:17 +0200)]
x86/HVM: eliminate vulnerabilities from hvm_inject_msi()

- pirq_info() returns NULL for a non-allocated pIRQ, and hence we
  mustn't unconditionally de-reference it, and we need to invoke it
  another time after having called map_domain_emuirq_pirq()
- don't use printk(), namely without XENLOG_GUEST, for error reporting

This is XSA-96.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agoUpdate mail address
Juergen Gross [Tue, 3 Jun 2014 12:03:03 +0000 (14:03 +0200)]
Update mail address

Signed-off-by: Juergen Gross <jgross@suse.com>
10 years agox86, mce: remove amd_{k8,f10}_mcheck_init functions
Aravind Gopalakrishnan [Tue, 3 Jun 2014 10:02:11 +0000 (12:02 +0200)]
x86, mce: remove amd_{k8,f10}_mcheck_init functions

With all AMD mcheck initialization unified now after
commit 518576c, these two function definitions can be removed.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
10 years agosupport 'tera' suffixes for size parameters
Andrew Cooper [Tue, 3 Jun 2014 10:01:56 +0000 (12:01 +0200)]
support 'tera' suffixes for size parameters

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/xsave: remove xfeat_mask checking from validate_xstate()
Andrew Cooper [Tue, 3 Jun 2014 10:00:53 +0000 (12:00 +0200)]
x86/xsave: remove xfeat_mask checking from validate_xstate()

validate_xsave() is called codepaths which load new vcpu xsave state from
XEN_DOMCTL_{setvcpuextstate,sethvmcontext}, usually as part of migration.  In
both cases, this is the xfeature_mask of the saving Xen rather than the
restoring Xen.

Given that the xsave state itself is checked for consistency and validity on
the current cpu, checking whether it was valid for the cpu before migration is
not interesting (or indeed relevant, as the error can't be distinguished from
the other validity checking).

This change removes the need to pass the saving Xen's xfeature_mask,
simplifying the toolstack code and migration stream format in this area.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: use alternative mechanism to define CLAC/STAC
Feng Wu [Tue, 3 Jun 2014 09:56:24 +0000 (11:56 +0200)]
x86: use alternative mechanism to define CLAC/STAC

This patch use alternative mechanism to define CLAC/STAC.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: port the basic alternative mechanism from Linux to Xen
Feng Wu [Tue, 3 Jun 2014 09:31:21 +0000 (11:31 +0200)]
x86: port the basic alternative mechanism from Linux to Xen

This patch ports the basic alternative mechanism from Linux to Xen.
With this mechanism, we can patch code based on the CPU features.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: make set_nmi_callback return the old nmi callback
Feng Wu [Tue, 3 Jun 2014 09:29:38 +0000 (11:29 +0200)]
x86: make set_nmi_callback return the old nmi callback

This patch makes set_nmi_callback return the old nmi callback, so
we can set it back later.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: add definitions for NOP operation
Feng Wu [Tue, 3 Jun 2014 09:29:12 +0000 (11:29 +0200)]
x86: add definitions for NOP operation

This patch adds definitions for different length of NOP operation.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoxen/arm: grant: Add another entry to map MFN 1:1 in dom0 p2m
Julien Grall [Tue, 27 May 2014 11:11:41 +0000 (12:11 +0100)]
xen/arm: grant: Add another entry to map MFN 1:1 in dom0 p2m

Grant mappings can be used for DMA requests. Currently the dev_bus_addr returned
by the hypercall is the MFN (not the IPA). Guest expects to be able the returned
address for DMA. When the device is protected by IOMMU the request will fail.
Therefore, we have to add 1:1 mapping in the domain p2m to allow DMA request
to work.

This is valid because DOM0 has its memory mapped 1:1 and therefore we know
that RAM and devices cannot clash.

If the guest only owns protected device, the return dev_bus_addr should be an
IPA. This will allow us to remove safely the 1:1 mapping and make grant mapping
works correctly in the guest. For now, this is not addressed by this patch.

The grant mapping code does the reference counting on every MFN and will
call iommu_{map,unmap}_page when necessary. This was already handle for x86
PV guests, so we can reuse the same code path for ARM guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
[ ijc s/ld/d/ in both arch's gnttab_need_iommu_mapping() ]