]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agox86/LAPIC: drop support for non-integrated APIC
Jan Beulich [Thu, 25 Sep 2014 09:56:22 +0000 (11:56 +0200)]
x86/LAPIC: drop support for non-integrated APIC

We never really supported such, even in the 32-bit days.

As a minor extra thing move the APIC_SELF_IPI definition out of the
middle of Divider Configuration Register ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: make dump_pageframe_info() slightly more verbose for dying domains
Jan Beulich [Thu, 25 Sep 2014 09:55:49 +0000 (11:55 +0200)]
x86: make dump_pageframe_info() slightly more verbose for dying domains

Allowing more than just 10 pages to be printed in this case gives a
better chance to fully understand eventual page reference leaks: Report
up to 16 "normal" (writable or untyped) pages, and an unlimited number
of special type (page or descriptor table) ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation
Jan Beulich [Thu, 25 Sep 2014 09:53:32 +0000 (11:53 +0200)]
x86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation

SYSCALL:
- make sure SS selector has RPL 0
- only use 32 bits of RIP to fill RCX when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless initializers and casts
- drop redundant MSR_STAR read (as suggested by Andrew Cooper)

SYSENTER/SYSEXIT:
- #GP condition doesn't depend on guest mode
- only use 32 bits for setting RIP/RSP when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless (and inconsistently used) casts

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/p2m: typo fix for spelling ambiguous
Tamas K Lengyel [Wed, 24 Sep 2014 09:19:57 +0000 (11:19 +0200)]
x86/p2m: typo fix for spelling ambiguous

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 24 Sep 2014 09:15:19 +0000 (10:15 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agox86/EFI: fix freeing of uninitialized pointer
Roy Franz [Wed, 24 Sep 2014 09:09:11 +0000 (11:09 +0200)]
x86/EFI: fix freeing of uninitialized pointer

The only valid response from the LocateHandle() call is EFI_BUFFER_TOO_SMALL,
so exit if we get anything else.  We pass a 0 size/NULL pointer buffer, so the
only other returns we will get is an error.  Return right away as there is
nothing to do.  Also return if there is an error allocating the buffer, as the
previous code path also allowed for an undefined pointer to be freed.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Re-structure the change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agoflask/policy: use naming convention xenpolicy-$(XEN_FULLVERSION)
Wei Liu [Mon, 15 Sep 2014 19:29:15 +0000 (20:29 +0100)]
flask/policy: use naming convention xenpolicy-$(XEN_FULLVERSION)

Daniel suggested we use xenpolicy-$(XEN_FULLVERSION) as flask policy
naming convention.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
10 years agoFix QEMU cross-compile build
Stefano Stabellini [Tue, 23 Sep 2014 16:29:29 +0000 (17:29 +0100)]
Fix QEMU cross-compile build

Introduce the per-arch IOEMU_CPU_ARCH variable.
Always pass --configure=IOEMU_CPU_ARCH to QEMU's configure script.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- dropped redundant comments ]

10 years agoMAINTAINERS: Add Wei Liu as toolstack co-maintainer.
Ian Campbell [Mon, 22 Sep 2014 16:10:39 +0000 (17:10 +0100)]
MAINTAINERS: Add Wei Liu as toolstack co-maintainer.

The three existing maintainers are not really able to keep up with
the flow and Wei is one of the top tools contributors (according to
"git shortlog -s -n -p RELEASE-4.4.0..origin/staging tools" and my
own impressions).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agodocs: add PVH specification
Roger Pau Monne [Tue, 23 Sep 2014 16:17:18 +0000 (18:17 +0200)]
docs: add PVH specification

Introduce a document that describes the interfaces used on PVH. This
document has been designed from a guest OS point of view (i.e.: what a guest
needs to do in order to support PVH).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Mukesh Rathor <mukesh.rathor@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
10 years agoxen/arm: remove check for generic timer support for arm64
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:49 +0000 (17:43 +0530)]
xen/arm: remove check for generic timer support for arm64

Information about support for generic support is available in
IDR_PFR1 register in ARMv7. Where as this information is not
available in ARMv8 that supports only aarch64 bit mode.
ARMv8 being always supports generic timer, this check is not
required.

For platforms that support only aarch64 mode, IDR_PFR1 is
not implemented

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Restricted access to IFSR32_EL2 and FPEXC32_EL2
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:48 +0000 (17:43 +0530)]
xen/arm: Restricted access to IFSR32_EL2 and FPEXC32_EL2

IFSR32_EL1 and FPEXC32_EL1 registers are accessible in
aarch64 mode only if aarch32 mode is support in EL1.
So allow access to these registers only for 32-bit domains.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Use REG_RANK_INDEX macro
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:46 +0000 (17:43 +0530)]
xen/arm: Use REG_RANK_INDEX macro

Use REG_RANK_INDEX macro to compute index to access
vgic ipriority[] and itargets[] for a given irq.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Remove a duplicate calculation of be_path
Ian Jackson [Tue, 23 Sep 2014 16:46:21 +0000 (17:46 +0100)]
libxl: Remove a duplicate calculation of be_path

Coverity-ID: 1238177
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agomake: Make "src-tarball" target actually make a source tarball
George Dunlap [Mon, 15 Sep 2014 16:25:04 +0000 (17:25 +0100)]
make: Make "src-tarball" target actually make a source tarball

At the moment, making a release tarball is an annoyingly manual
process that involves running "git archive" into a temporary directory.

Script this process up and make a target, so that the release manager
can simply type "make src-tarball-release" and have everything show up
nice and neat in dist/xen-$version.tar.gz.  "make src-tarball" will
make a version number based on git describe, which will typically have
the most recent tag, number of commits since that tag, and the git
commit id of the current HEAD.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agomake: Add subtree-force-update target
George Dunlap [Mon, 15 Sep 2014 16:25:03 +0000 (17:25 +0100)]
make: Add subtree-force-update target

subtree-force-update will update all subtrees according to the current TAG specified
in Config.mk.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoxen: arm: Add support for the Exynos secure firmware
Suriyan Ramasami [Mon, 22 Sep 2014 18:33:54 +0000 (11:33 -0700)]
xen: arm: Add support for the Exynos secure firmware

The existence of secure firmware is dictated by the presence of
"samsung,secure-firmware" in the DT.

The Arndale board does not have that entry, and uses the address as defined
in "samsung,exynos4210-sysram", offset 0 as the smp init address. This is
possibly true for all SoCs without secure firmware.

For other boards which do have a "secure-firmware" node, use sysram-ns
at offset +0x1c as the smp init address.

The "secure-firmware" MMIO range contains ways to idle the CPU. As this gets
mapped to DOM0 because of its presence in the DT, we blacklist it.

Have tested this on the Odroid XU. I have also tested the other code path
on the Odroid XU by removing "secure-firmware" from its DT. I could see
that the other code path was exercised with correct smp init address
values.

Signed-off-by: Suriyan Ramasami <suriyan.r@gmail.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/viridian: Add partition time reference counter MSR support
Paul Durrant [Tue, 23 Sep 2014 10:40:10 +0000 (11:40 +0100)]
x86/viridian: Add partition time reference counter MSR support

This patch optionally re-instates support for the partition time reference
counter that was previously introduced by commit
e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b and reverted by commit
1cd4fab14ce25859efa4a2af13475e6650a5506c. The previous implementation was
non-optional and flawed.

This implementation uses the tsc of vcpu0, which is preserved across
save/restore as part of the architectural state, and then converts that
to a 100ns tick using the domain's tsc_khz.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Christoph Egger <chegger@amazon.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/viridian: Re-purpose the HVM parameter to be a feature mask
Paul Durrant [Tue, 23 Sep 2014 10:40:09 +0000 (11:40 +0100)]
x86/viridian: Re-purpose the HVM parameter to be a feature mask

The following commits introduced the time reference counter MSR and
TSC/APIC frequency MSRs into the viridian feature set respectively:

e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b
84657efd9116f40924aa13c9d5a349e007da716f

The time reference counter MSR feature was then reverted by commit

1cd4fab14ce25859efa4a2af13475e6650a5506c

because a flaw in the implementation meant the counter was reset on
migration.

All of these changes were made without any addtional options being
added to the VM configuration, or any compatibility checks being made
in the domain save/restore code. Hence setting the single boolean
'viridian' option in the VM configuration yields a different set of
features depending on which version of Xen the VM is started on, and the
feature set can change across migration (so new MSRs can magically appear).

This patch grandfathers in the current viridian features set and calls them
the 'base' and 'freq' feature sets. HVM_PARAM_VIRIDIAN is re-purposed as
a feature mask. The hypervisor has only ever allowed it ot be set to 0
or 1, so the presence of the base and freq sets are indicated by setting
bit 0. The freq set can then be turned off by setting bit 1, thus
restoring the pre-Xen-4.4 base set. Newly implemented viridian features
can be optionally enabled in future by setting further bits.

The viridian option in xl.cfg(5) has also been changed to a list so
that the sets can be individually enabled or disabled. For compatibility,
if the option is specified as a boolean, then a true (1) value will enable
the base and freq sets and a false (0) value will not enable any
enlightenments.

This patch also alters the allowed write accesses to HVM_PARAM_VIRIDIAN.
Currently there is nothing to stop the guest writing this value (which,
while harmless to anything else, should not happen) and nothing to
stop a toolstack from setting the value back to zero whilst the guest is
running, causing CPUID leaves to disappear and MSR accesses to start
causing GPFs in the guest. Both of these possibilities are now disallowed:
Once the parameter is set to a non-zero value it may not be modified (only
re-written with the same value), and guests no longer have any write
access.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: David Scott <dave.scott@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxl: introduce rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:15:02 +0000 (18:15 -0400)]
xl: introduce rtds scheduler

Add xl command for rtds scheduler
Note: VCPU's parameter (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: add rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:14:43 +0000 (18:14 -0400)]
libxl: add rtds scheduler

Add libxl functions to set/get domain's parameters for rtds scheduler
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: add rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:14:18 +0000 (18:14 -0400)]
libxc: add rtds scheduler

Add xc_sched_rtds_* functions to interact with Xen to set/get domain's
parameters for rtds scheduler.
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- xenctrl.h has moved to tools/libxc/include, adjust patch to match ]

10 years agoxen: add real time scheduler rtds
Meng Xu [Sat, 20 Sep 2014 22:13:48 +0000 (18:13 -0400)]
xen: add real time scheduler rtds

This scheduler follows the Preemptive Global Earliest Deadline First
(EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher
priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
idle or has a lower-priority VCPU running on it.)

Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each period;
A VCPU has its budget replenished at the beginning of each period;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each period.
If a VCPU runs out of budget in a period, it has to wait until next period.

Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Queue scheme:
A global runqueue and a global depletedq for each CPU pool.
The runqueue holds all runnable VCPUs with budget and sorted by deadline;
The depletedq holds all VCPUs without budget and unsorted.

Note: cpumask and cpupool is supported.

This is an experimental scheduler.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- use PRI_stime to print delta in burn_budget, to fix build on
         32-bit (i.e. arm32) ]

10 years agotools: enable QEMU for ARM builds
Stefano Stabellini [Fri, 1 Aug 2014 15:32:19 +0000 (16:32 +0100)]
tools: enable QEMU for ARM builds

Build qemu-xen on ARM and ARM64: it is used to provide the PV backends,
disk and framebuffer in particular.

Ideally we would also modify the configure options to only build what is
necessary: a machine just for PV backends. However that is a work in
progress and not yet available in QEMU (see
http://marc.info/?l=qemu-devel&m=139082425718379&w=2). So we just build
the usual i386 target, even though no i386 emulation is going to be done
by qemu-xen on ARM.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoMove xenstore and libxc public headers to include subdir
Stefano Stabellini [Thu, 10 Jul 2014 15:35:28 +0000 (15:35 +0000)]
Move xenstore and libxc public headers to include subdir

Also moves xc_dom.h to include as it is used often by other xen tools.
Use the new include subdirectories to build Xen tools, qemu-xen and
stubdoms.

Add the old libxc include path to the programs that need it to build,
on a case by case basis and commeting that they shouldn't require
internal libxc headers to build.

[ And: update QEMU_TRADITIONAL_REVISION to corresponding qemu patch
   - Ian jackson ]

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoRerun autogen.sh after 7d7147762282 "Use configure --sysconfdir=DIR to se..."
Ian Campbell [Tue, 23 Sep 2014 13:08:51 +0000 (14:08 +0100)]
Rerun autogen.sh after 7d7147762282 "Use configure --sysconfdir=DIR to se..."

I tried to do this but failed to commit --amend correctly before pushing.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86emul: only emulate software interrupt injection for real mode
Jan Beulich [Tue, 23 Sep 2014 12:33:50 +0000 (14:33 +0200)]
x86emul: only emulate software interrupt injection for real mode

Protected mode emulation currently lacks proper privilege checking of
the referenced IDT entry, and there's currently no legitimate way for
any of the respective instructions to reach the emulator when the guest
is in protected mode.

This is XSA-106.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
10 years agox86/emulate: check cpl for all privileged instructions
Andrew Cooper [Tue, 23 Sep 2014 12:33:06 +0000 (14:33 +0200)]
x86/emulate: check cpl for all privileged instructions

Without this, it is possible for userspace to load its own IDT or GDT.

This is XSA-105.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrei LUTAS <vlutas@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/shadow: fix race condition sampling the dirty vram state
Andrew Cooper [Tue, 23 Sep 2014 12:31:47 +0000 (14:31 +0200)]
x86/shadow: fix race condition sampling the dirty vram state

d->arch.hvm_domain.dirty_vram must be read with the domain's paging lock held.

If not, two concurrent hypercalls could both end up attempting to free
dirty_vram (the second of which will free a wild pointer), or both end up
allocating a new dirty_vram structure (the first of which will be leaked).

This is XSA-104.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agoUse configure --sysconfdir=DIR to set CONFIG_DIR
Olaf Hering [Mon, 22 Sep 2014 13:00:07 +0000 (15:00 +0200)]
Use configure --sysconfdir=DIR to set CONFIG_DIR

Preserve existing behaviour: if the option was not given, set existing
defaults for FreeBSD, Solaris and everything else.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
10 years agotools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE
Olaf Hering [Mon, 22 Sep 2014 13:00:06 +0000 (15:00 +0200)]
tools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE

Remove hardcoded /var/run/xen directory path, use XEN_RUN_DIR instead.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: provide variable paths to libxc
Olaf Hering [Mon, 22 Sep 2014 13:00:05 +0000 (15:00 +0200)]
tools/libxc: provide variable paths to libxc

In preparation to remove hardcoded /var/run/xen paths, provide
XEN_RUN_DIR and related directories to xc_private.h. Similar code exists
already for libxl, stubdom and other parts.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxl: use buildmakevars2header to create _paths.h
Olaf Hering [Mon, 22 Sep 2014 13:00:04 +0000 (15:00 +0200)]
tools/libxl: use buildmakevars2header to create _paths.h

Replace usage of buildmakevars2file with buildmakevars2header. The macro
generates a C header file, so remove code which converts shell variables
into C defines. Also update the dependency, the macro itself creates a
dependency for _paths.h. A temporary file is not needed anymore.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: add new macro buildmakevars2header
Olaf Hering [Mon, 22 Sep 2014 13:00:03 +0000 (15:00 +0200)]
Config.mk: add new macro buildmakevars2header

This macro is similar to buildmakevars2file, it just creates a C header
file instead of shell style syntax. Upcoming changes will use this macro
in libxl and libxc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: replace dependency to genpath with actual target
Olaf Hering [Mon, 22 Sep 2014 13:00:02 +0000 (15:00 +0200)]
Config.mk: replace dependency to genpath with actual target

genpath is a detail of buildmakevars2file. Replace the dependency to
genpath with the actual buildmakevars2file target. This change by
itself does not fix any bug. Upcoming changes will add dependencies to
$(target), but no rule exist to create $(target).

To force a rebuild of the $(1) rule the target now depends on the
existing .phony target. This dummy target is already used elsewhere in
the code.

No change in behaviour is expected by this patch.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: move directory list into BUILD_MAKE_VARS
Olaf Hering [Mon, 22 Sep 2014 13:00:01 +0000 (15:00 +0200)]
Config.mk: move directory list into BUILD_MAKE_VARS

To maintain the list of directories in a single place, move the existing
list into its own variable and use it in buildmakevars2file.
Required for upcoming changes.
Trim also whitespaces.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoremove obsolete SUBSYS_DIR variable
Olaf Hering [Mon, 22 Sep 2014 13:00:00 +0000 (15:00 +0200)]
remove obsolete SUBSYS_DIR variable

/var/run is a runtime directory. It is not supposed to be packaged.
Remove unused SUBSYS_DIR variable from Config.mk and distro_mapping.txt.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/examples: remove obsolete install targets
Olaf Hering [Mon, 22 Sep 2014 12:59:59 +0000 (14:59 +0200)]
tools/examples: remove obsolete install targets

install-hotplug and install-udev are obsolete since commit 57bcfa11
("tools/hotplug: Separate OS-specific scripts.")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: use XEN_LOCK_DIR instead of hardcoded path
Olaf Hering [Mon, 22 Sep 2014 12:59:58 +0000 (14:59 +0200)]
tools/hotplug: use XEN_LOCK_DIR instead of hardcoded path

Use XEN_LOCK_DIR because it is a compiletime setting.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_LOCK_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:57 +0000 (14:59 +0200)]
tools/hotplug: create XEN_LOCK_DIR at runtime

Create XEN_LOCK_DIR because it is a compiletime setting. Also /var/lock
might be empty on startup because it is a tmpfs mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_RUN_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:56 +0000 (14:59 +0200)]
tools/hotplug: create XEN_RUN_DIR at runtime

Create XEN_RUN_DIR instead of hardcoded path because it is a compiletime
setting. Also /var/run might be empty on startup because it is a tmpfs
mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/pygrub: store kernels in /var/run/xen/pygrub
Olaf Hering [Mon, 22 Sep 2014 12:59:55 +0000 (14:59 +0200)]
tools/pygrub: store kernels in /var/run/xen/pygrub

Move location of temporary bootfiles from /var/run/xend/boot to
/var/run/xen/pygrub. Create the subdirectory if does not exist.
The <dir> argument --output-directory must be an existing directory.

The reason for this change is that all entrys below /var/run have to be
created at runtime in case /var/run is cleared on every boot.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: remove obsolete path.py from tools/python
Olaf Hering [Mon, 22 Sep 2014 12:59:54 +0000 (14:59 +0200)]
tools: remove obsolete path.py from tools/python

The directory tools/python/xen/util does not exist.
Upcoming changes to genpath will fail if the rule persists.
Nothing uses path.py (anymore?), so get rid it.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed from .gitignore too ]

10 years agotools/mkrpm: allow custom rpm package name
Olaf Hering [Mon, 22 Sep 2014 12:59:53 +0000 (14:59 +0200)]
tools/mkrpm: allow custom rpm package name

Even if xen is configured and compiled with different --prefix= so that
it operates entirely below $prefix, the resulting package from 'make
rpmball' is always called "xen.rpm".

Use an environment name to give a different name.
This can be used like this:

suffix=-bugN
prefix=/opx/xen/staging${suffix}
./configure --prefix=${prefix}
make rpmball PKG_SUFFIX=${suffix}

The result will be "xen-bugN.rpm" instead of "xen.rpm". The benefit is that
many xen${suffix}.rpm packages can be installed at the same time.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoinstall.sh: Preserve permissions from make install
Olaf Hering [Mon, 22 Sep 2014 12:59:52 +0000 (14:59 +0200)]
install.sh: Preserve permissions from make install

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/xenpaging: create dumpdir with mode 0700
Olaf Hering [Mon, 22 Sep 2014 12:59:51 +0000 (14:59 +0200)]
tools/xenpaging: create dumpdir with mode 0700

The swapfile contain sensitive guest info.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agostubdom: fix lwip compile
Olaf Hering [Mon, 22 Sep 2014 12:59:50 +0000 (14:59 +0200)]
stubdom: fix lwip compile

stubdom/lwip-x86_64/src/core/dhcp.c: In function 'dhcp_create_request':
stubdom/lwip-x86_64/src/core/dhcp.c:1359:71: error: array subscript is above array bounds [-Werror=array-bounds]
     dhcp->msg_out->chaddr[i] = (i < netif->hwaddr_len) ? netif->hwaddr[i] : 0/* pad byte*/;

gcc can not know if hwaddr_len exceeds the hwaddr array size,
so force an upper limit to assist gcc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agoxen: arm: Enable physical address space compression (PDX) on arm
Ian Campbell [Wed, 17 Sep 2014 21:21:03 +0000 (22:21 +0100)]
xen: arm: Enable physical address space compression (PDX) on arm

This allows us to support sparse physical address maps which we previously
could not because the frametable would end up taking up an enormous fraction
of RAM.

On a fast model which has RAM at 0x80000000-0x100000000 and
0x880000000-0x900000000 this reduces the size of the frametable from
478M to 84M.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: add helpers for PDX mask initialisation calculations
Ian Campbell [Tue, 16 Sep 2014 20:01:41 +0000 (21:01 +0100)]
xen: add helpers for PDX mask initialisation calculations

I wanted to make fill_mask a public function so I could use it on ARM, but it
was actually easier to think of a (semi) reasonable public name for the users
of it, so that is what I have done.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: refactor physical address space compression support into common code
Ian Campbell [Wed, 17 Sep 2014 21:21:01 +0000 (22:21 +0100)]
xen: refactor physical address space compression support into common code

The "pdx compression" functionality will be useful on ARM as well.

Move the code to common code+header and introduce HAS_PDX to control when it is
built. L2_PAGETABLE_SHIFT is x86 specific, so introduce PDX_GROUP_SHIFT to
abstract it out.

ARM has no need for superpage compression (yet?) and lacks SUPERPAGE_SHIFT so
those functions (spage_to_mfn et al) are not moved.

No affect on x86 and no change for ARM (yet).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen: arm: support for up to 48-bit IPA addressing on arm64
Ian Campbell [Thu, 18 Sep 2014 00:09:55 +0000 (01:09 +0100)]
xen: arm: support for up to 48-bit IPA addressing on arm64

Currently we support only 40-bits. This is insufficient on systems where
peripherals which need to be 1:1 mapped to dom0 are above the 40-bit limit.

Unfortunately the hardware requirements are such that this means that the
number of levels in the P2M is not static and must vary with the number of
implemented physical address bits. This is described in ARM DDI 0487A.b Table
D4-5. In short there is no single p2m configuration which supports everything
from 40- to 48- bits.

For example a system which supports up to 40-bit addressing will only support 3
level p2m (maximum SL0 is 1 == 3 levels), requiring a concatenated page table
root consisting of two pages to make the full 40-bits of addressing.

A maximum of 16 pages can be concatenated meaning that a 3 level p2m can only
support up to 43-bit addresses. Therefore support for 48-bit addressing
requires SL0==2 (4 levels of paging).

After the previous patches our various p2m lookup and manipulation functions
already support starting at arbitrary level and with arbitrary root
concatenation. All that remains is to determine the correct settings from
ID_AA64MMFR0_EL1.PARange for which we use a lookup table.

As well as supporting 44 and 48 bit addressing we can also reduce the order of
the first level for systems which support only 32 or 36 physical address bits,
saving a page.

Systems with 42-bits are an interesting case, since they only support 3 levels
of paging, implying that 8 pages are required at the root level. So far I am
not aware of any systems with peripheral located so high up (the only 42-bit
system I've seen has nothing above 40-bits), so such systems remain configured
for 40-bit IPA with a pair of pages at the root of the p2m.

Switching to symbolic names for the VTCR_EL2 bits as we go improves the clarity
of the result.

Parts of this are derived from "xen/arm: Add 4-level page table for
stage 2 translation" by Vijaya Kumar K.

arm32 remains with the static 3-level, 2 page root configuration.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: support for up to 48-bit physical addressing on arm64
Ian Campbell [Thu, 18 Sep 2014 00:09:54 +0000 (01:09 +0100)]
xen: arm: support for up to 48-bit physical addressing on arm64

This only affects Xen's own stage one paging.

- Use symbolic names for TCR bits for clarity.
- Update PADDR_BITS
- Base field of LPAE PT structs is now 36 bits (and therefore
  unsigned long long for arm32 compatibility)
- TCR_EL2.PS is set from ID_AA64MMFR0_EL1.PASize.
- Provide decode of ID_AA64MMFR0_EL1 in CPU info

Parts of this are derived from "xen/arm: Add 4-level page table for
stage 2 translation" by Vijaya Kumar K.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle variable p2m levels in apply_p2m_changes
Ian Campbell [Thu, 18 Sep 2014 00:09:53 +0000 (01:09 +0100)]
xen: arm: handle variable p2m levels in apply_p2m_changes

As with previous changes this involves conversion from a linear series of
lookups into a loop over the levels.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle variable p2m levels in p2m_lookup
Ian Campbell [Thu, 18 Sep 2014 00:09:52 +0000 (01:09 +0100)]
xen: arm: handle variable p2m levels in p2m_lookup

This paves the way for boot-time selection of the number of levels to
use in the p2m, which is required to support both 40-bit and 48-bit
systems. For now the starting level remains a compile time constant.

Implemented by turning the linear sequence of lookups into a loop.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Defer setting of VTCR_EL2 until after CPUs are up
Ian Campbell [Thu, 18 Sep 2014 00:09:51 +0000 (01:09 +0100)]
xen: arm: Defer setting of VTCR_EL2 until after CPUs are up

Currently we retain the hardcoded values but soon we will want to calculate the
correct values based upon the CPU properties common to all processors, which
are only available once they are all up.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: move setup_virt_paging to p2m.[ch] from mm.[ch]
Ian Campbell [Thu, 18 Sep 2014 00:09:50 +0000 (01:09 +0100)]
xen: arm: move setup_virt_paging to p2m.[ch] from mm.[ch]

This file is where most of the P2M logic lives and this function will
eventually need to poke at some internals, so move it.

This is pure code motion.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle concatenated root tables in dump_pt_walk
Ian Campbell [Thu, 18 Sep 2014 00:09:49 +0000 (01:09 +0100)]
xen: arm: handle concatenated root tables in dump_pt_walk

ARM allows for the concatenation of pages at the root of a p2m (but not a
regular page table) in order to support a larger IPA space than the number of
levels in the P2M would normally support. We use this to support 40-bit guest
addresses.

Previously we were unable to dump IPAs which were outside the first page of the
root. To fix this we adjust dump_pt_walk to take the machine address of the
page table root instead of expecting the caller to have mapped it. This allows
the walker code to select the correct page to map.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Implement variable levels in dump_pt_walk
Ian Campbell [Thu, 18 Sep 2014 00:09:48 +0000 (01:09 +0100)]
xen: arm: Implement variable levels in dump_pt_walk

This allows us to correctly dump 64-bit hypervisor addresses, which use a 4
level table.

It also paves the way for boot-time selection of the number of levels to use in
the p2m, which is required to support both 40-bit and 48-bit systems.

To support multiple levels it is convenient to recast the page table walk as a
loop over the levels instead of the current open coding.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: rename p2m->first_level to p2m->root.
Ian Campbell [Thu, 18 Sep 2014 00:09:47 +0000 (01:09 +0100)]
xen: arm: rename p2m->first_level to p2m->root.

This was previously part of Vijaya's "xen/arm: Add 4-level page table
for stage 2 translation" but is split out here to make that patch
easier to read.

I went with ->root rather than ->root_level as the original did.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Cc: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
10 years agotools: libxl: read nictype from xenstore
Wen Congyang [Mon, 22 Sep 2014 05:59:16 +0000 (13:59 +0800)]
tools: libxl: read nictype from xenstore

We need to use nictype to get default vifname.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxl: pass correct file to qemu if we use blktap2
Wen Congyang [Mon, 22 Sep 2014 05:59:15 +0000 (13:59 +0800)]
tools: libxl: pass correct file to qemu if we use blktap2

If we use blktap2, the correct file should be blktap device
not the pdev_path.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxc: restore: csum the correct page
Wen Congyang [Mon, 22 Sep 2014 05:59:14 +0000 (13:59 +0800)]
tools: libxc: restore: csum the correct page

In verify mode, we map the guest memory, and the guest page is
region_base + i * PAGE_SIZE. So we should csum page (region_base
+ i * PAGE_SIZE), not (region_base + (i+curbatch) * PAGE_SIZE)

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxc: restore: copy the correct page to memory
Hong Tao [Mon, 22 Sep 2014 05:59:13 +0000 (13:59 +0800)]
tools: libxc: restore: copy the correct page to memory

apply_batch() only handles MAX_BATCH_SIZE pages at one time. If
there is some bogus/unmapped/allocate-only/broken page, we will
skip it. So when we call apply_batch() again, the first page's
index is curbatch - invalid_pages. invalid_pages stores the number
of bogus/unmapped/allocate-only/broken pages we have found.

In many cases, invalid_pages is 0, so we don't catch this error.

Signed-off-by: Hong Tao <bobby.hong@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoUpdate libfdt to v1.4.0
Roy Franz [Thu, 18 Sep 2014 22:50:05 +0000 (15:50 -0700)]
Update libfdt to v1.4.0

Update libfdt to v1.4.0 of libfdt taken from git://git.jdl.com/software/dtc.git
Xen changes to libfdt_env.h carried over from existing libfdt (v1.3.0)
This update provides the fdt_create_empty_tree() function used by the ARM
EFI boot code.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoadd arm64 cache flushing code from linux v3.16
Roy Franz [Thu, 18 Sep 2014 22:50:04 +0000 (15:50 -0700)]
add arm64 cache flushing code from linux v3.16

__flush_dcache_all added from arch/arm64/mm/cache.S, with helper macros from
arch/arm64/include/asm/assembler.h, from v3.16.  The cache flushing is required
when transitioning from EFI code that runs with cache enable to Xen startup
code which expects the cache to be disabled.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed indent on ENTRY() and dropped the entry point label which
         duplicates the one from the macro. ]

10 years agoVT-d: suppress UR signaling for further desktop chipsets
Jan Beulich [Thu, 18 Sep 2014 13:03:22 +0000 (15:03 +0200)]
VT-d: suppress UR signaling for further desktop chipsets

This extends commit d6cb14b34f ("VT-d: suppress UR signaling for
desktop chipsets") as per the finally obtained list of affected
chipsets from Intel.

Also pad the IDs we had listed there before to full 4 hex digits.

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agox86: handle resumed instruction based on previous mem_event reply
Razvan Cojocaru [Thu, 18 Sep 2014 12:57:45 +0000 (14:57 +0200)]
x86: handle resumed instruction based on previous mem_event reply

In a scenario where a page fault that triggered a mem_event occured,
p2m_mem_access_check() will now be able to either 1) emulate the
current instruction, or 2) emulate it, but don't allow it to perform
any writes.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agox86, libxc: force-enable relevant MSR events
Razvan Cojocaru [Thu, 18 Sep 2014 12:56:43 +0000 (14:56 +0200)]
x86, libxc: force-enable relevant MSR events

Vmx_disable_intercept_for_msr() will now refuse to disable interception of
MSRs needed for memory introspection. It is not possible to gate this on
mem_access being active for the domain, since by the time mem_access does
become active the interception for the interesting MSRs has already been
disabled (vmx_disable_intercept_for_msr() runs very early on).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86: optimize introspection access to guest state
Razvan Cojocaru [Thu, 18 Sep 2014 12:54:58 +0000 (14:54 +0200)]
x86: optimize introspection access to guest state

Speed optimization for introspection purposes: a handful of registers
are sent along with each mem_event. This requires enlargement of the
mem_event_request / mem_event_response stuctures, and additional code
to fill in relevant values. Since the EPT event processing code needs
more data than CR3 or MSR event processors, hvm_mem_event_fill_regs()
fills in less data than p2m_mem_event_fill_regs(), in order to avoid
overhead. Struct hvm_hw_cpu has been considered instead of the custom
struct mem_event_regs_st, but its size would cause quick filling up
of the mem_event ring buffer.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/HVM: emulate with no writes
Razvan Cojocaru [Thu, 18 Sep 2014 12:53:52 +0000 (14:53 +0200)]
x86/HVM: emulate with no writes

Added support for emulating an instruction with no memory writes.
Additionally, introduced hvm_emulate_one_full(), which inspects
possible return values from the hvm_emulate_one() functions
(EXCEPTION, UNHANDLEABLE) and acts on them.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/HVM: batch vCPU wakeups
Jan Beulich [Thu, 18 Sep 2014 12:44:58 +0000 (14:44 +0200)]
x86/HVM: batch vCPU wakeups

Mass wakeups (via vlapic_ipi()) can take enormous amounts of time,
especially when many of the remote pCPU-s are in deep C-states. For
64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware,
accumulated times of over 2ms were observed (average 1.1ms).
Considering that Windows broadcasts IPIs from its timer interrupt,
which at least at certain times can run at 1kHz, it is clear that this
can't result in good guest behavior. In fact, on said hardware guests
with significantly beyond 40 vCPU-s simply hung when e.g. ServerManager
gets started.

This isn't just helping to reduce the number of ICR writes when the
host APICs run in clustered mode, it also reduces them by suppressing
the sends altogether when - by the time
cpu_raise_softirq_batch_finish() is reached - the remote CPU already
managed to handle the softirq. Plus - when using MONITOR/MWAIT - the
update of softirq_pending(cpu), being on the monitored cache line -
should make the remote CPU wake up ahead of the ICR being sent,
allowing the wait-for-ICR-idle latencies to be reduced (perhaps to a
large part due to overlapping the wakeups of multiple CPUs).

With this alone (i.e. without the IPI avoidance patch in place),
average broadcast times for a 64-vCPU guest went down to a measured
maximum of 310us. With that other patch in place, improvements aren't
as clear anymore (short term averages only went down from 255us to
250us, which clearly is within the error range of the measurements),
but longer term an improvement of the averages is still visible.
Depending on hardware, long term maxima were observed to go down quite
a bit (on aforementioned hardware), while they were seen to go up
again on a (single core) Nehalem (where instead the improvement on the
average values was more visible).

Of course this necessarily increases the latencies for the remote
CPU wakeup at least slightly. To weigh between the effects, the
condition to enable batching in vlapic_ipi() may need further tuning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86: suppress event check IPI to MWAITing CPUs
Jan Beulich [Thu, 18 Sep 2014 12:43:49 +0000 (14:43 +0200)]
x86: suppress event check IPI to MWAITing CPUs

Mass wakeups (via vlapic_ipi()) can take enormous amounts of time,
especially when many of the remote pCPU-s are in deep C-states. For
64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware,
accumulated times of over 2ms were observed (average 1.1ms).
Considering that Windows broadcasts IPIs from its timer interrupt,
which at least at certain times can run at 1kHz, it is clear that this
can't result in good guest behavior. In fact, on said hardware guests
with significantly beyond 40 vCPU-s simply hung when e.g. ServerManager
gets started.

Recognizing that writes to softirq_pending() already have the effect of
waking remote CPUs from MWAITing (due to being co-located on the same
cache line with mwait_wakeup()), we can avoid sending IPIs to CPUs we
know are in a (deep) C-state entered via MWAIT.

With this, average broadcast times for a 64-vCPU guest went down to a
measured maximum of 255us (which is still quite a lot).

One aspect worth noting is that cpumask_raise_softirq() gets brought in
sync here with cpu_raise_softirq() in that now both don't attempt to
raise a self-IPI on the processing CPU.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/hvm: always set pending event injection when loading VMC[BS] state
Wen Congyang [Thu, 18 Sep 2014 10:08:45 +0000 (12:08 +0200)]
x86/hvm: always set pending event injection when loading VMC[BS] state

In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
valid before restoring vmcs. If there is no pending event after
restoring vm, we should clear it.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Also clear pending software exceptions.
Copy the fix to SVM as well.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
10 years agox86/p2m: fix conversion macro of p2m_access to XENMEM_access
Tamas K Lengyel [Thu, 18 Sep 2014 09:41:03 +0000 (11:41 +0200)]
x86/p2m: fix conversion macro of p2m_access to XENMEM_access

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 17 Sep 2014 19:15:28 +0000 (20:15 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agoxl: long output of "list" command now contains Dom0 information
Wei Liu [Tue, 16 Sep 2014 10:01:18 +0000 (11:01 +0100)]
xl: long output of "list" command now contains Dom0 information

As we've already generated a JSON config for Dom0, print that out when
requested.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxl: use libxl_retrieve_domain_configuration and JSON format
Wei Liu [Tue, 16 Sep 2014 10:01:17 +0000 (11:01 +0100)]
xl: use libxl_retrieve_domain_configuration and JSON format

Before this change, xl stores domain configuration in "xl" format, which
is in fact a verbatim copy of user supplied domain config.

Now libxl provides a new API to retrieve domain configuration, switch to
that new API, store configuration in JSON format.

Tests done so far (xl.{new,old} denotes xl with{,out} "libxl-json"
support):

1. xl.new create then xl.new save, hexdump saved file: domain config
   saved in JSON format
2. xl.new create, xl.new save then xl.old restore: failed on
   mandatory flag check
3. xl.new create, xl.new save then xl.new restore: succeeded
4. xl.old create, xl.old save then xl.new restore: succeeded
5. xl.new create then local migrate, receiving end xl.new: succeeded
6. xl.old create then local migrate, receiving end xl.new: succeeded

Note that "xl" config is still supported and handled when restarting a
domain. "xl" config file takes precedence over "libxl-json" in that
case, so that user who uses "config-update" to store new config file
won't have regression. All other scenarios (migration, domain listing
etc.) now use the new API.

Lastly, print out warning when users invoke "config-update" to
discourage them from using this command.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: introduce libxl_userdata_unlink
Wei Liu [Tue, 16 Sep 2014 10:01:16 +0000 (11:01 +0100)]
libxl: introduce libxl_userdata_unlink

This will be used in later patch for xl to remove its "xl" userdata
file.

Both CTX lock and userdata lock are taken in this API. CTX lock is taken
to maintain locking hierarchy, but it also has a side effect to protect
against R-M-W by other threads. Userdata lock is used to protect against
domain destruction.

In general application should not rely on these internal locks to
protect its own userdata files. It should deploys its own lock if it
cares.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: introduce libxl_retrieve_domain_configuration
Wei Liu [Tue, 16 Sep 2014 10:01:15 +0000 (11:01 +0100)]
libxl: introduce libxl_retrieve_domain_configuration

Introduce a new public API to return domain configuration. This returned
configuration can be used to rebuild a domain.

Note that this configuration only describes the configuration necessary
to reproduce the guest visible state and does not necessarily include
specific decisions made by the toolstack regarding its current
incarnation (e.g. disk backend) unless they were specified by the
application when the domain was created.

With this approach we can preserve what user has provided in the
original configuration as well as valuable information from xenstore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: refactor libxl_get_memory_target
Wei Liu [Tue, 16 Sep 2014 10:01:14 +0000 (11:01 +0100)]
libxl: refactor libxl_get_memory_target

Introduce a helper function which can return both "target" node and
"static-max" node of a domain. Reimplement libxl_get_memory_target using
this helper. libxl__fill_dom0_memory_info is adjusted as well.

This helper will be used in later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: make libxl_cd_insert "eject" + "insert"
Wei Liu [Tue, 16 Sep 2014 10:01:13 +0000 (11:01 +0100)]
libxl: make libxl_cd_insert "eject" + "insert"

We introduce an intermediate empty state when inserting media into
CDROM. The scheme works like this:

  lock json config
  write empty state to xenstore
  for (;;) {
      write user supplied disk to JSON
      write disk information to xenstore
  }
  unlock json config

Bear in mind that all steps can fail. With the proposed scheme, we now
know, if xenstore is empty, then CDROM should be considered empty;
otherwise we should use JSON version of CDROM configuration.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: synchronise configuration when we hotplug a device
Wei Liu [Tue, 16 Sep 2014 10:01:12 +0000 (11:01 +0100)]
libxl: synchronise configuration when we hotplug a device

We update JSON version first, then write to xenstore, so that we
maintain the following invariant: any device which is present in
xenstore has a corresponding entry in JSON.

The workflow is as followed:
   lock json config
       read json config
       update in-memory json config with new entry, replacing
         any stale entry
       for loop
           open xs transaction
           check device existence, abort if it exists
           write in-memory json config to disk
           commit xs transaction
       end for loop
   unlock json config

Please see comment in libxl_internal.h for correctness proof.

As those routines are called both during domain creation and device
hotplug, we add a flag to indicate whether we need to update JSON
config. This flag is only set to true when we hotplug a device. We
cannot update JSON config during domain creation as JSON config is
committed to disk only when domain creation finishes.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: rework domain userdata file lock
Wei Liu [Tue, 16 Sep 2014 10:01:11 +0000 (11:01 +0100)]
libxl: rework domain userdata file lock

The lock introduced in d2cd9d4f ("libxl: functions to lock / unlock
libxl userdata store") has a bug that can leak the lock file when domain
destruction races with other functions that try to get hold of the lock.

There are several issues:
1. The lock is released too early with libxl__userdata_destroyall
   deletes everything in userdata store, including the lock file.
2. The check of domain existence is only done at the beginning of lock
   function, by the time the lock is acquired, the domain might have
   been gone already.

The effect of this two issues is we can run into such situation:

     Process 1                        Process 2 domain destruction
   # LOCK FUNCTION                 # LOCK FUNCTION
    check domain existence          check domain existence
                                    acquire lock (file created)
                                   # LOCK FUNCTION
                                    destroy all files (lock file deleted,
                                                       lock released)
    acquire lock (file created)
   # LOCK FUNCTION                  destroy domain
                                   # UNLOCK (close fd only)
   [ lock file leaked ]

Fix this problem by deploying following changes:

1. Unlink lock file in unlock function.
2. Modify libxl__userdata_destroyall to not delete domain-userdata-lock,
   so that the lock remains held until unlock function is called.
3. Check domain still exists when the lock is acquired, unlock if
   domain is already gone.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoVMX: don't unintentionally leave x2APIC MSR intercepts disabled
Jan Beulich [Tue, 16 Sep 2014 11:58:20 +0000 (13:58 +0200)]
VMX: don't unintentionally leave x2APIC MSR intercepts disabled

These should be re-enabled in particular when the virtualized APIC
transitions to HW-disabled state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agox86: show page walk when create_bounce_frame() encounters a fault
Jan Beulich [Tue, 16 Sep 2014 11:57:44 +0000 (13:57 +0200)]
x86: show page walk when create_bounce_frame() encounters a fault

... getting the native code in sync with the compat mode one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agopassthrough: streamline _hvm_dirq_assist()
Jan Beulich [Tue, 16 Sep 2014 11:56:45 +0000 (13:56 +0200)]
passthrough: streamline _hvm_dirq_assist()

The loop inside this function was calling two functions with loop-
invariable arguments which clearly don't need calling more than once:
send_guest_pirq() and __msi_pirq_eoi(). After moving these out of the
loop it further became apparent that folding the hvm_pci_msi_assert()
helper into the main function can further help readability.

In the course of this I noticed that __hvm_dpci_eoi() called
hvm_pci_intx_deassert() unconditionally, whereas hvm_pci_intx_assert()
(correctly) got called only when !hvm_domain_use_pirq(), so the former
is being made conditional now too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoxen/arm: check for GICv3 platform support
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:49 +0000 (16:39 +0530)]
xen/arm: check for GICv3 platform support

ID_AA64PFR0_EL1 register provides information about GIC support.
Check for this register in GICv3 driver.

Also print GICv3 support information in boot log

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: add SGI handling for GICv3
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:48 +0000 (16:39 +0530)]
xen/arm: add SGI handling for GICv3

In ARMv8, write to ICC_SGI1R_EL1 register raises trap to EL2.
Handle the trap and inject SGI to vcpu.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Update Dom0 GIC dt node with GICv3 information
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:47 +0000 (16:39 +0530)]
xen/arm: Update Dom0 GIC dt node with GICv3 information

Update GIC device tree node for DOM0 with GICv3
information. GIC hw specfic device tree information
is moved to respective GIC driver.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Add virtual GICv3 support
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:46 +0000 (16:39 +0530)]
xen/arm: Add virtual GICv3 support

Add virtual GICv3 driver support.
Also, with this patch vgic_irq_rank structure is modified to
hold GICv2 GICD_TARGET and GICv3 GICD_ROUTER registers under
union.

This patch adds only basic GICv3 support.
Does not support Interrupt Translation support (ITS)

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoxen/arm: Add support for GIC v3
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:45 +0000 (16:39 +0530)]
xen/arm: Add support for GIC v3

Add support for GIC v3 specification System register access(SRE)
is enabled to access cpu and virtual interface registers based
on kernel GICv3 driver.

This patch adds only basic v3 support.
Does not support Interrupt Translation support (ITS)

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoxen/arm: Add vgic callback to read irq priority
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:44 +0000 (16:39 +0530)]
xen/arm: Add vgic callback to read irq priority

Use callback in vgic driver to read priority for
a given irq

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
10 years agoxen/arm: Calculate irq rank from irq number
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:43 +0000 (16:39 +0530)]
xen/arm: Calculate irq rank from irq number

irq rank calculated is not generic and assumes
hardware register size value which does not work
for future GIC versions like V3.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agox86/APIC: reduce self-IPI related code
Jan Beulich [Fri, 12 Sep 2014 11:48:37 +0000 (13:48 +0200)]
x86/APIC: reduce self-IPI related code

send_IPI_self_{phys,flat}() were identical and send_IPI_self_x2apic()
was misplaced and pointlessly (implictly) had a non-x2APIC code path.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoxen: arm: fix boot on arndale.
Ian Campbell [Thu, 11 Sep 2014 12:55:08 +0000 (13:55 +0100)]
xen: arm: fix boot on arndale.

The differences between Arndale and the Odoid-XU are more interesting
than first though, which results in 0bf8ddecb4df "xen/arm: Add
support for the Odroid-XU board." breaking boot on arndale.

Revert back to arndale compatible behaviour while we sort this out.
Specifically we must (counterintuitively) use the regular (!ns)
sysram and the correct offset is 0x0 and 0x1c.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agolibxc/bitops: Add or() to the available bitmap operations
Andrew Cooper [Wed, 10 Sep 2014 17:10:42 +0000 (18:10 +0100)]
libxc/bitops: Add or() to the available bitmap operations

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/[lib]xl: Correct use of init/dispose for libxl_domain_restore_params
Andrew Cooper [Wed, 10 Sep 2014 17:10:40 +0000 (18:10 +0100)]
tools/[lib]xl: Correct use of init/dispose for libxl_domain_restore_params

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxl: Fix stray blank line from debug logging
Andrew Cooper [Wed, 10 Sep 2014 17:10:39 +0000 (18:10 +0100)]
tools/libxl: Fix stray blank line from debug logging

LOG() automatically adds a newline.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: add missing dependency for xen-init-dom0 in Makefile
Wei Liu [Wed, 10 Sep 2014 15:43:16 +0000 (16:43 +0100)]
libxl: add missing dependency for xen-init-dom0 in Makefile

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxc_cpuid_x86.c: Simplify masking conditions and remove redundant work
Zhuo Song [Wed, 10 Sep 2014 10:29:00 +0000 (18:29 +0800)]
xc_cpuid_x86.c: Simplify masking conditions and remove redundant work

* Since there would not be 32-bit hypervisor, we do not need
  hypervisor_is_64bit() again.

* Remove xen_64bit from xc_cpuid_pv_policy().

* Move conditionals for LM/NX masking into architectural logic.

* Since RDTSCP could be used for both 64-bit and 32-bit architectures,
  we do not need the tying to 64-bit in intel_xc_cpuid_policy().

* vmx_cpuid_intercept() has covered SYSCALL masking when vmexit and
  original is_64bit or is_pae could not cover whether guest OS is really
  in long mode or not. Here to drop the conditionals and leave it to
  vmexit handler to do the real work.

Signed-off-by: Zhuo Song <songzhuo.sz@alibaba-inc.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[ ijc -- inserted missing ) to fix compile error ]