]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agox86emul: only emulate software interrupt injection for real mode
Jan Beulich [Tue, 23 Sep 2014 12:33:50 +0000 (14:33 +0200)]
x86emul: only emulate software interrupt injection for real mode

Protected mode emulation currently lacks proper privilege checking of
the referenced IDT entry, and there's currently no legitimate way for
any of the respective instructions to reach the emulator when the guest
is in protected mode.

This is XSA-106.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
10 years agox86/emulate: check cpl for all privileged instructions
Andrew Cooper [Tue, 23 Sep 2014 12:33:06 +0000 (14:33 +0200)]
x86/emulate: check cpl for all privileged instructions

Without this, it is possible for userspace to load its own IDT or GDT.

This is XSA-105.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrei LUTAS <vlutas@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/shadow: fix race condition sampling the dirty vram state
Andrew Cooper [Tue, 23 Sep 2014 12:31:47 +0000 (14:31 +0200)]
x86/shadow: fix race condition sampling the dirty vram state

d->arch.hvm_domain.dirty_vram must be read with the domain's paging lock held.

If not, two concurrent hypercalls could both end up attempting to free
dirty_vram (the second of which will free a wild pointer), or both end up
allocating a new dirty_vram structure (the first of which will be leaked).

This is XSA-104.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agoUse configure --sysconfdir=DIR to set CONFIG_DIR
Olaf Hering [Mon, 22 Sep 2014 13:00:07 +0000 (15:00 +0200)]
Use configure --sysconfdir=DIR to set CONFIG_DIR

Preserve existing behaviour: if the option was not given, set existing
defaults for FreeBSD, Solaris and everything else.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
10 years agotools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE
Olaf Hering [Mon, 22 Sep 2014 13:00:06 +0000 (15:00 +0200)]
tools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE

Remove hardcoded /var/run/xen directory path, use XEN_RUN_DIR instead.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: provide variable paths to libxc
Olaf Hering [Mon, 22 Sep 2014 13:00:05 +0000 (15:00 +0200)]
tools/libxc: provide variable paths to libxc

In preparation to remove hardcoded /var/run/xen paths, provide
XEN_RUN_DIR and related directories to xc_private.h. Similar code exists
already for libxl, stubdom and other parts.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxl: use buildmakevars2header to create _paths.h
Olaf Hering [Mon, 22 Sep 2014 13:00:04 +0000 (15:00 +0200)]
tools/libxl: use buildmakevars2header to create _paths.h

Replace usage of buildmakevars2file with buildmakevars2header. The macro
generates a C header file, so remove code which converts shell variables
into C defines. Also update the dependency, the macro itself creates a
dependency for _paths.h. A temporary file is not needed anymore.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: add new macro buildmakevars2header
Olaf Hering [Mon, 22 Sep 2014 13:00:03 +0000 (15:00 +0200)]
Config.mk: add new macro buildmakevars2header

This macro is similar to buildmakevars2file, it just creates a C header
file instead of shell style syntax. Upcoming changes will use this macro
in libxl and libxc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: replace dependency to genpath with actual target
Olaf Hering [Mon, 22 Sep 2014 13:00:02 +0000 (15:00 +0200)]
Config.mk: replace dependency to genpath with actual target

genpath is a detail of buildmakevars2file. Replace the dependency to
genpath with the actual buildmakevars2file target. This change by
itself does not fix any bug. Upcoming changes will add dependencies to
$(target), but no rule exist to create $(target).

To force a rebuild of the $(1) rule the target now depends on the
existing .phony target. This dummy target is already used elsewhere in
the code.

No change in behaviour is expected by this patch.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: move directory list into BUILD_MAKE_VARS
Olaf Hering [Mon, 22 Sep 2014 13:00:01 +0000 (15:00 +0200)]
Config.mk: move directory list into BUILD_MAKE_VARS

To maintain the list of directories in a single place, move the existing
list into its own variable and use it in buildmakevars2file.
Required for upcoming changes.
Trim also whitespaces.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoremove obsolete SUBSYS_DIR variable
Olaf Hering [Mon, 22 Sep 2014 13:00:00 +0000 (15:00 +0200)]
remove obsolete SUBSYS_DIR variable

/var/run is a runtime directory. It is not supposed to be packaged.
Remove unused SUBSYS_DIR variable from Config.mk and distro_mapping.txt.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/examples: remove obsolete install targets
Olaf Hering [Mon, 22 Sep 2014 12:59:59 +0000 (14:59 +0200)]
tools/examples: remove obsolete install targets

install-hotplug and install-udev are obsolete since commit 57bcfa11
("tools/hotplug: Separate OS-specific scripts.")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: use XEN_LOCK_DIR instead of hardcoded path
Olaf Hering [Mon, 22 Sep 2014 12:59:58 +0000 (14:59 +0200)]
tools/hotplug: use XEN_LOCK_DIR instead of hardcoded path

Use XEN_LOCK_DIR because it is a compiletime setting.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_LOCK_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:57 +0000 (14:59 +0200)]
tools/hotplug: create XEN_LOCK_DIR at runtime

Create XEN_LOCK_DIR because it is a compiletime setting. Also /var/lock
might be empty on startup because it is a tmpfs mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_RUN_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:56 +0000 (14:59 +0200)]
tools/hotplug: create XEN_RUN_DIR at runtime

Create XEN_RUN_DIR instead of hardcoded path because it is a compiletime
setting. Also /var/run might be empty on startup because it is a tmpfs
mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/pygrub: store kernels in /var/run/xen/pygrub
Olaf Hering [Mon, 22 Sep 2014 12:59:55 +0000 (14:59 +0200)]
tools/pygrub: store kernels in /var/run/xen/pygrub

Move location of temporary bootfiles from /var/run/xend/boot to
/var/run/xen/pygrub. Create the subdirectory if does not exist.
The <dir> argument --output-directory must be an existing directory.

The reason for this change is that all entrys below /var/run have to be
created at runtime in case /var/run is cleared on every boot.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: remove obsolete path.py from tools/python
Olaf Hering [Mon, 22 Sep 2014 12:59:54 +0000 (14:59 +0200)]
tools: remove obsolete path.py from tools/python

The directory tools/python/xen/util does not exist.
Upcoming changes to genpath will fail if the rule persists.
Nothing uses path.py (anymore?), so get rid it.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed from .gitignore too ]

10 years agotools/mkrpm: allow custom rpm package name
Olaf Hering [Mon, 22 Sep 2014 12:59:53 +0000 (14:59 +0200)]
tools/mkrpm: allow custom rpm package name

Even if xen is configured and compiled with different --prefix= so that
it operates entirely below $prefix, the resulting package from 'make
rpmball' is always called "xen.rpm".

Use an environment name to give a different name.
This can be used like this:

suffix=-bugN
prefix=/opx/xen/staging${suffix}
./configure --prefix=${prefix}
make rpmball PKG_SUFFIX=${suffix}

The result will be "xen-bugN.rpm" instead of "xen.rpm". The benefit is that
many xen${suffix}.rpm packages can be installed at the same time.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoinstall.sh: Preserve permissions from make install
Olaf Hering [Mon, 22 Sep 2014 12:59:52 +0000 (14:59 +0200)]
install.sh: Preserve permissions from make install

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/xenpaging: create dumpdir with mode 0700
Olaf Hering [Mon, 22 Sep 2014 12:59:51 +0000 (14:59 +0200)]
tools/xenpaging: create dumpdir with mode 0700

The swapfile contain sensitive guest info.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agostubdom: fix lwip compile
Olaf Hering [Mon, 22 Sep 2014 12:59:50 +0000 (14:59 +0200)]
stubdom: fix lwip compile

stubdom/lwip-x86_64/src/core/dhcp.c: In function 'dhcp_create_request':
stubdom/lwip-x86_64/src/core/dhcp.c:1359:71: error: array subscript is above array bounds [-Werror=array-bounds]
     dhcp->msg_out->chaddr[i] = (i < netif->hwaddr_len) ? netif->hwaddr[i] : 0/* pad byte*/;

gcc can not know if hwaddr_len exceeds the hwaddr array size,
so force an upper limit to assist gcc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agoxen: arm: Enable physical address space compression (PDX) on arm
Ian Campbell [Wed, 17 Sep 2014 21:21:03 +0000 (22:21 +0100)]
xen: arm: Enable physical address space compression (PDX) on arm

This allows us to support sparse physical address maps which we previously
could not because the frametable would end up taking up an enormous fraction
of RAM.

On a fast model which has RAM at 0x80000000-0x100000000 and
0x880000000-0x900000000 this reduces the size of the frametable from
478M to 84M.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: add helpers for PDX mask initialisation calculations
Ian Campbell [Tue, 16 Sep 2014 20:01:41 +0000 (21:01 +0100)]
xen: add helpers for PDX mask initialisation calculations

I wanted to make fill_mask a public function so I could use it on ARM, but it
was actually easier to think of a (semi) reasonable public name for the users
of it, so that is what I have done.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: refactor physical address space compression support into common code
Ian Campbell [Wed, 17 Sep 2014 21:21:01 +0000 (22:21 +0100)]
xen: refactor physical address space compression support into common code

The "pdx compression" functionality will be useful on ARM as well.

Move the code to common code+header and introduce HAS_PDX to control when it is
built. L2_PAGETABLE_SHIFT is x86 specific, so introduce PDX_GROUP_SHIFT to
abstract it out.

ARM has no need for superpage compression (yet?) and lacks SUPERPAGE_SHIFT so
those functions (spage_to_mfn et al) are not moved.

No affect on x86 and no change for ARM (yet).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen: arm: support for up to 48-bit IPA addressing on arm64
Ian Campbell [Thu, 18 Sep 2014 00:09:55 +0000 (01:09 +0100)]
xen: arm: support for up to 48-bit IPA addressing on arm64

Currently we support only 40-bits. This is insufficient on systems where
peripherals which need to be 1:1 mapped to dom0 are above the 40-bit limit.

Unfortunately the hardware requirements are such that this means that the
number of levels in the P2M is not static and must vary with the number of
implemented physical address bits. This is described in ARM DDI 0487A.b Table
D4-5. In short there is no single p2m configuration which supports everything
from 40- to 48- bits.

For example a system which supports up to 40-bit addressing will only support 3
level p2m (maximum SL0 is 1 == 3 levels), requiring a concatenated page table
root consisting of two pages to make the full 40-bits of addressing.

A maximum of 16 pages can be concatenated meaning that a 3 level p2m can only
support up to 43-bit addresses. Therefore support for 48-bit addressing
requires SL0==2 (4 levels of paging).

After the previous patches our various p2m lookup and manipulation functions
already support starting at arbitrary level and with arbitrary root
concatenation. All that remains is to determine the correct settings from
ID_AA64MMFR0_EL1.PARange for which we use a lookup table.

As well as supporting 44 and 48 bit addressing we can also reduce the order of
the first level for systems which support only 32 or 36 physical address bits,
saving a page.

Systems with 42-bits are an interesting case, since they only support 3 levels
of paging, implying that 8 pages are required at the root level. So far I am
not aware of any systems with peripheral located so high up (the only 42-bit
system I've seen has nothing above 40-bits), so such systems remain configured
for 40-bit IPA with a pair of pages at the root of the p2m.

Switching to symbolic names for the VTCR_EL2 bits as we go improves the clarity
of the result.

Parts of this are derived from "xen/arm: Add 4-level page table for
stage 2 translation" by Vijaya Kumar K.

arm32 remains with the static 3-level, 2 page root configuration.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: support for up to 48-bit physical addressing on arm64
Ian Campbell [Thu, 18 Sep 2014 00:09:54 +0000 (01:09 +0100)]
xen: arm: support for up to 48-bit physical addressing on arm64

This only affects Xen's own stage one paging.

- Use symbolic names for TCR bits for clarity.
- Update PADDR_BITS
- Base field of LPAE PT structs is now 36 bits (and therefore
  unsigned long long for arm32 compatibility)
- TCR_EL2.PS is set from ID_AA64MMFR0_EL1.PASize.
- Provide decode of ID_AA64MMFR0_EL1 in CPU info

Parts of this are derived from "xen/arm: Add 4-level page table for
stage 2 translation" by Vijaya Kumar K.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle variable p2m levels in apply_p2m_changes
Ian Campbell [Thu, 18 Sep 2014 00:09:53 +0000 (01:09 +0100)]
xen: arm: handle variable p2m levels in apply_p2m_changes

As with previous changes this involves conversion from a linear series of
lookups into a loop over the levels.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle variable p2m levels in p2m_lookup
Ian Campbell [Thu, 18 Sep 2014 00:09:52 +0000 (01:09 +0100)]
xen: arm: handle variable p2m levels in p2m_lookup

This paves the way for boot-time selection of the number of levels to
use in the p2m, which is required to support both 40-bit and 48-bit
systems. For now the starting level remains a compile time constant.

Implemented by turning the linear sequence of lookups into a loop.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Defer setting of VTCR_EL2 until after CPUs are up
Ian Campbell [Thu, 18 Sep 2014 00:09:51 +0000 (01:09 +0100)]
xen: arm: Defer setting of VTCR_EL2 until after CPUs are up

Currently we retain the hardcoded values but soon we will want to calculate the
correct values based upon the CPU properties common to all processors, which
are only available once they are all up.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: move setup_virt_paging to p2m.[ch] from mm.[ch]
Ian Campbell [Thu, 18 Sep 2014 00:09:50 +0000 (01:09 +0100)]
xen: arm: move setup_virt_paging to p2m.[ch] from mm.[ch]

This file is where most of the P2M logic lives and this function will
eventually need to poke at some internals, so move it.

This is pure code motion.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle concatenated root tables in dump_pt_walk
Ian Campbell [Thu, 18 Sep 2014 00:09:49 +0000 (01:09 +0100)]
xen: arm: handle concatenated root tables in dump_pt_walk

ARM allows for the concatenation of pages at the root of a p2m (but not a
regular page table) in order to support a larger IPA space than the number of
levels in the P2M would normally support. We use this to support 40-bit guest
addresses.

Previously we were unable to dump IPAs which were outside the first page of the
root. To fix this we adjust dump_pt_walk to take the machine address of the
page table root instead of expecting the caller to have mapped it. This allows
the walker code to select the correct page to map.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Implement variable levels in dump_pt_walk
Ian Campbell [Thu, 18 Sep 2014 00:09:48 +0000 (01:09 +0100)]
xen: arm: Implement variable levels in dump_pt_walk

This allows us to correctly dump 64-bit hypervisor addresses, which use a 4
level table.

It also paves the way for boot-time selection of the number of levels to use in
the p2m, which is required to support both 40-bit and 48-bit systems.

To support multiple levels it is convenient to recast the page table walk as a
loop over the levels instead of the current open coding.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: rename p2m->first_level to p2m->root.
Ian Campbell [Thu, 18 Sep 2014 00:09:47 +0000 (01:09 +0100)]
xen: arm: rename p2m->first_level to p2m->root.

This was previously part of Vijaya's "xen/arm: Add 4-level page table
for stage 2 translation" but is split out here to make that patch
easier to read.

I went with ->root rather than ->root_level as the original did.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Cc: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
10 years agotools: libxl: read nictype from xenstore
Wen Congyang [Mon, 22 Sep 2014 05:59:16 +0000 (13:59 +0800)]
tools: libxl: read nictype from xenstore

We need to use nictype to get default vifname.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxl: pass correct file to qemu if we use blktap2
Wen Congyang [Mon, 22 Sep 2014 05:59:15 +0000 (13:59 +0800)]
tools: libxl: pass correct file to qemu if we use blktap2

If we use blktap2, the correct file should be blktap device
not the pdev_path.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxc: restore: csum the correct page
Wen Congyang [Mon, 22 Sep 2014 05:59:14 +0000 (13:59 +0800)]
tools: libxc: restore: csum the correct page

In verify mode, we map the guest memory, and the guest page is
region_base + i * PAGE_SIZE. So we should csum page (region_base
+ i * PAGE_SIZE), not (region_base + (i+curbatch) * PAGE_SIZE)

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxc: restore: copy the correct page to memory
Hong Tao [Mon, 22 Sep 2014 05:59:13 +0000 (13:59 +0800)]
tools: libxc: restore: copy the correct page to memory

apply_batch() only handles MAX_BATCH_SIZE pages at one time. If
there is some bogus/unmapped/allocate-only/broken page, we will
skip it. So when we call apply_batch() again, the first page's
index is curbatch - invalid_pages. invalid_pages stores the number
of bogus/unmapped/allocate-only/broken pages we have found.

In many cases, invalid_pages is 0, so we don't catch this error.

Signed-off-by: Hong Tao <bobby.hong@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoUpdate libfdt to v1.4.0
Roy Franz [Thu, 18 Sep 2014 22:50:05 +0000 (15:50 -0700)]
Update libfdt to v1.4.0

Update libfdt to v1.4.0 of libfdt taken from git://git.jdl.com/software/dtc.git
Xen changes to libfdt_env.h carried over from existing libfdt (v1.3.0)
This update provides the fdt_create_empty_tree() function used by the ARM
EFI boot code.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoadd arm64 cache flushing code from linux v3.16
Roy Franz [Thu, 18 Sep 2014 22:50:04 +0000 (15:50 -0700)]
add arm64 cache flushing code from linux v3.16

__flush_dcache_all added from arch/arm64/mm/cache.S, with helper macros from
arch/arm64/include/asm/assembler.h, from v3.16.  The cache flushing is required
when transitioning from EFI code that runs with cache enable to Xen startup
code which expects the cache to be disabled.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed indent on ENTRY() and dropped the entry point label which
         duplicates the one from the macro. ]

10 years agoVT-d: suppress UR signaling for further desktop chipsets
Jan Beulich [Thu, 18 Sep 2014 13:03:22 +0000 (15:03 +0200)]
VT-d: suppress UR signaling for further desktop chipsets

This extends commit d6cb14b34f ("VT-d: suppress UR signaling for
desktop chipsets") as per the finally obtained list of affected
chipsets from Intel.

Also pad the IDs we had listed there before to full 4 hex digits.

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agox86: handle resumed instruction based on previous mem_event reply
Razvan Cojocaru [Thu, 18 Sep 2014 12:57:45 +0000 (14:57 +0200)]
x86: handle resumed instruction based on previous mem_event reply

In a scenario where a page fault that triggered a mem_event occured,
p2m_mem_access_check() will now be able to either 1) emulate the
current instruction, or 2) emulate it, but don't allow it to perform
any writes.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agox86, libxc: force-enable relevant MSR events
Razvan Cojocaru [Thu, 18 Sep 2014 12:56:43 +0000 (14:56 +0200)]
x86, libxc: force-enable relevant MSR events

Vmx_disable_intercept_for_msr() will now refuse to disable interception of
MSRs needed for memory introspection. It is not possible to gate this on
mem_access being active for the domain, since by the time mem_access does
become active the interception for the interesting MSRs has already been
disabled (vmx_disable_intercept_for_msr() runs very early on).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86: optimize introspection access to guest state
Razvan Cojocaru [Thu, 18 Sep 2014 12:54:58 +0000 (14:54 +0200)]
x86: optimize introspection access to guest state

Speed optimization for introspection purposes: a handful of registers
are sent along with each mem_event. This requires enlargement of the
mem_event_request / mem_event_response stuctures, and additional code
to fill in relevant values. Since the EPT event processing code needs
more data than CR3 or MSR event processors, hvm_mem_event_fill_regs()
fills in less data than p2m_mem_event_fill_regs(), in order to avoid
overhead. Struct hvm_hw_cpu has been considered instead of the custom
struct mem_event_regs_st, but its size would cause quick filling up
of the mem_event ring buffer.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/HVM: emulate with no writes
Razvan Cojocaru [Thu, 18 Sep 2014 12:53:52 +0000 (14:53 +0200)]
x86/HVM: emulate with no writes

Added support for emulating an instruction with no memory writes.
Additionally, introduced hvm_emulate_one_full(), which inspects
possible return values from the hvm_emulate_one() functions
(EXCEPTION, UNHANDLEABLE) and acts on them.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/HVM: batch vCPU wakeups
Jan Beulich [Thu, 18 Sep 2014 12:44:58 +0000 (14:44 +0200)]
x86/HVM: batch vCPU wakeups

Mass wakeups (via vlapic_ipi()) can take enormous amounts of time,
especially when many of the remote pCPU-s are in deep C-states. For
64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware,
accumulated times of over 2ms were observed (average 1.1ms).
Considering that Windows broadcasts IPIs from its timer interrupt,
which at least at certain times can run at 1kHz, it is clear that this
can't result in good guest behavior. In fact, on said hardware guests
with significantly beyond 40 vCPU-s simply hung when e.g. ServerManager
gets started.

This isn't just helping to reduce the number of ICR writes when the
host APICs run in clustered mode, it also reduces them by suppressing
the sends altogether when - by the time
cpu_raise_softirq_batch_finish() is reached - the remote CPU already
managed to handle the softirq. Plus - when using MONITOR/MWAIT - the
update of softirq_pending(cpu), being on the monitored cache line -
should make the remote CPU wake up ahead of the ICR being sent,
allowing the wait-for-ICR-idle latencies to be reduced (perhaps to a
large part due to overlapping the wakeups of multiple CPUs).

With this alone (i.e. without the IPI avoidance patch in place),
average broadcast times for a 64-vCPU guest went down to a measured
maximum of 310us. With that other patch in place, improvements aren't
as clear anymore (short term averages only went down from 255us to
250us, which clearly is within the error range of the measurements),
but longer term an improvement of the averages is still visible.
Depending on hardware, long term maxima were observed to go down quite
a bit (on aforementioned hardware), while they were seen to go up
again on a (single core) Nehalem (where instead the improvement on the
average values was more visible).

Of course this necessarily increases the latencies for the remote
CPU wakeup at least slightly. To weigh between the effects, the
condition to enable batching in vlapic_ipi() may need further tuning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86: suppress event check IPI to MWAITing CPUs
Jan Beulich [Thu, 18 Sep 2014 12:43:49 +0000 (14:43 +0200)]
x86: suppress event check IPI to MWAITing CPUs

Mass wakeups (via vlapic_ipi()) can take enormous amounts of time,
especially when many of the remote pCPU-s are in deep C-states. For
64-vCPU Windows Server 2012 R2 guests on Ivybridge hardware,
accumulated times of over 2ms were observed (average 1.1ms).
Considering that Windows broadcasts IPIs from its timer interrupt,
which at least at certain times can run at 1kHz, it is clear that this
can't result in good guest behavior. In fact, on said hardware guests
with significantly beyond 40 vCPU-s simply hung when e.g. ServerManager
gets started.

Recognizing that writes to softirq_pending() already have the effect of
waking remote CPUs from MWAITing (due to being co-located on the same
cache line with mwait_wakeup()), we can avoid sending IPIs to CPUs we
know are in a (deep) C-state entered via MWAIT.

With this, average broadcast times for a 64-vCPU guest went down to a
measured maximum of 255us (which is still quite a lot).

One aspect worth noting is that cpumask_raise_softirq() gets brought in
sync here with cpu_raise_softirq() in that now both don't attempt to
raise a self-IPI on the processing CPU.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/hvm: always set pending event injection when loading VMC[BS] state
Wen Congyang [Thu, 18 Sep 2014 10:08:45 +0000 (12:08 +0200)]
x86/hvm: always set pending event injection when loading VMC[BS] state

In colo mode, secondary vm is running, so VM_ENTRY_INTR_INFO may
valid before restoring vmcs. If there is no pending event after
restoring vm, we should clear it.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Also clear pending software exceptions.
Copy the fix to SVM as well.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
10 years agox86/p2m: fix conversion macro of p2m_access to XENMEM_access
Tamas K Lengyel [Thu, 18 Sep 2014 09:41:03 +0000 (11:41 +0200)]
x86/p2m: fix conversion macro of p2m_access to XENMEM_access

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 17 Sep 2014 19:15:28 +0000 (20:15 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agoxl: long output of "list" command now contains Dom0 information
Wei Liu [Tue, 16 Sep 2014 10:01:18 +0000 (11:01 +0100)]
xl: long output of "list" command now contains Dom0 information

As we've already generated a JSON config for Dom0, print that out when
requested.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxl: use libxl_retrieve_domain_configuration and JSON format
Wei Liu [Tue, 16 Sep 2014 10:01:17 +0000 (11:01 +0100)]
xl: use libxl_retrieve_domain_configuration and JSON format

Before this change, xl stores domain configuration in "xl" format, which
is in fact a verbatim copy of user supplied domain config.

Now libxl provides a new API to retrieve domain configuration, switch to
that new API, store configuration in JSON format.

Tests done so far (xl.{new,old} denotes xl with{,out} "libxl-json"
support):

1. xl.new create then xl.new save, hexdump saved file: domain config
   saved in JSON format
2. xl.new create, xl.new save then xl.old restore: failed on
   mandatory flag check
3. xl.new create, xl.new save then xl.new restore: succeeded
4. xl.old create, xl.old save then xl.new restore: succeeded
5. xl.new create then local migrate, receiving end xl.new: succeeded
6. xl.old create then local migrate, receiving end xl.new: succeeded

Note that "xl" config is still supported and handled when restarting a
domain. "xl" config file takes precedence over "libxl-json" in that
case, so that user who uses "config-update" to store new config file
won't have regression. All other scenarios (migration, domain listing
etc.) now use the new API.

Lastly, print out warning when users invoke "config-update" to
discourage them from using this command.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: introduce libxl_userdata_unlink
Wei Liu [Tue, 16 Sep 2014 10:01:16 +0000 (11:01 +0100)]
libxl: introduce libxl_userdata_unlink

This will be used in later patch for xl to remove its "xl" userdata
file.

Both CTX lock and userdata lock are taken in this API. CTX lock is taken
to maintain locking hierarchy, but it also has a side effect to protect
against R-M-W by other threads. Userdata lock is used to protect against
domain destruction.

In general application should not rely on these internal locks to
protect its own userdata files. It should deploys its own lock if it
cares.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: introduce libxl_retrieve_domain_configuration
Wei Liu [Tue, 16 Sep 2014 10:01:15 +0000 (11:01 +0100)]
libxl: introduce libxl_retrieve_domain_configuration

Introduce a new public API to return domain configuration. This returned
configuration can be used to rebuild a domain.

Note that this configuration only describes the configuration necessary
to reproduce the guest visible state and does not necessarily include
specific decisions made by the toolstack regarding its current
incarnation (e.g. disk backend) unless they were specified by the
application when the domain was created.

With this approach we can preserve what user has provided in the
original configuration as well as valuable information from xenstore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: refactor libxl_get_memory_target
Wei Liu [Tue, 16 Sep 2014 10:01:14 +0000 (11:01 +0100)]
libxl: refactor libxl_get_memory_target

Introduce a helper function which can return both "target" node and
"static-max" node of a domain. Reimplement libxl_get_memory_target using
this helper. libxl__fill_dom0_memory_info is adjusted as well.

This helper will be used in later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: make libxl_cd_insert "eject" + "insert"
Wei Liu [Tue, 16 Sep 2014 10:01:13 +0000 (11:01 +0100)]
libxl: make libxl_cd_insert "eject" + "insert"

We introduce an intermediate empty state when inserting media into
CDROM. The scheme works like this:

  lock json config
  write empty state to xenstore
  for (;;) {
      write user supplied disk to JSON
      write disk information to xenstore
  }
  unlock json config

Bear in mind that all steps can fail. With the proposed scheme, we now
know, if xenstore is empty, then CDROM should be considered empty;
otherwise we should use JSON version of CDROM configuration.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: synchronise configuration when we hotplug a device
Wei Liu [Tue, 16 Sep 2014 10:01:12 +0000 (11:01 +0100)]
libxl: synchronise configuration when we hotplug a device

We update JSON version first, then write to xenstore, so that we
maintain the following invariant: any device which is present in
xenstore has a corresponding entry in JSON.

The workflow is as followed:
   lock json config
       read json config
       update in-memory json config with new entry, replacing
         any stale entry
       for loop
           open xs transaction
           check device existence, abort if it exists
           write in-memory json config to disk
           commit xs transaction
       end for loop
   unlock json config

Please see comment in libxl_internal.h for correctness proof.

As those routines are called both during domain creation and device
hotplug, we add a flag to indicate whether we need to update JSON
config. This flag is only set to true when we hotplug a device. We
cannot update JSON config during domain creation as JSON config is
committed to disk only when domain creation finishes.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: rework domain userdata file lock
Wei Liu [Tue, 16 Sep 2014 10:01:11 +0000 (11:01 +0100)]
libxl: rework domain userdata file lock

The lock introduced in d2cd9d4f ("libxl: functions to lock / unlock
libxl userdata store") has a bug that can leak the lock file when domain
destruction races with other functions that try to get hold of the lock.

There are several issues:
1. The lock is released too early with libxl__userdata_destroyall
   deletes everything in userdata store, including the lock file.
2. The check of domain existence is only done at the beginning of lock
   function, by the time the lock is acquired, the domain might have
   been gone already.

The effect of this two issues is we can run into such situation:

     Process 1                        Process 2 domain destruction
   # LOCK FUNCTION                 # LOCK FUNCTION
    check domain existence          check domain existence
                                    acquire lock (file created)
                                   # LOCK FUNCTION
                                    destroy all files (lock file deleted,
                                                       lock released)
    acquire lock (file created)
   # LOCK FUNCTION                  destroy domain
                                   # UNLOCK (close fd only)
   [ lock file leaked ]

Fix this problem by deploying following changes:

1. Unlink lock file in unlock function.
2. Modify libxl__userdata_destroyall to not delete domain-userdata-lock,
   so that the lock remains held until unlock function is called.
3. Check domain still exists when the lock is acquired, unlock if
   domain is already gone.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoVMX: don't unintentionally leave x2APIC MSR intercepts disabled
Jan Beulich [Tue, 16 Sep 2014 11:58:20 +0000 (13:58 +0200)]
VMX: don't unintentionally leave x2APIC MSR intercepts disabled

These should be re-enabled in particular when the virtualized APIC
transitions to HW-disabled state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agox86: show page walk when create_bounce_frame() encounters a fault
Jan Beulich [Tue, 16 Sep 2014 11:57:44 +0000 (13:57 +0200)]
x86: show page walk when create_bounce_frame() encounters a fault

... getting the native code in sync with the compat mode one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agopassthrough: streamline _hvm_dirq_assist()
Jan Beulich [Tue, 16 Sep 2014 11:56:45 +0000 (13:56 +0200)]
passthrough: streamline _hvm_dirq_assist()

The loop inside this function was calling two functions with loop-
invariable arguments which clearly don't need calling more than once:
send_guest_pirq() and __msi_pirq_eoi(). After moving these out of the
loop it further became apparent that folding the hvm_pci_msi_assert()
helper into the main function can further help readability.

In the course of this I noticed that __hvm_dpci_eoi() called
hvm_pci_intx_deassert() unconditionally, whereas hvm_pci_intx_assert()
(correctly) got called only when !hvm_domain_use_pirq(), so the former
is being made conditional now too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoxen/arm: check for GICv3 platform support
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:49 +0000 (16:39 +0530)]
xen/arm: check for GICv3 platform support

ID_AA64PFR0_EL1 register provides information about GIC support.
Check for this register in GICv3 driver.

Also print GICv3 support information in boot log

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: add SGI handling for GICv3
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:48 +0000 (16:39 +0530)]
xen/arm: add SGI handling for GICv3

In ARMv8, write to ICC_SGI1R_EL1 register raises trap to EL2.
Handle the trap and inject SGI to vcpu.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Update Dom0 GIC dt node with GICv3 information
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:47 +0000 (16:39 +0530)]
xen/arm: Update Dom0 GIC dt node with GICv3 information

Update GIC device tree node for DOM0 with GICv3
information. GIC hw specfic device tree information
is moved to respective GIC driver.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Add virtual GICv3 support
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:46 +0000 (16:39 +0530)]
xen/arm: Add virtual GICv3 support

Add virtual GICv3 driver support.
Also, with this patch vgic_irq_rank structure is modified to
hold GICv2 GICD_TARGET and GICv3 GICD_ROUTER registers under
union.

This patch adds only basic GICv3 support.
Does not support Interrupt Translation support (ITS)

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoxen/arm: Add support for GIC v3
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:45 +0000 (16:39 +0530)]
xen/arm: Add support for GIC v3

Add support for GIC v3 specification System register access(SRE)
is enabled to access cpu and virtual interface registers based
on kernel GICv3 driver.

This patch adds only basic v3 support.
Does not support Interrupt Translation support (ITS)

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoxen/arm: Add vgic callback to read irq priority
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:44 +0000 (16:39 +0530)]
xen/arm: Add vgic callback to read irq priority

Use callback in vgic driver to read priority for
a given irq

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
10 years agoxen/arm: Calculate irq rank from irq number
Vijaya Kumar K [Fri, 12 Sep 2014 11:09:43 +0000 (16:39 +0530)]
xen/arm: Calculate irq rank from irq number

irq rank calculated is not generic and assumes
hardware register size value which does not work
for future GIC versions like V3.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agox86/APIC: reduce self-IPI related code
Jan Beulich [Fri, 12 Sep 2014 11:48:37 +0000 (13:48 +0200)]
x86/APIC: reduce self-IPI related code

send_IPI_self_{phys,flat}() were identical and send_IPI_self_x2apic()
was misplaced and pointlessly (implictly) had a non-x2APIC code path.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoxen: arm: fix boot on arndale.
Ian Campbell [Thu, 11 Sep 2014 12:55:08 +0000 (13:55 +0100)]
xen: arm: fix boot on arndale.

The differences between Arndale and the Odoid-XU are more interesting
than first though, which results in 0bf8ddecb4df "xen/arm: Add
support for the Odroid-XU board." breaking boot on arndale.

Revert back to arndale compatible behaviour while we sort this out.
Specifically we must (counterintuitively) use the regular (!ns)
sysram and the correct offset is 0x0 and 0x1c.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agolibxc/bitops: Add or() to the available bitmap operations
Andrew Cooper [Wed, 10 Sep 2014 17:10:42 +0000 (18:10 +0100)]
libxc/bitops: Add or() to the available bitmap operations

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/[lib]xl: Correct use of init/dispose for libxl_domain_restore_params
Andrew Cooper [Wed, 10 Sep 2014 17:10:40 +0000 (18:10 +0100)]
tools/[lib]xl: Correct use of init/dispose for libxl_domain_restore_params

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxl: Fix stray blank line from debug logging
Andrew Cooper [Wed, 10 Sep 2014 17:10:39 +0000 (18:10 +0100)]
tools/libxl: Fix stray blank line from debug logging

LOG() automatically adds a newline.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: add missing dependency for xen-init-dom0 in Makefile
Wei Liu [Wed, 10 Sep 2014 15:43:16 +0000 (16:43 +0100)]
libxl: add missing dependency for xen-init-dom0 in Makefile

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxc_cpuid_x86.c: Simplify masking conditions and remove redundant work
Zhuo Song [Wed, 10 Sep 2014 10:29:00 +0000 (18:29 +0800)]
xc_cpuid_x86.c: Simplify masking conditions and remove redundant work

* Since there would not be 32-bit hypervisor, we do not need
  hypervisor_is_64bit() again.

* Remove xen_64bit from xc_cpuid_pv_policy().

* Move conditionals for LM/NX masking into architectural logic.

* Since RDTSCP could be used for both 64-bit and 32-bit architectures,
  we do not need the tying to 64-bit in intel_xc_cpuid_policy().

* vmx_cpuid_intercept() has covered SYSCALL masking when vmexit and
  original is_64bit or is_pae could not cover whether guest OS is really
  in long mode or not. Here to drop the conditionals and leave it to
  vmexit handler to do the real work.

Signed-off-by: Zhuo Song <songzhuo.sz@alibaba-inc.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[ ijc -- inserted missing ) to fix compile error ]

10 years agolinux_gntshr_munmap: munmap takes a length, not a page count
David Scott [Wed, 3 Sep 2014 17:34:21 +0000 (18:34 +0100)]
linux_gntshr_munmap: munmap takes a length, not a page count

This fixes a bug where if a client shares more than 1 page, the
munmap call fails to clean up everything. A process which does
a lot of sharing and unsharing can run out of resources.

This patch also fixes in-tree callers of
  - xc_gntshr_munmap
  - xc_gnttab_munmap
to supply page counts rather than lengths.

Signed-off-by: David Scott <dave.scott@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Add support for the Odroid-XU board.
Suriyan Ramasami [Thu, 4 Sep 2014 22:57:23 +0000 (15:57 -0700)]
xen/arm: Add support for the Odroid-XU board.

The Odroid-XU from hardkernel is an Exynos 5410 based board.

This patch introduces a generic PLATFORM exynos5 which hopefully is
applicable to the majority of  exynos5 based SoCs. It currently has
only been tested on an exynos5410 based (OdroidXU) board and hence
that is the only board listed.

Previously only the Arndale board, based on an exynos5250 was
supported. It was the only exynos based platform that was supported
and it was called exynos5. It has now been renamed to exynos5250. The
Arndale currently is a separate platform mostly cause I do not have
one to test and for the most part the code path for that board is
preserved. To be specific it varies from the generic implementation
as follows:

1. exynos5250 based specific DT mapping for CHIPID and PWM region. I
   believe mainline kernel's DTS for the arndale has those mappings
   already in place.
2. exynos5250 based cpu up code. It appears that exynos5250 already
   has the secondary core powered up and in wfe and hence a
   cpu_up_send_sgi suffices. Here too, I believe that the generic
   code path might be acceptable.

Most of the code for the cpu bring up has been ported over from
mainline linux, and hence should be generic enough for future exynos
based SoCs. All reference to hardcoded memory locations have been
avoided. They are now gleaned from the device tree.

The existing SMP bringup code has been broken since 4557c2292854
"xen: arm: rewrite start of day page table and cpu bring up" which
moved the arndale CPU kick from secure world to non-secure world
without updating it to match the new environment.  Specifically the
sysram address remained hardcoded to the S sysram address and not the
NS sysram address, this is now correctly taken from DT. Secondly the
offset within the sysram where the start address is written is 0x1c
for NS bringup, rather than 0x0 as it is in S bringup.

Signed-off-by: Suriyan Ramasami <suriyan.r@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- updated commit log as discussed on list, plus reformatted
         slightly.
         s/exynos5XXXX/exynos5XXX/ in one error message ]

10 years agox86/mwait-idle: Broadwell support
Len Brown [Tue, 9 Sep 2014 16:11:10 +0000 (18:11 +0200)]
x86/mwait-idle: Broadwell support

Broadwell (BDW) is similar to Haswell (HSW), the preceding processor generation.

Currently, the only difference in their C-state tables is that PC3 max exit latency
is 33usec on HSW and 40usec on BDW.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/mwait-idle: disable Baytrail Core and Module C6 auto-demotion
Len Brown [Tue, 9 Sep 2014 16:10:21 +0000 (18:10 +0200)]
x86/mwait-idle: disable Baytrail Core and Module C6 auto-demotion

Power efficiency improves on Baytrail (Intel Atom Processor E3000)
when Linux disables C6 auto-demotion.

Based on work by Srinidhi Kasagar <srinidhi.kasagar@intel.com>.

Signed-off-by: Len Brown <len.brown@intel.com>
Do the MSR writes on all CPUs rather than just the current one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agox86, idle: add barriers to CLFLUSH workaround
H. Peter Anvin [Tue, 9 Sep 2014 16:09:08 +0000 (18:09 +0200)]
x86, idle: add barriers to CLFLUSH workaround

... since the documentation is explicit that CLFLUSH is only ordered
with respect to MFENCE.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agoevtchn: check control block exists when using FIFO-based events
David Vrabel [Tue, 9 Sep 2014 13:25:58 +0000 (15:25 +0200)]
evtchn: check control block exists when using FIFO-based events

When using the FIFO-based event channels, there are no checks for the
existance of a control block when binding an event or moving it to a
different VCPU.  This is because events may be bound when the ABI is
in 2-level mode (e.g., by the toolstack before the domain is started).

The guest may trigger a Xen crash in evtchn_fifo_set_pending() if:

  a) the event is bound to a VCPU without a control block; or
  b) VCPU 0 does not have a control block.

In case (a), Xen will crash when looking up the current queue.  In
(b), Xen will crash when looking up the old queue (which defaults to a
queue on VCPU 0).

By allocating all the per-VCPU structures when enabling the FIFO ABI,
we can be sure that v->evtchn_fifo is always valid.

EVTCHNOP_init_control_block for all the other CPUs need only map the
shared control block.

A single check in evtchn_fifo_set_pending() before accessing the
control block fixes all cases where the guest has not initialized some
control blocks.

This is XSA-107.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/common: do not implicitly permit access to mapped I/O memory
Arianna Avanzini [Mon, 8 Sep 2014 15:05:34 +0000 (17:05 +0200)]
xen/common: do not implicitly permit access to mapped I/O memory

Currently, the XEN_DOMCTL_memory_mapping hypercall implicitly grants
to a domain access permission to the I/O memory areas mapped in its
guest address space. This conflicts with the presence of a specific
hypercall (XEN_DOMCTL_iomem_permission) used to grant such a permission
to a domain.
This commit separates the functions of the two hypercalls by having only
the latter be able to permit I/O memory access to a domain, and the
former just performing the mapping after a permissions check on both the
granting and the grantee domains.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Paolo Valente <paolo.valente@unimore.it>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Julien Grall <julien.grall@citrix.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Eric Trudeau <etrudeau@broadcom.com>
Cc: Viktor Kleinik <viktor.kleinik@globallogic.com>
Cc: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
10 years agotools/libxl: cleanup the do_pci_add() function
Arianna Avanzini [Mon, 8 Sep 2014 15:05:33 +0000 (17:05 +0200)]
tools/libxl: cleanup the do_pci_add() function

This function modifies the do_pci_add() function in libxl_pci.c
by unindenting a code block whose condition was removed in the
previous commit. The block was left as is to facilitate functional
review of the previous commit; this commit cleans it up.
This commit introduces no functional change.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Paolo Valente <paolo.valente@unimore.it>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Julien Grall <julien.grall@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Eric Trudeau <etrudeau@broadcom.com>
Cc: Viktor Kleinik <viktor.kleinik@globallogic.com>
Cc: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
10 years agotools/libxl: explicitly grant access to needed I/O-memory ranges
Arianna Avanzini [Mon, 8 Sep 2014 15:05:32 +0000 (17:05 +0200)]
tools/libxl: explicitly grant access to needed I/O-memory ranges

This commit changes the existing libxl code to be sure to grant access
permission to PCI-related I/O memory ranges, while setting up passthrough
of PCI devices specified in the domain's configuration, and to VGA-related
memory ranges, while setting up VGA passthrough (if gfx_passthru = 1 in
the domain's configuration).
As for the latter, the newly-added code does not replace any existing one,
but instead matches the calls to xc_domain_memory_mapping() performed by
QEMU on the path that is executed if gfx passthru is enabled and follows
the registration of a new VGA controller (in register_vga_regions(),
defined in hw/pt-graphics.c). In fact, VGA needs some extra memory
ranges to be mapped with respect to PCI; QEMU expects that access to those
memory ranges is implicitly granted when he calls the hypervisor with the
function xc_domain_memory_mapping(): this commit calls iomem_permission
for it when needed by checking the passthru PCI device's class.

NOTE: the code added by this commit still does not verify if the passthru
      of the framebuffer area is being performed for the primary GPU, but
      only replicates the behavior of QEMU which is limited to performing
      the passthru for all PCI devices of VGA class.

This commit is instrumental to the last one in the series, which will
separate the functions of the iomem_permission and memory_mapping DOMCTLs,
so that requesting an I/O-memory range will not imply that access to such
a range is implicitly granted.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Paolo Valente <paolo.valente@unimore.it>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Julien Grall <julien.grall@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Eric Trudeau <etrudeau@broadcom.com>
Cc: Viktor Kleinik <viktor.kleinik@globallogic.com>
Cc: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
10 years agoxsm/flask: handle XEN_DOMCTL_memory_mapping for all architectures
Arianna Avanzini [Thu, 4 Sep 2014 11:49:52 +0000 (13:49 +0200)]
xsm/flask: handle XEN_DOMCTL_memory_mapping for all architectures

Currently, FLASK only handles the memory_mapping hypercall for the
x86 architecture. As the DOMCTL's hook now is in common code and
no more specific to x86, this commit lets the DOMCTL be handled also
for other architectures.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Paolo Valente <paolo.valente@unimore.it>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Eric Trudeau <etrudeau@broadcom.com>
Cc: Viktor Kleinik <viktor.kleinik@globallogic.com>
Cc: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
10 years agolibxl: introduce helper to initialise Dom0
Wei Liu [Thu, 4 Sep 2014 22:43:14 +0000 (23:43 +0100)]
libxl: introduce helper to initialise Dom0

This small helper is responsible for generating Dom0 JSON config
stub and writing Dom0 xenstore entries. This helpers subsumes two calls
to xenstore-write in xencommons script.

Dom0 UUID is intentionally left untouched, so it is always all
zeros.  This makes sure that we don't leak Dom0 stubs across rebooting.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- folded in incremental patch for *BSD ]

10 years agolibxl: disallow attaching the same device more than once
Wei Liu [Thu, 4 Sep 2014 22:43:13 +0000 (23:43 +0100)]
libxl: disallow attaching the same device more than once

Originally the code allowed users to attach the same device more than
once. It just stupidly overwrites xenstore entries. This is bogus as
frontend will be very confused.

Introduce a helper function to check if the device to be written to
xenstore already exists. A new error code is also introduced.

The check and add are within one xs transaction.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: introduce libxl__device_from_pcidev
Wei Liu [Thu, 4 Sep 2014 22:43:12 +0000 (23:43 +0100)]
libxl: introduce libxl__device_from_pcidev

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: store a copy of configuration when creating domain
Wei Liu [Thu, 4 Sep 2014 22:43:11 +0000 (23:43 +0100)]
libxl: store a copy of configuration when creating domain

The configuration is stored in libxl-json format. It will be used as
template to reconstruct domain configuration during runtime.

There's only one write to disk when domain creation finishes. We
therefore have a window that the domain exists but has no JSON config in
disk. We define this state as domain being created or destroyed. Any
other operations that need to access JSON config should bail.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: libxl-json format and internal functions to get / set it
Wei Liu [Thu, 4 Sep 2014 22:43:10 +0000 (23:43 +0100)]
libxl: libxl-json format and internal functions to get / set it

Introduce a new format in libxl userdata store called "libxl-json". This
file format contains JSON version of libxl_domain_config, generated by
libxl. Applications are not supposed to access this file directly.

Two internal functions to get and set libxl_domain_configuration
are also introduced. Also introduce a new error code to indicate
abnormal state that libxl-json config file is empty.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agolibxl: properly lock userdata store
Wei Liu [Thu, 4 Sep 2014 22:43:09 +0000 (23:43 +0100)]
libxl: properly lock userdata store

Originally libxl user data store didn't have lock at all. There could be
such race condition as mentioned by Ian Jackson:

  Task 1                                 Task 2
  Creating the domain                    Trying to shut down

    actually create domain
                                           observe domid
                                           start domain destruction
                                           delete all userdata
                                           destroy domain
    store the userdata
      *** forbidden state created: userdata exists but domain doesn't
      *** userdata has been leaked
    [ would now bomb out ]

This patch adds in proper locking to libxl user data store. The lock is
associated with a specific domain (i.e. a per-domain lock).

As for locking hierachy, we first take CTX lock (which is implemented
with pthread recursive mutex so even if the application has taken it
we're fine), then take the file lock. These locks are released in
reversed order.

A new libxl error code ERROR_LOCK_FAIL is introduced to describe failure
to acquire locks.

Also factor out libxl__userdata_{retrieve,store}, so that other
functions that already hold the lock can call them to manipulate
user data.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: functions to lock / unlock libxl userdata store
Wei Liu [Thu, 4 Sep 2014 22:43:08 +0000 (23:43 +0100)]
libxl: functions to lock / unlock libxl userdata store

This lock is used to protect all userdata files related to a particular
domain, which include but are not limited to domain configuration.  A
new "domain-userdata-lock" entry is introduced in libxl registry.

This lock works among different processes and different threads within
the same process.

Locking protocol inspired by Ian Jackson's chiark-utils with-lock-ex. A
file lock is taken with flock(2). If that succeeds that thread fstat the
fd and stat the lock file path. If the device and inode match then the
lock has been successfully acquired. This lock remains acquired until
the lock file gets deleted or released by flock(2). If device and inode
don't match then another thread acquired the lock and deleted the file
in the meantime; lock procedure should restart.

Portability note: this lock utilises flock(2) so a proper implementation
of flock(2) is required -- that is, it should not be implemented with
fcntl(2).

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: make userdata_path libxl internal function
Wei Liu [Thu, 4 Sep 2014 22:43:07 +0000 (23:43 +0100)]
libxl: make userdata_path libxl internal function

Later patch will make use of it to generate file path and name.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: introduce XENFEAT_grant_map_identity
Stefano Stabellini [Fri, 1 Aug 2014 14:45:25 +0000 (15:45 +0100)]
xen/arm: introduce XENFEAT_grant_map_identity

The flag specifies that the hypervisor maps a grant page to guest
physical address == machine address of the page in addition to the
normal grant mapping address.

Frontends are allowed to map the same page multiple times using multiple
grant references. On the backend side it can be difficult to find out
the physical address corresponding to a particular machine address,
especially at the completion of a dma operation. To simplify address
translations, we introduce a second mapping of the grant at physical
address == machine address so that dom0 can issue cache maintenance
operations without having to find the pfn.

Call arch_grant_map_page_identity and arch_grant_unmap_page_identity
from __gnttab_map_grant_ref and __gnttab_unmap_common to introduce the
second mapping if the domain is directly mapped. To do so we also need
to change gnttab_need_iommu_mapping to just be defined as
is_domain_direct_mapped on arm.

Remove arm_smmu_map_page and arm_smmu_unmap_page as they have become
unused.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: introduce arch_grant_(un)map_page_identity
Stefano Stabellini [Fri, 1 Aug 2014 14:45:24 +0000 (15:45 +0100)]
xen: introduce arch_grant_(un)map_page_identity

Introduce two arch specific functions to create a new p2m mapping of
granted pages at pfn == mfn.
We don't an x86 implementation as these functions should never be
compiled on x86 (they are called from an if (0) statement).

Base the implementation of arm_smmu_(un)map_page on
arch_grant_(un)map_page_identity.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/x86: introduce is_domain_direct_mapped(d) as ((void)(d), 0) on x86
Stefano Stabellini [Fri, 1 Aug 2014 14:45:23 +0000 (15:45 +0100)]
xen/x86: introduce is_domain_direct_mapped(d) as ((void)(d), 0) on x86

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/arm: domain_vgic_init: Avoid double free on shared_irqs
Julien Grall [Fri, 25 Jul 2014 14:17:26 +0000 (15:17 +0100)]
xen/arm: domain_vgic_init: Avoid double free on shared_irqs

When the function domain_vgic_init is failing to initialize pending_irqs,
it will free shared_irqs. Few call later, domain_vgic_free will be called
an try to free a second time the same variable. This will result to a double
free.

Remove the free in domain_vgic_init and rely on domain_vgic_free to correctly
release the memory.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: move itargets initialization to vgic-v2
Vijaya Kumar K [Thu, 4 Sep 2014 13:15:21 +0000 (18:45 +0530)]
xen/arm: move itargets initialization to vgic-v2

itarget registers are GIC version specific. So move
initialization of these registers to vgic-v2 driver.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agomini-os: arm: events
Karim Raslan [Fri, 8 Aug 2014 15:47:37 +0000 (16:47 +0100)]
mini-os: arm: events

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agomini-os: arm: scheduling
Thomas Leonard [Fri, 8 Aug 2014 15:47:36 +0000 (16:47 +0100)]
mini-os: arm: scheduling

Based on an initial patch by Karim Raslan.

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agomini-os: arm: memory management
Thomas Leonard [Fri, 8 Aug 2014 15:47:35 +0000 (16:47 +0100)]
mini-os: arm: memory management

Based on an initial patch by Karim Raslan.

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>