]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agolibxl/remus: setup and control network output buffering
Yang Hongyang [Fri, 18 Jul 2014 07:08:36 +0000 (15:08 +0800)]
libxl/remus: setup and control network output buffering

This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:

1. Hotplug scripts: The remus-netbuf-setup script is responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering.  This script should be invoked by libxl for
  each of the guest's network interfaces, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the REMUS_IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

2. Remus network device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) init_subkind_nic() & cleanup_subkind_nic() are called once per Remus
      invocation. They establish and free netlink related state respectively.

   b) setup() and teardown are called for each vif attached to the
      guest.
      During setup():
      i) The hotplug script is called to setup a network buffer on a
         given vif. The script chooses an available IFB device from
         the system, redirects vif egress traffic to the IFB device
         and sets up the plug qdisc (output buffer) on the IFB device.
         The name of the IFB device is communicated via xenstore to
         libxl.

      ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
          and subsequently controls output buffering using this handle
          in the checkpoint callbacks.

      During teardown(), the hotplug scripts are called again to remove
      the vif->ifb traffic redirection, release the ifb and the plug
      qdisc associated with it.

   c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
      are implemented as synchronous ops as the netlink calls associated
      with the qdisc subsystem are very fast.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl/remus: introduce an abstract Remus device layer
Yang Hongyang [Fri, 18 Jul 2014 07:02:34 +0000 (15:02 +0800)]
libxl/remus: introduce an abstract Remus device layer

Introduce an abstract device layer that allows the Remus
logic in libxl to control a guest's devices in a device-agnostic
manner. The device layer also exposes a set of internal interfaces
that a device type must implement, if it wishes to support Remus.

The following API are exposed to libxl:

One-time configuration operations:
  *libxl__remus_devices_setup
    > Enable output buffering for NICs, setup disk replication, etc.
  *libxl__remus_devices_teardown
    > Disable network output buffering and disk replication;
      teardown any associated external setups like qdiscs for NICs.

Operations executed every checkpoint (in order of invocation):
  *libxl__remus_devices_postsuspend
  *libxl__remus_devices_preresume
  *libxl__remus_devices_commit

Each device type needs to implement the interfaces specified in
the libxl__remus_device_instance_ops if it wishes to support Remus.

The high-level control flow through the Remus device layer is shown below:

xl remus
  |->  libxl_domain_remus_start
    |-> libxl__remus_devices_setup
      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
        ...
        |-> On backup failure/network error/other errors
            libxl__remus_devices_teardown

callback processing
* Only call the per-device libxl__multidev_one_callback
  when the iteration has succeded or failed.
* The final callback (called by multidev) is a trivial
  shim to shuffle the pointers and notify our own caller.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agoautoconf: add libnl3 dependency for Remus network buffering support
Yang Hongyang [Fri, 27 Jun 2014 01:43:51 +0000 (09:43 +0800)]
autoconf: add libnl3 dependency for Remus network buffering support

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
It also provides the ability to configure tools without libnl3 support
i.e., without network buffering support.

When there is no network buffering support, libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.

NOTE: This patch changes tools/configure.ac, please rerun
      autogen.sh while applying the patch.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: Extend libxl__ao_device with a libxl__ev_child member
Yang Hongyang [Fri, 18 Jul 2014 08:40:54 +0000 (16:40 +0800)]
libxl: Extend libxl__ao_device with a libxl__ev_child member

This can be used to fork children to allow the asynchronous execution
of system calls which only come in a synchronous variant. This will
be useful for Remus, in the following patches.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: introduce libxl__multidev_prepare_with_aodev
Yang Hongyang [Fri, 18 Jul 2014 08:26:50 +0000 (16:26 +0800)]
libxl: introduce libxl__multidev_prepare_with_aodev

libxl__multidev_prepare_with_aodev is similar to libxl__multidev_prepare,
but takes a libxl__ao_device as an extra argument.
libxl__multidev_prepare is now a wrapper around
libxl__multidev_prepare_with_aodev.

This new internal API will be used by the Remus device abstract layer
for handling various Remus devices.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: multidev: Expose libxl__multidev_one_callback
Ian Jackson [Thu, 25 Sep 2014 15:04:01 +0000 (16:04 +0100)]
libxl: multidev: Expose libxl__multidev_one_callback

Now a caller who wants to be able to do other work when the aodev
completes can put their own callback into the aodev, and make the
multidev machinery aware that the particular aodev is complete (from
the point of view that multidev should have) whenever it likes.

No functional change in this patch.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: multidev: Clarify comments about which callbacks are meant
Ian Jackson [Thu, 25 Sep 2014 14:59:06 +0000 (15:59 +0100)]
libxl: multidev: Clarify comments about which callbacks are meant

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agoEFI: add arch specific function to control use of config file
Roy Franz [Fri, 26 Sep 2014 10:00:55 +0000 (12:00 +0200)]
EFI: add arch specific function to control use of config file

The x86 EFI build of Xen always uses a configuration file to load modules, but
the ARM version can either use a config file to specify the modules, or be
loaded by GRUB in which case GRUB loads the modules and adds them to the DTB
that is passed to Xen.  Add the efi_arch_use_config_file() to indicate if a
configuration file is required.  For x86, this will always be true.  ARM will
examine the DTB passed via EFI configuration table (if any), and if it contains
module information will use that that not use the configuration file at all.
Add Emacs footer to efi-boot.h and boot.c

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add several misc. arch functions for boot code
Roy Franz [Fri, 26 Sep 2014 10:00:27 +0000 (12:00 +0200)]
EFI: add several misc. arch functions for boot code

Add efi_arch_blexit() for arch specific cleanup on error exit,
efi_arch_load_addr_check() to do the arch specific verifications
of where the UEFI firmware loaded Xen, and efi_arch_cpu() for
probing CPU features.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add arch specific module handling to read_file()
Roy Franz [Fri, 26 Sep 2014 09:59:56 +0000 (11:59 +0200)]
EFI: add arch specific module handling to read_file()

Each architecture tracks modules differently internally, so add
efi_arch_handle_module() routine to enable the common code to invoke the proper
handling of modules as they are loaded.  Module handling for ucode,ramdisk, and
xsm is changed to not process remainder of string after filename as options,
since these modules don't take options.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agouse relative path for true(1)
Roger Pau Monné [Fri, 26 Sep 2014 09:59:18 +0000 (11:59 +0200)]
use relative path for true(1)

On FreeBSD true(1) is at /usr/bin/true.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
10 years agoVT-d: don't needlessly suppress page table sharing
Jan Beulich [Fri, 26 Sep 2014 09:56:45 +0000 (11:56 +0200)]
VT-d: don't needlessly suppress page table sharing

Despite the mid term goal being to do away with the sharing there's no
point in suppressing it in cases where it can be used now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Thu, 25 Sep 2014 12:43:41 +0000 (13:43 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agoEFI: arch specific memory setup
Roy Franz [Thu, 25 Sep 2014 12:30:16 +0000 (14:30 +0200)]
EFI: arch specific memory setup

This patch adds efi_arch_memory() to allow each architecture a hook
to use for do memory setup.  x86 uses this for trampoline memory setup
and some pagetable setup.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: create arch functions for console and video init
Roy Franz [Thu, 25 Sep 2014 12:29:51 +0000 (14:29 +0200)]
EFI: create arch functions for console and video init

Add arch functions for text console and graphics initialization, and move VGA
specific code to x86 architecture file.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: move x86 specific disk probing code
Roy Franz [Thu, 25 Sep 2014 12:29:29 +0000 (14:29 +0200)]
EFI: move x86 specific disk probing code

Move x86 specific disk (EDD) probing to arch specific file.  This code is x86
only and relates to legacy BIOS handling of disk drives.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add efi_arch_handle_cmdline() for processing commandline
Roy Franz [Thu, 25 Sep 2014 12:28:48 +0000 (14:28 +0200)]
EFI: add efi_arch_handle_cmdline() for processing commandline

Add arch function for processing the Xen commandline and
updating internal structures.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add efi_arch_cfg_file_early/late() to handle arch specific cfg file fields
Roy Franz [Thu, 25 Sep 2014 12:28:27 +0000 (14:28 +0200)]
EFI: add efi_arch_cfg_file_early/late() to handle arch specific cfg file fields

Different architectures have some different configuration file
fields that need to be handled.  In particular, x86 has ucode
and ARM has device tree files to be loaded.  These arch specific
functions is used to allow each architecture to implement these
features in arch specific code.  Early/late versions are provided,
as ARM needs to process the DTB entry first, and x86 wants to process
the ucode entry last as it is the smallest allocation.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add architecture functions for pre/post ExitBootServices
Roy Franz [Thu, 25 Sep 2014 12:27:55 +0000 (14:27 +0200)]
EFI: add architecture functions for pre/post ExitBootServices

The UEFI ExitBootServices function is invoked to transition the
system to the 'runtime' mode of operation, and is done right before
transitioning from the EFI loader code into Xen proper. x86 does some
arch specific memory management (trampoline) before exit boot services,
and the code that transitions from the EFI application state to Xen
is architecture specific.  This patch adds two functions, one pre
and one post ExitBootServices to allow each architecture to
to handle these cases in a customized manner.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: create arch functions to allocate memory for and process memory map
Roy Franz [Thu, 25 Sep 2014 12:26:34 +0000 (14:26 +0200)]
EFI: create arch functions to allocate memory for and process memory map

The memory used to store the EFI memory map is allocated in an architecture
specific way, and the processing of the memory map itself uses x86 specific
data structures. This patch adds architecture specific funtions so each
architecture can provide its own implementation.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: move x86 specific functions/variables to arch header
Roy Franz [Thu, 25 Sep 2014 12:24:52 +0000 (14:24 +0200)]
EFI: move x86 specific functions/variables to arch header

Move the global variables and functions that can be moved as-is
from the common boot.c file to the x86 implementation header file.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: move x86 boot/runtime code to common/efi
Roy Franz [Thu, 25 Sep 2014 12:22:12 +0000 (14:22 +0200)]
EFI: move x86 boot/runtime code to common/efi

This moves the EFI boot and runtime services code to the common/efi directory.
This code is symbolicly linked back into the arch/x86/efi directory where it is
built if a build-time check for PE/COFF support in the toolchain passes.  In
the PE/COFF supporting case, both the EFI executable and the normal Xen image
(with stubbed EFI functions) are built.  We can't use the normal common build
infrastructure since we are building two versions at the same time, with
different EFI related code in each.  No code changes, just file movement and
make updates.  The files are symbolicly linked at build time back toe the
original arch/x86/efi directory.  This is in preparation for adding ARM EFI
support where much of these files can be shared.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/vlapic: don't silently accept bad vectors
Jan Beulich [Thu, 25 Sep 2014 12:10:01 +0000 (14:10 +0200)]
x86/vlapic: don't silently accept bad vectors

Vectors 0-15 are reserved, and a physical LAPIC - upon sending or
receiving one - would generate an APIC error instead of doing the
requested action. Make our emulation behave similarly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/vlapic: a few type adjustments
Jan Beulich [Thu, 25 Sep 2014 12:09:10 +0000 (14:09 +0200)]
x86/vlapic: a few type adjustments

Constify a couple of pointer parameters, convert a boolean function
return type to bool_t, and clean up a printk() being touched anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/HVM: fix ID handling of x2APIC emulation
Jan Beulich [Thu, 25 Sep 2014 12:08:20 +0000 (14:08 +0200)]
x86/HVM: fix ID handling of x2APIC emulation

- properly change ID when switching into x2APIC mode (instead of
  mimicking necessary behavior in hvm_x2apic_msr_read())
- correctly (meaningfully) set LDR (so far it ended up being 1 on all
  vCPU-s)
- even if we don't support more than 128 vCPU-s in a HVM guest for now,
  we should properly handle IDs as 32-bit values (i.e. not ignore the
  top 24 bits)
- with that, properly do cluster ID and bit mask check in
  vlapic_match_logical_addr()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/HVM: fix miscellaneous aspects of x2APIC emulation
Jan Beulich [Thu, 25 Sep 2014 12:07:27 +0000 (14:07 +0200)]
x86/HVM: fix miscellaneous aspects of x2APIC emulation

- generate #GP on invalid APIC base MSR transitions
- fail reads from the EOI and self-IPI registers (which are write-only)
- handle self-IPI writes and the ICR2 half of ICR writes largely in
  hvm_x2apic_msr_write() and (for self-IPI only) vlapic_apicv_write()
- don't permit MMIO-based access in x2APIC mode
- filter writes to read-only registers in hvm_x2apic_msr_write(),
  allowing conditionals to be dropped from vlapic_reg_write()
- don't ignore upper half of MSR-based write to ESR being non-zero
- don't ignore other writes to reserved bits
- VMX's EXIT_REASON_APIC_WRITE must not result in #GP (this exit being
  trap-like, this exception would get raised on the wrong RIP)
- make hvm_x2apic_msr_read() produce X86EMUL_* return codes just like
  hvm_x2apic_msr_write() does (benign to the only caller)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agolibxl: Fix build dependency for libxl.h.
Anthony PERARD [Wed, 24 Sep 2014 15:30:34 +0000 (16:30 +0100)]
libxl: Fix build dependency for libxl.h.

libxl.h includes _libxl_list.h, but the Makefile does not reflect this
dependency. This can lead to build error due to a missing _libxl_list.h
file.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: correct VTCR setting on arm32.
Ian Campbell [Wed, 24 Sep 2014 14:13:28 +0000 (15:13 +0100)]
xen: arm: correct VTCR setting on arm32.

1c92a2aaf8c6 "xen: arm: support for up to 48-bit IPA addressing on
arm64" inadvertently changes the VTCR setting for 32-bit from
0x80003558 to 0x80003518, changing the SL0 setting from 0x1 (p2m
starts at L1) to 0x0 (p2m starts at L2).

For some (inexplicable) reason this doesn't cause any issue on
Arndale but it does on the OdroidXU.

Reported-by: Suriyan Ramasami <suriyan.r@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Tested-by: Suriyan Ramasami <suriyan.r@gmail.com>
10 years agotools/libxc: Avoid cacheflush toolstack hypercalls on x86
Andrew Cooper [Wed, 24 Sep 2014 16:28:15 +0000 (17:28 +0100)]
tools/libxc: Avoid cacheflush toolstack hypercalls on x86

XEN_DOMCTL_cacheflush hypercalls are (and will always be) -ENOSYS on x86, but
xc_domain_cacheflush() is called often during domain build and migrate for
correct behaviour on ARM.

Stub xc_domain_cacheflush() out on x86 to remove its pressure on the global
domctl lock, and the hypercall overhead (which applies further pressure to the
already heavily-contended TLB flush lock).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86: use constant as multiboot protocol identifier
Daniel Kiper [Thu, 25 Sep 2014 10:08:03 +0000 (12:08 +0200)]
x86: use constant as multiboot protocol identifier

... instead of plain number.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
10 years agox86: define e820 entries counter as unsigned int
Daniel Kiper [Thu, 25 Sep 2014 10:07:22 +0000 (12:07 +0200)]
x86: define e820 entries counter as unsigned int

e820 entries counter is inherently an unsigned quantity
so define it as unsigned int.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/hvm: Forced Emulation Prefix for debug builds of Xen
Andrew Cooper [Thu, 25 Sep 2014 10:06:24 +0000 (12:06 +0200)]
x86/hvm: Forced Emulation Prefix for debug builds of Xen

Analysis of XSAs 105 and 106 show that is possible to force a race condition
which causes any arbitrary instruction to be emulated.

To aid testing, explicitly introduce the Forced Emulation Prefix for debug
builds alone.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agomisc/coverity: Model __builtin_unreachable()
Andrew Cooper [Thu, 25 Sep 2014 10:00:07 +0000 (12:00 +0200)]
misc/coverity: Model __builtin_unreachable()

This resolves 23 issues Coverity had identified by following the false path of
an ASSERT().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/LAPIC: drop support for non-integrated APIC
Jan Beulich [Thu, 25 Sep 2014 09:56:22 +0000 (11:56 +0200)]
x86/LAPIC: drop support for non-integrated APIC

We never really supported such, even in the 32-bit days.

As a minor extra thing move the APIC_SELF_IPI definition out of the
middle of Divider Configuration Register ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: make dump_pageframe_info() slightly more verbose for dying domains
Jan Beulich [Thu, 25 Sep 2014 09:55:49 +0000 (11:55 +0200)]
x86: make dump_pageframe_info() slightly more verbose for dying domains

Allowing more than just 10 pages to be printed in this case gives a
better chance to fully understand eventual page reference leaks: Report
up to 16 "normal" (writable or untyped) pages, and an unlimited number
of special type (page or descriptor table) ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation
Jan Beulich [Thu, 25 Sep 2014 09:53:32 +0000 (11:53 +0200)]
x86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation

SYSCALL:
- make sure SS selector has RPL 0
- only use 32 bits of RIP to fill RCX when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless initializers and casts
- drop redundant MSR_STAR read (as suggested by Andrew Cooper)

SYSENTER/SYSEXIT:
- #GP condition doesn't depend on guest mode
- only use 32 bits for setting RIP/RSP when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless (and inconsistently used) casts

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/p2m: typo fix for spelling ambiguous
Tamas K Lengyel [Wed, 24 Sep 2014 09:19:57 +0000 (11:19 +0200)]
x86/p2m: typo fix for spelling ambiguous

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 24 Sep 2014 09:15:19 +0000 (10:15 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agox86/EFI: fix freeing of uninitialized pointer
Roy Franz [Wed, 24 Sep 2014 09:09:11 +0000 (11:09 +0200)]
x86/EFI: fix freeing of uninitialized pointer

The only valid response from the LocateHandle() call is EFI_BUFFER_TOO_SMALL,
so exit if we get anything else.  We pass a 0 size/NULL pointer buffer, so the
only other returns we will get is an error.  Return right away as there is
nothing to do.  Also return if there is an error allocating the buffer, as the
previous code path also allowed for an undefined pointer to be freed.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Re-structure the change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agoflask/policy: use naming convention xenpolicy-$(XEN_FULLVERSION)
Wei Liu [Mon, 15 Sep 2014 19:29:15 +0000 (20:29 +0100)]
flask/policy: use naming convention xenpolicy-$(XEN_FULLVERSION)

Daniel suggested we use xenpolicy-$(XEN_FULLVERSION) as flask policy
naming convention.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
10 years agoFix QEMU cross-compile build
Stefano Stabellini [Tue, 23 Sep 2014 16:29:29 +0000 (17:29 +0100)]
Fix QEMU cross-compile build

Introduce the per-arch IOEMU_CPU_ARCH variable.
Always pass --configure=IOEMU_CPU_ARCH to QEMU's configure script.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- dropped redundant comments ]

10 years agoMAINTAINERS: Add Wei Liu as toolstack co-maintainer.
Ian Campbell [Mon, 22 Sep 2014 16:10:39 +0000 (17:10 +0100)]
MAINTAINERS: Add Wei Liu as toolstack co-maintainer.

The three existing maintainers are not really able to keep up with
the flow and Wei is one of the top tools contributors (according to
"git shortlog -s -n -p RELEASE-4.4.0..origin/staging tools" and my
own impressions).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agodocs: add PVH specification
Roger Pau Monne [Tue, 23 Sep 2014 16:17:18 +0000 (18:17 +0200)]
docs: add PVH specification

Introduce a document that describes the interfaces used on PVH. This
document has been designed from a guest OS point of view (i.e.: what a guest
needs to do in order to support PVH).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Mukesh Rathor <mukesh.rathor@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
10 years agoxen/arm: remove check for generic timer support for arm64
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:49 +0000 (17:43 +0530)]
xen/arm: remove check for generic timer support for arm64

Information about support for generic support is available in
IDR_PFR1 register in ARMv7. Where as this information is not
available in ARMv8 that supports only aarch64 bit mode.
ARMv8 being always supports generic timer, this check is not
required.

For platforms that support only aarch64 mode, IDR_PFR1 is
not implemented

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Restricted access to IFSR32_EL2 and FPEXC32_EL2
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:48 +0000 (17:43 +0530)]
xen/arm: Restricted access to IFSR32_EL2 and FPEXC32_EL2

IFSR32_EL1 and FPEXC32_EL1 registers are accessible in
aarch64 mode only if aarch32 mode is support in EL1.
So allow access to these registers only for 32-bit domains.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Use REG_RANK_INDEX macro
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:46 +0000 (17:43 +0530)]
xen/arm: Use REG_RANK_INDEX macro

Use REG_RANK_INDEX macro to compute index to access
vgic ipriority[] and itargets[] for a given irq.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Remove a duplicate calculation of be_path
Ian Jackson [Tue, 23 Sep 2014 16:46:21 +0000 (17:46 +0100)]
libxl: Remove a duplicate calculation of be_path

Coverity-ID: 1238177
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agomake: Make "src-tarball" target actually make a source tarball
George Dunlap [Mon, 15 Sep 2014 16:25:04 +0000 (17:25 +0100)]
make: Make "src-tarball" target actually make a source tarball

At the moment, making a release tarball is an annoyingly manual
process that involves running "git archive" into a temporary directory.

Script this process up and make a target, so that the release manager
can simply type "make src-tarball-release" and have everything show up
nice and neat in dist/xen-$version.tar.gz.  "make src-tarball" will
make a version number based on git describe, which will typically have
the most recent tag, number of commits since that tag, and the git
commit id of the current HEAD.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agomake: Add subtree-force-update target
George Dunlap [Mon, 15 Sep 2014 16:25:03 +0000 (17:25 +0100)]
make: Add subtree-force-update target

subtree-force-update will update all subtrees according to the current TAG specified
in Config.mk.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoxen: arm: Add support for the Exynos secure firmware
Suriyan Ramasami [Mon, 22 Sep 2014 18:33:54 +0000 (11:33 -0700)]
xen: arm: Add support for the Exynos secure firmware

The existence of secure firmware is dictated by the presence of
"samsung,secure-firmware" in the DT.

The Arndale board does not have that entry, and uses the address as defined
in "samsung,exynos4210-sysram", offset 0 as the smp init address. This is
possibly true for all SoCs without secure firmware.

For other boards which do have a "secure-firmware" node, use sysram-ns
at offset +0x1c as the smp init address.

The "secure-firmware" MMIO range contains ways to idle the CPU. As this gets
mapped to DOM0 because of its presence in the DT, we blacklist it.

Have tested this on the Odroid XU. I have also tested the other code path
on the Odroid XU by removing "secure-firmware" from its DT. I could see
that the other code path was exercised with correct smp init address
values.

Signed-off-by: Suriyan Ramasami <suriyan.r@gmail.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/viridian: Add partition time reference counter MSR support
Paul Durrant [Tue, 23 Sep 2014 10:40:10 +0000 (11:40 +0100)]
x86/viridian: Add partition time reference counter MSR support

This patch optionally re-instates support for the partition time reference
counter that was previously introduced by commit
e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b and reverted by commit
1cd4fab14ce25859efa4a2af13475e6650a5506c. The previous implementation was
non-optional and flawed.

This implementation uses the tsc of vcpu0, which is preserved across
save/restore as part of the architectural state, and then converts that
to a 100ns tick using the domain's tsc_khz.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Christoph Egger <chegger@amazon.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/viridian: Re-purpose the HVM parameter to be a feature mask
Paul Durrant [Tue, 23 Sep 2014 10:40:09 +0000 (11:40 +0100)]
x86/viridian: Re-purpose the HVM parameter to be a feature mask

The following commits introduced the time reference counter MSR and
TSC/APIC frequency MSRs into the viridian feature set respectively:

e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b
84657efd9116f40924aa13c9d5a349e007da716f

The time reference counter MSR feature was then reverted by commit

1cd4fab14ce25859efa4a2af13475e6650a5506c

because a flaw in the implementation meant the counter was reset on
migration.

All of these changes were made without any addtional options being
added to the VM configuration, or any compatibility checks being made
in the domain save/restore code. Hence setting the single boolean
'viridian' option in the VM configuration yields a different set of
features depending on which version of Xen the VM is started on, and the
feature set can change across migration (so new MSRs can magically appear).

This patch grandfathers in the current viridian features set and calls them
the 'base' and 'freq' feature sets. HVM_PARAM_VIRIDIAN is re-purposed as
a feature mask. The hypervisor has only ever allowed it ot be set to 0
or 1, so the presence of the base and freq sets are indicated by setting
bit 0. The freq set can then be turned off by setting bit 1, thus
restoring the pre-Xen-4.4 base set. Newly implemented viridian features
can be optionally enabled in future by setting further bits.

The viridian option in xl.cfg(5) has also been changed to a list so
that the sets can be individually enabled or disabled. For compatibility,
if the option is specified as a boolean, then a true (1) value will enable
the base and freq sets and a false (0) value will not enable any
enlightenments.

This patch also alters the allowed write accesses to HVM_PARAM_VIRIDIAN.
Currently there is nothing to stop the guest writing this value (which,
while harmless to anything else, should not happen) and nothing to
stop a toolstack from setting the value back to zero whilst the guest is
running, causing CPUID leaves to disappear and MSR accesses to start
causing GPFs in the guest. Both of these possibilities are now disallowed:
Once the parameter is set to a non-zero value it may not be modified (only
re-written with the same value), and guests no longer have any write
access.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: David Scott <dave.scott@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxl: introduce rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:15:02 +0000 (18:15 -0400)]
xl: introduce rtds scheduler

Add xl command for rtds scheduler
Note: VCPU's parameter (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: add rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:14:43 +0000 (18:14 -0400)]
libxl: add rtds scheduler

Add libxl functions to set/get domain's parameters for rtds scheduler
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: add rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:14:18 +0000 (18:14 -0400)]
libxc: add rtds scheduler

Add xc_sched_rtds_* functions to interact with Xen to set/get domain's
parameters for rtds scheduler.
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- xenctrl.h has moved to tools/libxc/include, adjust patch to match ]

10 years agoxen: add real time scheduler rtds
Meng Xu [Sat, 20 Sep 2014 22:13:48 +0000 (18:13 -0400)]
xen: add real time scheduler rtds

This scheduler follows the Preemptive Global Earliest Deadline First
(EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher
priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
idle or has a lower-priority VCPU running on it.)

Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each period;
A VCPU has its budget replenished at the beginning of each period;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each period.
If a VCPU runs out of budget in a period, it has to wait until next period.

Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Queue scheme:
A global runqueue and a global depletedq for each CPU pool.
The runqueue holds all runnable VCPUs with budget and sorted by deadline;
The depletedq holds all VCPUs without budget and unsorted.

Note: cpumask and cpupool is supported.

This is an experimental scheduler.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- use PRI_stime to print delta in burn_budget, to fix build on
         32-bit (i.e. arm32) ]

10 years agotools: enable QEMU for ARM builds
Stefano Stabellini [Fri, 1 Aug 2014 15:32:19 +0000 (16:32 +0100)]
tools: enable QEMU for ARM builds

Build qemu-xen on ARM and ARM64: it is used to provide the PV backends,
disk and framebuffer in particular.

Ideally we would also modify the configure options to only build what is
necessary: a machine just for PV backends. However that is a work in
progress and not yet available in QEMU (see
http://marc.info/?l=qemu-devel&m=139082425718379&w=2). So we just build
the usual i386 target, even though no i386 emulation is going to be done
by qemu-xen on ARM.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoMove xenstore and libxc public headers to include subdir
Stefano Stabellini [Thu, 10 Jul 2014 15:35:28 +0000 (15:35 +0000)]
Move xenstore and libxc public headers to include subdir

Also moves xc_dom.h to include as it is used often by other xen tools.
Use the new include subdirectories to build Xen tools, qemu-xen and
stubdoms.

Add the old libxc include path to the programs that need it to build,
on a case by case basis and commeting that they shouldn't require
internal libxc headers to build.

[ And: update QEMU_TRADITIONAL_REVISION to corresponding qemu patch
   - Ian jackson ]

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoRerun autogen.sh after 7d7147762282 "Use configure --sysconfdir=DIR to se..."
Ian Campbell [Tue, 23 Sep 2014 13:08:51 +0000 (14:08 +0100)]
Rerun autogen.sh after 7d7147762282 "Use configure --sysconfdir=DIR to se..."

I tried to do this but failed to commit --amend correctly before pushing.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86emul: only emulate software interrupt injection for real mode
Jan Beulich [Tue, 23 Sep 2014 12:33:50 +0000 (14:33 +0200)]
x86emul: only emulate software interrupt injection for real mode

Protected mode emulation currently lacks proper privilege checking of
the referenced IDT entry, and there's currently no legitimate way for
any of the respective instructions to reach the emulator when the guest
is in protected mode.

This is XSA-106.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
10 years agox86/emulate: check cpl for all privileged instructions
Andrew Cooper [Tue, 23 Sep 2014 12:33:06 +0000 (14:33 +0200)]
x86/emulate: check cpl for all privileged instructions

Without this, it is possible for userspace to load its own IDT or GDT.

This is XSA-105.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrei LUTAS <vlutas@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/shadow: fix race condition sampling the dirty vram state
Andrew Cooper [Tue, 23 Sep 2014 12:31:47 +0000 (14:31 +0200)]
x86/shadow: fix race condition sampling the dirty vram state

d->arch.hvm_domain.dirty_vram must be read with the domain's paging lock held.

If not, two concurrent hypercalls could both end up attempting to free
dirty_vram (the second of which will free a wild pointer), or both end up
allocating a new dirty_vram structure (the first of which will be leaked).

This is XSA-104.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agoUse configure --sysconfdir=DIR to set CONFIG_DIR
Olaf Hering [Mon, 22 Sep 2014 13:00:07 +0000 (15:00 +0200)]
Use configure --sysconfdir=DIR to set CONFIG_DIR

Preserve existing behaviour: if the option was not given, set existing
defaults for FreeBSD, Solaris and everything else.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
10 years agotools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE
Olaf Hering [Mon, 22 Sep 2014 13:00:06 +0000 (15:00 +0200)]
tools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE

Remove hardcoded /var/run/xen directory path, use XEN_RUN_DIR instead.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: provide variable paths to libxc
Olaf Hering [Mon, 22 Sep 2014 13:00:05 +0000 (15:00 +0200)]
tools/libxc: provide variable paths to libxc

In preparation to remove hardcoded /var/run/xen paths, provide
XEN_RUN_DIR and related directories to xc_private.h. Similar code exists
already for libxl, stubdom and other parts.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxl: use buildmakevars2header to create _paths.h
Olaf Hering [Mon, 22 Sep 2014 13:00:04 +0000 (15:00 +0200)]
tools/libxl: use buildmakevars2header to create _paths.h

Replace usage of buildmakevars2file with buildmakevars2header. The macro
generates a C header file, so remove code which converts shell variables
into C defines. Also update the dependency, the macro itself creates a
dependency for _paths.h. A temporary file is not needed anymore.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: add new macro buildmakevars2header
Olaf Hering [Mon, 22 Sep 2014 13:00:03 +0000 (15:00 +0200)]
Config.mk: add new macro buildmakevars2header

This macro is similar to buildmakevars2file, it just creates a C header
file instead of shell style syntax. Upcoming changes will use this macro
in libxl and libxc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: replace dependency to genpath with actual target
Olaf Hering [Mon, 22 Sep 2014 13:00:02 +0000 (15:00 +0200)]
Config.mk: replace dependency to genpath with actual target

genpath is a detail of buildmakevars2file. Replace the dependency to
genpath with the actual buildmakevars2file target. This change by
itself does not fix any bug. Upcoming changes will add dependencies to
$(target), but no rule exist to create $(target).

To force a rebuild of the $(1) rule the target now depends on the
existing .phony target. This dummy target is already used elsewhere in
the code.

No change in behaviour is expected by this patch.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: move directory list into BUILD_MAKE_VARS
Olaf Hering [Mon, 22 Sep 2014 13:00:01 +0000 (15:00 +0200)]
Config.mk: move directory list into BUILD_MAKE_VARS

To maintain the list of directories in a single place, move the existing
list into its own variable and use it in buildmakevars2file.
Required for upcoming changes.
Trim also whitespaces.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoremove obsolete SUBSYS_DIR variable
Olaf Hering [Mon, 22 Sep 2014 13:00:00 +0000 (15:00 +0200)]
remove obsolete SUBSYS_DIR variable

/var/run is a runtime directory. It is not supposed to be packaged.
Remove unused SUBSYS_DIR variable from Config.mk and distro_mapping.txt.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/examples: remove obsolete install targets
Olaf Hering [Mon, 22 Sep 2014 12:59:59 +0000 (14:59 +0200)]
tools/examples: remove obsolete install targets

install-hotplug and install-udev are obsolete since commit 57bcfa11
("tools/hotplug: Separate OS-specific scripts.")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: use XEN_LOCK_DIR instead of hardcoded path
Olaf Hering [Mon, 22 Sep 2014 12:59:58 +0000 (14:59 +0200)]
tools/hotplug: use XEN_LOCK_DIR instead of hardcoded path

Use XEN_LOCK_DIR because it is a compiletime setting.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_LOCK_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:57 +0000 (14:59 +0200)]
tools/hotplug: create XEN_LOCK_DIR at runtime

Create XEN_LOCK_DIR because it is a compiletime setting. Also /var/lock
might be empty on startup because it is a tmpfs mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_RUN_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:56 +0000 (14:59 +0200)]
tools/hotplug: create XEN_RUN_DIR at runtime

Create XEN_RUN_DIR instead of hardcoded path because it is a compiletime
setting. Also /var/run might be empty on startup because it is a tmpfs
mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/pygrub: store kernels in /var/run/xen/pygrub
Olaf Hering [Mon, 22 Sep 2014 12:59:55 +0000 (14:59 +0200)]
tools/pygrub: store kernels in /var/run/xen/pygrub

Move location of temporary bootfiles from /var/run/xend/boot to
/var/run/xen/pygrub. Create the subdirectory if does not exist.
The <dir> argument --output-directory must be an existing directory.

The reason for this change is that all entrys below /var/run have to be
created at runtime in case /var/run is cleared on every boot.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: remove obsolete path.py from tools/python
Olaf Hering [Mon, 22 Sep 2014 12:59:54 +0000 (14:59 +0200)]
tools: remove obsolete path.py from tools/python

The directory tools/python/xen/util does not exist.
Upcoming changes to genpath will fail if the rule persists.
Nothing uses path.py (anymore?), so get rid it.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed from .gitignore too ]

10 years agotools/mkrpm: allow custom rpm package name
Olaf Hering [Mon, 22 Sep 2014 12:59:53 +0000 (14:59 +0200)]
tools/mkrpm: allow custom rpm package name

Even if xen is configured and compiled with different --prefix= so that
it operates entirely below $prefix, the resulting package from 'make
rpmball' is always called "xen.rpm".

Use an environment name to give a different name.
This can be used like this:

suffix=-bugN
prefix=/opx/xen/staging${suffix}
./configure --prefix=${prefix}
make rpmball PKG_SUFFIX=${suffix}

The result will be "xen-bugN.rpm" instead of "xen.rpm". The benefit is that
many xen${suffix}.rpm packages can be installed at the same time.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoinstall.sh: Preserve permissions from make install
Olaf Hering [Mon, 22 Sep 2014 12:59:52 +0000 (14:59 +0200)]
install.sh: Preserve permissions from make install

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/xenpaging: create dumpdir with mode 0700
Olaf Hering [Mon, 22 Sep 2014 12:59:51 +0000 (14:59 +0200)]
tools/xenpaging: create dumpdir with mode 0700

The swapfile contain sensitive guest info.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agostubdom: fix lwip compile
Olaf Hering [Mon, 22 Sep 2014 12:59:50 +0000 (14:59 +0200)]
stubdom: fix lwip compile

stubdom/lwip-x86_64/src/core/dhcp.c: In function 'dhcp_create_request':
stubdom/lwip-x86_64/src/core/dhcp.c:1359:71: error: array subscript is above array bounds [-Werror=array-bounds]
     dhcp->msg_out->chaddr[i] = (i < netif->hwaddr_len) ? netif->hwaddr[i] : 0/* pad byte*/;

gcc can not know if hwaddr_len exceeds the hwaddr array size,
so force an upper limit to assist gcc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agoxen: arm: Enable physical address space compression (PDX) on arm
Ian Campbell [Wed, 17 Sep 2014 21:21:03 +0000 (22:21 +0100)]
xen: arm: Enable physical address space compression (PDX) on arm

This allows us to support sparse physical address maps which we previously
could not because the frametable would end up taking up an enormous fraction
of RAM.

On a fast model which has RAM at 0x80000000-0x100000000 and
0x880000000-0x900000000 this reduces the size of the frametable from
478M to 84M.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: add helpers for PDX mask initialisation calculations
Ian Campbell [Tue, 16 Sep 2014 20:01:41 +0000 (21:01 +0100)]
xen: add helpers for PDX mask initialisation calculations

I wanted to make fill_mask a public function so I could use it on ARM, but it
was actually easier to think of a (semi) reasonable public name for the users
of it, so that is what I have done.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: refactor physical address space compression support into common code
Ian Campbell [Wed, 17 Sep 2014 21:21:01 +0000 (22:21 +0100)]
xen: refactor physical address space compression support into common code

The "pdx compression" functionality will be useful on ARM as well.

Move the code to common code+header and introduce HAS_PDX to control when it is
built. L2_PAGETABLE_SHIFT is x86 specific, so introduce PDX_GROUP_SHIFT to
abstract it out.

ARM has no need for superpage compression (yet?) and lacks SUPERPAGE_SHIFT so
those functions (spage_to_mfn et al) are not moved.

No affect on x86 and no change for ARM (yet).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen: arm: support for up to 48-bit IPA addressing on arm64
Ian Campbell [Thu, 18 Sep 2014 00:09:55 +0000 (01:09 +0100)]
xen: arm: support for up to 48-bit IPA addressing on arm64

Currently we support only 40-bits. This is insufficient on systems where
peripherals which need to be 1:1 mapped to dom0 are above the 40-bit limit.

Unfortunately the hardware requirements are such that this means that the
number of levels in the P2M is not static and must vary with the number of
implemented physical address bits. This is described in ARM DDI 0487A.b Table
D4-5. In short there is no single p2m configuration which supports everything
from 40- to 48- bits.

For example a system which supports up to 40-bit addressing will only support 3
level p2m (maximum SL0 is 1 == 3 levels), requiring a concatenated page table
root consisting of two pages to make the full 40-bits of addressing.

A maximum of 16 pages can be concatenated meaning that a 3 level p2m can only
support up to 43-bit addresses. Therefore support for 48-bit addressing
requires SL0==2 (4 levels of paging).

After the previous patches our various p2m lookup and manipulation functions
already support starting at arbitrary level and with arbitrary root
concatenation. All that remains is to determine the correct settings from
ID_AA64MMFR0_EL1.PARange for which we use a lookup table.

As well as supporting 44 and 48 bit addressing we can also reduce the order of
the first level for systems which support only 32 or 36 physical address bits,
saving a page.

Systems with 42-bits are an interesting case, since they only support 3 levels
of paging, implying that 8 pages are required at the root level. So far I am
not aware of any systems with peripheral located so high up (the only 42-bit
system I've seen has nothing above 40-bits), so such systems remain configured
for 40-bit IPA with a pair of pages at the root of the p2m.

Switching to symbolic names for the VTCR_EL2 bits as we go improves the clarity
of the result.

Parts of this are derived from "xen/arm: Add 4-level page table for
stage 2 translation" by Vijaya Kumar K.

arm32 remains with the static 3-level, 2 page root configuration.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: support for up to 48-bit physical addressing on arm64
Ian Campbell [Thu, 18 Sep 2014 00:09:54 +0000 (01:09 +0100)]
xen: arm: support for up to 48-bit physical addressing on arm64

This only affects Xen's own stage one paging.

- Use symbolic names for TCR bits for clarity.
- Update PADDR_BITS
- Base field of LPAE PT structs is now 36 bits (and therefore
  unsigned long long for arm32 compatibility)
- TCR_EL2.PS is set from ID_AA64MMFR0_EL1.PASize.
- Provide decode of ID_AA64MMFR0_EL1 in CPU info

Parts of this are derived from "xen/arm: Add 4-level page table for
stage 2 translation" by Vijaya Kumar K.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle variable p2m levels in apply_p2m_changes
Ian Campbell [Thu, 18 Sep 2014 00:09:53 +0000 (01:09 +0100)]
xen: arm: handle variable p2m levels in apply_p2m_changes

As with previous changes this involves conversion from a linear series of
lookups into a loop over the levels.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle variable p2m levels in p2m_lookup
Ian Campbell [Thu, 18 Sep 2014 00:09:52 +0000 (01:09 +0100)]
xen: arm: handle variable p2m levels in p2m_lookup

This paves the way for boot-time selection of the number of levels to
use in the p2m, which is required to support both 40-bit and 48-bit
systems. For now the starting level remains a compile time constant.

Implemented by turning the linear sequence of lookups into a loop.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Defer setting of VTCR_EL2 until after CPUs are up
Ian Campbell [Thu, 18 Sep 2014 00:09:51 +0000 (01:09 +0100)]
xen: arm: Defer setting of VTCR_EL2 until after CPUs are up

Currently we retain the hardcoded values but soon we will want to calculate the
correct values based upon the CPU properties common to all processors, which
are only available once they are all up.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: move setup_virt_paging to p2m.[ch] from mm.[ch]
Ian Campbell [Thu, 18 Sep 2014 00:09:50 +0000 (01:09 +0100)]
xen: arm: move setup_virt_paging to p2m.[ch] from mm.[ch]

This file is where most of the P2M logic lives and this function will
eventually need to poke at some internals, so move it.

This is pure code motion.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle concatenated root tables in dump_pt_walk
Ian Campbell [Thu, 18 Sep 2014 00:09:49 +0000 (01:09 +0100)]
xen: arm: handle concatenated root tables in dump_pt_walk

ARM allows for the concatenation of pages at the root of a p2m (but not a
regular page table) in order to support a larger IPA space than the number of
levels in the P2M would normally support. We use this to support 40-bit guest
addresses.

Previously we were unable to dump IPAs which were outside the first page of the
root. To fix this we adjust dump_pt_walk to take the machine address of the
page table root instead of expecting the caller to have mapped it. This allows
the walker code to select the correct page to map.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Implement variable levels in dump_pt_walk
Ian Campbell [Thu, 18 Sep 2014 00:09:48 +0000 (01:09 +0100)]
xen: arm: Implement variable levels in dump_pt_walk

This allows us to correctly dump 64-bit hypervisor addresses, which use a 4
level table.

It also paves the way for boot-time selection of the number of levels to use in
the p2m, which is required to support both 40-bit and 48-bit systems.

To support multiple levels it is convenient to recast the page table walk as a
loop over the levels instead of the current open coding.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: rename p2m->first_level to p2m->root.
Ian Campbell [Thu, 18 Sep 2014 00:09:47 +0000 (01:09 +0100)]
xen: arm: rename p2m->first_level to p2m->root.

This was previously part of Vijaya's "xen/arm: Add 4-level page table
for stage 2 translation" but is split out here to make that patch
easier to read.

I went with ->root rather than ->root_level as the original did.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Cc: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
10 years agotools: libxl: read nictype from xenstore
Wen Congyang [Mon, 22 Sep 2014 05:59:16 +0000 (13:59 +0800)]
tools: libxl: read nictype from xenstore

We need to use nictype to get default vifname.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxl: pass correct file to qemu if we use blktap2
Wen Congyang [Mon, 22 Sep 2014 05:59:15 +0000 (13:59 +0800)]
tools: libxl: pass correct file to qemu if we use blktap2

If we use blktap2, the correct file should be blktap device
not the pdev_path.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxc: restore: csum the correct page
Wen Congyang [Mon, 22 Sep 2014 05:59:14 +0000 (13:59 +0800)]
tools: libxc: restore: csum the correct page

In verify mode, we map the guest memory, and the guest page is
region_base + i * PAGE_SIZE. So we should csum page (region_base
+ i * PAGE_SIZE), not (region_base + (i+curbatch) * PAGE_SIZE)

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: libxc: restore: copy the correct page to memory
Hong Tao [Mon, 22 Sep 2014 05:59:13 +0000 (13:59 +0800)]
tools: libxc: restore: copy the correct page to memory

apply_batch() only handles MAX_BATCH_SIZE pages at one time. If
there is some bogus/unmapped/allocate-only/broken page, we will
skip it. So when we call apply_batch() again, the first page's
index is curbatch - invalid_pages. invalid_pages stores the number
of bogus/unmapped/allocate-only/broken pages we have found.

In many cases, invalid_pages is 0, so we don't catch this error.

Signed-off-by: Hong Tao <bobby.hong@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoUpdate libfdt to v1.4.0
Roy Franz [Thu, 18 Sep 2014 22:50:05 +0000 (15:50 -0700)]
Update libfdt to v1.4.0

Update libfdt to v1.4.0 of libfdt taken from git://git.jdl.com/software/dtc.git
Xen changes to libfdt_env.h carried over from existing libfdt (v1.3.0)
This update provides the fdt_create_empty_tree() function used by the ARM
EFI boot code.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoadd arm64 cache flushing code from linux v3.16
Roy Franz [Thu, 18 Sep 2014 22:50:04 +0000 (15:50 -0700)]
add arm64 cache flushing code from linux v3.16

__flush_dcache_all added from arch/arm64/mm/cache.S, with helper macros from
arch/arm64/include/asm/assembler.h, from v3.16.  The cache flushing is required
when transitioning from EFI code that runs with cache enable to Xen startup
code which expects the cache to be disabled.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed indent on ENTRY() and dropped the entry point label which
         duplicates the one from the macro. ]

10 years agoVT-d: suppress UR signaling for further desktop chipsets
Jan Beulich [Thu, 18 Sep 2014 13:03:22 +0000 (15:03 +0200)]
VT-d: suppress UR signaling for further desktop chipsets

This extends commit d6cb14b34f ("VT-d: suppress UR signaling for
desktop chipsets") as per the finally obtained list of affected
chipsets from Intel.

Also pad the IDs we had listed there before to full 4 hex digits.

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agox86: handle resumed instruction based on previous mem_event reply
Razvan Cojocaru [Thu, 18 Sep 2014 12:57:45 +0000 (14:57 +0200)]
x86: handle resumed instruction based on previous mem_event reply

In a scenario where a page fault that triggered a mem_event occured,
p2m_mem_access_check() will now be able to either 1) emulate the
current instruction, or 2) emulate it, but don't allow it to perform
any writes.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>