]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agox86/emulate: support for emulating software event injection
Andrew Cooper [Mon, 29 Sep 2014 08:23:01 +0000 (10:23 +0200)]
x86/emulate: support for emulating software event injection

AMD SVM requires all software events to have their injection emulated if
hardware lacks NextRIP support.  In addition, `icebp` (opcode 0xf1) injection
requires emulation in all cases, even with hardware NextRIP support.

Emulating full control transfers is overkill for our needs.  All that matters
is that guest userspace can't bypass the descriptor DPL check.  Any guest OS
which would incur other faults as part of injection is going to end up with a
double fault instead, and won't be in a position to care that the faulting eip
is wrong.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
10 years agox86/hvm: don't discard the SW/HW event distinction from the emulator
Andrew Cooper [Mon, 29 Sep 2014 08:22:23 +0000 (10:22 +0200)]
x86/hvm: don't discard the SW/HW event distinction from the emulator

Injecting emulator software events as hardware exceptions results in a bypass
of DPL checks.  As the emulator doesn't perform DPL checks itself, guest
userspace is capable of bypassing DPL checks and injecting arbitrary events.

Propagating software event information from the emulator allows VMX to now
properly inject software events, including DPL and presence checks, as well
correct fault/trap frames.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrei LUTAS <vlutas@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
10 years agox86/emulate: provide further information about software events
Andrew Cooper [Mon, 29 Sep 2014 08:20:47 +0000 (10:20 +0200)]
x86/emulate: provide further information about software events

This is needed by subsequent patches to support correctly injecting sofware
events for HVM Guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/hvm: remove stray lock release from hvm_ioreq_server_init()
Vitaly Kuznetsov [Fri, 26 Sep 2014 15:20:01 +0000 (17:20 +0200)]
x86/hvm: remove stray lock release from hvm_ioreq_server_init()

If HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or HVM_PARAM_BUFIOREQ_EVTCHN
parameters are read when guest domain is dying it leads to the following
ASSERT:

(XEN) Assertion '_raw_spin_is_locked(lock)' failed at ...workspace/KERNEL/xen/xen/include/asm/spinlock.h:18
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
...
(XEN) Xen call trace:
(XEN)    [<ffff82d08012b07f>] _spin_unlock+0x27/0x30
(XEN)    [<ffff82d0801b6103>] hvm_create_ioreq_server+0x3df/0x49a
(XEN)    [<ffff82d0801bcceb>] do_hvm_op+0x12bf/0x27a0
(XEN)    [<ffff82d08022b9bb>] syscall_enter+0xeb/0x145

The root cause of this issue is the fact that ioreq_server.lock is being
released twice - first in hvm_ioreq_server_init() and then in hvm_create_ioreq_server().
Drop the lock release from hvm_ioreq_server_init() as we don't take it here, do minor
label cleanup.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agomem_access: abstract architecture specific sanity check
Tamas K Lengyel [Fri, 26 Sep 2014 14:31:15 +0000 (16:31 +0200)]
mem_access: abstract architecture specific sanity check

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agomem_event: abstract architecture specific sanity checks
Tamas K Lengyel [Fri, 26 Sep 2014 14:30:37 +0000 (16:30 +0200)]
mem_event: abstract architecture specific sanity checks

Move architecture specific sanity checks into its own function
which is called when enabling mem_event.

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agomem_event: relax error condition on debug builds
Tamas K Lengyel [Fri, 26 Sep 2014 14:29:34 +0000 (16:29 +0200)]
mem_event: relax error condition on debug builds

A faulty tool stack can brick a debug hypervisor. Unpleasant while dev/test.

Suggested-by: Andres Lagar Cavilla <andres@lagarcavilla.org>
Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agomem_event: clean out superfluous white-spaces
Tamas K Lengyel [Fri, 26 Sep 2014 14:28:57 +0000 (16:28 +0200)]
mem_event: clean out superfluous white-spaces

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agorelocate mem_event_op domctl and access_op memop into common
Tamas K Lengyel [Fri, 26 Sep 2014 14:27:57 +0000 (16:27 +0200)]
relocate mem_event_op domctl and access_op memop into common

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agorelocate set_access_required domctl into common
Tamas K Lengyel [Fri, 26 Sep 2014 14:26:58 +0000 (16:26 +0200)]
relocate set_access_required domctl into common

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agorelocate p2m_mem_access_resume to mem_access common
Tamas K Lengyel [Fri, 26 Sep 2014 14:25:31 +0000 (16:25 +0200)]
relocate p2m_mem_access_resume to mem_access common

Relocate p2m_mem_access_resume to common and abstract the new
p2m_mem_event_emulate_check into the p2m layer to.

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agorelocate p2m_access_t into common and swap the order
Tamas K Lengyel [Fri, 26 Sep 2014 14:24:02 +0000 (16:24 +0200)]
relocate p2m_access_t into common and swap the order

We swap the order of the enum of types n ... rwx, as to have rwx at 0, which is
the default setting when mem_access is not in use. This has performance benefits for
non-memaccess paths, as now comparison is to 0 when checking if memaccess is in use,
which is often faster.

We fix one location in nested_hap where the order of the enum made a difference.

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Fri, 26 Sep 2014 14:23:04 +0000 (16:23 +0200)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

10 years agoMAINTAINERS: update maintained files of Remus
Yang Hongyang [Mon, 7 Jul 2014 02:10:20 +0000 (10:10 +0800)]
MAINTAINERS: update maintained files of Remus

Add Remus specific hotplug scripts and libxl files
to the list of maintained files.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
Yang Hongyang [Wed, 16 Jul 2014 09:07:43 +0000 (17:07 +0800)]
libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl

Add LIBXL_HAVE_REMUS to indicate Remus support in libxl

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agoxl/remus: add a cmdline switch to disable disk replication
Yang Hongyang [Wed, 16 Jul 2014 09:27:43 +0000 (17:27 +0800)]
xl/remus: add a cmdline switch to disable disk replication

Disk replication is enabled by default. This patch adds a cmdline
switch to 'xl remus' command to explicitly disable disk replication.
A new boolean field 'diskbuf' is added to the libxl_domain_remus_info
structure to represent this configuration option inside libxl.

Note: Disabling disk replication requires enabling unsafe mode.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agoxl/remus: cmdline switches and config vars to control network buffering
Yang Hongyang [Wed, 11 Jun 2014 03:29:44 +0000 (11:29 +0800)]
xl/remus: cmdline switches and config vars to control network buffering

Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
                  and tear down the guest's interface.

Add cmdline switches to 'xl remus' command to enable or disable
network buffering and a domain-specific hotplug script to setup
network buffering.

Add a new config var 'remus.default.netbufscript' to xl.conf, that
allows the user to override the default global script used to
setup network buffering.

Note: Network buffering is enabled by default. Disabling network
buffering requires enabling unsafe mode.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agoxl/remus: cmdline switch to explicitly enable unsafe configurations
Yang Hongyang [Thu, 24 Jul 2014 08:47:24 +0000 (16:47 +0800)]
xl/remus: cmdline switch to explicitly enable unsafe configurations

By default, network buffering and disk replication are enabled;
checkpoints are replicated to another standby VM.

This patch allows the user to disable any of these features by
explicitly specifying a 'run in unsafe mode' switch when invoking
the 'xl remus' command.  While running Remus in an unsafe mode
makes little sense under normal circumstances, it is useful to be
able to disable one or more features mentioned above for
testing/debugging/profiling purposes.

Unless this option is enabled, it will not be possible to
replicate memory checkpoints to /dev/null (blackhole replication),
disable network buffering or disk replication.

As a starter, the use of blackhole replication now requires that
the unsafe mode be enabled. Subsequent patches will add support
for disabling network buffering and disk replication in a similar
manner.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agoxl/remus: change bool to defbool
Yang Hongyang [Fri, 29 Aug 2014 02:16:36 +0000 (10:16 +0800)]
xl/remus: change bool to defbool

Use defbool instead of bool for boolean flags in remus_info struct.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl/remus: setup and control disk replication for DRBD backends
Yang Hongyang [Fri, 18 Jul 2014 09:14:22 +0000 (17:14 +0800)]
libxl/remus: setup and control disk replication for DRBD backends

This patch adds the machinery required for protecting a guest's
disk state, when the guest disk uses a DRBD disk backend.
This patch comprises of two parts:

1. Hotplug scripts: The block-drbd-probe script is responsible for
  performing sanity checks on the state of the DRBD disk before the
  checkpointing process begins. This script should be invoked by
  libxl for each of the guest's disk devices, when starting Remus.

2. Remus drbd disk device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) setup() is called for each disk attached to the guest.
      During setup():
      i) The hotplug script is called to perform the sanity check.

      ii) Libxl obtains a handle to the DRBD device (/dev/drbd*) and
          and subsequently controls disk checkpoint replication using
          this handle in the checkpoint callbacks.

   c) The preresume() checkpoint callback is executed asynchronously
      using libxl__ev_child_fork(), as it may potentially block for more
      than few seconds in case of backup failure.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl/remus: setup and control network output buffering
Yang Hongyang [Fri, 18 Jul 2014 07:08:36 +0000 (15:08 +0800)]
libxl/remus: setup and control network output buffering

This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:

1. Hotplug scripts: The remus-netbuf-setup script is responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering.  This script should be invoked by libxl for
  each of the guest's network interfaces, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the REMUS_IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

2. Remus network device: Implements the interfaces required by the
   remus abstract device layer. A note about the implementation:

   a) init_subkind_nic() & cleanup_subkind_nic() are called once per Remus
      invocation. They establish and free netlink related state respectively.

   b) setup() and teardown are called for each vif attached to the
      guest.
      During setup():
      i) The hotplug script is called to setup a network buffer on a
         given vif. The script chooses an available IFB device from
         the system, redirects vif egress traffic to the IFB device
         and sets up the plug qdisc (output buffer) on the IFB device.
         The name of the IFB device is communicated via xenstore to
         libxl.

      ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
          and subsequently controls output buffering using this handle
          in the checkpoint callbacks.

      During teardown(), the hotplug scripts are called again to remove
      the vif->ifb traffic redirection, release the ifb and the plug
      qdisc associated with it.

   c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
      are implemented as synchronous ops as the netlink calls associated
      with the qdisc subsystem are very fast.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl/remus: introduce an abstract Remus device layer
Yang Hongyang [Fri, 18 Jul 2014 07:02:34 +0000 (15:02 +0800)]
libxl/remus: introduce an abstract Remus device layer

Introduce an abstract device layer that allows the Remus
logic in libxl to control a guest's devices in a device-agnostic
manner. The device layer also exposes a set of internal interfaces
that a device type must implement, if it wishes to support Remus.

The following API are exposed to libxl:

One-time configuration operations:
  *libxl__remus_devices_setup
    > Enable output buffering for NICs, setup disk replication, etc.
  *libxl__remus_devices_teardown
    > Disable network output buffering and disk replication;
      teardown any associated external setups like qdiscs for NICs.

Operations executed every checkpoint (in order of invocation):
  *libxl__remus_devices_postsuspend
  *libxl__remus_devices_preresume
  *libxl__remus_devices_commit

Each device type needs to implement the interfaces specified in
the libxl__remus_device_instance_ops if it wishes to support Remus.

The high-level control flow through the Remus device layer is shown below:

xl remus
  |->  libxl_domain_remus_start
    |-> libxl__remus_devices_setup
      |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
        ...
        |-> On backup failure/network error/other errors
            libxl__remus_devices_teardown

callback processing
* Only call the per-device libxl__multidev_one_callback
  when the iteration has succeded or failed.
* The final callback (called by multidev) is a trivial
  shim to shuffle the pointers and notify our own caller.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agoautoconf: add libnl3 dependency for Remus network buffering support
Yang Hongyang [Fri, 27 Jun 2014 01:43:51 +0000 (09:43 +0800)]
autoconf: add libnl3 dependency for Remus network buffering support

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
It also provides the ability to configure tools without libnl3 support
i.e., without network buffering support.

When there is no network buffering support, libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.

NOTE: This patch changes tools/configure.ac, please rerun
      autogen.sh while applying the patch.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: Extend libxl__ao_device with a libxl__ev_child member
Yang Hongyang [Fri, 18 Jul 2014 08:40:54 +0000 (16:40 +0800)]
libxl: Extend libxl__ao_device with a libxl__ev_child member

This can be used to fork children to allow the asynchronous execution
of system calls which only come in a synchronous variant. This will
be useful for Remus, in the following patches.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: introduce libxl__multidev_prepare_with_aodev
Yang Hongyang [Fri, 18 Jul 2014 08:26:50 +0000 (16:26 +0800)]
libxl: introduce libxl__multidev_prepare_with_aodev

libxl__multidev_prepare_with_aodev is similar to libxl__multidev_prepare,
but takes a libxl__ao_device as an extra argument.
libxl__multidev_prepare is now a wrapper around
libxl__multidev_prepare_with_aodev.

This new internal API will be used by the Remus device abstract layer
for handling various Remus devices.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: multidev: Expose libxl__multidev_one_callback
Ian Jackson [Thu, 25 Sep 2014 15:04:01 +0000 (16:04 +0100)]
libxl: multidev: Expose libxl__multidev_one_callback

Now a caller who wants to be able to do other work when the aodev
completes can put their own callback into the aodev, and make the
multidev machinery aware that the particular aodev is complete (from
the point of view that multidev should have) whenever it likes.

No functional change in this patch.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agolibxl: multidev: Clarify comments about which callbacks are meant
Ian Jackson [Thu, 25 Sep 2014 14:59:06 +0000 (15:59 +0100)]
libxl: multidev: Clarify comments about which callbacks are meant

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agorelocate struct npfec definition into common
Tamas K Lengyel [Fri, 26 Sep 2014 13:51:57 +0000 (15:51 +0200)]
relocate struct npfec definition into common

Nested page fault exception code definitions can be reused on ARM as well.

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agorelocate mem_access and mem_event into common
Tamas K Lengyel [Fri, 26 Sep 2014 13:49:26 +0000 (15:49 +0200)]
relocate mem_access and mem_event into common

In preparation to add support for ARM LPAE mem_event, relocate mem_access,
mem_event and auxiliary functions into common Xen code.
This patch makes no functional changes to the X86 side, for ARM mem_event
and mem_access functions are just defined as placeholder stubs, and are
actually enabled later in the series.

Edits that are only header path adjustments:
   xen/arch/x86/domctl.c
   xen/arch/x86/mm/hap/nested_ept.c
   xen/arch/x86/mm/hap/nested_hap.c
   xen/arch/x86/mm/mem_paging.c
   xen/arch/x86/mm/mem_sharing.c
   xen/arch/x86/mm/p2m-pod.c
   xen/arch/x86/mm/p2m-pt.c
   xen/arch/x86/mm/p2m.c
   xen/arch/x86/x86_64/compat/mm.c
   xen/arch/x86/x86_64/mm.c

Makefile adjustments for new/removed code:
   xen/common/Makefile
   xen/arch/x86/mm/Makefile

Relocated prepare_ring_for_helper and destroy_ring_for_helper functions:
   xen/include/xen/mm.h
   xen/common/memory.c
   xen/include/asm-x86/hvm/hvm.h
   xen/arch/x86/hvm/hvm.c

Code movement of mem_event and mem_access:
    xen/arch/x86/mm/mem_access.c -> xen/common/mem_access.c
    xen/arch/x86/mm/mem_event.c -> xen/common/mem_event.c
    xen/include/asm-x86/mem_access.h -> xen/include/xen/mem_access.h
    xen/include/asm-x86/mem_event.h -> xen/include/xen/mem_event.h

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoEFI: add arch specific function to control use of config file
Roy Franz [Fri, 26 Sep 2014 10:00:55 +0000 (12:00 +0200)]
EFI: add arch specific function to control use of config file

The x86 EFI build of Xen always uses a configuration file to load modules, but
the ARM version can either use a config file to specify the modules, or be
loaded by GRUB in which case GRUB loads the modules and adds them to the DTB
that is passed to Xen.  Add the efi_arch_use_config_file() to indicate if a
configuration file is required.  For x86, this will always be true.  ARM will
examine the DTB passed via EFI configuration table (if any), and if it contains
module information will use that that not use the configuration file at all.
Add Emacs footer to efi-boot.h and boot.c

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add several misc. arch functions for boot code
Roy Franz [Fri, 26 Sep 2014 10:00:27 +0000 (12:00 +0200)]
EFI: add several misc. arch functions for boot code

Add efi_arch_blexit() for arch specific cleanup on error exit,
efi_arch_load_addr_check() to do the arch specific verifications
of where the UEFI firmware loaded Xen, and efi_arch_cpu() for
probing CPU features.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add arch specific module handling to read_file()
Roy Franz [Fri, 26 Sep 2014 09:59:56 +0000 (11:59 +0200)]
EFI: add arch specific module handling to read_file()

Each architecture tracks modules differently internally, so add
efi_arch_handle_module() routine to enable the common code to invoke the proper
handling of modules as they are loaded.  Module handling for ucode,ramdisk, and
xsm is changed to not process remainder of string after filename as options,
since these modules don't take options.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agouse relative path for true(1)
Roger Pau Monné [Fri, 26 Sep 2014 09:59:18 +0000 (11:59 +0200)]
use relative path for true(1)

On FreeBSD true(1) is at /usr/bin/true.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
10 years agoVT-d: don't needlessly suppress page table sharing
Jan Beulich [Fri, 26 Sep 2014 09:56:45 +0000 (11:56 +0200)]
VT-d: don't needlessly suppress page table sharing

Despite the mid term goal being to do away with the sharing there's no
point in suppressing it in cases where it can be used now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Thu, 25 Sep 2014 12:43:41 +0000 (13:43 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agoEFI: arch specific memory setup
Roy Franz [Thu, 25 Sep 2014 12:30:16 +0000 (14:30 +0200)]
EFI: arch specific memory setup

This patch adds efi_arch_memory() to allow each architecture a hook
to use for do memory setup.  x86 uses this for trampoline memory setup
and some pagetable setup.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: create arch functions for console and video init
Roy Franz [Thu, 25 Sep 2014 12:29:51 +0000 (14:29 +0200)]
EFI: create arch functions for console and video init

Add arch functions for text console and graphics initialization, and move VGA
specific code to x86 architecture file.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: move x86 specific disk probing code
Roy Franz [Thu, 25 Sep 2014 12:29:29 +0000 (14:29 +0200)]
EFI: move x86 specific disk probing code

Move x86 specific disk (EDD) probing to arch specific file.  This code is x86
only and relates to legacy BIOS handling of disk drives.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add efi_arch_handle_cmdline() for processing commandline
Roy Franz [Thu, 25 Sep 2014 12:28:48 +0000 (14:28 +0200)]
EFI: add efi_arch_handle_cmdline() for processing commandline

Add arch function for processing the Xen commandline and
updating internal structures.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add efi_arch_cfg_file_early/late() to handle arch specific cfg file fields
Roy Franz [Thu, 25 Sep 2014 12:28:27 +0000 (14:28 +0200)]
EFI: add efi_arch_cfg_file_early/late() to handle arch specific cfg file fields

Different architectures have some different configuration file
fields that need to be handled.  In particular, x86 has ucode
and ARM has device tree files to be loaded.  These arch specific
functions is used to allow each architecture to implement these
features in arch specific code.  Early/late versions are provided,
as ARM needs to process the DTB entry first, and x86 wants to process
the ucode entry last as it is the smallest allocation.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: add architecture functions for pre/post ExitBootServices
Roy Franz [Thu, 25 Sep 2014 12:27:55 +0000 (14:27 +0200)]
EFI: add architecture functions for pre/post ExitBootServices

The UEFI ExitBootServices function is invoked to transition the
system to the 'runtime' mode of operation, and is done right before
transitioning from the EFI loader code into Xen proper. x86 does some
arch specific memory management (trampoline) before exit boot services,
and the code that transitions from the EFI application state to Xen
is architecture specific.  This patch adds two functions, one pre
and one post ExitBootServices to allow each architecture to
to handle these cases in a customized manner.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: create arch functions to allocate memory for and process memory map
Roy Franz [Thu, 25 Sep 2014 12:26:34 +0000 (14:26 +0200)]
EFI: create arch functions to allocate memory for and process memory map

The memory used to store the EFI memory map is allocated in an architecture
specific way, and the processing of the memory map itself uses x86 specific
data structures. This patch adds architecture specific funtions so each
architecture can provide its own implementation.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: move x86 specific functions/variables to arch header
Roy Franz [Thu, 25 Sep 2014 12:24:52 +0000 (14:24 +0200)]
EFI: move x86 specific functions/variables to arch header

Move the global variables and functions that can be moved as-is
from the common boot.c file to the x86 implementation header file.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoEFI: move x86 boot/runtime code to common/efi
Roy Franz [Thu, 25 Sep 2014 12:22:12 +0000 (14:22 +0200)]
EFI: move x86 boot/runtime code to common/efi

This moves the EFI boot and runtime services code to the common/efi directory.
This code is symbolicly linked back into the arch/x86/efi directory where it is
built if a build-time check for PE/COFF support in the toolchain passes.  In
the PE/COFF supporting case, both the EFI executable and the normal Xen image
(with stubbed EFI functions) are built.  We can't use the normal common build
infrastructure since we are building two versions at the same time, with
different EFI related code in each.  No code changes, just file movement and
make updates.  The files are symbolicly linked at build time back toe the
original arch/x86/efi directory.  This is in preparation for adding ARM EFI
support where much of these files can be shared.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/vlapic: don't silently accept bad vectors
Jan Beulich [Thu, 25 Sep 2014 12:10:01 +0000 (14:10 +0200)]
x86/vlapic: don't silently accept bad vectors

Vectors 0-15 are reserved, and a physical LAPIC - upon sending or
receiving one - would generate an APIC error instead of doing the
requested action. Make our emulation behave similarly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/vlapic: a few type adjustments
Jan Beulich [Thu, 25 Sep 2014 12:09:10 +0000 (14:09 +0200)]
x86/vlapic: a few type adjustments

Constify a couple of pointer parameters, convert a boolean function
return type to bool_t, and clean up a printk() being touched anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/HVM: fix ID handling of x2APIC emulation
Jan Beulich [Thu, 25 Sep 2014 12:08:20 +0000 (14:08 +0200)]
x86/HVM: fix ID handling of x2APIC emulation

- properly change ID when switching into x2APIC mode (instead of
  mimicking necessary behavior in hvm_x2apic_msr_read())
- correctly (meaningfully) set LDR (so far it ended up being 1 on all
  vCPU-s)
- even if we don't support more than 128 vCPU-s in a HVM guest for now,
  we should properly handle IDs as 32-bit values (i.e. not ignore the
  top 24 bits)
- with that, properly do cluster ID and bit mask check in
  vlapic_match_logical_addr()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/HVM: fix miscellaneous aspects of x2APIC emulation
Jan Beulich [Thu, 25 Sep 2014 12:07:27 +0000 (14:07 +0200)]
x86/HVM: fix miscellaneous aspects of x2APIC emulation

- generate #GP on invalid APIC base MSR transitions
- fail reads from the EOI and self-IPI registers (which are write-only)
- handle self-IPI writes and the ICR2 half of ICR writes largely in
  hvm_x2apic_msr_write() and (for self-IPI only) vlapic_apicv_write()
- don't permit MMIO-based access in x2APIC mode
- filter writes to read-only registers in hvm_x2apic_msr_write(),
  allowing conditionals to be dropped from vlapic_reg_write()
- don't ignore upper half of MSR-based write to ESR being non-zero
- don't ignore other writes to reserved bits
- VMX's EXIT_REASON_APIC_WRITE must not result in #GP (this exit being
  trap-like, this exception would get raised on the wrong RIP)
- make hvm_x2apic_msr_read() produce X86EMUL_* return codes just like
  hvm_x2apic_msr_write() does (benign to the only caller)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agolibxl: Fix build dependency for libxl.h.
Anthony PERARD [Wed, 24 Sep 2014 15:30:34 +0000 (16:30 +0100)]
libxl: Fix build dependency for libxl.h.

libxl.h includes _libxl_list.h, but the Makefile does not reflect this
dependency. This can lead to build error due to a missing _libxl_list.h
file.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: correct VTCR setting on arm32.
Ian Campbell [Wed, 24 Sep 2014 14:13:28 +0000 (15:13 +0100)]
xen: arm: correct VTCR setting on arm32.

1c92a2aaf8c6 "xen: arm: support for up to 48-bit IPA addressing on
arm64" inadvertently changes the VTCR setting for 32-bit from
0x80003558 to 0x80003518, changing the SL0 setting from 0x1 (p2m
starts at L1) to 0x0 (p2m starts at L2).

For some (inexplicable) reason this doesn't cause any issue on
Arndale but it does on the OdroidXU.

Reported-by: Suriyan Ramasami <suriyan.r@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Tested-by: Suriyan Ramasami <suriyan.r@gmail.com>
10 years agotools/libxc: Avoid cacheflush toolstack hypercalls on x86
Andrew Cooper [Wed, 24 Sep 2014 16:28:15 +0000 (17:28 +0100)]
tools/libxc: Avoid cacheflush toolstack hypercalls on x86

XEN_DOMCTL_cacheflush hypercalls are (and will always be) -ENOSYS on x86, but
xc_domain_cacheflush() is called often during domain build and migrate for
correct behaviour on ARM.

Stub xc_domain_cacheflush() out on x86 to remove its pressure on the global
domctl lock, and the hypercall overhead (which applies further pressure to the
already heavily-contended TLB flush lock).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86: use constant as multiboot protocol identifier
Daniel Kiper [Thu, 25 Sep 2014 10:08:03 +0000 (12:08 +0200)]
x86: use constant as multiboot protocol identifier

... instead of plain number.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
10 years agox86: define e820 entries counter as unsigned int
Daniel Kiper [Thu, 25 Sep 2014 10:07:22 +0000 (12:07 +0200)]
x86: define e820 entries counter as unsigned int

e820 entries counter is inherently an unsigned quantity
so define it as unsigned int.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/hvm: Forced Emulation Prefix for debug builds of Xen
Andrew Cooper [Thu, 25 Sep 2014 10:06:24 +0000 (12:06 +0200)]
x86/hvm: Forced Emulation Prefix for debug builds of Xen

Analysis of XSAs 105 and 106 show that is possible to force a race condition
which causes any arbitrary instruction to be emulated.

To aid testing, explicitly introduce the Forced Emulation Prefix for debug
builds alone.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agomisc/coverity: Model __builtin_unreachable()
Andrew Cooper [Thu, 25 Sep 2014 10:00:07 +0000 (12:00 +0200)]
misc/coverity: Model __builtin_unreachable()

This resolves 23 issues Coverity had identified by following the false path of
an ASSERT().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/LAPIC: drop support for non-integrated APIC
Jan Beulich [Thu, 25 Sep 2014 09:56:22 +0000 (11:56 +0200)]
x86/LAPIC: drop support for non-integrated APIC

We never really supported such, even in the 32-bit days.

As a minor extra thing move the APIC_SELF_IPI definition out of the
middle of Divider Configuration Register ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: make dump_pageframe_info() slightly more verbose for dying domains
Jan Beulich [Thu, 25 Sep 2014 09:55:49 +0000 (11:55 +0200)]
x86: make dump_pageframe_info() slightly more verbose for dying domains

Allowing more than just 10 pages to be printed in this case gives a
better chance to fully understand eventual page reference leaks: Report
up to 16 "normal" (writable or untyped) pages, and an unlimited number
of special type (page or descriptor table) ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation
Jan Beulich [Thu, 25 Sep 2014 09:53:32 +0000 (11:53 +0200)]
x86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation

SYSCALL:
- make sure SS selector has RPL 0
- only use 32 bits of RIP to fill RCX when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless initializers and casts
- drop redundant MSR_STAR read (as suggested by Andrew Cooper)

SYSENTER/SYSEXIT:
- #GP condition doesn't depend on guest mode
- only use 32 bits for setting RIP/RSP when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless (and inconsistently used) casts

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/p2m: typo fix for spelling ambiguous
Tamas K Lengyel [Wed, 24 Sep 2014 09:19:57 +0000 (11:19 +0200)]
x86/p2m: typo fix for spelling ambiguous

Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 24 Sep 2014 09:15:19 +0000 (10:15 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agox86/EFI: fix freeing of uninitialized pointer
Roy Franz [Wed, 24 Sep 2014 09:09:11 +0000 (11:09 +0200)]
x86/EFI: fix freeing of uninitialized pointer

The only valid response from the LocateHandle() call is EFI_BUFFER_TOO_SMALL,
so exit if we get anything else.  We pass a 0 size/NULL pointer buffer, so the
only other returns we will get is an error.  Return right away as there is
nothing to do.  Also return if there is an error allocating the buffer, as the
previous code path also allowed for an undefined pointer to be freed.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Re-structure the change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agoflask/policy: use naming convention xenpolicy-$(XEN_FULLVERSION)
Wei Liu [Mon, 15 Sep 2014 19:29:15 +0000 (20:29 +0100)]
flask/policy: use naming convention xenpolicy-$(XEN_FULLVERSION)

Daniel suggested we use xenpolicy-$(XEN_FULLVERSION) as flask policy
naming convention.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
10 years agoFix QEMU cross-compile build
Stefano Stabellini [Tue, 23 Sep 2014 16:29:29 +0000 (17:29 +0100)]
Fix QEMU cross-compile build

Introduce the per-arch IOEMU_CPU_ARCH variable.
Always pass --configure=IOEMU_CPU_ARCH to QEMU's configure script.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- dropped redundant comments ]

10 years agoMAINTAINERS: Add Wei Liu as toolstack co-maintainer.
Ian Campbell [Mon, 22 Sep 2014 16:10:39 +0000 (17:10 +0100)]
MAINTAINERS: Add Wei Liu as toolstack co-maintainer.

The three existing maintainers are not really able to keep up with
the flow and Wei is one of the top tools contributors (according to
"git shortlog -s -n -p RELEASE-4.4.0..origin/staging tools" and my
own impressions).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agodocs: add PVH specification
Roger Pau Monne [Tue, 23 Sep 2014 16:17:18 +0000 (18:17 +0200)]
docs: add PVH specification

Introduce a document that describes the interfaces used on PVH. This
document has been designed from a guest OS point of view (i.e.: what a guest
needs to do in order to support PVH).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Mukesh Rathor <mukesh.rathor@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
10 years agoxen/arm: remove check for generic timer support for arm64
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:49 +0000 (17:43 +0530)]
xen/arm: remove check for generic timer support for arm64

Information about support for generic support is available in
IDR_PFR1 register in ARMv7. Where as this information is not
available in ARMv8 that supports only aarch64 bit mode.
ARMv8 being always supports generic timer, this check is not
required.

For platforms that support only aarch64 mode, IDR_PFR1 is
not implemented

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Restricted access to IFSR32_EL2 and FPEXC32_EL2
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:48 +0000 (17:43 +0530)]
xen/arm: Restricted access to IFSR32_EL2 and FPEXC32_EL2

IFSR32_EL1 and FPEXC32_EL1 registers are accessible in
aarch64 mode only if aarch32 mode is support in EL1.
So allow access to these registers only for 32-bit domains.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Use REG_RANK_INDEX macro
Vijaya Kumar K [Thu, 18 Sep 2014 12:13:46 +0000 (17:43 +0530)]
xen/arm: Use REG_RANK_INDEX macro

Use REG_RANK_INDEX macro to compute index to access
vgic ipriority[] and itargets[] for a given irq.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Remove a duplicate calculation of be_path
Ian Jackson [Tue, 23 Sep 2014 16:46:21 +0000 (17:46 +0100)]
libxl: Remove a duplicate calculation of be_path

Coverity-ID: 1238177
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agomake: Make "src-tarball" target actually make a source tarball
George Dunlap [Mon, 15 Sep 2014 16:25:04 +0000 (17:25 +0100)]
make: Make "src-tarball" target actually make a source tarball

At the moment, making a release tarball is an annoyingly manual
process that involves running "git archive" into a temporary directory.

Script this process up and make a target, so that the release manager
can simply type "make src-tarball-release" and have everything show up
nice and neat in dist/xen-$version.tar.gz.  "make src-tarball" will
make a version number based on git describe, which will typically have
the most recent tag, number of commits since that tag, and the git
commit id of the current HEAD.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agomake: Add subtree-force-update target
George Dunlap [Mon, 15 Sep 2014 16:25:03 +0000 (17:25 +0100)]
make: Add subtree-force-update target

subtree-force-update will update all subtrees according to the current TAG specified
in Config.mk.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoxen: arm: Add support for the Exynos secure firmware
Suriyan Ramasami [Mon, 22 Sep 2014 18:33:54 +0000 (11:33 -0700)]
xen: arm: Add support for the Exynos secure firmware

The existence of secure firmware is dictated by the presence of
"samsung,secure-firmware" in the DT.

The Arndale board does not have that entry, and uses the address as defined
in "samsung,exynos4210-sysram", offset 0 as the smp init address. This is
possibly true for all SoCs without secure firmware.

For other boards which do have a "secure-firmware" node, use sysram-ns
at offset +0x1c as the smp init address.

The "secure-firmware" MMIO range contains ways to idle the CPU. As this gets
mapped to DOM0 because of its presence in the DT, we blacklist it.

Have tested this on the Odroid XU. I have also tested the other code path
on the Odroid XU by removing "secure-firmware" from its DT. I could see
that the other code path was exercised with correct smp init address
values.

Signed-off-by: Suriyan Ramasami <suriyan.r@gmail.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/viridian: Add partition time reference counter MSR support
Paul Durrant [Tue, 23 Sep 2014 10:40:10 +0000 (11:40 +0100)]
x86/viridian: Add partition time reference counter MSR support

This patch optionally re-instates support for the partition time reference
counter that was previously introduced by commit
e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b and reverted by commit
1cd4fab14ce25859efa4a2af13475e6650a5506c. The previous implementation was
non-optional and flawed.

This implementation uses the tsc of vcpu0, which is preserved across
save/restore as part of the architectural state, and then converts that
to a 100ns tick using the domain's tsc_khz.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Christoph Egger <chegger@amazon.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/viridian: Re-purpose the HVM parameter to be a feature mask
Paul Durrant [Tue, 23 Sep 2014 10:40:09 +0000 (11:40 +0100)]
x86/viridian: Re-purpose the HVM parameter to be a feature mask

The following commits introduced the time reference counter MSR and
TSC/APIC frequency MSRs into the viridian feature set respectively:

e36cd2cdc9674a7a4855d21fb7b3e6e17c4bb33b
84657efd9116f40924aa13c9d5a349e007da716f

The time reference counter MSR feature was then reverted by commit

1cd4fab14ce25859efa4a2af13475e6650a5506c

because a flaw in the implementation meant the counter was reset on
migration.

All of these changes were made without any addtional options being
added to the VM configuration, or any compatibility checks being made
in the domain save/restore code. Hence setting the single boolean
'viridian' option in the VM configuration yields a different set of
features depending on which version of Xen the VM is started on, and the
feature set can change across migration (so new MSRs can magically appear).

This patch grandfathers in the current viridian features set and calls them
the 'base' and 'freq' feature sets. HVM_PARAM_VIRIDIAN is re-purposed as
a feature mask. The hypervisor has only ever allowed it ot be set to 0
or 1, so the presence of the base and freq sets are indicated by setting
bit 0. The freq set can then be turned off by setting bit 1, thus
restoring the pre-Xen-4.4 base set. Newly implemented viridian features
can be optionally enabled in future by setting further bits.

The viridian option in xl.cfg(5) has also been changed to a list so
that the sets can be individually enabled or disabled. For compatibility,
if the option is specified as a boolean, then a true (1) value will enable
the base and freq sets and a false (0) value will not enable any
enlightenments.

This patch also alters the allowed write accesses to HVM_PARAM_VIRIDIAN.
Currently there is nothing to stop the guest writing this value (which,
while harmless to anything else, should not happen) and nothing to
stop a toolstack from setting the value back to zero whilst the guest is
running, causing CPUID leaves to disappear and MSR accesses to start
causing GPFs in the guest. Both of these possibilities are now disallowed:
Once the parameter is set to a non-zero value it may not be modified (only
re-written with the same value), and guests no longer have any write
access.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: David Scott <dave.scott@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxl: introduce rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:15:02 +0000 (18:15 -0400)]
xl: introduce rtds scheduler

Add xl command for rtds scheduler
Note: VCPU's parameter (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: add rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:14:43 +0000 (18:14 -0400)]
libxl: add rtds scheduler

Add libxl functions to set/get domain's parameters for rtds scheduler
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: add rtds scheduler
Meng Xu [Sat, 20 Sep 2014 22:14:18 +0000 (18:14 -0400)]
libxc: add rtds scheduler

Add xc_sched_rtds_* functions to interact with Xen to set/get domain's
parameters for rtds scheduler.
Note: VCPU's information (period, budget) is in microsecond (us).

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- xenctrl.h has moved to tools/libxc/include, adjust patch to match ]

10 years agoxen: add real time scheduler rtds
Meng Xu [Sat, 20 Sep 2014 22:13:48 +0000 (18:13 -0400)]
xen: add real time scheduler rtds

This scheduler follows the Preemptive Global Earliest Deadline First
(EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher
priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
idle or has a lower-priority VCPU running on it.)

Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each period;
A VCPU has its budget replenished at the beginning of each period;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each period.
If a VCPU runs out of budget in a period, it has to wait until next period.

Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.

Queue scheme:
A global runqueue and a global depletedq for each CPU pool.
The runqueue holds all runnable VCPUs with budget and sorted by deadline;
The depletedq holds all VCPUs without budget and unsorted.

Note: cpumask and cpupool is supported.

This is an experimental scheduler.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- use PRI_stime to print delta in burn_budget, to fix build on
         32-bit (i.e. arm32) ]

10 years agotools: enable QEMU for ARM builds
Stefano Stabellini [Fri, 1 Aug 2014 15:32:19 +0000 (16:32 +0100)]
tools: enable QEMU for ARM builds

Build qemu-xen on ARM and ARM64: it is used to provide the PV backends,
disk and framebuffer in particular.

Ideally we would also modify the configure options to only build what is
necessary: a machine just for PV backends. However that is a work in
progress and not yet available in QEMU (see
http://marc.info/?l=qemu-devel&m=139082425718379&w=2). So we just build
the usual i386 target, even though no i386 emulation is going to be done
by qemu-xen on ARM.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoMove xenstore and libxc public headers to include subdir
Stefano Stabellini [Thu, 10 Jul 2014 15:35:28 +0000 (15:35 +0000)]
Move xenstore and libxc public headers to include subdir

Also moves xc_dom.h to include as it is used often by other xen tools.
Use the new include subdirectories to build Xen tools, qemu-xen and
stubdoms.

Add the old libxc include path to the programs that need it to build,
on a case by case basis and commeting that they shouldn't require
internal libxc headers to build.

[ And: update QEMU_TRADITIONAL_REVISION to corresponding qemu patch
   - Ian jackson ]

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoRerun autogen.sh after 7d7147762282 "Use configure --sysconfdir=DIR to se..."
Ian Campbell [Tue, 23 Sep 2014 13:08:51 +0000 (14:08 +0100)]
Rerun autogen.sh after 7d7147762282 "Use configure --sysconfdir=DIR to se..."

I tried to do this but failed to commit --amend correctly before pushing.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86emul: only emulate software interrupt injection for real mode
Jan Beulich [Tue, 23 Sep 2014 12:33:50 +0000 (14:33 +0200)]
x86emul: only emulate software interrupt injection for real mode

Protected mode emulation currently lacks proper privilege checking of
the referenced IDT entry, and there's currently no legitimate way for
any of the respective instructions to reach the emulator when the guest
is in protected mode.

This is XSA-106.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
10 years agox86/emulate: check cpl for all privileged instructions
Andrew Cooper [Tue, 23 Sep 2014 12:33:06 +0000 (14:33 +0200)]
x86/emulate: check cpl for all privileged instructions

Without this, it is possible for userspace to load its own IDT or GDT.

This is XSA-105.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrei LUTAS <vlutas@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/shadow: fix race condition sampling the dirty vram state
Andrew Cooper [Tue, 23 Sep 2014 12:31:47 +0000 (14:31 +0200)]
x86/shadow: fix race condition sampling the dirty vram state

d->arch.hvm_domain.dirty_vram must be read with the domain's paging lock held.

If not, two concurrent hypercalls could both end up attempting to free
dirty_vram (the second of which will free a wild pointer), or both end up
allocating a new dirty_vram structure (the first of which will be leaked).

This is XSA-104.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agoUse configure --sysconfdir=DIR to set CONFIG_DIR
Olaf Hering [Mon, 22 Sep 2014 13:00:07 +0000 (15:00 +0200)]
Use configure --sysconfdir=DIR to set CONFIG_DIR

Preserve existing behaviour: if the option was not given, set existing
defaults for FreeBSD, Solaris and everything else.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
10 years agotools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE
Olaf Hering [Mon, 22 Sep 2014 13:00:06 +0000 (15:00 +0200)]
tools/libxc: use XEN_RUN_DIR for SUSPEND_LOCK_FILE

Remove hardcoded /var/run/xen directory path, use XEN_RUN_DIR instead.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: provide variable paths to libxc
Olaf Hering [Mon, 22 Sep 2014 13:00:05 +0000 (15:00 +0200)]
tools/libxc: provide variable paths to libxc

In preparation to remove hardcoded /var/run/xen paths, provide
XEN_RUN_DIR and related directories to xc_private.h. Similar code exists
already for libxl, stubdom and other parts.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxl: use buildmakevars2header to create _paths.h
Olaf Hering [Mon, 22 Sep 2014 13:00:04 +0000 (15:00 +0200)]
tools/libxl: use buildmakevars2header to create _paths.h

Replace usage of buildmakevars2file with buildmakevars2header. The macro
generates a C header file, so remove code which converts shell variables
into C defines. Also update the dependency, the macro itself creates a
dependency for _paths.h. A temporary file is not needed anymore.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: add new macro buildmakevars2header
Olaf Hering [Mon, 22 Sep 2014 13:00:03 +0000 (15:00 +0200)]
Config.mk: add new macro buildmakevars2header

This macro is similar to buildmakevars2file, it just creates a C header
file instead of shell style syntax. Upcoming changes will use this macro
in libxl and libxc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: replace dependency to genpath with actual target
Olaf Hering [Mon, 22 Sep 2014 13:00:02 +0000 (15:00 +0200)]
Config.mk: replace dependency to genpath with actual target

genpath is a detail of buildmakevars2file. Replace the dependency to
genpath with the actual buildmakevars2file target. This change by
itself does not fix any bug. Upcoming changes will add dependencies to
$(target), but no rule exist to create $(target).

To force a rebuild of the $(1) rule the target now depends on the
existing .phony target. This dummy target is already used elsewhere in
the code.

No change in behaviour is expected by this patch.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoConfig.mk: move directory list into BUILD_MAKE_VARS
Olaf Hering [Mon, 22 Sep 2014 13:00:01 +0000 (15:00 +0200)]
Config.mk: move directory list into BUILD_MAKE_VARS

To maintain the list of directories in a single place, move the existing
list into its own variable and use it in buildmakevars2file.
Required for upcoming changes.
Trim also whitespaces.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoremove obsolete SUBSYS_DIR variable
Olaf Hering [Mon, 22 Sep 2014 13:00:00 +0000 (15:00 +0200)]
remove obsolete SUBSYS_DIR variable

/var/run is a runtime directory. It is not supposed to be packaged.
Remove unused SUBSYS_DIR variable from Config.mk and distro_mapping.txt.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/examples: remove obsolete install targets
Olaf Hering [Mon, 22 Sep 2014 12:59:59 +0000 (14:59 +0200)]
tools/examples: remove obsolete install targets

install-hotplug and install-udev are obsolete since commit 57bcfa11
("tools/hotplug: Separate OS-specific scripts.")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: use XEN_LOCK_DIR instead of hardcoded path
Olaf Hering [Mon, 22 Sep 2014 12:59:58 +0000 (14:59 +0200)]
tools/hotplug: use XEN_LOCK_DIR instead of hardcoded path

Use XEN_LOCK_DIR because it is a compiletime setting.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_LOCK_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:57 +0000 (14:59 +0200)]
tools/hotplug: create XEN_LOCK_DIR at runtime

Create XEN_LOCK_DIR because it is a compiletime setting. Also /var/lock
might be empty on startup because it is a tmpfs mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: create XEN_RUN_DIR at runtime
Olaf Hering [Mon, 22 Sep 2014 12:59:56 +0000 (14:59 +0200)]
tools/hotplug: create XEN_RUN_DIR at runtime

Create XEN_RUN_DIR instead of hardcoded path because it is a compiletime
setting. Also /var/run might be empty on startup because it is a tmpfs
mount point.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/pygrub: store kernels in /var/run/xen/pygrub
Olaf Hering [Mon, 22 Sep 2014 12:59:55 +0000 (14:59 +0200)]
tools/pygrub: store kernels in /var/run/xen/pygrub

Move location of temporary bootfiles from /var/run/xend/boot to
/var/run/xen/pygrub. Create the subdirectory if does not exist.
The <dir> argument --output-directory must be an existing directory.

The reason for this change is that all entrys below /var/run have to be
created at runtime in case /var/run is cleared on every boot.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools: remove obsolete path.py from tools/python
Olaf Hering [Mon, 22 Sep 2014 12:59:54 +0000 (14:59 +0200)]
tools: remove obsolete path.py from tools/python

The directory tools/python/xen/util does not exist.
Upcoming changes to genpath will fail if the rule persists.
Nothing uses path.py (anymore?), so get rid it.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed from .gitignore too ]

10 years agotools/mkrpm: allow custom rpm package name
Olaf Hering [Mon, 22 Sep 2014 12:59:53 +0000 (14:59 +0200)]
tools/mkrpm: allow custom rpm package name

Even if xen is configured and compiled with different --prefix= so that
it operates entirely below $prefix, the resulting package from 'make
rpmball' is always called "xen.rpm".

Use an environment name to give a different name.
This can be used like this:

suffix=-bugN
prefix=/opx/xen/staging${suffix}
./configure --prefix=${prefix}
make rpmball PKG_SUFFIX=${suffix}

The result will be "xen-bugN.rpm" instead of "xen.rpm". The benefit is that
many xen${suffix}.rpm packages can be installed at the same time.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoinstall.sh: Preserve permissions from make install
Olaf Hering [Mon, 22 Sep 2014 12:59:52 +0000 (14:59 +0200)]
install.sh: Preserve permissions from make install

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>