Andrew Cooper [Mon, 29 Sep 2014 08:23:01 +0000 (10:23 +0200)]
x86/emulate: support for emulating software event injection
AMD SVM requires all software events to have their injection emulated if
hardware lacks NextRIP support. In addition, `icebp` (opcode 0xf1) injection
requires emulation in all cases, even with hardware NextRIP support.
Emulating full control transfers is overkill for our needs. All that matters
is that guest userspace can't bypass the descriptor DPL check. Any guest OS
which would incur other faults as part of injection is going to end up with a
double fault instead, and won't be in a position to care that the faulting eip
is wrong.
Reported-by: Andrei LUTAS <vlutas@bitdefender.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Andrew Cooper [Mon, 29 Sep 2014 08:22:23 +0000 (10:22 +0200)]
x86/hvm: don't discard the SW/HW event distinction from the emulator
Injecting emulator software events as hardware exceptions results in a bypass
of DPL checks. As the emulator doesn't perform DPL checks itself, guest
userspace is capable of bypassing DPL checks and injecting arbitrary events.
Propagating software event information from the emulator allows VMX to now
properly inject software events, including DPL and presence checks, as well
correct fault/trap frames.
Reported-by: Andrei LUTAS <vlutas@bitdefender.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrei LUTAS <vlutas@bitdefender.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
x86/hvm: remove stray lock release from hvm_ioreq_server_init()
If HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or HVM_PARAM_BUFIOREQ_EVTCHN
parameters are read when guest domain is dying it leads to the following
ASSERT:
The root cause of this issue is the fact that ioreq_server.lock is being
released twice - first in hvm_ioreq_server_init() and then in hvm_create_ioreq_server().
Drop the lock release from hvm_ioreq_server_init() as we don't take it here, do minor
label cleanup.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Fri, 26 Sep 2014 14:29:34 +0000 (16:29 +0200)]
mem_event: relax error condition on debug builds
A faulty tool stack can brick a debug hypervisor. Unpleasant while dev/test.
Suggested-by: Andres Lagar Cavilla <andres@lagarcavilla.org> Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de> Acked-by: Tim Deegan <tim@xen.org>
Tamas K Lengyel [Fri, 26 Sep 2014 14:24:02 +0000 (16:24 +0200)]
relocate p2m_access_t into common and swap the order
We swap the order of the enum of types n ... rwx, as to have rwx at 0, which is
the default setting when mem_access is not in use. This has performance benefits for
non-memaccess paths, as now comparison is to 0 when checking if memaccess is in use,
which is often faster.
We fix one location in nested_hap where the order of the enum made a difference.
Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de> Acked-by: Tim Deegan <tim@xen.org>
Yang Hongyang [Mon, 7 Jul 2014 02:10:20 +0000 (10:10 +0800)]
MAINTAINERS: update maintained files of Remus
Add Remus specific hotplug scripts and libxl files
to the list of maintained files.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Wed, 16 Jul 2014 09:07:43 +0000 (17:07 +0800)]
libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
Add LIBXL_HAVE_REMUS to indicate Remus support in libxl
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Wed, 16 Jul 2014 09:27:43 +0000 (17:27 +0800)]
xl/remus: add a cmdline switch to disable disk replication
Disk replication is enabled by default. This patch adds a cmdline
switch to 'xl remus' command to explicitly disable disk replication.
A new boolean field 'diskbuf' is added to the libxl_domain_remus_info
structure to represent this configuration option inside libxl.
Note: Disabling disk replication requires enabling unsafe mode.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Wed, 11 Jun 2014 03:29:44 +0000 (11:29 +0800)]
xl/remus: cmdline switches and config vars to control network buffering
Add two members in libxl_domain_remus_info:
netbuf: whether netbuf is enabled
netbufscript: the path of the script which will be run to setup
and tear down the guest's interface.
Add cmdline switches to 'xl remus' command to enable or disable
network buffering and a domain-specific hotplug script to setup
network buffering.
Add a new config var 'remus.default.netbufscript' to xl.conf, that
allows the user to override the default global script used to
setup network buffering.
Note: Network buffering is enabled by default. Disabling network
buffering requires enabling unsafe mode.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Thu, 24 Jul 2014 08:47:24 +0000 (16:47 +0800)]
xl/remus: cmdline switch to explicitly enable unsafe configurations
By default, network buffering and disk replication are enabled;
checkpoints are replicated to another standby VM.
This patch allows the user to disable any of these features by
explicitly specifying a 'run in unsafe mode' switch when invoking
the 'xl remus' command. While running Remus in an unsafe mode
makes little sense under normal circumstances, it is useful to be
able to disable one or more features mentioned above for
testing/debugging/profiling purposes.
Unless this option is enabled, it will not be possible to
replicate memory checkpoints to /dev/null (blackhole replication),
disable network buffering or disk replication.
As a starter, the use of blackhole replication now requires that
the unsafe mode be enabled. Subsequent patches will add support
for disabling network buffering and disk replication in a similar
manner.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 29 Aug 2014 02:16:36 +0000 (10:16 +0800)]
xl/remus: change bool to defbool
Use defbool instead of bool for boolean flags in remus_info struct.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 18 Jul 2014 09:14:22 +0000 (17:14 +0800)]
libxl/remus: setup and control disk replication for DRBD backends
This patch adds the machinery required for protecting a guest's
disk state, when the guest disk uses a DRBD disk backend.
This patch comprises of two parts:
1. Hotplug scripts: The block-drbd-probe script is responsible for
performing sanity checks on the state of the DRBD disk before the
checkpointing process begins. This script should be invoked by
libxl for each of the guest's disk devices, when starting Remus.
2. Remus drbd disk device: Implements the interfaces required by the
remus abstract device layer. A note about the implementation:
a) setup() is called for each disk attached to the guest.
During setup():
i) The hotplug script is called to perform the sanity check.
ii) Libxl obtains a handle to the DRBD device (/dev/drbd*) and
and subsequently controls disk checkpoint replication using
this handle in the checkpoint callbacks.
c) The preresume() checkpoint callback is executed asynchronously
using libxl__ev_child_fork(), as it may potentially block for more
than few seconds in case of backup failure.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 18 Jul 2014 07:08:36 +0000 (15:08 +0800)]
libxl/remus: setup and control network output buffering
This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:
1. Hotplug scripts: The remus-netbuf-setup script is responsible for
setting up and tearing down the necessary infrastructure required for
network output buffering. This script should be invoked by libxl for
each of the guest's network interfaces, when starting or stopping Remus.
Apart from returning success/failure indication via the usual hotplug
entries in xenstore, this script also writes to xenstore, the name of
the REMUS_IFB device to be used to control the vif's network output.
The script relies on libnl3 command line utilities to perform various
setup/teardown functions. The script is confined to Linux platforms only
since NetBSD does not seem to have libnl3.
2. Remus network device: Implements the interfaces required by the
remus abstract device layer. A note about the implementation:
a) init_subkind_nic() & cleanup_subkind_nic() are called once per Remus
invocation. They establish and free netlink related state respectively.
b) setup() and teardown are called for each vif attached to the
guest.
During setup():
i) The hotplug script is called to setup a network buffer on a
given vif. The script chooses an available IFB device from
the system, redirects vif egress traffic to the IFB device
and sets up the plug qdisc (output buffer) on the IFB device.
The name of the IFB device is communicated via xenstore to
libxl.
ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
and subsequently controls output buffering using this handle
in the checkpoint callbacks.
During teardown(), the hotplug scripts are called again to remove
the vif->ifb traffic redirection, release the ifb and the plug
qdisc associated with it.
c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
are implemented as synchronous ops as the netlink calls associated
with the qdisc subsystem are very fast.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 18 Jul 2014 07:02:34 +0000 (15:02 +0800)]
libxl/remus: introduce an abstract Remus device layer
Introduce an abstract device layer that allows the Remus
logic in libxl to control a guest's devices in a device-agnostic
manner. The device layer also exposes a set of internal interfaces
that a device type must implement, if it wishes to support Remus.
The following API are exposed to libxl:
One-time configuration operations:
*libxl__remus_devices_setup
> Enable output buffering for NICs, setup disk replication, etc.
*libxl__remus_devices_teardown
> Disable network output buffering and disk replication;
teardown any associated external setups like qdiscs for NICs.
Operations executed every checkpoint (in order of invocation):
*libxl__remus_devices_postsuspend
*libxl__remus_devices_preresume
*libxl__remus_devices_commit
Each device type needs to implement the interfaces specified in
the libxl__remus_device_instance_ops if it wishes to support Remus.
The high-level control flow through the Remus device layer is shown below:
callback processing
* Only call the per-device libxl__multidev_one_callback
when the iteration has succeded or failed.
* The final callback (called by multidev) is a trivial
shim to shuffle the pointers and notify our own caller.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 27 Jun 2014 01:43:51 +0000 (09:43 +0800)]
autoconf: add libnl3 dependency for Remus network buffering support
Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
It also provides the ability to configure tools without libnl3 support
i.e., without network buffering support.
When there is no network buffering support, libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.
NOTE: This patch changes tools/configure.ac, please rerun
autogen.sh while applying the patch.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 18 Jul 2014 08:40:54 +0000 (16:40 +0800)]
libxl: Extend libxl__ao_device with a libxl__ev_child member
This can be used to fork children to allow the asynchronous execution
of system calls which only come in a synchronous variant. This will
be useful for Remus, in the following patches.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
libxl__multidev_prepare_with_aodev is similar to libxl__multidev_prepare,
but takes a libxl__ao_device as an extra argument.
libxl__multidev_prepare is now a wrapper around
libxl__multidev_prepare_with_aodev.
This new internal API will be used by the Remus device abstract layer
for handling various Remus devices.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Now a caller who wants to be able to do other work when the aodev
completes can put their own callback into the aodev, and make the
multidev machinery aware that the particular aodev is complete (from
the point of view that multidev should have) whenever it likes.
No functional change in this patch.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Thu, 25 Sep 2014 14:59:06 +0000 (15:59 +0100)]
libxl: multidev: Clarify comments about which callbacks are meant
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tamas K Lengyel [Fri, 26 Sep 2014 13:49:26 +0000 (15:49 +0200)]
relocate mem_access and mem_event into common
In preparation to add support for ARM LPAE mem_event, relocate mem_access,
mem_event and auxiliary functions into common Xen code.
This patch makes no functional changes to the X86 side, for ARM mem_event
and mem_access functions are just defined as placeholder stubs, and are
actually enabled later in the series.
Edits that are only header path adjustments:
xen/arch/x86/domctl.c
xen/arch/x86/mm/hap/nested_ept.c
xen/arch/x86/mm/hap/nested_hap.c
xen/arch/x86/mm/mem_paging.c
xen/arch/x86/mm/mem_sharing.c
xen/arch/x86/mm/p2m-pod.c
xen/arch/x86/mm/p2m-pt.c
xen/arch/x86/mm/p2m.c
xen/arch/x86/x86_64/compat/mm.c
xen/arch/x86/x86_64/mm.c
Makefile adjustments for new/removed code:
xen/common/Makefile
xen/arch/x86/mm/Makefile
Relocated prepare_ring_for_helper and destroy_ring_for_helper functions:
xen/include/xen/mm.h
xen/common/memory.c
xen/include/asm-x86/hvm/hvm.h
xen/arch/x86/hvm/hvm.c
Code movement of mem_event and mem_access:
xen/arch/x86/mm/mem_access.c -> xen/common/mem_access.c
xen/arch/x86/mm/mem_event.c -> xen/common/mem_event.c
xen/include/asm-x86/mem_access.h -> xen/include/xen/mem_access.h
xen/include/asm-x86/mem_event.h -> xen/include/xen/mem_event.h
Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de> Acked-by: Tim Deegan <tim@xen.org>
Roy Franz [Fri, 26 Sep 2014 10:00:55 +0000 (12:00 +0200)]
EFI: add arch specific function to control use of config file
The x86 EFI build of Xen always uses a configuration file to load modules, but
the ARM version can either use a config file to specify the modules, or be
loaded by GRUB in which case GRUB loads the modules and adds them to the DTB
that is passed to Xen. Add the efi_arch_use_config_file() to indicate if a
configuration file is required. For x86, this will always be true. ARM will
examine the DTB passed via EFI configuration table (if any), and if it contains
module information will use that that not use the configuration file at all.
Add Emacs footer to efi-boot.h and boot.c
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roy Franz [Fri, 26 Sep 2014 10:00:27 +0000 (12:00 +0200)]
EFI: add several misc. arch functions for boot code
Add efi_arch_blexit() for arch specific cleanup on error exit,
efi_arch_load_addr_check() to do the arch specific verifications
of where the UEFI firmware loaded Xen, and efi_arch_cpu() for
probing CPU features.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roy Franz [Fri, 26 Sep 2014 09:59:56 +0000 (11:59 +0200)]
EFI: add arch specific module handling to read_file()
Each architecture tracks modules differently internally, so add
efi_arch_handle_module() routine to enable the common code to invoke the proper
handling of modules as they are loaded. Module handling for ucode,ramdisk, and
xsm is changed to not process remainder of string after filename as options,
since these modules don't take options.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roy Franz [Thu, 25 Sep 2014 12:30:16 +0000 (14:30 +0200)]
EFI: arch specific memory setup
This patch adds efi_arch_memory() to allow each architecture a hook
to use for do memory setup. x86 uses this for trampoline memory setup
and some pagetable setup.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roy Franz [Thu, 25 Sep 2014 12:28:27 +0000 (14:28 +0200)]
EFI: add efi_arch_cfg_file_early/late() to handle arch specific cfg file fields
Different architectures have some different configuration file
fields that need to be handled. In particular, x86 has ucode
and ARM has device tree files to be loaded. These arch specific
functions is used to allow each architecture to implement these
features in arch specific code. Early/late versions are provided,
as ARM needs to process the DTB entry first, and x86 wants to process
the ucode entry last as it is the smallest allocation.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roy Franz [Thu, 25 Sep 2014 12:27:55 +0000 (14:27 +0200)]
EFI: add architecture functions for pre/post ExitBootServices
The UEFI ExitBootServices function is invoked to transition the
system to the 'runtime' mode of operation, and is done right before
transitioning from the EFI loader code into Xen proper. x86 does some
arch specific memory management (trampoline) before exit boot services,
and the code that transitions from the EFI application state to Xen
is architecture specific. This patch adds two functions, one pre
and one post ExitBootServices to allow each architecture to
to handle these cases in a customized manner.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roy Franz [Thu, 25 Sep 2014 12:26:34 +0000 (14:26 +0200)]
EFI: create arch functions to allocate memory for and process memory map
The memory used to store the EFI memory map is allocated in an architecture
specific way, and the processing of the memory map itself uses x86 specific
data structures. This patch adds architecture specific funtions so each
architecture can provide its own implementation.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Roy Franz [Thu, 25 Sep 2014 12:22:12 +0000 (14:22 +0200)]
EFI: move x86 boot/runtime code to common/efi
This moves the EFI boot and runtime services code to the common/efi directory.
This code is symbolicly linked back into the arch/x86/efi directory where it is
built if a build-time check for PE/COFF support in the toolchain passes. In
the PE/COFF supporting case, both the EFI executable and the normal Xen image
(with stubbed EFI functions) are built. We can't use the normal common build
infrastructure since we are building two versions at the same time, with
different EFI related code in each. No code changes, just file movement and
make updates. The files are symbolicly linked at build time back toe the
original arch/x86/efi directory. This is in preparation for adding ARM EFI
support where much of these files can be shared.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 25 Sep 2014 12:10:01 +0000 (14:10 +0200)]
x86/vlapic: don't silently accept bad vectors
Vectors 0-15 are reserved, and a physical LAPIC - upon sending or
receiving one - would generate an APIC error instead of doing the
requested action. Make our emulation behave similarly.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 25 Sep 2014 12:08:20 +0000 (14:08 +0200)]
x86/HVM: fix ID handling of x2APIC emulation
- properly change ID when switching into x2APIC mode (instead of
mimicking necessary behavior in hvm_x2apic_msr_read())
- correctly (meaningfully) set LDR (so far it ended up being 1 on all
vCPU-s)
- even if we don't support more than 128 vCPU-s in a HVM guest for now,
we should properly handle IDs as 32-bit values (i.e. not ignore the
top 24 bits)
- with that, properly do cluster ID and bit mask check in
vlapic_match_logical_addr()
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 25 Sep 2014 12:07:27 +0000 (14:07 +0200)]
x86/HVM: fix miscellaneous aspects of x2APIC emulation
- generate #GP on invalid APIC base MSR transitions
- fail reads from the EOI and self-IPI registers (which are write-only)
- handle self-IPI writes and the ICR2 half of ICR writes largely in
hvm_x2apic_msr_write() and (for self-IPI only) vlapic_apicv_write()
- don't permit MMIO-based access in x2APIC mode
- filter writes to read-only registers in hvm_x2apic_msr_write(),
allowing conditionals to be dropped from vlapic_reg_write()
- don't ignore upper half of MSR-based write to ESR being non-zero
- don't ignore other writes to reserved bits
- VMX's EXIT_REASON_APIC_WRITE must not result in #GP (this exit being
trap-like, this exception would get raised on the wrong RIP)
- make hvm_x2apic_msr_read() produce X86EMUL_* return codes just like
hvm_x2apic_msr_write() does (benign to the only caller)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ian Campbell [Wed, 24 Sep 2014 14:13:28 +0000 (15:13 +0100)]
xen: arm: correct VTCR setting on arm32.
1c92a2aaf8c6 "xen: arm: support for up to 48-bit IPA addressing on
arm64" inadvertently changes the VTCR setting for 32-bit from
0x80003558 to 0x80003518, changing the SL0 setting from 0x1 (p2m
starts at L1) to 0x0 (p2m starts at L2).
For some (inexplicable) reason this doesn't cause any issue on
Arndale but it does on the OdroidXU.
Andrew Cooper [Wed, 24 Sep 2014 16:28:15 +0000 (17:28 +0100)]
tools/libxc: Avoid cacheflush toolstack hypercalls on x86
XEN_DOMCTL_cacheflush hypercalls are (and will always be) -ENOSYS on x86, but
xc_domain_cacheflush() is called often during domain build and migrate for
correct behaviour on ARM.
Stub xc_domain_cacheflush() out on x86 to remove its pressure on the global
domctl lock, and the hypercall overhead (which applies further pressure to the
already heavily-contended TLB flush lock).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 25 Sep 2014 09:55:49 +0000 (11:55 +0200)]
x86: make dump_pageframe_info() slightly more verbose for dying domains
Allowing more than just 10 pages to be printed in this case gives a
better chance to fully understand eventual page reference leaks: Report
up to 16 "normal" (writable or untyped) pages, and an unlimited number
of special type (page or descriptor table) ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 25 Sep 2014 09:53:32 +0000 (11:53 +0200)]
x86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation
SYSCALL:
- make sure SS selector has RPL 0
- only use 32 bits of RIP to fill RCX when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless initializers and casts
- drop redundant MSR_STAR read (as suggested by Andrew Cooper)
SYSENTER/SYSEXIT:
- #GP condition doesn't depend on guest mode
- only use 32 bits for setting RIP/RSP when target execution mode is 32-bit
- don't shadow function wide variable 'rc'
- consolidate CS attribute setting into single statements
- drop pointless (and inconsistently used) casts
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roy Franz [Wed, 24 Sep 2014 09:09:11 +0000 (11:09 +0200)]
x86/EFI: fix freeing of uninitialized pointer
The only valid response from the LocateHandle() call is EFI_BUFFER_TOO_SMALL,
so exit if we get anything else. We pass a 0 size/NULL pointer buffer, so the
only other returns we will get is an error. Return right away as there is
nothing to do. Also return if there is an error allocating the buffer, as the
previous code path also allowed for an undefined pointer to be freed.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Re-structure the change.
Wei Liu [Mon, 15 Sep 2014 19:29:15 +0000 (20:29 +0100)]
flask/policy: use naming convention xenpolicy-$(XEN_FULLVERSION)
Daniel suggested we use xenpolicy-$(XEN_FULLVERSION) as flask policy
naming convention.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Campbell [Mon, 22 Sep 2014 16:10:39 +0000 (17:10 +0100)]
MAINTAINERS: Add Wei Liu as toolstack co-maintainer.
The three existing maintainers are not really able to keep up with
the flow and Wei is one of the top tools contributors (according to
"git shortlog -s -n -p RELEASE-4.4.0..origin/staging tools" and my
own impressions).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Introduce a document that describes the interfaces used on PVH. This
document has been designed from a guest OS point of view (i.e.: what a guest
needs to do in order to support PVH).
xen/arm: remove check for generic timer support for arm64
Information about support for generic support is available in
IDR_PFR1 register in ARMv7. Where as this information is not
available in ARMv8 that supports only aarch64 bit mode.
ARMv8 being always supports generic timer, this check is not
required.
For platforms that support only aarch64 mode, IDR_PFR1 is
not implemented
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: Restricted access to IFSR32_EL2 and FPEXC32_EL2
IFSR32_EL1 and FPEXC32_EL1 registers are accessible in
aarch64 mode only if aarch32 mode is support in EL1.
So allow access to these registers only for 32-bit domains.
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 23 Sep 2014 16:46:21 +0000 (17:46 +0100)]
libxl: Remove a duplicate calculation of be_path
Coverity-ID: 1238177 CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
George Dunlap [Mon, 15 Sep 2014 16:25:04 +0000 (17:25 +0100)]
make: Make "src-tarball" target actually make a source tarball
At the moment, making a release tarball is an annoyingly manual
process that involves running "git archive" into a temporary directory.
Script this process up and make a target, so that the release manager
can simply type "make src-tarball-release" and have everything show up
nice and neat in dist/xen-$version.tar.gz. "make src-tarball" will
make a version number based on git describe, which will typically have
the most recent tag, number of commits since that tag, and the git
commit id of the current HEAD.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
xen: arm: Add support for the Exynos secure firmware
The existence of secure firmware is dictated by the presence of
"samsung,secure-firmware" in the DT.
The Arndale board does not have that entry, and uses the address as defined
in "samsung,exynos4210-sysram", offset 0 as the smp init address. This is
possibly true for all SoCs without secure firmware.
For other boards which do have a "secure-firmware" node, use sysram-ns
at offset +0x1c as the smp init address.
The "secure-firmware" MMIO range contains ways to idle the CPU. As this gets
mapped to DOM0 because of its presence in the DT, we blacklist it.
Have tested this on the Odroid XU. I have also tested the other code path
on the Odroid XU by removing "secure-firmware" from its DT. I could see
that the other code path was exercised with correct smp init address
values.
Signed-off-by: Suriyan Ramasami <suriyan.r@gmail.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
This implementation uses the tsc of vcpu0, which is preserved across
save/restore as part of the architectural state, and then converts that
to a 100ns tick using the domain's tsc_khz.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Keir Fraser <keir@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Christoph Egger <chegger@amazon.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
because a flaw in the implementation meant the counter was reset on
migration.
All of these changes were made without any addtional options being
added to the VM configuration, or any compatibility checks being made
in the domain save/restore code. Hence setting the single boolean
'viridian' option in the VM configuration yields a different set of
features depending on which version of Xen the VM is started on, and the
feature set can change across migration (so new MSRs can magically appear).
This patch grandfathers in the current viridian features set and calls them
the 'base' and 'freq' feature sets. HVM_PARAM_VIRIDIAN is re-purposed as
a feature mask. The hypervisor has only ever allowed it ot be set to 0
or 1, so the presence of the base and freq sets are indicated by setting
bit 0. The freq set can then be turned off by setting bit 1, thus
restoring the pre-Xen-4.4 base set. Newly implemented viridian features
can be optionally enabled in future by setting further bits.
The viridian option in xl.cfg(5) has also been changed to a list so
that the sets can be individually enabled or disabled. For compatibility,
if the option is specified as a boolean, then a true (1) value will enable
the base and freq sets and a false (0) value will not enable any
enlightenments.
This patch also alters the allowed write accesses to HVM_PARAM_VIRIDIAN.
Currently there is nothing to stop the guest writing this value (which,
while harmless to anything else, should not happen) and nothing to
stop a toolstack from setting the value back to zero whilst the guest is
running, causing CPUID leaves to disappear and MSR accesses to start
causing GPFs in the guest. Both of these possibilities are now disallowed:
Once the parameter is set to a non-zero value it may not be modified (only
re-written with the same value), and guests no longer have any write
access.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Keir Fraser <keir@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: David Scott <dave.scott@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Add xl command for rtds scheduler
Note: VCPU's parameter (period, budget) is in microsecond (us).
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> Signed-off-by: Sisu Xi <xisisu@gmail.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Add libxl functions to set/get domain's parameters for rtds scheduler
Note: VCPU's information (period, budget) is in microsecond (us).
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> Signed-off-by: Sisu Xi <xisisu@gmail.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Add xc_sched_rtds_* functions to interact with Xen to set/get domain's
parameters for rtds scheduler.
Note: VCPU's information (period, budget) is in microsecond (us).
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> Signed-off-by: Sisu Xi <xisisu@gmail.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- xenctrl.h has moved to tools/libxc/include, adjust patch to match ]
This scheduler follows the Preemptive Global Earliest Deadline First
(EDF) theory in real-time field.
At any scheduling point, the VCPU with earlier deadline has higher
priority. The scheduler always picks the highest priority VCPU to run on a
feasible PCPU.
A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is
idle or has a lower-priority VCPU running on it.)
Each VCPU has a dedicated period and budget.
The deadline of a VCPU is at the end of each period;
A VCPU has its budget replenished at the beginning of each period;
While scheduled, a VCPU burns its budget.
The VCPU needs to finish its budget before its deadline in each period;
The VCPU discards its unused budget at the end of each period.
If a VCPU runs out of budget in a period, it has to wait until next period.
Each VCPU is implemented as a deferable server.
When a VCPU has a task running on it, its budget is continuously burned;
When a VCPU has no task but with budget left, its budget is preserved.
Queue scheme:
A global runqueue and a global depletedq for each CPU pool.
The runqueue holds all runnable VCPUs with budget and sorted by deadline;
The depletedq holds all VCPUs without budget and unsorted.
Note: cpumask and cpupool is supported.
This is an experimental scheduler.
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu> Signed-off-by: Sisu Xi <xisisu@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Tested-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- use PRI_stime to print delta in burn_budget, to fix build on
32-bit (i.e. arm32) ]
Build qemu-xen on ARM and ARM64: it is used to provide the PV backends,
disk and framebuffer in particular.
Ideally we would also modify the configure options to only build what is
necessary: a machine just for PV backends. However that is a work in
progress and not yet available in QEMU (see
http://marc.info/?l=qemu-devel&m=139082425718379&w=2). So we just build
the usual i386 target, even though no i386 emulation is going to be done
by qemu-xen on ARM.
Move xenstore and libxc public headers to include subdir
Also moves xc_dom.h to include as it is used often by other xen tools.
Use the new include subdirectories to build Xen tools, qemu-xen and
stubdoms.
Add the old libxc include path to the programs that need it to build,
on a case by case basis and commeting that they shouldn't require
internal libxc headers to build.
[ And: update QEMU_TRADITIONAL_REVISION to corresponding qemu patch
- Ian jackson ]
Jan Beulich [Tue, 23 Sep 2014 12:33:50 +0000 (14:33 +0200)]
x86emul: only emulate software interrupt injection for real mode
Protected mode emulation currently lacks proper privilege checking of
the referenced IDT entry, and there's currently no legitimate way for
any of the respective instructions to reach the emulator when the guest
is in protected mode.
This is XSA-106.
Reported-by: Andrei LUTAS <vlutas@bitdefender.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 23 Sep 2014 12:33:06 +0000 (14:33 +0200)]
x86/emulate: check cpl for all privileged instructions
Without this, it is possible for userspace to load its own IDT or GDT.
This is XSA-105.
Reported-by: Andrei LUTAS <vlutas@bitdefender.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrei LUTAS <vlutas@bitdefender.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 23 Sep 2014 12:31:47 +0000 (14:31 +0200)]
x86/shadow: fix race condition sampling the dirty vram state
d->arch.hvm_domain.dirty_vram must be read with the domain's paging lock held.
If not, two concurrent hypercalls could both end up attempting to free
dirty_vram (the second of which will free a wild pointer), or both end up
allocating a new dirty_vram structure (the first of which will be leaked).
This is XSA-104.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Olaf Hering [Mon, 22 Sep 2014 13:00:05 +0000 (15:00 +0200)]
tools/libxc: provide variable paths to libxc
In preparation to remove hardcoded /var/run/xen paths, provide
XEN_RUN_DIR and related directories to xc_private.h. Similar code exists
already for libxl, stubdom and other parts.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 22 Sep 2014 13:00:04 +0000 (15:00 +0200)]
tools/libxl: use buildmakevars2header to create _paths.h
Replace usage of buildmakevars2file with buildmakevars2header. The macro
generates a C header file, so remove code which converts shell variables
into C defines. Also update the dependency, the macro itself creates a
dependency for _paths.h. A temporary file is not needed anymore.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 22 Sep 2014 13:00:03 +0000 (15:00 +0200)]
Config.mk: add new macro buildmakevars2header
This macro is similar to buildmakevars2file, it just creates a C header
file instead of shell style syntax. Upcoming changes will use this macro
in libxl and libxc.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 22 Sep 2014 13:00:02 +0000 (15:00 +0200)]
Config.mk: replace dependency to genpath with actual target
genpath is a detail of buildmakevars2file. Replace the dependency to
genpath with the actual buildmakevars2file target. This change by
itself does not fix any bug. Upcoming changes will add dependencies to
$(target), but no rule exist to create $(target).
To force a rebuild of the $(1) rule the target now depends on the
existing .phony target. This dummy target is already used elsewhere in
the code.
No change in behaviour is expected by this patch.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 22 Sep 2014 13:00:01 +0000 (15:00 +0200)]
Config.mk: move directory list into BUILD_MAKE_VARS
To maintain the list of directories in a single place, move the existing
list into its own variable and use it in buildmakevars2file.
Required for upcoming changes.
Trim also whitespaces.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 22 Sep 2014 12:59:56 +0000 (14:59 +0200)]
tools/hotplug: create XEN_RUN_DIR at runtime
Create XEN_RUN_DIR instead of hardcoded path because it is a compiletime
setting. Also /var/run might be empty on startup because it is a tmpfs
mount point.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 22 Sep 2014 12:59:55 +0000 (14:59 +0200)]
tools/pygrub: store kernels in /var/run/xen/pygrub
Move location of temporary bootfiles from /var/run/xend/boot to
/var/run/xen/pygrub. Create the subdirectory if does not exist.
The <dir> argument --output-directory must be an existing directory.
The reason for this change is that all entrys below /var/run have to be
created at runtime in case /var/run is cleared on every boot.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 22 Sep 2014 12:59:54 +0000 (14:59 +0200)]
tools: remove obsolete path.py from tools/python
The directory tools/python/xen/util does not exist.
Upcoming changes to genpath will fail if the rule persists.
Nothing uses path.py (anymore?), so get rid it.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- removed from .gitignore too ]
Olaf Hering [Mon, 22 Sep 2014 12:59:53 +0000 (14:59 +0200)]
tools/mkrpm: allow custom rpm package name
Even if xen is configured and compiled with different --prefix= so that
it operates entirely below $prefix, the resulting package from 'make
rpmball' is always called "xen.rpm".
Use an environment name to give a different name.
This can be used like this:
suffix=-bugN
prefix=/opx/xen/staging${suffix}
./configure --prefix=${prefix}
make rpmball PKG_SUFFIX=${suffix}
The result will be "xen-bugN.rpm" instead of "xen.rpm". The benefit is that
many xen${suffix}.rpm packages can be installed at the same time.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>