]> xenbits.xensource.com Git - xen.git/log
xen.git
9 years agox86/xsaves: fix overwriting between non-lazy/lazy xsaves
Shuai Ruan [Tue, 5 Apr 2016 11:23:41 +0000 (13:23 +0200)]
x86/xsaves: fix overwriting between non-lazy/lazy xsaves

The offset at which components xsaved by xsave[sc] are not fixed.
So when when a save with v->fpu_dirtied set is followed by one
with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data
written by the lazy one.

The solution is when using_xsave_compact is enabled and taking xcr0_accum into
consideration, if guest has ever used XSTATE_LAZY & ~XSTATE_FP_SSE
(XSTATE_FP_SSE will be excluded beacause xsave will write XSTATE_FP_SSE
part in legacy region of xsave area which is fixed, saving XSTATE_FS_SSE
will not cause overwriting problem), vcpu_xsave_mask will return XSTATE_ALL.
Otherwise vcpu_xsave_mask will return XSTATE_NONLAZY.

This may cause overhead save on lazy states which will cause performance
impact. After doing some performance tests on xsavec and xsaveopt
(suggested by jan), the results show xsaveopt performs better than xsavec.
So hypervisor will not use xsavec anymore.

xsaves will not be used until supervised state is introduced in hypervisor.
And XSTATE_XSAVES_ONLY (indicates supervised state is understood in xen)
is introduced, the use of xsaves depend on whether XSTATE_XSAVES_ONLY is set
in xcr0_accum.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agopublic/xen.h: add flags field to vcpu_time_info
Joao Martins [Tue, 5 Apr 2016 11:21:39 +0000 (13:21 +0200)]
public/xen.h: add flags field to vcpu_time_info

This field has two possible flags (as of latest pvclock ABI
shared with KVM).

flags: bits in this field indicate extended capabilities
coordinated between the guest and the hypervisor.  Specifically
on KVM, availability of specific flags has to be checked in
0x40000001 cpuid leaf. On Xen, we don't have that but we can
still check some of the flags after registering the time info
page since a force_update_vcpu_system_time is performed.

Current flags are:

 flag bit   | cpuid bit    | meaning
-------------------------------------------------------------
            |              | time measures taken across
     0      |      24      | multiple cpus are guaranteed to
            |              | be monotonic
-------------------------------------------------------------
            |              | guest vcpu has been paused by
     1      |     N/A      | the host
            |              |
-------------------------------------------------------------

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Add XEN_ prefixes to new #define-s. Make structure layout change
dependent upon __XEN_INTERFACE_VERSION__ (intentionally comparing to
4.6, as we may want to backport this at least there).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agodocs: Document block-script protocol
George Dunlap [Thu, 24 Mar 2016 17:18:35 +0000 (17:18 +0000)]
docs: Document block-script protocol

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agolibxl: Allow local access for block devices with hotplug scripts
George Dunlap [Thu, 24 Mar 2016 17:18:34 +0000 (17:18 +0000)]
libxl: Allow local access for block devices with hotplug scripts

pygrub and qemuu need to be able to access a VM's disks locally in
order to be able to pull out the kernel and provide emulated disk
access, respectively.  This can be done either by accessing the local
disk directly, or by plugging the target disk into dom0 to allow
access.

Unfortunately, while the plugging machinery works for pygrub, it does
not yet work for qemuu; this means that at the moment, disks with
hotplug scripts or disks with non-dom0 backends cannot be provided as
emulated devices to HVM domains.

Fortunately, disks using hotplug scripts created in dom0 do create a
block device as part of set-up, which can be accessed locally; and if
they use block-common.sh:write_dev, this path will be written to
physical-device-path.

Modify libxl__device_disk_setdefault() to be able to fish this path
out of xenstore and pass it to the caller.

Unfortunately, at the time pygrub runs, the devices have not yet been
set up.  Rather than try to stash the domid somewhere to pass, we just
pass INVALID_DOMID.

This allows qemuu to emulate block devices created with custom hotplug
scripts.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agolibxl: Share logic for finding path between qemuu and pygrub
George Dunlap [Thu, 24 Mar 2016 17:18:34 +0000 (17:18 +0000)]
libxl: Share logic for finding path between qemuu and pygrub

qemu can also access disks which will be provided with a qdisk backend
directly; add a flag to libxl__device_disk_find_local_path to indicate
whether to check for qdisk direct access.

Call libxl__device_disk_find_local_path() for most paths.  If we can't
find a local path, print an error and skip the disk, rather than using
a bogus path.

Now if there is no local access to the disk (i.e., because the disk
has a non-local backend, or relies on a custom hotplug script), libxl
will now print a warning and not provide the emulated disk, rather
than providing bogus parameters to qemu which cause it to error out.
(Such disks will still be available via the PV backend.)

I left the libxl__blktap_devpath in the qemuu-specific code rather
than sharing it with the pyrgub code because:

1) When the pygrub path runs the guest disks have not yet been set up

2) libxl__blktap_devpath() will give you the existing devpath if it
already exists, but will set one up for you if you don't.  So on the
pygrub path, this would end up setting up a new tap device.

3) There is no tap-specific teardown code on the pygrub path, and I
don't want to add any (particularly since I'm hoping to remove tapdisk
altogether soon).

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agolibxl: Rearrange qemu upstream disk argument code
George Dunlap [Thu, 24 Mar 2016 17:18:33 +0000 (17:18 +0000)]
libxl: Rearrange qemu upstream disk argument code

Reorganize the qemuu disk argument code to make a clean separation
between finding a file to use, and constructing the parameters:

* Rename pdev_path to target_path

* Only use qemu_disk_format_string() in circumstances where qemu may
be interpreting the disk (i.e., backend==QDISK).  In all other cases,
it should use RAW.

* Share as much as possible between the is_cdrom path and the normal
path.

This is mainly prep for sharing the local path finder with the
bootloader; but it does allow cdroms to use any backend that a normal
disk can use. Previously this was limited to RAW files or things that
qemu could handle directly; as of this changeset, it now includes tap
disks; and in future changesets it will include backends with custom
block scripts.

NB that this retains an existing bug, that disks with custom block
scripts or non-dom0 backends will have the bogus pdev_path passed in
to qemu, most likely resulting in qemu exiting with an error.  This
will be fixed in follow-up patches.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agolibxl: Move check for local access to a funciton
George Dunlap [Thu, 24 Mar 2016 17:18:33 +0000 (17:18 +0000)]
libxl: Move check for local access to a funciton

Move pygrub checks for local access ability into a separate function.

Also reorganize libxl__device_disk_local_initiate_attach so that we
don't initialize dls->disk unless we actually end up doing a local
attach.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agotools/hotplug: Write physical-device-path in addition to physical-device
George Dunlap [Thu, 24 Mar 2016 17:18:33 +0000 (17:18 +0000)]
tools/hotplug: Write physical-device-path in addition to physical-device

Change block-common.sh on Linux to write physical-device-path with the
path of the device node, in addition to physical-device with its
major:minor numbers.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agolibxl: Remove redundant setting of phyical-device
George Dunlap [Thu, 24 Mar 2016 17:18:32 +0000 (17:18 +0000)]
libxl: Remove redundant setting of phyical-device

Regardless of whether we're running a custom hotplug script or using
normal phy: or file:, the "block" script will be run, which will set
all the necessary xenstore nodes.

In fact, writing this value here prevents the block script from
accomplishing its only purpose: to detect duplicate physical block
devices used in different virtual devices.  The first thing the block
script does is check to see if this node is written; and if it is, it
silently exits.

Remove this, and let the block script perform its duplicate checking
function.

NOTE: It's likely that the duplicate checking for physical devices has
never been run under libxl (at least since this bug was introduced);
this may shake out some issues.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agotools/hotplug: Add a "dummy" hotplug script for testing
George Dunlap [Thu, 24 Mar 2016 17:18:32 +0000 (17:18 +0000)]
tools/hotplug: Add a "dummy" hotplug script for testing

Testing the hotplug external script path at the moment involves
actually setting up one of the alternate datapaths (blktap, iscsi,
&c).  Simplify testing by making a script which does a simple loopback,
but still has a target that can't be used directly.

To use:

script=block-dummy,vdev=xvda,target=dummy:<file>

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
9 years agounmodified_drivers: enable use of register_oldmem_pfn_is_ram() API
Mike Meyer [Mon, 4 Apr 2016 13:02:59 +0000 (15:02 +0200)]
unmodified_drivers: enable use of register_oldmem_pfn_is_ram() API

During the investigation of very slow dump times of guest images in
Amazon EC2 instance, it was discovered that the
register_oldmem_pfn_is_ram() API implemented by the upstream kernel
commit 997c136f518c5debd63847e78e2a8694f56dcf90:

        fs/proc/vmcore.c: add hook to read_from_oldmem() to check
                           for non-ram pages

was not being called.  This was due to the PV driver with the call
to register_oldmem_pfn_is_ram() API was not including the
kernel header file that is used to communicate support of the API in the
kernel.  Fix the issue by including the required header file.

Signed-off-by: Mike Meyer <mike.meyer@teradata.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
9 years agoMAINTAINERS: add ioreq.c and ioreq.h to x86 I/O emulation file list
Paul Durrant [Mon, 4 Apr 2016 13:02:07 +0000 (15:02 +0200)]
MAINTAINERS: add ioreq.c and ioreq.h to x86 I/O emulation file list

Commit 108788e8 "x86/hvm: separate ioreq server code from generic hvm
code" split the code supporting ioreq servers from the rest of the HVM
code. Now that this is separate, the new files can be added to the x86
I/O emulation file list in MAINTAINERS.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agolibxl: ARM build: fix type of libxl__srm_callout_callback_restore_results
Ian Jackson [Mon, 4 Apr 2016 10:41:19 +0000 (11:41 +0100)]
libxl: ARM build: fix type of libxl__srm_callout_callback_restore_results

COLO introduced a few callbacks. The original implementation used
unsigned long for a type which in fact should be xen_pfn_t. That broke
libxl compilation on ARM, because xen_pfn_t is not a synonym for
unsigned long on ARM platform.

Fixing this requires modifying the perl script: specifically now we
need to include xenctrl.h before _libxl_save_msgs_*.h, rather than
afterwards, so that we can use xen_pfn_t there.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: remove second unistd.h inclusion
Wei Liu [Fri, 1 Apr 2016 16:53:56 +0000 (17:53 +0100)]
libxc: remove second unistd.h inclusion

There is already one a few lines above.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: xc_domain_resume_hvm is used by x86 only
Wei Liu [Fri, 1 Apr 2016 16:53:55 +0000 (17:53 +0100)]
libxc: xc_domain_resume_hvm is used by x86 only

The call site is enclosed by x86 define guards.

Without this patch:

[  334s] xc_resume.c:112:12: error: 'xc_domain_resume_hvm' defined but not used [-Werror=unused-function]
[  334s]  static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: use PRIx64 to print out pfn
Wei Liu [Fri, 1 Apr 2016 14:45:06 +0000 (15:45 +0100)]
libxc: use PRIx64 to print out pfn

Pfn is always 64 bit long. Use PRIx64 to avoid truncation.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: fix xl manpage compilation broken by COLO
Wei Liu [Fri, 1 Apr 2016 14:45:05 +0000 (15:45 +0100)]
docs: fix xl manpage compilation broken by COLO

Rearrange the section to conform with pod syntax. Fix some typos along
the way.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agodoc: document pvusb Xenstore paths
Juergen Gross [Thu, 31 Mar 2016 11:46:52 +0000 (13:46 +0200)]
doc: document pvusb Xenstore paths

The patches adding Xen tools support for paravirtualized USB devices
(pvUSB) omitted documenting the introduced Xenstore paths.

Add the paths to docs/misc/xenstore-paths.markdown

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agodoc: correct hyperlinks in docs/misc/xenstore-paths.markdown
Juergen Gross [Thu, 31 Mar 2016 11:46:51 +0000 (13:46 +0200)]
doc: correct hyperlinks in docs/misc/xenstore-paths.markdown

The hyperlinks for the different I/O protocols are wrong.

Correct them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agox86/hvm: separate ioreq server code from generic hvm code
Paul Durrant [Fri, 1 Apr 2016 14:32:13 +0000 (16:32 +0200)]
x86/hvm: separate ioreq server code from generic hvm code

The code in hvm/hvm.c related to handling I/O emulation using the ioreq
server framework is large and mostly self-contained.

This patch separates the ioreq server code into a new hvm/ioreq.c source
module and accompanying asm-x86/hvm/ioreq.h header file. There is no
intended functional change, only code movement.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoenable per-VCPU parameter for RTDS
Chong Li [Fri, 1 Apr 2016 14:30:50 +0000 (16:30 +0200)]
enable per-VCPU parameter for RTDS

Add XEN_DOMCTL_SCHEDOP_getvcpuinfo and _putvcpuinfo hypercalls
to independently get and set the scheduling parameters of each
vCPU of a domain.

Also fix a bug in XEN_DOMCTL_SCHEDOP_getinfo, where PERIOD and
BUDGET are not divided by MICROSECS(1) before being retruned
to the caller.

Signed-off-by: Chong Li <chong.li@wustl.edu>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Sisu Xi <xisisu@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agolibxl: allow 'phy' backend to use empty files
Roger Pau Monne [Fri, 19 Feb 2016 18:01:55 +0000 (19:01 +0100)]
libxl: allow 'phy' backend to use empty files

This was introduced by 97ee1f (~5 years ago), but was probably never
surfaced because most people used regular files as CDROM images, so the PHY
backend was actually never selected. A year ago this was changed, and now
regular RAW files are also handled by the PHY backend, which has made this
bug suface.

Fix it by allowing empty disks to use the PHY backend, skipping the stat
tests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Alex Braunegg <alex.braunegg@gmail.com>
9 years agolibelf: rewrite symtab/strtab loading
Roger Pau Monne [Thu, 31 Mar 2016 16:00:22 +0000 (18:00 +0200)]
libelf: rewrite symtab/strtab loading

Current implementation of elf_load_bsdsyms is broken when loading inside of
a HVM guest, because it assumes elf_memcpy_safe is able to write into guest
memory space, which it is not.

Take the oportunity to do some cleanup and properly document how
elf_{parse/load}_bsdsyms works. The new implementation uses elf_load_image
when dealing with data that needs to be copied to the guest memory space.
Also reduce the number of section headers copied to the minimum necessary.

This patch also removes the duplication of code found in the libxc ELF
loader, since the libelf symtab/strtab loading code will also handle this
case without having to duplicate it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN
Roger Pau Monne [Wed, 2 Mar 2016 15:46:43 +0000 (16:46 +0100)]
libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN

And use it as the default value for the VGA kind. This allows libxl to set
it to the default value later on when the domain type is known. For HVM
guests the default value is LIBXL_VGA_INTERFACE_TYPE_CIRRUS while for
HVMlite the default value is LIBXL_VGA_INTERFACE_TYPE_NONE.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxenalyze: handle RTDS scheduler events
Dario Faggioli [Sat, 6 Feb 2016 01:26:15 +0000 (02:26 +0100)]
xenalyze: handle RTDS scheduler events

so the trace will show properly decoded info,
rather than just a bunch of hex codes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agoxenalyze: handle DOM0 operations events
Dario Faggioli [Tue, 16 Feb 2016 12:00:51 +0000 (13:00 +0100)]
xenalyze: handle DOM0 operations events

(i.e., domain creation and destruction) so the
trace will show properly decoded info, rather
than just a bunch of hex codes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agotools/libxc: Fix build error when using xc_version_len
Olaf Hering [Thu, 31 Mar 2016 11:25:30 +0000 (13:25 +0200)]
tools/libxc: Fix build error when using xc_version_len

Tools fails to build with gcc 4.5, it does not provide ssize_t.

Fixes d275ec9 ("libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION
hypercall")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reported-by: Olaf Hering <olaf@aepfle.de>
Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxc: Document xc_domain_resume
Konrad Rzeszutek Wilk [Thu, 31 Mar 2016 15:59:28 +0000 (16:59 +0100)]
libxc: Document xc_domain_resume

Document the save and suspend mechanism.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: support to resume uncooperative HVM guests
Wen Congyang [Thu, 31 Mar 2016 15:59:27 +0000 (16:59 +0100)]
libxc: support to resume uncooperative HVM guests

Before this patch:
1. Suspend
  a. PVHVM and PV: we use the same way to suspend the guest (send the
     suspend request to the guest). If the guest doesn't support evtchn, the
     xenstore variant will be used, suspending the guest via XenBus control
     node.
  b. Pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
     the guest.

2. Resume:
  a. Fast path (fast=1)
     Do not change the guest state. We call libxl__domain_resume(.., 1) which
     calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
     PV:       Modify the return code to 1, and than call the domctl:
               XEN_DOMCTL_resumedomain
     PVHVM:    same with PV
     Pure HVM: Do nothing in modify_returncode, and than call the domctl:
               XEN_DOMCTL_resumedomain
  b. Slow path (fast=0)
     Used when the guest's state have been changed. Will call
     libxl__domain_resume(..., 0) to resume the guest.
     PV:       Update start info, and reset all secondary CPU states. Than call
               the domctl: XEN_DOMCTL_resumedomain
     PVHVM:    Can not be resumed. You will get the following error message:
               "Cannot resume uncooperative HVM guests"
     Pure HVM: Same with PVHVM

After this patch:
1. Suspend
  Unchanged.

2. Resume
  a. Fast path
     Unchanged.
  b. Slow path
     PV:       Unchanged.
     PVHVM:    Call XEN_DOMCTL_resumedomain to resume the guest. Because we
               don't modify the return code, PV drivers will disconnect
               and reconnect.
               The guest ends up doing the XENMAPSPACE_shared_info
               XENMEM_add_to_physmap hypercall and resetting all of its CPU
               states to point to the shared_info (except the ones past 32).
               That is the Linux kernel does that - regardless whether the
               SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
     Pure HVM: Call XEN_DOMCTL_resumedomain to resume the guest.

Under COLO, we will update the guest's state (modify memory, CPU registers,
device state etc). In this case, we cannot use the fast path to resume it.
Keep the return code 0, and use the slow path to resume the guest.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ wei: reformat commit message a bit ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
9 years agocmdline switches and config vars to control colo-proxy
Wen Congyang [Wed, 15 Jul 2015 09:18:56 +0000 (17:18 +0800)]
cmdline switches and config vars to control colo-proxy

Add cmdline switches to 'xl migrate-receive' command to specify
a domain-specific hotplug script to setup COLO proxy.

Add a new config var 'colo.default.agentscript' to xl.conf, that
allows the user to override the default global script used to
setup COLO proxy.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agosetup and control colo proxy on secondary side
Wen Congyang [Wed, 15 Jul 2015 09:18:55 +0000 (17:18 +0800)]
setup and control colo proxy on secondary side

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agosetup and control colo proxy on primary side
Wen Congyang [Wed, 15 Jul 2015 09:18:54 +0000 (17:18 +0800)]
setup and control colo proxy on primary side

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoCOLO nic: implement COLO nic subkind
Wen Congyang [Wed, 15 Jul 2015 09:18:53 +0000 (17:18 +0800)]
COLO nic: implement COLO nic subkind

implement COLO nic subkind.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoCOLO proxy: implement setup/teardown/preresume/postresume/checkpoint
Wen Congyang [Wed, 23 Dec 2015 01:12:26 +0000 (09:12 +0800)]
COLO proxy: implement setup/teardown/preresume/postresume/checkpoint

setup/teardown/preresume/postresume/checkpoint of COLO proxy module.
we use netlink to communicate with proxy module.
About colo-proxy module:
http://www.spinics.net/lists/netdev/msg333520.html
https://github.com/wencongyang/colo-proxy
How to use:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoCOLO: use qemu block replication
Wen Congyang [Wed, 17 Feb 2016 07:10:27 +0000 (15:10 +0800)]
COLO: use qemu block replication

Use qemu block replication as our block replication solution.
Note that guest must be paused before starting COLO, otherwise,
the disk won't be consistent between primary and secondary.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoSupport colo mode for qemu disk
Wen Congyang [Mon, 21 Mar 2016 07:38:30 +0000 (15:38 +0800)]
Support colo mode for qemu disk

Usage: disk = ['...,colo,colo-host=xxx,colo-port=xxx,colo-export=xxx,active-disk=xxx,hidden-disk=xxx...']
For QEMU block replication details:
http://wiki.qemu.org/Features/BlockReplication

Note: we just introduce COLO framework, but don't implement COLO
operations in this patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoIntroduce COLO mode and refactor relevant function
Changlong Xie [Fri, 18 Mar 2016 07:44:27 +0000 (15:44 +0800)]
Introduce COLO mode and refactor relevant function

No functional changes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoCOLO: introduce new API to prepare/start/do/get_error/stop replication
Wen Congyang [Wed, 17 Feb 2016 06:28:32 +0000 (14:28 +0800)]
COLO: introduce new API to prepare/start/do/get_error/stop replication

We will use qemu block replication, and qemu provides some qmp commands
to prepare replication, start replication, get replication error, and
stop replication. Introduce new API to execute these qmp commands.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoimplement the cmdline for COLO
Wen Congyang [Wed, 15 Jul 2015 09:18:48 +0000 (17:18 +0800)]
implement the cmdline for COLO

Add a new option -c to the command 'xl remus'. If you want
to use COLO HA instead of Remus HA, please use -c option.

Update man pages to reflect the addition of a new option to
'xl remus' command.

Also add a new option --colo to the internal command 'xl migrate-receive'.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxc/save: support COLO save
Wen Congyang [Wed, 15 Jul 2015 09:18:47 +0000 (17:18 +0800)]
libxc/save: support COLO save

After suspend primary vm, get dirty bitmap on secondary vm,
and send pages both dirty on primary/secondary to secondary.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxc/restore: support COLO restore
Wen Congyang [Wed, 15 Jul 2015 09:18:44 +0000 (17:18 +0800)]
libxc/restore: support COLO restore

a. call callbacks resume/checkpoint/suspend while secondary vm
   status is consistent with primary
b. send dirty pfn list to primary when checkpoint under colo
c. send store gfn and console gfn to xl before resuming secondary vm

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoprimary vm suspend/resume/checkpoint code
Wen Congyang [Tue, 15 Dec 2015 08:05:41 +0000 (16:05 +0800)]
primary vm suspend/resume/checkpoint code

We will do the following things again and again:
1. Suspend primary vm
   a. Suspend primary vm
   b. do postsuspend
   c. Read CHECKPOINT_SVM_SUSPENDED sent by secondary
2. Checkpoint
   a. Write emulator xenstore data and emulator context
   b. Write checkpoint end record
3. Resume primary vm
   a. Read CHECKPOINT_SVM_READY from slave
   b. Do presume
   c. Resume primary vm
   d. Read CHECKPOINT_SVM_RESUMED from slave
4. Wait a new checkpoint
   a. Wait a new checkpoint(not implemented)
   b. Send CHECKPOINT_NEW to slave

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl_internal: move stream read manipulations to right place
Changlong Xie [Fri, 18 Mar 2016 02:40:37 +0000 (10:40 +0800)]
libxl_internal: move stream read manipulations to right place

No functional changes and this cleanup will make the later
patch called "primary vm suspend/resume/checkpoint code" not
too complicated.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agosecondary vm suspend/resume/checkpoint code
Wen Congyang [Tue, 15 Dec 2015 08:45:17 +0000 (16:45 +0800)]
secondary vm suspend/resume/checkpoint code

Secondary vm is running in colo mode. So we will do
the following things again and again:
1. Resume secondary vm
   a. Send CHECKPOINT_SVM_READY to master.
   b. If it is not the first resume, call libxl__checkpoint_devices_preresume().
   c. If it is the first resume(resume right after live migration),
      - call libxl__xc_domain_restore_done() to build the secondary vm.
      - enable secondary vm's logdirty.
      - call libxl__domain_resume() to resume secondary vm.
      - call libxl__checkpoint_devices_setup() to setup checkpoint devices.
   d. Send CHECKPOINT_SVM_RESUMED to master.
2. Wait a new checkpoint
   a. Call libxl__checkpoint_devices_commit().
   b. Read CHECKPOINT_NEW from master.
3. Suspend secondary vm
   a. Suspend secondary vm.
   b. Call libxl__checkpoint_devices_postsuspend().
   c. Send CHECKPOINT_SVM_SUSPENDED to master.
4. Checkpoint
   a. Read emulator xenstore data and emulator context
   b. REC_TYPE_CHECKPOINT_END

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libxl: add back channel support to read stream
Wen Congyang [Wed, 15 Jul 2015 09:18:38 +0000 (17:18 +0800)]
tools/libxl: add back channel support to read stream

This is used by primay to read records sent by secondary.

Note: The function libxl__stream_read_checkpoint_state() will be used
in later patches called "secondary vm suspend/resume/checkpoint code" and
"primary vm suspend/resume/checkpoint code".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libxl: add back channel support to write stream
Wen Congyang [Wed, 15 Jul 2015 09:18:36 +0000 (17:18 +0800)]
tools/libxl: add back channel support to write stream

Add back channel support to write stream. If the write stream is
a back channel stream, this means the write stream is used by
Secondary to send some records back.

Note: The function libxl__stream_write_checkpoint_state() will be used
in later patches called "secondary vm suspend/resume/checkpoint code" and
"primary vm suspend/resume/checkpoint code".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxc/migration: export read_record for common use
Wen Congyang [Wed, 15 Jul 2015 09:18:35 +0000 (17:18 +0800)]
libxc/migration: export read_record for common use

read_record() could be used by primary to read dirty bitmap
record sent by secondary under COLO.
When used by xc save side, we need to pass the backchannel fd
instead of ctx->fd to read_record(), so we added a fd param to
it.
No functional changes.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxc/migration: Specification update for DIRTY_PFN_LIST records
Wen Congyang [Wed, 15 Jul 2015 09:18:34 +0000 (17:18 +0800)]
libxc/migration: Specification update for DIRTY_PFN_LIST records

Used by secondary to send it's dirty bitmap to primary under COLO.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agodocs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams
Wen Congyang [Wed, 15 Jul 2015 09:18:33 +0000 (17:18 +0800)]
docs/libxl: Introduce CHECKPOINT_CONTEXT to support migration v2 colo streams

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agodocs: add colo readme
Wen Congyang [Wed, 15 Jul 2015 09:18:32 +0000 (17:18 +0800)]
docs: add colo readme

add colo readme, refer to
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libx{l,c}: add back channel to libxc
Wen Congyang [Mon, 14 Dec 2015 07:24:17 +0000 (15:24 +0800)]
tools/libx{l,c}: add back channel to libxc

In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of
sync.

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent from the
primary to the secondary.

However, the set difference B - A (the one in B but not in A, lets
call this C) is out-of-date on the secondary (with respect to the
primary) and will not be sent by the primary (to secondary), as it
was not memory dirtied by the primary. The secondary needs C page data
to reconstruct an exact copy of the primary at the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

We transfer the dirty bitmap on libxc side, so we need to introduce back
channel to libxc.

Note: it is different from the paper. We change the original design to
the current one, according to our following concerns:
1. The original design needs extra memory on Secondary host. When there's
   multiple backups on one host, the memory cost is high.
2. The memory cache code will be another 1k+, it will make the review
   more time consuming.

Note: this patch merely adds new parameters to various prototypes and
functions. The new parameters are used in later patch called
"libxc/restore: send dirty pfn list to primary when checkpoint under
COLO".

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libxl: Introduce new helper function dup_fd_helper()
Wen Congyang [Tue, 16 Feb 2016 08:06:31 +0000 (16:06 +0800)]
tools/libxl: Introduce new helper function dup_fd_helper()

It is pure refactoring and no functional changes.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxl: Add back channel to allow migration target send data back
Wen Congyang [Wed, 15 Jul 2015 07:45:45 +0000 (15:45 +0800)]
tools/libxl: Add back channel to allow migration target send data back

In COLO mode, secondary needs to send the following data to primary:
1. In libxl
   Secondary sends the following CHECKPOINT_CONTEXT to primary:
   CHECKPOINT_SVM_SUSPENDED, CHECKPOINT_SVM_READY and CHECKPOINT_SVM_RESUMED
2. In libxc
   Secondary sends the dirty pfn list to primary

But the io_fd only can be written in primary, and only can be read in
secondary. Save recv_fd in domain_suspend_state, and send_fd in
domain_create_state. Extend libxl_domain_create_restore API, add a
send_fd param to it. Add LIBXL_HAVE_CREATE_RESTORE_SEND_FD to indicate
the API change.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()
Wen Congyang [Wed, 15 Jul 2015 07:45:43 +0000 (15:45 +0800)]
tools/libxl: introduce libxl__domain_common_switch_qemu_logdirty()

Secondary vm is running in COLO mode, we need to send secondary
vm's dirty page information to primary host at checkpoint, so we
have to enable qemu logdirty on secondary.

libxl__domain_suspend_common_switch_qemu_logdirty() is to enable
qemu logdirty. But it uses libxl__domain_save_state, and calls
libxl__xc_domain_saverestore_async_callback_done() before exits.
This can not be used for secondary vm.

Update libxl__domain_suspend_common_switch_qemu_logdirty() to
introduce a new API libxl__domain_common_switch_qemu_logdirty().
This API only uses libxl__logdirty_switch, and calls
lds->callback before exits. This new API will be used by the patch:
  secondary vm suspend/resume/checkpoint codes

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxl: introduction of libxl__qmp_restore to load qemu state
Wen Congyang [Mon, 14 Dec 2015 07:08:13 +0000 (15:08 +0800)]
tools/libxl: introduction of libxl__qmp_restore to load qemu state

In normal migration, the qemu state is passed to qemu as a parameter.
With COLO, secondary vm is running. So we will do the following steps
at every checkpoint:
1. suspend both primary vm and secondary vm
2. sync the state
3. resume both primary vm and secondary vm
Primary will send qemu's state in step2, and secondary's qemu should
read it and restore the state before it is resumed. We can not pass the
state to qemu as a parameter because secondary QEMU is already started
at this point, so we introduce libxl__qmp_restore() to do it.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Cc: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools: add missing header for xenctrl.h
Changlong Xie [Fri, 1 Apr 2016 01:35:52 +0000 (09:35 +0800)]
tools: add missing header for xenctrl.h

Commit d275ec9c introduce ssize_t but not include relevant header,
it will cause compile errors just like below:

./include/xenctrl.h:1485: error: expected '=', ',', ';', 'asm' or
'__attribute__' before 'xc_version_len'

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
9 years agox86/HVM: fix forwarding of internally cached requests
Jan Beulich [Thu, 31 Mar 2016 12:52:04 +0000 (14:52 +0200)]
x86/HVM: fix forwarding of internally cached requests

Forwarding entire batches to the device model when an individual
iteration of them got rejected by internal device emulation handlers
with X86EMUL_UNHANDLEABLE is wrong: The device model would then handle
all iterations, without the internal handler getting to see any past
the one it returned failure for. This causes misbehavior in at least
the MSI-X and VGA code, which want to see all such requests for
internal tracking/caching purposes. But note that this does not apply
to buffered I/O requests.

This in turn means that the condition in hvm_process_io_intercept() of
when to crash the domain was wrong: Since X86EMUL_UNHANDLEABLE can
validly be returned by the individual device handlers, we mustn't
blindly crash the domain if such occurs on other than the initial
iteration. Instead we need to distinguish hvm_copy_*_guest_phys()
failures from device specific ones, and then the former need to always
be fatal to the domain (i.e. also on the first iteration), since
otherwise we again would end up forwarding a request to qemu which the
internal handler didn't get to see.

The adjustment should be okay even for stdvga's MMIO handling:
- if it is not caching then the accept function would have failed so we
  won't get into hvm_process_io_intercept(),
- if it issued the buffered ioreq then we only get to the p->count
  reduction if hvm_send_ioreq() actually encountered an error (in which
  we don't care about the request getting split up).

Also commit 4faffc41d ("x86/hvm: limit reps to avoid the need to handle
retry") went too far in removing code from hvm_process_io_intercept():
When there were successfully handled iterations, the function should
continue to return success with a clipped repeat count.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
9 years agoMAINTAINERS: add myself as x86 I/O emulation and viridian maintainer
Paul Durrant [Thu, 31 Mar 2016 12:51:05 +0000 (14:51 +0200)]
MAINTAINERS: add myself as x86 I/O emulation and viridian maintainer

I have made many modifications to this code over the past few years
so I'm probably the one most familiar with it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/hvm/viridian: zero and check vcpu context _pad field
Paul Durrant [Thu, 31 Mar 2016 12:49:50 +0000 (14:49 +0200)]
x86/hvm/viridian: zero and check vcpu context _pad field

Commit 57844631 "save APIC assist vector" added an extra field to the
viridian vcpu context save record. This field was only a uint8_t and
so an extra _pad field was also added to pad up to the next 64-bit
boundary.

This patch makes sure that _pad field is zeroed on save and checked
for zero on restore. This prevents a potential leak of information
from the stack and a compatibility check against future use of the
space occupied by the _pad field.

The _pad field is zeroed as a side effect of making use of a C99 struct
initializer for the other fields. This patch also modifies the domain
context save code to use the same mechanism.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agolibxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall
Konrad Rzeszutek Wilk [Thu, 10 Mar 2016 21:11:59 +0000 (16:11 -0500)]
libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall

We change the xen_version libxc code to use the new hypercall.
Which of course means every user in the code base has to
be changed over.

It is important to note that the xc_version_op has a different
return semantic than the previous one. It returns negative
values on error (like the old one), but it also returns
an positive value on success (unlike the old one). The positive
value is the number of bytes copied in.

Note that both Ocaml and xenstat use tabs instead of four
spaces so they look quite odd.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> [for the Ocaml stubs]
Acked-by: George Dunlap <george.dunlap@eu.citrix.com> [xenctx bits]
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoHYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
Konrad Rzeszutek Wilk [Tue, 22 Mar 2016 20:53:19 +0000 (16:53 -0400)]
HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
 - No compat layer. The data structures are the same size on 32
   as on 64-bit.
 - The hypercall accepts three arguments - the command, pointer to
   an buffer, and the length of the buffer.
 - Each sub-ops can be "probed" for size by returning the size of
   buffer that will be needed - if the buffer is NULL.
 - Subops can complete even if the buffer is too small - truncated
   data will be filled and hypercall will return -ENOBUFS.
 - VERSION_commandline, VERSION_changeset are privileged.
 - There is no XENVER_compile_info equivalent.
 - The hypercall can return -EPERM and toolstack/OSes are expected
   to deal with. However there are three subops: XEN_VERSION_version,
   XEN_VERSION_platform_parameters and XEN_VERSION_get_features
   that will always return an value as guests cannot survive without them.

While we combine some of the common code between XENVER_ and VERSION_
take the liberty of moving pae_extended_cr3 in x86 area.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> [XSM bits]
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agooxenstored: allow compilation prior to OCaml 3.12.0
Jonathan Davies [Wed, 30 Mar 2016 16:06:39 +0000 (16:06 +0000)]
oxenstored: allow compilation prior to OCaml 3.12.0

Commit 363ae55c8 used an OCaml feature called record field punning. This broke
the build on compilers prior to OCaml 3.12.0.

This patch makes no semantic change but now uses backwards-compatible syntax.

Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
9 years agoarm64: Add ACPI support
Shannon Zhao [Wed, 30 Mar 2016 15:06:49 +0000 (17:06 +0200)]
arm64: Add ACPI support

Add ACPI support on arm64 xen hypervisor. Enable EFI support on ARM.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm: Add a hypercall for device mmio mapping
Shannon Zhao [Wed, 30 Mar 2016 10:09:00 +0000 (12:09 +0200)]
arm: Add a hypercall for device mmio mapping

It needs to map platform or amba device mmio to Dom0 on ARM. But when
booting with ACPI, it can't get the mmio region in Xen due to lack of
AML interpreter to parse DSDT table. Therefore, let Dom0 call a
hypercall to map mmio region when it adds the devices.

Here we add a new map space like the XEN_DOMCTL_memory_mapping to map
mmio region for Dom0. Also add a helper to combine the
xsm_add_to_physmap and XENMAPSPACE_dev_mmio space check together.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Fix event-channel interrupt when booting with ACPI
Shannon Zhao [Wed, 30 Mar 2016 10:14:00 +0000 (12:14 +0200)]
arm/acpi: Fix event-channel interrupt when booting with ACPI

Store the event-channel interrupt number and flag in HVM parameter
HVM_PARAM_CALLBACK_IRQ. Then Dom0 could get it through hypercall
HVMOP_get_param.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Permit MMIO access of Xen unused devices for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:10:00 +0000 (12:10 +0200)]
arm/acpi: Permit MMIO access of Xen unused devices for Dom0

Firstly it permits full MMIO capabilities for Dom0. Then deny MMIO
access of Xen used devices, such as UART, GIC, SMMU. Currently, it only
denies the MMIO access of UART and GIC regions. For other Xen used
devices it could be added later when they are supported.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/gic: Add a new callback to deny Dom0 access to GIC regions
Shannon Zhao [Wed, 30 Mar 2016 10:10:00 +0000 (12:10 +0200)]
arm/gic: Add a new callback to deny Dom0 access to GIC regions

Add a new member in gic_hw_operations which is used to deny Dom0 access
to GIC regions.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Configure SPI interrupt type and route to Dom0 dynamically
Shannon Zhao [Wed, 30 Mar 2016 10:14:00 +0000 (12:14 +0200)]
arm/acpi: Configure SPI interrupt type and route to Dom0 dynamically

Interrupt information is described in DSDT and is not available at the
time of booting. Check if the interrupt is permitted to access and set
the interrupt type, route it to guest dynamically only for SPI
and Dom0.

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Permit access all Xen unused SPIs for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:10:00 +0000 (12:10 +0200)]
arm/acpi: Permit access all Xen unused SPIs for Dom0

Allow DOM0 to use all SPIs but the ones used by Xen. Then when Dom0
configures the interrupt, it could set the interrupt type and route it
to Dom0.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Create min DT stub for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:12:00 +0000 (12:12 +0200)]
arm/acpi: Create min DT stub for Dom0

Create a DT for Dom0 for ACPI-case only. DT contains minimal required
information such as Dom0 bootargs, initrd, efi description table and
address of uefi memory table.

Also document this device tree bindings of "hypervisor" and
"hypervisor/uefi" node.

Signed-off-by: Naresh Bhat <naresh.bhat@linaro.org>
Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Map the new created EFI and ACPI tables to Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:12:00 +0000 (12:12 +0200)]
arm/acpi: Map the new created EFI and ACPI tables to Dom0

Map the UEFI and ACPI tables which we created to non-RAM space in Dom0.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Prepare EFI memory descriptor for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:10:00 +0000 (12:10 +0200)]
arm/acpi: Prepare EFI memory descriptor for Dom0

Create EFI memory descriptors to tell Dom0 the RAM region information,
ACPI table regions and EFI tables reserved regions.

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Prepare EFI system table for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:11:00 +0000 (12:11 +0200)]
arm/acpi: Prepare EFI system table for Dom0

Prepare EFI system table for Dom0 to describe the information of UEFI.

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Map all other tables for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:10:00 +0000 (12:10 +0200)]
arm/acpi: Map all other tables for Dom0

Map all other ACPI tables into Dom0 using 1:1 mappings.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/p2m: Add helper functions to map memory regions
Parth Dixit [Wed, 30 Mar 2016 10:15:00 +0000 (12:15 +0200)]
arm/p2m: Add helper functions to map memory regions

Create a helper function for mapping with cached attributes and
read-write range.

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Prepare RSDP table for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:10:00 +0000 (12:10 +0200)]
arm/acpi: Prepare RSDP table for Dom0

Copy RSDP table and replace rsdp->xsdt_physical_address with the address
of XSDT table, so it can point to the right XSDT table.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Prepare XSDT table for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:11:00 +0000 (12:11 +0200)]
arm/acpi: Prepare XSDT table for Dom0

Copy and modify XSDT table before passing it to Dom0. Replace the entry
value of the copied table. Add a new entry for STAO table as well. And
keep entry value of other reused tables unchanged.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Prepare STAO table for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:15:00 +0000 (12:15 +0200)]
arm/acpi: Prepare STAO table for Dom0

Create STAO table for Dom0. This table is used to tell Dom0 whether it
should ignore UART defined in SPCR table or the ACPI namespace names.

Look at below url for details:
http://wiki.xenproject.org/mediawiki/images/0/02/Status-override-table.pdf

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Prepare MADT table for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:11:00 +0000 (12:11 +0200)]
arm/acpi: Prepare MADT table for Dom0

Copy main MADT table contents and distributor subtable from physical
ACPI MADT table. Make other subtables through the callback of
gic_hw_ops.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/gic: Add a new callback for creating MADT table for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:14:00 +0000 (12:14 +0200)]
arm/gic: Add a new callback for creating MADT table for Dom0

Add a new member in gic_hw_operations which is used to create MADT table
for Dom0.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Prepare FADT table for Dom0
Shannon Zhao [Wed, 30 Mar 2016 10:11:00 +0000 (12:11 +0200)]
arm/acpi: Prepare FADT table for Dom0

Copy and modify FADT table before passing it to Dom0. Set PSCI_COMPLIANT
and PSCI_USE_HVC.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Add a helper function to get the acpi table offset
Shannon Zhao [Wed, 30 Mar 2016 10:16:00 +0000 (12:16 +0200)]
arm/acpi: Add a helper function to get the acpi table offset

These tables are aligned with 64bit.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoarm/acpi: Estimate memory required for acpi/efi tables
Shannon Zhao [Wed, 30 Mar 2016 10:10:00 +0000 (12:10 +0200)]
arm/acpi: Estimate memory required for acpi/efi tables

Estimate the memory required for loading acpi/efi tables in Dom0. Make
the length of each table aligned with 64bit. Alloc the pages to store
the new created EFI and ACPI tables and free these pages when
destroying domain.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agoupdate MAINTAINERS info for Stefano
Stefano Stabellini [Wed, 30 Mar 2016 14:56:15 +0000 (16:56 +0200)]
update MAINTAINERS info for Stefano

Update my email address.
Remove myself from STUB DOMAINS, MINI-OS and TOOLSTACK, where I haven't
been active recently.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoMerge branch 'pin' of https://github.com/jgross1/xen into staging
Konrad Rzeszutek Wilk [Wed, 30 Mar 2016 14:20:15 +0000 (10:20 -0400)]
Merge branch 'pin' of https://github.com/jgross1/xen into staging

* 'pin' of https://github.com/jgross1/xen:
  libxl: add force option for xl vcpu-pin
  libxl: print message how to recover from xl cpupool-cpu-remove errors
  libxc: do some retries in xc_cpupool_removecpu() for EBUSY case

All patches have Acked-by and Reviewed-by tags.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agotools/misc/xen-hvmctx: fix the build
Paul Durrant [Tue, 29 Mar 2016 15:55:23 +0000 (16:55 +0100)]
tools/misc/xen-hvmctx: fix the build

Commit 78c5f59e "x86/hvm/viridian: save APIC assist vector" changed
the name of a field in the viridian vcpu save record. Unfortunately this
record has a decode function in xen-hvmctx and so it no longer builds.

This patch fixes the field name in xen-hvmctx and also adds a decode of
the additional field that was added to the save record.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/python/xc: fix tmem_control parameter parsing
Zhigang Wang [Wed, 23 Mar 2016 17:45:37 +0000 (13:45 -0400)]
tools/python/xc: fix tmem_control parameter parsing

There should be 6 instead of 7 arguments now for tmem_control()
. which was done in commit 54a51b1766fd433b95e63834eb15d4b1f70271de
"tmem: Remove xc_tmem_control mystical arg3" which missed
this change.

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agopublic: drop assembly equates from errno.h again
Jan Beulich [Tue, 29 Mar 2016 15:17:10 +0000 (17:17 +0200)]
public: drop assembly equates from errno.h again

This wasn't a good idea after all - make them unavailable except for
legacy code using an older interface version.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agobuild: avoid putting local absolute symbols in symbol tables
Jan Beulich [Tue, 29 Mar 2016 15:16:47 +0000 (17:16 +0200)]
build: avoid putting local absolute symbols in symbol tables

They're not really useful past the building stage and only needlessly
increase binary file sizes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agospinlock: improve spin_is_locked() for recursive locks
Jan Beulich [Tue, 29 Mar 2016 15:16:23 +0000 (17:16 +0200)]
spinlock: improve spin_is_locked() for recursive locks

Recursive locks know their current owner, and since we use the function
solely to determine whether a particular lock is being held by the
current CPU (which so far has been an imprecise check), make actually
check the owner for recusrively acquired locks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Quan Xu <quan.xu@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agox86/xsaves: calculate comp_offsets[] based on xcomp_bv
Shuai Ruan [Tue, 29 Mar 2016 15:15:57 +0000 (17:15 +0200)]
x86/xsaves: calculate comp_offsets[] based on xcomp_bv

Previous patch using all available features calculate comp_offsets.
This is wrong.This patch fix this bug by calculating the comp_offset
based on xcomp_bv of current guest.
Also, the comp_offset should take alignment into consideration.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agons16550: enable Pericom controller support
Jan Beulich [Tue, 29 Mar 2016 15:15:15 +0000 (17:15 +0200)]
ns16550: enable Pericom controller support

Other than the controllers supported so far, multiple port Pericom
boards map all of their ports via BAR0, which requires a number of
adjustments: Instead of tracking "max_bars" we now flag whether all
ports use BAR0, and whether to expect a port-I/O or MMIO resource. As
a result pci_uart_config() now gets handed a port index, which it then
maps into a BAR index or an offset into BAR0 depending on the bar0
flag.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agons16550: store pointer to config parameters for PCI
Jan Beulich [Tue, 29 Mar 2016 15:14:43 +0000 (17:14 +0200)]
ns16550: store pointer to config parameters for PCI

Subsequent changes will want to use this pointer.

This makes the enable_ro structure member redundant, so it gets dropped
at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agohvm/params: add a new delivery type for event-channel in HVM_PARAM_CALLBACK_IRQ
Shannon Zhao [Tue, 29 Mar 2016 12:26:57 +0000 (14:26 +0200)]
hvm/params: add a new delivery type for event-channel in HVM_PARAM_CALLBACK_IRQ

This new delivery type which is for ARM shares the same value with
HVM_PARAM_CALLBACK_TYPE_VECTOR which is for x86.

val[15:8] is flag: val[7:0] is a PPI.
To the flag, bit 8 stands the interrupt mode is edge(1) or level(0) and
bit 9 stands the interrupt polarity is active low(1) or high(0).

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/hvm/viridian: fix APIC assist page leak
Paul Durrant [Tue, 29 Mar 2016 12:26:33 +0000 (14:26 +0200)]
x86/hvm/viridian: fix APIC assist page leak

Commit a6f2cdb6 "keep APIC assist page mapped..." introduced a page
leak because it relied on viridian_vcpu_deinit() always being called
to release the page mapping. This does not happen in the case a normal
domain shutdown.

This patch fixes the problem by introducing a new function,
viridian_domain_deinit(), which will iterate through the vCPUs and
release any page mappings still present.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/hvm/viridian: save APIC assist vector
Paul Durrant [Tue, 29 Mar 2016 12:26:03 +0000 (14:26 +0200)]
x86/hvm/viridian: save APIC assist vector

If any vcpu has a pending APIC assist when the domain is suspended
then the vector needs to be saved. If this is not done then it's
possible for the vector to remain pending in the vlapic ISR
indefinitely after resume.

This patch adds code to save the APIC assist vector value in the
viridian vcpu save record. This means that the record is now zero-
extended on load and, because this implies a loaded value of
zero means nothing is pending (for backwards compatibility with
hosts not implementing APIC assist), the rest of the viridian APIC
assist code is adjusted to treat a zero value in this way. A
check has therefore been added to viridian_start_apic_assist() to
prevent the enlightenment being used for vectors < 0x10 (which
are illegal for an APIC).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoAnthony Perard to co-maintain qemu
Stefano Stabellini [Tue, 29 Mar 2016 12:25:43 +0000 (14:25 +0200)]
Anthony Perard to co-maintain qemu

I nominate Anthony Perard as qemu-xen co-maintainer. He has been doing a
lot of QEMU work over the years and in fact he is the original author of
the Xen enablement code in upstream QEMU.

As qemu-xen co-maintainer, he could help me manage the qemu-xen trees
and promptly backport all the relevant commits from upstream QEMU.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
9 years agox86: fix information leak on AMD CPUs
Jan Beulich [Tue, 29 Mar 2016 12:24:26 +0000 (14:24 +0200)]
x86: fix information leak on AMD CPUs

The fix for XSA-52 was wrong, and so was the change synchronizing that
new behavior to the FXRSTOR logic: AMD's manuals explictly state that
writes to the ES bit are ignored, and it instead gets calculated from
the exception and mask bits (it gets set whenever there is an unmasked
exception, and cleared otherwise). Hence we need to follow that model
in our workaround.

This is CVE-2016-3158 / CVE-2016-3159 / XSA-172.
[xen/arch/x86/xstate.c:xrstor: CVE-2016-3158]
[xen/arch/x86/i387.c:fpu_fxrstor: CVE-2016-3159]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoxl: Return an error on failed cd-insert
George Dunlap [Thu, 24 Mar 2016 17:17:24 +0000 (17:17 +0000)]
xl: Return an error on failed cd-insert

This makes xl more useful in scripts.

The strange thing about this is that the internal cd_insert function
*already* returned something appropriate, and cd-eject was using it,
but cd-insert wasn't.

Also:

* Rework cd_insert to return EXIT_FAILURE and EXIT_SUCCESS rather than
magic constants

* Use 'r' for non-libxl return code, as specified in CODING_STYLE

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoxl: Make set_memory_target return an error code on failure
George Dunlap [Thu, 24 Mar 2016 17:17:23 +0000 (17:17 +0000)]
xl: Make set_memory_target return an error code on failure

Also move the rc -> shell code translation into set_memory_max() to
make the two functions consistent with each other, and with other
similar examples in xl_cmdimpl.c

Change a 'long long' to "int64_t" while we're at it.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>