]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agomini-os: whitespace
Thomas Leonard [Thu, 26 Jun 2014 11:28:24 +0000 (12:28 +0100)]
mini-os: whitespace

Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: switched initial C entry point to arch_init
Karim Raslan [Thu, 26 Jun 2014 11:28:23 +0000 (12:28 +0100)]
mini-os: switched initial C entry point to arch_init

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
[talex5@gmail.com: separated from big ARM commit]
[talex5@gmail.com: restored comment, moved prototypes to headers]
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
[talex5@gmail.com: restored stack address printk on x86]
[talex5@gmail.com: moved first printk's after start_info setup on x86]
Signed-off-by: Thomas Leonard <talex5@gmail.com>
10 years agomini-os: made off_t type signed
Thomas Leonard [Thu, 26 Jun 2014 11:28:22 +0000 (12:28 +0100)]
mini-os: made off_t type signed

POSIX requires this.

Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: use unbind_evtchn in unbind_all_ports
Thomas Leonard [Thu, 26 Jun 2014 11:28:21 +0000 (12:28 +0100)]
mini-os: use unbind_evtchn in unbind_all_ports

This marks the channel as closed, in case someone tries to use it again.

Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: fixed format string error in unbind_evtchn
Thomas Leonard [Thu, 26 Jun 2014 11:28:20 +0000 (12:28 +0100)]
mini-os: fixed format string error in unbind_evtchn

Would crash if HYPERVISOR_event_channel_op returned an error code.
The other changes in this commit are just fixing indentation.

Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Ian Campbell <ian.cammpbell@citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: fixed shutdown thread
Thomas Leonard [Thu, 26 Jun 2014 11:28:19 +0000 (12:28 +0100)]
mini-os: fixed shutdown thread

Before, it read "" and started a shutdown immediately. Now, it waits for
a non-empty value and then actually shuts down.

Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
[talex5@gmail.com: avoid declaration-after-statement in kernel.c]
Signed-off-by: Thomas Leonard <talex5@gmail.com>
10 years agomini-os: build fixes
Thomas Leonard [Thu, 26 Jun 2014 11:28:18 +0000 (12:28 +0100)]
mini-os: build fixes

Make .o rules depend on the includes. Before, only the final link step
depended on setting up the includes directory, making parallel builds
unreliable.

Make symlinks use explicit make rules instead of using a phony target.
Avoids unnecessary rebuilds.

[talex5@gmail.com: bring back "make links", for stubdom]
Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agolibxl/xl: push VCPU affinity pinning down to libxl
Wei Liu [Fri, 20 Jun 2014 16:19:37 +0000 (18:19 +0200)]
libxl/xl: push VCPU affinity pinning down to libxl

This patch introduces an array of libxl_bitmap called "vcpu_hard_affinity"
in libxl IDL to preserve VCPU to PCPU mapping. This is necessary for libxl
to preserve all information to construct a domain.

The array accommodates at most max_vcpus elements, each containing the
affinity of the respective VCPU. If less than max_vcpus bitmaps are
present, the VCPUs associated to the missing elements will just stay with
their default affinity (they'll be free to execute on every PCPU).

In case both this new field, and the already existing cpumap field are
used, the content of the array will override what's set in cpumap. (In
xl, we make sure that this never happens in xl, by using only one of the
two at any given time.)

The proper macro to mark the API change (called
LIBXL_HAVE_BUILDINFO_VCPU_AFFINITY_ARRAYS) is added but it is commented.
It will be uncommented by the patch in the series that completes the
process, by adding the "vcpu_soft_affinity" array. This is because, after
all, these two fields are being added sort-of together, and are very
very similar, in both meaning and usage, so it makes sense for them to
share the same marker.

This patch was originally part of Wei's series about pushing as much
information as possible on domain configuration in libxl, rather than
xl. See here, for more details:
  http://lists.xen.org/archives/html/xen-devel/2014-06/msg01026.html
  http://lists.xen.org/archives/html/xen-devel/2014-06/msg01031.html

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Change default for b_info->{cpu, node}map to "not allocated"
Dario Faggioli [Fri, 20 Jun 2014 16:19:29 +0000 (18:19 +0200)]
libxl: Change default for b_info->{cpu, node}map to "not allocated"

by avoiding allocating them in libxl__domain_build_info_setdefault.
In fact, back in 7e449837 ("libxl: provide _init and _setdefault for
libxl_domain_build_info") and a5d30c23 ("libxl: allow for explicitly
specifying node-affinity"), it was decided that the default for these
fields was for them to be allocated and filled.

That is now causing problem, whenever we have to figure out whether
the caller is using or not one of those fields. In fact, when we see
a full bitmap, is it just the default value, or is the user that
wants it that way?

Since that kind of knowledge has become important, change the default
to be "bitmap not allocated". It then becomes easy to know whether a
libxl caller is using one of the fields, just by checking whether the
bitmap is actually there with a non-zero size.

This is very important for the following patches introducing new ways
of specifying hard and soft affinity. It also allows us to improve
the checks around NUMA automatic placement, during domain creation
(and that bit is done in this very patch).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: get and set soft affinity
Dario Faggioli [Fri, 20 Jun 2014 16:19:12 +0000 (18:19 +0200)]
libxl: get and set soft affinity

Make space a new cpumap in vcpu_info, called cpumap_soft,
for retrieving soft affinity, and amend the relevant API
accordingly.

libxl_set_vcpuaffinity() now takes two cpumaps, one for hard
and one for soft affinity (LIBXL_API_VERSION is exploited to
retain source level backword compatibility). Either of the
two cpumap can be NULL, in which case, only the affinity
corresponding to the non-NULL cpumap will be affected.

Getting soft affinity happens indirectly (see, e.g.,
`xl vcpu-list'), as it is already for hard affinity).

This commit also introduces some logic to check whether the
affinity which will be used by Xen to schedule the vCPU(s)
does actually match with the cpumaps provided. In fact, we
want to allow every possible combination of hard and soft
affinity to be set, but we warn the user upon particularly
weird situations (e.g., hard and soft being disjoint sets
of pCPUs).

This very change also update the error handling for calls
to libxl_set_vcpuaffinity() in xl, as that can now be any
libxl error code, not just only -1.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: get and set soft and hard affinity
Dario Faggioli [Fri, 20 Jun 2014 16:19:01 +0000 (18:19 +0200)]
libxc: get and set soft and hard affinity

by using the flag and the new cpumap arguments introduced in
the parameters of the DOMCTL_{get,set}_vcpuaffinity hypercalls.

Now, both xc_vcpu_setaffinity() and xc_vcpu_getaffinity() have
a new flag parameter, to specify whether the user wants to
set/get hard affinity, soft affinity or both. They also have
two cpumap parameters instead of only one. This way, it is
possible to set/get both hard and soft affinity at the same
time (and, in case of set, each one to its own value).

In xc_vcpu_setaffinity(), the cpumaps are IN/OUT parameters,
as it is for the corresponding arguments of the
DOMCTL_set_vcpuaffinity hypercall. What Xen puts there is the
hard and soft effective affinity, that is what Xen will actually
use for scheduling.

In-tree callers are also fixed to cope with the new interface.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc/libxl: bump library SONAMEs
Dario Faggioli [Fri, 20 Jun 2014 16:18:53 +0000 (18:18 +0200)]
libxc/libxl: bump library SONAMEs

The following two patches break both libxc and libxl ABI and
API, so we better bump the MAJORs.

Of course, for libxl, proper measures are taken (in the
relevant patch) in order to guarantee API stability.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: Implement OSDLR_EL1 trap as RAZ/WO.
Ian Campbell [Fri, 13 Jun 2014 12:15:04 +0000 (13:15 +0100)]
xen: arm: Implement OSDLR_EL1 trap as RAZ/WO.

I'm not sure why this wasn't added at the same time as the other
debug registers.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: take FIQ exceptions to Xen not guest by setting HCR_EL2.FMO
Ian Campbell [Thu, 26 Jun 2014 08:53:42 +0000 (09:53 +0100)]
xen: arm: take FIQ exceptions to Xen not guest by setting HCR_EL2.FMO

As with HCR_EL2.{IMO,AMO} we want to route FIQs to Xen not the guest. See ARM
ARM DDI 0406C.b B1.8.4.

So far none of the platforms which we support use FIQ for anything, but when we
end up supporting one it would be far better to surprise Xen with them than
whatever guest happens to be running...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: make sure gcc doesn't use floating-point registers on arm64
Ian Campbell [Thu, 26 Jun 2014 16:30:14 +0000 (17:30 +0100)]
xen: arm: make sure gcc doesn't use floating-point registers on arm64

By using -mgeneral-regs-only which is the Aarch64 equivalent to
-msoft-float.

Otherwise gcc will corrupt the d* registers, which we don't save/restore when
trapping to/from the hypervisor.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoQEMU_TAG update
Ian Jackson [Wed, 25 Jun 2014 14:58:02 +0000 (15:58 +0100)]
QEMU_TAG update

10 years agoxen: arm: initialise the grant_table_gpfn array on allocation
Ian Campbell [Wed, 25 Jun 2014 12:58:59 +0000 (13:58 +0100)]
xen: arm: initialise the grant_table_gpfn array on allocation

Avoids leaking uninitialised memory via the grant table setup hypercall.

This is XSA-101.

Reported-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoVT-d/qinval: make further functions static
Jan Beulich [Wed, 25 Jun 2014 12:43:46 +0000 (14:43 +0200)]
VT-d/qinval: make further functions static

... and with that change their return types to void as they can't
actually fail, simplifying error handling in their callers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agoslightly consolidate code in free_domheap_pages()
Jan Beulich [Wed, 25 Jun 2014 12:43:04 +0000 (14:43 +0200)]
slightly consolidate code in free_domheap_pages()

... to combine the three scrubbing paths into a single one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoVT-d/qinval: eliminate redundant locking
Jan Beulich [Wed, 25 Jun 2014 12:42:15 +0000 (14:42 +0200)]
VT-d/qinval: eliminate redundant locking

The qinval-specific lock would only ever get used with the IOMMU's
register lock already held. Along with dropping the lock also drop
another unused field from struct qi_ctrl.

Furthermore the gen_*_dsc() helpers become pretty pointless with the
lock dropped - being each used only in a single place, simply fold
them into their callers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agox86/HVM: consolidate and sanitize CR4 guest reserved bit determination
Jan Beulich [Wed, 25 Jun 2014 12:40:34 +0000 (14:40 +0200)]
x86/HVM: consolidate and sanitize CR4 guest reserved bit determination

First of all, this is needed by just a single source file, so it gets
moved there instead of getting fed to the compiler for most other
source files too. With that it becomes sensible for this to no longer
be a macro, allowing elimination of the mostly redundant helpers
hvm_vcpu_has_{smep,smap}(). And finally, following the model SMEP and
SMAP already used, tie the determination of reserved bits to the
features the guest is shown rather than the host's.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agotools/libxl: Fix free() of wild pointer in libxl__initiate_device_remove()
Andrew Cooper [Wed, 18 Jun 2014 18:04:14 +0000 (19:04 +0100)]
tools/libxl: Fix free() of wild pointer in libxl__initiate_device_remove()

libxl__initiate_device_remove() had a preexisting error path issue where
libxl_dominfo_dispose() could be called on a libxl_dominfo object before it
had been initialised with libxl_dominfo_init().

This was safe until c/s ab44401 added the pointer ssid_label, which point
libxl_dominfo_dispose() free()s.

Unconditionally initialise info in libxl__initiate_device_remove() before
taking an error path which will free it.

Coverity-ID: 1223212
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
10 years agotools/libxc: Fix missing break in xc_domain_bind_pt_irq()
Andrew Cooper [Wed, 18 Jun 2014 17:44:44 +0000 (18:44 +0100)]
tools/libxc: Fix missing break in xc_domain_bind_pt_irq()

c/s 568da4f8 "pt-irq fixes and improvements" accidentally forgot a break when
refactoring xc_domain_bind_pt_irq() which results in bind->u.pci.bus being
clobbered by isa_irq for PCI and MSI_TRANSLATE interrupts.

Coverity-ID: 1223210
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
10 years agoblktap2: Fix two 'maybe uninitialized' variables
Dario Faggioli [Fri, 20 Jun 2014 14:09:00 +0000 (16:09 +0200)]
blktap2: Fix two 'maybe uninitialized' variables

for which gcc 4.9.0 complains about, like this:

block-qcow.c: In function `get_cluster_offset':
block-qcow.c:431:3: error: `tmp_ptr' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
   memcpy(tmp_ptr, l1_ptr, 4096);
   ^
block-qcow.c:606:7: error: `tmp_ptr2' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
   if (write(s->fd, tmp_ptr2, 4096) != 4096) {
       ^
cc1: all warnings being treated as errors
/home/dario/Sources/xen/xen/xen.git/tools/blktap2/drivers/../../../tools/Rules.mk:89:
 recipe for target 'block-qcow.o' failed
make[5]: *** [block-qcow.o] Error 1

The proper behavior is to return upon allocation failure.
About what to return, 0 seems the best option, looking
at both the function and the call sites.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agolibxl: Rewind toolstack_save_fd in libxl_save_helper when using remus
Yang Hongyang [Fri, 20 Jun 2014 06:59:34 +0000 (14:59 +0800)]
libxl: Rewind toolstack_save_fd in libxl_save_helper when using remus

Commit b327a3f421bb57d262b7d1fb3c43b710852b103b moved the rewinding of
toolstack_save_fd to libxl.  This breaks remus, because in remus mode,
toolstack_save_cb will be called in every checkpoint, and if we don't
rewind it in libxl_save_helper, it will surely fail.

This fix is just a hack: in fact the whole toolstack save thing should
be done in libxl.  But for now (until migration v2) this fix should
solve both remus and Jason Adryuk's use case.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Tested-by: Jason Andryuk <andryuk@aero.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agolibxc: Fix xc_mem_event.c compilation for ARM
Julien Grall [Mon, 23 Jun 2014 13:27:58 +0000 (14:27 +0100)]
libxc: Fix xc_mem_event.c compilation for ARM

The commit 6ae2df9 "mem_access: Add helper API to setup ring and enable
mem_access¨ break libxc compilation for ARM.

This is because xc_map_foreign_map and xc_domain_decrease_reservation_exact
is taking an xen_pfn_t in parameters. On ARM, xen_pfn_t is always an uin64_t.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Cc: Aravindh Puthiyaparambil <aravindp@cisco.com>
Cc: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agomem_access: Add helper API to setup ring and enable mem_access
Aravindh Puthiyaparambil [Tue, 20 May 2014 23:35:44 +0000 (16:35 -0700)]
mem_access: Add helper API to setup ring and enable mem_access

tools/libxc: Add helper function to setup ring for mem events
This patch adds a helper function that maps the ring, enables mem_event
and removes the ring from the guest physmap while the domain is paused.
This can be used by all mem_events but is only enabled for mem_access at
the moment.

tests/xen-access: Use helper API to setup ring and enable mem_access
Prior to this patch, xen-access was setting up the ring page in a way
that would give a malicous guest a window to write in to the shared ring
page. This patch fixes this by using the helper API that does it safely
on behalf of xen-access.

This is XSA-99.

Signed-off-by: Aravindh Puthiyaparambil <aravindp@cisco.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoVT-d/qinval: queue index is always unsigned
Jan Beulich [Fri, 20 Jun 2014 12:48:56 +0000 (14:48 +0200)]
VT-d/qinval: queue index is always unsigned

At once drop bogus initializers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agoVT-d/qinval: clean up error handling
Jan Beulich [Fri, 20 Jun 2014 12:47:55 +0000 (14:47 +0200)]
VT-d/qinval: clean up error handling

- neither qinval_update_qtail() nor qinval_next_index() can fail: make
  the former return "void", and drop caller error checks for the latter
  (all of which would otherwise return with a spin lock still held)
- or-ing together error codes is a bad idea

At once drop bogus initializers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agox86/PVH: allow guest_remove_page to remove p2m_mmio_direct pages
Roger Pau Monné [Fri, 20 Jun 2014 08:38:07 +0000 (10:38 +0200)]
x86/PVH: allow guest_remove_page to remove p2m_mmio_direct pages

IF a guest tries to do a foreign/grant mapping in a memory region
marked as p2m_mmio_direct Xen will complain with the following
message:

(XEN) memory.c:241:d0v0 Bad page free for domain 0

Albeit the mapping will succeed. This is specially problematic for PVH
Dom0, in which we map all the e820 holes and memory up to 4GB as
p2m_mmio_direct.

In order to deal with it, add a special casing for p2m_mmio_direct
regions in guest_remove_page if the domain is a hardware domain, that
calls clear_mmio_p2m_entry in order to remove the mappings.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/mwait_idle: fix trace output
Ross Lagerwall [Fri, 20 Jun 2014 08:37:21 +0000 (10:37 +0200)]
x86/mwait_idle: fix trace output

Use the C-state's type when tracing, not its index since the index is
not set by the mwait_idle driver.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
10 years agoVT-d: drop redundant calls to invalidate_sync()
Jan Beulich [Fri, 20 Jun 2014 08:26:37 +0000 (10:26 +0200)]
VT-d: drop redundant calls to invalidate_sync()

The call tree iommu_flush_iec_index() -> __iommu_flush_iec() already
invokes invalidate_sync(). Removing the superfluous instances at once
allows the function to become static.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agoVT-d/qinval: make local variable used for communication with IOMMU "volatile"
Jan Beulich [Fri, 20 Jun 2014 08:25:33 +0000 (10:25 +0200)]
VT-d/qinval: make local variable used for communication with IOMMU "volatile"

Without that there is - afaict - nothing preventing the compiler from
putting the variable into a register for the duration of the wait loop.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agobuild: export linker emulation parameter to SeaBIOS
Roger Pau Monne [Mon, 2 Jun 2014 15:08:23 +0000 (17:08 +0200)]
build: export linker emulation parameter to SeaBIOS

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agohvmloader: remove size_t typedef and include stddef.h
Roger Pau Monne [Mon, 2 Jun 2014 15:08:22 +0000 (17:08 +0200)]
hvmloader: remove size_t typedef and include stddef.h

The open coded typedef of size_t was clashing with the typedef in
FreeBSD headers. Remove the typedef and include the proper header
where size_t is defined (stddef.h).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: only include utmp.h if it's present
Roger Pau Monne [Mon, 2 Jun 2014 15:08:20 +0000 (17:08 +0200)]
libxl: only include utmp.h if it's present

Add a configure check for utmp.h presence, and gate the usage of
utmp.h in libxl to the result of the test.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- resolved minor conflict in configure.ac and reran autogen ]

10 years agolibxl: add FreeBSD OS support
Roger Pau Monne [Mon, 2 Jun 2014 15:08:18 +0000 (17:08 +0200)]
libxl: add FreeBSD OS support

Create a new libxl_freebsd.c file that contains OS-specific bits used
by libxl.

This currently defines which backend to use to handle disks, and
which hotplug scripts to execute.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agohotplug: add FreeBSD vif-bridge
Roger Pau Monne [Mon, 2 Jun 2014 15:08:17 +0000 (17:08 +0200)]
hotplug: add FreeBSD vif-bridge

Add a simple vif-bridge script, that takes care of adding network
backends (tap or xnb) to a pre-configured bridge.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoinit: add FreeBSD xencommons init script
Roger Pau Monne [Mon, 2 Jun 2014 15:08:16 +0000 (17:08 +0200)]
init: add FreeBSD xencommons init script

This is a clone of the NetBSD xencommons init script with some minor
modifications.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoconsole: add FreeBSD includes
Roger Pau Monne [Mon, 2 Jun 2014 15:08:15 +0000 (17:08 +0200)]
console: add FreeBSD includes

Add FreeBSD specific includes to the console daemon.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxenstored: add FreeBSD xenstored device paths
Roger Pau Monne [Mon, 2 Jun 2014 15:08:14 +0000 (17:08 +0200)]
xenstored: add FreeBSD xenstored device paths

Add the path to FreeBSD special xenstored device, this is all that's
needed to get xenstored working on FreeBSD after the unification of
the implementations.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxenstored: unify xenstored OS-specific bits
Roger Pau Monne [Mon, 2 Jun 2014 15:08:13 +0000 (17:08 +0200)]
xenstored: unify xenstored OS-specific bits

The Solaris implementation seems too different, so this patch only
folds both the Linux and NetBSD implementations.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: remove broken endianess gate on lz4 decompressor
Roger Pau Monne [Mon, 2 Jun 2014 15:08:12 +0000 (17:08 +0200)]
libxc: remove broken endianess gate on lz4 decompressor

The lz4 decompressor had wrongly implemented a gate between
little-endian and big-endian versions of get_unaligned_le{16/32},
which turns out to be broken on all architectures supported by Xen,
because __LITTLE_ENDIAN is not defined. Instead of trying to fix
this, just implement the little-endian version and remove the switch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: add support for FreeBSD
Roger Pau Monne [Mon, 2 Jun 2014 15:08:11 +0000 (17:08 +0200)]
libxc: add support for FreeBSD

Add the FreeBSD implementation of the privcmd and evtchn devices
interface.

The evtchn device interface is the same as the Linux one, while the
privcmd map interface is simplified because FreeBSD only supports
IOCTL_PRIVCMD_MMAPBATCH.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoinclude: import FreeBSD headers for evtchn and privcmd devices
Roger Pau Monne [Mon, 2 Jun 2014 15:08:10 +0000 (17:08 +0200)]
include: import FreeBSD headers for evtchn and privcmd devices

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
10 years agoconfigure: disable ROMBIOS if qemu-trad is disabled
Roger Pau Monne [Mon, 2 Jun 2014 15:08:09 +0000 (17:08 +0200)]
configure: disable ROMBIOS if qemu-trad is disabled

ROMBIOS only works with qemu-traditional, so if it is disabled,
disable ROMBIOS also.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- ran autogen.sh ]

10 years agoconfigure: disable qemu-trad on FreeBSD systems by default
Roger Pau Monne [Mon, 2 Jun 2014 15:08:08 +0000 (17:08 +0200)]
configure: disable qemu-trad on FreeBSD systems by default

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- ran autogen.sh ]

10 years agolibxl: disable usbredirection if spice is disabled
Fabio Fantoni [Tue, 27 May 2014 15:01:39 +0000 (17:01 +0200)]
libxl: disable usbredirection if spice is disabled

Now if usbredirection is enabled in domU's xl cfg is added also
if spice is disabled and then usbredirection remain unused.
This patch if usbredirection is enabled but spice not disable
usbredirection and show a warning.

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- simplified log message ]

10 years agotools/libxc: rename pfn_to_mfn to xc_pfn_to_mfn
Andrew Cooper [Wed, 18 Jun 2014 12:57:58 +0000 (13:57 +0100)]
tools/libxc: rename pfn_to_mfn to xc_pfn_to_mfn

Also refactor the contents of xc_pfn_to_mfn().  It is functionally identical,
but contains less lisp, fewer magic numbers, and more description of why 32bit
guests are treated differently.

Note that this does not affect pfn_to_mfn() in xc_domain_save.c  That was
already a macro which aliased pfn_to_mfn() in xg_private.h but without
actually using it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: introduce vgic_rank_irq
Stefano Stabellini [Wed, 11 Jun 2014 16:27:08 +0000 (17:27 +0100)]
xen/arm: introduce vgic_rank_irq

Introduce vgic_rank_irq: a new helper function that gives you the struct
vgic_irq_rank corresponding to a given irq number.
Use it in vgic_vcpu_inject_irq.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: rename vgic_irq_rank to vgic_rank_offset
Stefano Stabellini [Wed, 11 Jun 2014 16:27:07 +0000 (17:27 +0100)]
xen/arm: rename vgic_irq_rank to vgic_rank_offset

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/pygrub: Fix extlinux when /boot is a separate partition from /
Andrew Cooper [Wed, 11 Jun 2014 18:31:55 +0000 (19:31 +0100)]
tools/pygrub: Fix extlinux when /boot is a separate partition from /

Grub and Grub2 already cope with this.

Reported-by: Joseph Hom <jhom@softlayer.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: libxl_uuid_copy now takes a ctx argument
Wei Liu [Tue, 17 Jun 2014 09:32:22 +0000 (10:32 +0100)]
libxl: libxl_uuid_copy now takes a ctx argument

Make it consistent with existing libxl functions like libxl_bitmap_copy
etc.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxl / libxl: push parsing of SSID and CPU pool ID down to libxl
Wei Liu [Tue, 17 Jun 2014 09:32:21 +0000 (10:32 +0100)]
xl / libxl: push parsing of SSID and CPU pool ID down to libxl

This patch pushes parsing of "init_seclabel", "seclabel",
"device_model_stubdomain_seclabel" and "pool" down to libxl level.

Originally the parsing is done in xl level, which is not ideal because
libxl won't have the truely relevant information. With this patch libxl
holds important information by itself.

The libxl IDL is extended to hold the string of labels and pool name.
And if there those strings are present they take precedence over the
numeric representations.

As all relevant structures (libxl_dominfo etc) have a field called
X_name / X_label now, a string is also copied there so that callers
won't have to do ID to name / label translation.

In order to be compatible with users of older versions of libxl, this
patch also defines LIBXL_HAVE_SSID_LABEL and LIBXL_HAVE_CPUPOOL_NAME. If
they are defined, the respective strings are available. And if those
strings are not NULL, libxl will do the parsing and ignore the numeric
values.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Panic when we receive an unexpected trap
Julien Grall [Tue, 17 Jun 2014 20:44:28 +0000 (21:44 +0100)]
xen/arm: Panic when we receive an unexpected trap

The current implementation of do_unexpected_trap make Xen spin forever
on the current physical CPU. This may lead to stall guests VCPU and print
unhelpful message (RCU stall...).

Usually when Xen receives an unexpected trap, it means that something goes
wrong either in the hypervisor or in the CPU. In this case we should
directly panic to also stop the other CPUs.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/python: Remove some legacy scripts
Andrew Cooper [Tue, 17 Jun 2014 17:26:18 +0000 (18:26 +0100)]
tools/python: Remove some legacy scripts

Nothing in scripts/ is referenced by the current Xen build system.  It is a
legacy version of the XenAPI bindings, other parts of which have already been
removed from the tree.

Additionally, prevent the install target from creating an $(SBINDIR) directory
but putting nothing in it.  This appears to be something missed when removing
Xend.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Drop cpuinfo_x86 structure definition
Julien Grall [Mon, 16 Jun 2014 20:41:34 +0000 (21:41 +0100)]
xen/arm: Drop cpuinfo_x86 structure definition

I'm not sure why this structure were defined in ARM specific include...

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/EFI: allow FPU/XMM use in runtime service functions
Jan Beulich [Wed, 18 Jun 2014 13:53:27 +0000 (15:53 +0200)]
x86/EFI: allow FPU/XMM use in runtime service functions

UEFI spec update 2.4B developed a requirement to enter runtime service
functions with CR0.TS (and CR0.EM) clear, thus making feasible the
already previously stated permission for these functions to use some of
the XMM registers. Enforce this requirement (along with the connected
ones on FPU control word and MXCSR) by going through a full FPU save
cycle (if the FPU was dirty) in efi_rs_enter() (along with loading  the
specified values into the other two registers).

Note that the UEFI spec mandates that extension registers other than
XMM ones (for our purposes all that get restored eagerly) are preserved
across runtime function calls, hence there's nothing we need to restore
in efi_rs_leave() (they do get saved, but just for simplicity's sake).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: prevent PVH Dom0 from having pages with more than one ref
Roger Pau Monné [Wed, 18 Jun 2014 13:52:25 +0000 (15:52 +0200)]
x86: prevent PVH Dom0 from having pages with more than one ref

On PV guests a reference is taken when a page gets added to the page
tables, which makes pages added to the page tables have two
references, but this is not suitable for PVH that doesn't use the
PVMMU. In the PVH case only one reference has to be taken or else the
page would not be freed when the memory of the domain is decreased.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/mce: sanitise the #MC entry path
Andrew Cooper [Wed, 18 Jun 2014 13:51:28 +0000 (15:51 +0200)]
x86/mce: sanitise the #MC entry path

The 'error_code' function parameters are not used at all; drop it from the
call chain.  If it is needed at some point in the future, it is available via
cpu_user_regs.

Having do_machine_check() call the non-inlineable machine_check_vector() just
to get at the static function pointer '_machine_check_vector' is silly.  Move
do_machine_check() from traps.c to mce.c and do away with
machine_check_vector() entirely.

Both {intel,amd}_init_mce() register their own local function as the #MC
handler, each of which call mcheck_cmn_handler() in an identical way.  Fix
this craziness by actually turning mcheck_cmn_handler() into a valid #MC
handler (as its comments already state), and have {intel,amd}_init_mce()
register it instead of their own private handlers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
10 years agoIOMMU: prevent VT-d device IOTLB operations on wrong IOMMU
Malcolm Crossley [Wed, 18 Jun 2014 13:50:02 +0000 (15:50 +0200)]
IOMMU: prevent VT-d device IOTLB operations on wrong IOMMU

PCIe ATS allows for devices to contain IOTLBs, the VT-d code was iterating
around all ATS capable devices and issuing IOTLB operations for all IOMMUs,
even though each ATS device is only accessible via one particular IOMMU.

Issuing an IOMMU operation to a device not accessible via that IOMMU results
in an IOMMU timeout because the device does not reply. VT-d IOMMU timeouts
result in a Xen panic.

Therefore this bug prevents any Intel system with 2 or more ATS enabled IOMMUs,
each with an ATS device connected to them, from booting Xen.

The patch adds a IOMMU pointer to the ATS device struct so the VT-d code can
ensure it does not issue IOMMU ATS operations on the wrong IOMMU. A void
pointer has to be used because AMD and Intel IOMMU implementations do not have
a common IOMMU structure or indexing mechanism.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/arm: gic_events_need_delivery and irq priorities
Stefano Stabellini [Tue, 10 Jun 2014 14:07:20 +0000 (15:07 +0100)]
xen/arm: gic_events_need_delivery and irq priorities

Introduce GIC_IRQ_GUEST_ACTIVE to track which irqs are currently
active in the guest.

gic_events_need_delivery should only return positive if an outstanding
pending irq has an higher group priority than the currently active group
priotity and the priority mask.
Read GICH_APR to find the active group priority.
Read GICH_VMCR to find the priority mask.
Find the highest priority non-active enabled irq by going through the
inflight list.

In gic_restore_pending_irqs replace lower priority pending (and not
active) irqs in GICH_LRs with higher priority irqs if no more GICH_LRs
are available.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: introduce GIC_PRI_TO_GUEST macro
Stefano Stabellini [Tue, 10 Jun 2014 14:07:19 +0000 (15:07 +0100)]
xen/arm: introduce GIC_PRI_TO_GUEST macro

GICH_LR registers and GICH_VMCR only support 5 bits for guest irq
priorities.
Introduce a macro to reduce the 8-bit priority fields to 5 bits; use it
in gic.c.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: don't protect GICH and lr_queue accesses with gic.lock
Stefano Stabellini [Tue, 10 Jun 2014 14:07:18 +0000 (15:07 +0100)]
xen/arm: don't protect GICH and lr_queue accesses with gic.lock

GICH is banked, protect accesses by disabling interrupts.
Protect lr_queue accesses with the vgic.lock only.
gic.lock only protects accesses to GICD now.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen/arm: second irq injection while the first irq is still inflight
Stefano Stabellini [Tue, 10 Jun 2014 14:07:17 +0000 (15:07 +0100)]
xen/arm: second irq injection while the first irq is still inflight

Set GICH_LR_PENDING in the corresponding GICH_LR to inject a second irq
while the first one is still active.
If the first irq is already pending (not active), clear
GIC_IRQ_GUEST_QUEUED because the guest doesn't need a second
notification.If the irq has already been EOI'ed then just clear the
GICH_LR right away and move the interrupt to lr_pending so that it is
going to be reinjected by gic_restore_pending_irqs on return to guest.

If the target cpu is not the current cpu, then set GIC_IRQ_GUEST_QUEUED
and send an SGI. The target cpu is going to be interrupted and call
gic_clear_lrs, that is going to take the same actions.

Do not call vgic_vcpu_inject_irq from gic_inject if
evtchn_upcall_pending is set. If we remove that call, we don't need to
special case evtchn_irq in vgic_vcpu_inject_irq anymore.
We need to force the first injection of evtchn_irq (call
gic_vcpu_inject_irq) from vgic_enable_irqs because evtchn_upcall_pending
is already set by common code on vcpu creation.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: rename GIC_IRQ_GUEST_PENDING to GIC_IRQ_GUEST_QUEUED
Stefano Stabellini [Tue, 10 Jun 2014 14:07:16 +0000 (15:07 +0100)]
xen/arm: rename GIC_IRQ_GUEST_PENDING to GIC_IRQ_GUEST_QUEUED

Rename GIC_IRQ_GUEST_PENDING to GIC_IRQ_GUEST_QUEUED and clarify its
meaning in xen/include/asm-arm/domain.h.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: s/gic_set_guest_irq/gic_raise_guest_irq
Stefano Stabellini [Tue, 10 Jun 2014 14:07:15 +0000 (15:07 +0100)]
xen/arm: s/gic_set_guest_irq/gic_raise_guest_irq

Rename gic_set_guest_irq to gic_raise_guest_irq and remove the state
parameter.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: keep track of the GICH_LR used for the irq in struct pending_irq
Stefano Stabellini [Tue, 10 Jun 2014 14:07:14 +0000 (15:07 +0100)]
xen/arm: keep track of the GICH_LR used for the irq in struct pending_irq

Move the irq field in pending_irq to improve packing.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen/arm: nr_lrs should be uint8_t
Stefano Stabellini [Tue, 10 Jun 2014 14:07:13 +0000 (15:07 +0100)]
xen/arm: nr_lrs should be uint8_t

A later patch is going to use uint8_t to keep track of LRs.
Both GICv3 and GICv2 don't need any more than an uint8_t to keep track
of the number of LRs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: support HW interrupts, do not request maintenance_interrupts
Stefano Stabellini [Tue, 10 Jun 2014 14:07:12 +0000 (15:07 +0100)]
xen/arm: support HW interrupts, do not request maintenance_interrupts

If the irq to be injected is an hardware irq (p->desc != NULL), set
GICH_LR_HW. Do not set GICH_LR_MAINTENANCE_IRQ.

Remove the code to EOI a physical interrupt on behalf of the guest
because it has become unnecessary.

Introduce a new function, gic_clear_lrs, that goes over the GICH_LR
registers, clear the invalid ones and free the corresponding interrupts
from the inflight queue if appropriate. Add the interrupt to lr_pending
if the GIC_IRQ_GUEST_PENDING is still set.

Call gic_clear_lrs on entry to the hypervisor if we are coming from
guest mode to make sure that the calculation in Xen of the highest
priority interrupt currently inflight is correct and accurate and not
based on stale data.

In vgic_vcpu_inject_irq, if the target is a vcpu running on another
pcpu, we are already sending an SGI to the other pcpu so that it would
pick up the new IRQ to inject.  Now also send an SGI to the other pcpu
even if the IRQ is already inflight, so that it can clear the LR
corresponding to the previous injection as well as injecting the new
interrupt.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen/arm: set GICH_HCR_UIE if all the LRs are in use
Stefano Stabellini [Tue, 10 Jun 2014 14:07:11 +0000 (15:07 +0100)]
xen/arm: set GICH_HCR_UIE if all the LRs are in use

On return to guest, if there are no free LRs and we still have more
interrupt to inject, set GICH_HCR_UIE so that we are going to receive a
maintenance interrupt when no pending interrupts are present in the LR
registers.
The maintenance interrupt handler won't do anything anymore, but
receiving the interrupt is going to cause gic_inject to be called on
return to guest that is going to clear the old LRs and inject new
interrupts.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: remove unused virtual parameter from vgic_vcpu_inject_irq
Stefano Stabellini [Tue, 10 Jun 2014 14:07:10 +0000 (15:07 +0100)]
xen/arm: remove unused virtual parameter from vgic_vcpu_inject_irq

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: no need to set HCR_VI when using the vgic to inject irqs
Stefano Stabellini [Tue, 10 Jun 2014 14:07:09 +0000 (15:07 +0100)]
xen/arm: no need to set HCR_VI when using the vgic to inject irqs

HCR_VI forces the guest to resume execution in IRQ mode and can actually
cause spurious interrupt injections.
The GIC is capable of injecting interrupts into the guest and causing it
to switch to IRQ mode automatically, without any need for the hypervisor
to set HCR_VI manually.

See ARM ARM B1.8.11 and chapter 5.4 of the Generic Interrupt Controller
Architecture Specification.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agopage-alloc: scrub pages used by hypervisor upon freeing
Jan Beulich [Tue, 17 Jun 2014 13:21:10 +0000 (15:21 +0200)]
page-alloc: scrub pages used by hypervisor upon freeing

... unless they're part of a fully separate pool (and hence can't ever
be used for guest allocations).

This is CVE-2014-4021 / XSA-100.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 17 Jun 2014 09:40:39 +0000 (10:40 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agolibxl: properly set default of discard_enable
Olaf Hering [Tue, 17 Jun 2014 08:44:40 +0000 (10:44 +0200)]
libxl: properly set default of discard_enable

Initialze discard_enable properly. This avoids a crash if a
libxl_device_disk with an uninitialized discard_enable is passed to
device_disk_add. Up to now only xl initialized discard_enable in its
config parser. External users of libxl, such as libvirt, do not need to
provide a default value.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agosched: DOMCTL_*vcpuaffinity works with hard and soft affinity
Dario Faggioli [Mon, 16 Jun 2014 10:13:25 +0000 (12:13 +0200)]
sched: DOMCTL_*vcpuaffinity works with hard and soft affinity

by adding a flag for the caller to specify which one he cares about.

At the same time, enable the caller to get back the "effective affinity"
of the vCPU. That is the intersection between cpupool's cpus, the (new)
hard affinity and, for soft affinity, the (new) soft affinity. In fact,
despite what has been successfully set with the DOMCTL_setvcpuaffinity
hypercall, the Xen scheduler will never run a vCPU outside of its hard
affinity or of its domain's cpupool.

This happens by adding another cpumap to the interface and making both
the cpumaps IN/OUT parameters (for DOMCTL_setvcpuaffinity, they're of
course out-only for DOMCTL_getvcpuaffinity).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoderive NUMA node affinity from hard and soft CPU affinity
Dario Faggioli [Mon, 16 Jun 2014 10:13:03 +0000 (12:13 +0200)]
derive NUMA node affinity from hard and soft CPU affinity

if a domain's NUMA node-affinity (which is what controls
memory allocations) is provided by the user/toolstack, it
just is not touched. However, if the user does not say
anything, leaving it all to Xen, let's compute it in the
following way:

 1. cpupool's cpus & hard-affinity & soft-affinity
 2. if (1) is empty: cpupool's cpus & hard-affinity

This guarantees memory to be allocated from the narrowest
possible set of NUMA nodes, ad makes it relatively easy to
set up NUMA-aware scheduling on top of soft affinity.

Note that such 'narrowest set' is guaranteed to be non-empty.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agosched: introduce soft-affinity and use it instead d->node-affinity
Dario Faggioli [Mon, 16 Jun 2014 10:12:28 +0000 (12:12 +0200)]
sched: introduce soft-affinity and use it instead d->node-affinity

Before this change, each vcpu had its own vcpu-affinity
(in v->cpu_affinity), representing the set of pcpus where
the vcpu is allowed to run. Since when NUMA-aware scheduling
was introduced the (credit1 only, for now) scheduler also
tries as much as it can to run all the vcpus of a domain
on one of the nodes that constitutes the domain's
node-affinity.

The idea here is making the mechanism more general by:
  * allowing for this 'preference' for some pcpus/nodes to be
    expressed on a per-vcpu basis, instead than for the domain
    as a whole. That is to say, each vcpu should have its own
    set of preferred pcpus/nodes, instead than it being the
    very same for all the vcpus of the domain;
  * generalizing the idea of 'preferred pcpus' to not only NUMA
    awareness and support. That is to say, independently from
    it being or not (mostly) useful on NUMA systems, it should
    be possible to specify, for each vcpu, a set of pcpus where
    it prefers to run (in addition, and possibly unrelated to,
    the set of pcpus where it is allowed to run).

We will be calling this set of *preferred* pcpus the vcpu's
soft affinity, and this changes introduce it, and starts using it
for scheduling, replacing the indirect use of the domain's NUMA
node-affinity. This is more general, as soft affinity does not
have to be related to NUMA. Nevertheless, it allows to achieve the
same results of NUMA-aware scheduling, just by making soft affinity
equal to the domain's node affinity, for all the vCPUs (e.g.,
from the toolstack).

This also means renaming most of the NUMA-aware scheduling related
functions, in credit1, to something more generic, hinting toward
the concept of soft affinity rather than directly to NUMA awareness.

As a side effects, this simplifies the code quit a bit. In fact,
prior to this change, we needed to cache the translation of
d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
is what scheduling decisions require (we used to keep it in
node_affinity_cpumask). This, and all the complicated logic
required to keep it updated, is not necessary any longer.

The high level description of NUMA placement and scheduling in
docs/misc/xl-numa-placement.markdown is being updated too, to match
the new architecture.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agosched: rename v->cpu_affinity into v->cpu_hard_affinity
Dario Faggioli [Mon, 16 Jun 2014 10:11:52 +0000 (12:11 +0200)]
sched: rename v->cpu_affinity into v->cpu_hard_affinity

in order to distinguish it from the cpu_soft_affinity which will
be introduced a later commit ("xen: sched: introduce soft-affinity
and use it instead d->node-affinity").

This patch does not imply any functional change, it is basically
the result of something like the following:

 s/cpu_affinity/cpu_hard_affinity/g
 s/cpu_affinity_tmp/cpu_hard_affinity_tmp/g
 s/cpu_affinity_saved/cpu_hard_affinity_saved/g

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agospread boot time page scrubbing across all available CPU's
Malcolm Crossley [Mon, 16 Jun 2014 10:02:00 +0000 (12:02 +0200)]
spread boot time page scrubbing across all available CPU's

The page scrubbing is done in 128MB chunks in lockstep across all the
non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst each
chunk is being scrubbed and then release the heap_lock when the CPU's are
finished scrubing their individual chunk. This allows for the heap_lock to
not be held continously and for pending softirqs are to be serviced
periodically across the CPU's.

The page scrub memory chunks are allocated to the CPU's in a NUMA aware
fashion to reduce socket interconnect overhead and improve performance.
Specifically in the first phase we scrub at the same time on all the
NUMA nodes that have CPUs - we also weed out the SMT threads so that
we only use cores (that gives a 50% boost). The second phase is for NUMA
nodes that have no CPUs - for that we use the closest NUMA node's CPUs
(non-SMT again) to do the job.

This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron
6386 machine from 49 seconds to 3 seconds.
On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes
to 63 seconds.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/mce: don't spam the console with "CPUx: Temperature z"
Konrad Rzeszutek Wilk [Mon, 16 Jun 2014 09:59:32 +0000 (11:59 +0200)]
x86/mce: don't spam the console with "CPUx: Temperature z"

If the machine has been quite busy it ends up with these messages
printed on the hypervisor console:

(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature above threshold
(XEN) CPU0: Running in modulated clock mode
(XEN) CPU1: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal

While the state changes are important, the non-altered state
information is not needed. As such add a latch mechanism to only print
the information if it has changed since the last update (and the
hardware doesn't properly suppress redundant notifications).

This was observed on Intel DQ67SW,
BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
10 years agocpuidle: improve perf for certain workloads
Ross Lagerwall [Mon, 16 Jun 2014 09:59:05 +0000 (11:59 +0200)]
cpuidle: improve perf for certain workloads

The existing mechanism of using interrupt frequency as a heuristic does
not work well for certain workloads.  As an example, synchronous dd on a
small block size uses deep C-states because much of the time is spent
doing processing so the interrupt frequency is not too high, but when an
IOP is submitted, the interrupt occurs soon after going idle.  This
causes exit latency to be a significant factor.

To fix this, add a new factor which limits the exit latency to be no
more than 10% of the decaying measured idle time.  This improves
performance for workloads with a medium interrupt frequency but a short
idle duration.

In the workload given previously, throughput improves by 20% with this
patch.

This is not ported from the Linux menu governor since that uses load
average and number of IO wait processes to satisfy latency constraints.
If a process is in IO wait state, it compares the exit latency with the
predicted residency reduced by a factor of 10, which is somewhat similar
to what this patch does.

A side effect of this patch is to correctly limit the maximum idle time
used in the correction factor calculation. Previously data->measured_us
was used, and it was never set.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
10 years agox86/EFI: improve boot time diagnostics (try 2)
Jan Beulich [Mon, 16 Jun 2014 09:52:34 +0000 (11:52 +0200)]
x86/EFI: improve boot time diagnostics (try 2)

To aid analysis of eventual errors, print EFI status codes with error
messages where available. Also remove a case where the status gets
stored into a local variable without being used examined (which mis-
guided me to add an error check there in try 1 of this patch).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agopt-irq fixes and improvements
Jan Beulich [Mon, 16 Jun 2014 09:50:44 +0000 (11:50 +0200)]
pt-irq fixes and improvements

Tools side:
- don't silently ignore unrecognized PT_IRQ_TYPE_* values
- respect that the interface type contains a union, making the code at
  once no longer depend on the hypervisor ignoring the bus field of the
  PCI portion of the interface structure)

Hypervisor side:
- don't ignore the PCI bus number passed in
- don't store values (gsi, link) calculated from other stored values
- avoid calling xfree() with a spin lock held where easily possible
- have pt_irq_destroy_bind() respect the passed in type
- scope reduction and constification of various variables
- use switch instead of if/else-if chains
- formatting

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Implement a dummy debug monitor for ARM32
Julien Grall [Thu, 24 Apr 2014 22:45:55 +0000 (23:45 +0100)]
xen/arm: Implement a dummy debug monitor for ARM32

XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitors registers") disable Debug Registers access.

When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, it will try to
initialize the debug monitors. If an error occured Linux won't use this
feature.

The implementation made Xen expose a minimal set of registers which let think
the guest (i.e.) thinks HW debug won't work.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- s/DBGCR/DBGBCR/ to use correct register name ]
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Implement a dummy Performance Monitor for ARM32
Julien Grall [Thu, 24 Apr 2014 22:45:54 +0000 (23:45 +0100)]
xen/arm: Implement a dummy Performance Monitor for ARM32

XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitor registers") disable Performance Monitor.

When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, regardless the
ID_DFR0 (which tell if Perfomance Monitors Extension is implemented) the
kernel will try to access to PMCR.

Therefore we tell the guest we have 0 counters. Unfortunately we must always
support PMCCNTR (the cycle counter): we just RAZ/WI for all PM register,
which doesn't crash the kernel at least.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agomini-os: don't include queue.h if there's no libc
Thomas Leonard [Wed, 11 Jun 2014 10:30:17 +0000 (11:30 +0100)]
mini-os: don't include queue.h if there's no libc

Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: moved events code under arch
Karim Raslan [Wed, 11 Jun 2014 10:30:15 +0000 (11:30 +0100)]
mini-os: moved events code under arch

This is all code motion, except that we now initialise
the ev_actions array before calling the arch-specific code
to make it more robust against future changes.

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
[talex5@gmail.com: separated from big ARM commit]
Signed-off-by: Thomas Leonard <talex5@gmail.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
10 years agomini-os: tidied up code
Karim Raslan [Wed, 11 Jun 2014 10:30:14 +0000 (11:30 +0100)]
mini-os: tidied up code

Signed-off-by: Karim Allah Ahmed <karim.allah.ahmed@gmail.com>
[talex5@gmail.com: separated from big ARM commit]
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
[talex5@gmail.com: use __func__ in DEBUG macro]
[talex5@gmail.com: drop text about "xm create"]
Signed-off-by: Thomas Leonard <talex5@gmail.com>
10 years agolibxl: const-ify libxl_uuid_*() API
David Vrabel [Tue, 10 Jun 2014 18:07:30 +0000 (19:07 +0100)]
libxl: const-ify libxl_uuid_*() API

Add const to parameters of libxl_uuid_*() calls where it does not
change the API.

Add libxl_uuid_byte_array_const() to return a const array.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: Add Valgrind client requests
Andrew Cooper [Tue, 10 Jun 2014 14:41:07 +0000 (15:41 +0100)]
tools/libxc: Add Valgrind client requests

Valgrind client requests can be used by code to provide extra debugging
information about memory ranges, or to request checks at specific points.

Reference:
  http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs

Client requests are safe to compile into code for running outside of
valgrind.  Therefore, enable client requests whenever autoconf can find
memcheck.h and debug builds are enabled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- reran autogen.sh ]

10 years agox86/VPMU: mark context LOADED before registers are loaded
Boris Ostrovsky [Wed, 11 Jun 2014 08:55:43 +0000 (10:55 +0200)]
x86/VPMU: mark context LOADED before registers are loaded

Because a PMU interrupt may be generated as soon as PMU registers are
loaded (or, more precisely, as soon as HW PMU is "armed") we don't want
to delay marking context as LOADED until after registers are loaded.
Otherwise during interrupt handling VPMU_CONTEXT_LOADED may not be set
and this could be confusing.

(Technically, only SVM needs this change right now since VMX will "arm"
PMU later, during VMRUN when global control register is loaded from
VMCS. However, both AMD and Intel code will require this patch when we
introduce PV VPMU.)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agolibxl: move some internal functions to libxl_internal.h
Wei Liu [Tue, 10 Jun 2014 21:21:40 +0000 (22:21 +0100)]
libxl: move some internal functions to libxl_internal.h

In 752f181f ("libxl_json: introduce parser functions for builtin types")
a bunch of parser functions are added to libxl_json.h, which breaks
GCC < 4.6.

These functions are internal and libxl_json.h is public header, so move
them to libxl_internal.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoRevert "x86/EFI: improve boot time diagnostics"
Jan Beulich [Tue, 10 Jun 2014 15:56:11 +0000 (17:56 +0200)]
Revert "x86/EFI: improve boot time diagnostics"

This reverts commit 9921387f0c14a3f0ed42f9112efb7260af13db35.
It added an error check where none should be.

10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 10 Jun 2014 15:04:12 +0000 (16:04 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agotools/libxc: Introduce ARRAY_SIZE() and replace handrolled examples
Andrew Cooper [Tue, 10 Jun 2014 14:07:59 +0000 (15:07 +0100)]
tools/libxc: Introduce ARRAY_SIZE() and replace handrolled examples

xen-hptool and xen-mfndump include xc_private.h.  This is bad, but not trivial
to fix, so they gain a protective #undef and a stern comment.

MiniOS leaks ARRAY_SIZE into the libxc namespace as part of a stubdom build.
Therefore, xc_private.h gains an #ifndef until MiniOS is fixed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/domctl: remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext
Andrew Cooper [Tue, 10 Jun 2014 14:59:11 +0000 (16:59 +0200)]
x86/domctl: remove PV MSR parts of XEN_DOMCTL_[gs]et_ext_vcpucontext

The PV MSR functionality is now implemented as a separate set of domctls.

This is a revert of parts of c/s65e3554908
  "x86/PV: support data breakpoint extension registers"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agolibxc: use an explicit check for PV MSRs in xc_domain_save()
Andrew Cooper [Tue, 10 Jun 2014 14:58:47 +0000 (16:58 +0200)]
libxc: use an explicit check for PV MSRs in xc_domain_save()

Migrating PV domains using MSRs is not supported.  This uses the new
XEN_DOMCTL_get_vcpu_msrs and will fail the migration with an explicit error.

This is an improvement upon the current failure of
  "No extended context for VCPUxx (ENOBUFS)"

Support for migrating PV domains which are using MSRs will be included in the
migration v2 work.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
10 years agox86/domctl: implement XEN_DOMCTL_{get,set}_vcpu_msrs
Andrew Cooper [Tue, 10 Jun 2014 14:57:16 +0000 (16:57 +0200)]
x86/domctl: implement XEN_DOMCTL_{get,set}_vcpu_msrs

Despite my 'Reviewed-by' tag on c/s 65e3554908 "x86/PV: support data
breakpoint extension registers", I have re-evaluated my position as far as the
hypercall interface is concerned.

Previously, for the sake of not modifying the migration code in libxc,
XEN_DOMCTL_get_ext_vcpucontext would jump though hoops to return -ENOBUFS if
and only if MSRs were in use and no buffer was present.

This is fragile, and awkward from a toolstack point-of-view when actually
sending MSR content in the migration stream.  It also complicates fixing a
further race condition, between querying the number of MSRs for a vcpu, and
the vcpu touching a new one.

As this code is still only in unstable, take this opportunity to redesign the
interface.  This patch introduces the brand new XEN_DOMCTL_{get,set}_vcpu_msrs
subops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>