If an image source page is allocated in kimage_alloc_page() but the
machine_kexec_add_page() fails, the image may appear to load
succesfully but it will not execute. The relocation will fault
(rebooting the host) when trying to copy the source page, as it is not
mapped.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
If a bad image type is supplied in a KEXECOP_unload hypercall, the
kexec_lock in kexec_swap_images() was left locked, causing a deadlock
on a subsequent image load or unload.
The kexec_lock is only required to serialize the swap operation
itself.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Wed, 13 Nov 2013 08:42:51 +0000 (09:42 +0100)]
pvh tools: libxl changes to create a PVH guest
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Mukesh Rathor [Wed, 13 Nov 2013 08:42:14 +0000 (09:42 +0100)]
pvh tools: libxc changes to build a PVH guest
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Mukesh Rathor [Wed, 13 Nov 2013 08:41:12 +0000 (09:41 +0100)]
pvh: restrict tsc_mode to NEVER_EMULATE for now
The reason given for this restriction in the first place, given in one
of the comments checking for PVH requirements, had to do with
additional infrastructure required to allow PV RDTSC emulation for PVH
guests.
Since we don't use the PV emulation path at all anymore, we may be
able to remove this restriction.
Experiments show that pvh will boot without apparent issues in
"default", "native", and "native_paravirt" mode, but not in
"always_emulate" mode. We'll leave this restriction in until
we can sort out what's going on.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Mukesh Rathor [Wed, 13 Nov 2013 08:40:41 +0000 (09:40 +0100)]
pvh: disable 32-bit guest support for now
Removing the assert allows the PVH code to call this during vmcs
construction in a later patch, making the code more robust by removing
duplicate code.
To be implemented.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
George Dunlap [Wed, 13 Nov 2013 08:40:03 +0000 (09:40 +0100)]
pvh: use PV handlers for PIO
Register an IO handler for the entire PIO range, and have it call the
PV PIO handlers.
NB at this point this won't do the full "copy and execute on the stack
with full GPRs" work-around; this may need to be sorted out for dom0 to allow
these instructions to happen in guest context.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Mukesh Rathor [Wed, 13 Nov 2013 08:37:01 +0000 (09:37 +0100)]
pvh: use PV e820
Allow PV e820 map to be set and read from a PVH domain. This requires
moving the pv e820 struct out from the pv-specific domain struct and
into the arch domain struct.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Mukesh Rathor [Wed, 13 Nov 2013 08:35:20 +0000 (09:35 +0100)]
pvh: vmx-specific changes
Changes:
* Enforce HAP mode for now
* Disable exits related to virtual interrupts or emulated APICs
* Disable changing paging mode
- "unrestricted guest" (i.e., real mode for EPT) disabled
- write guest EFER disabled
* Start in 64-bit mode
* Paging mode update to happen in arch_set_info_guest
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Mukesh Rathor [Wed, 13 Nov 2013 08:30:09 +0000 (09:30 +0100)]
pvh prep: introduce pv guest type and has_hvm_container macros
The goal of this patch is to classify conditionals more clearly, as to
whether they relate to pv guests, hvm-only guests, or guests with an
"hvm container" (which will eventually include PVH).
This patch introduces an enum for guest type, as well as two new macros
for switching behavior on and off: is_pv_* and has_hvm_container_*. At the
moment is_pv_* <=> !has_hvm_container_*. The purpose of having two is that
it seems to me different to take a path because something does *not* have PV
structures as to take a path because it *does* have HVM structures, even if the
two happen to coincide 100% at the moment. The exact usage is occasionally a bit
fuzzy though, and a judgement call just needs to be made on which is clearer.
In general, a switch should use is_pv_* (or !is_pv_*) if the code in question
relates directly to a PV guest. Examples include use of pv_vcpu structs or
other behavior directly related to PV domains.
hvm_container is more of a fuzzy concept, but in general:
* Most core HVM behavior will be included in this. Behavior not
appropriate for PVH mode will be disabled in later patches
* Hypercalls related to HVM guests will *not* be included by default;
functionality needed by PVH guests will be enabled in future patches
* The following functionality are not considered part of the HVM
container, and PVH will end up behaving like PV by default: Event
channel, vtsc offset, code related to emulated timers, nested HVM,
emuirq, PoD
* Some features are left to implement for PVH later: vpmu, shadow mode
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
George Dunlap [Wed, 13 Nov 2013 08:29:02 +0000 (09:29 +0100)]
pvh: tolerate HVM guests having no ioreq page
PVH guests don't have a backing device model emulator (qemu); just
tolerate this situation explicitly, rather than special-casing PVH.
For unhandled IO, hvmemul_do_io() will now return X86EMUL_OKAY, which
is I believe what would be the effect if qemu didn't have a handler
for the IO.
This also fixes a potetial DoS in the host from the reworked series:
If the guest makes a hypercall which sends an invalidate request, it
would have crashed the host.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Mukesh Rathor [Wed, 13 Nov 2013 08:26:38 +0000 (09:26 +0100)]
pvh prep: code motion
There are many functions where PVH requires some code in common with
HVM. Rearrange some of these functions so that the code is together.
In general, the HVM code that PVH also uses includes:
- cacheattr functionality
- paging
- hvm_funcs
- hvm_assert_evtchn_irq tasklet
- tm_list
- hvm_params
And code that PVH shares with PV but not with PVH:
- updating the domain wallclock
- setting v->is_initialized
There should be no end-to-end changes in behavior.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Roger Pau Monné [Wed, 13 Nov 2013 08:26:13 +0000 (09:26 +0100)]
libxc: move temporary grant table mapping to end of memory
In order to set up the grant table for HVM guests, libxc needs to map
the grant table temporarily. At the moment, it does this by adding the
grant page to the HVM guest's p2m table in the MMIO hole (at gfn 0xFFFFE),
then mapping that gfn, setting up the table, then unmapping the gfn and
removing it from the p2m table.
This breaks with PVH guests with 4G or more of ram, because there is
no MMIO hole; so it ends up clobbering a valid RAM p2m entry, then
leaving a "hole" when it removes the grant map from the p2m table.
Since the guest thinks this is normal ram, when it maps it and tries
to access the page, it crashes.
This patch maps the page at max_gfn+1 instead.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
George Dunlap [Wed, 13 Nov 2013 08:25:36 +0000 (09:25 +0100)]
VMX: allow vmx_update_debug_state to be called when v!=current
Removing the assert allows the PVH code to call this during vmcs
construction in a later patch, making the code more robust by removing
duplicate code.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Eddie Dong <eddie.dong@intel.com>
Ian Jackson [Thu, 18 Apr 2013 15:27:46 +0000 (16:27 +0100)]
libxl: Avoid realloc(,0) when libxl__xs_directory returns empty list
If the named path is a leaf node, libxl__xs_directory can succeed,
returning non-null, but set *nb to 0.
In three places in libxl this may result in a zero size argument being
passed to malloc() or realloc(), which is not adviseable.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Mon, 14 Oct 2013 16:26:01 +0000 (17:26 +0100)]
libxl: Deprecate synchronous waiting for the device model
libxl__wait_for_device_model blocks, with the ctx lock held, waiting
for a response from the device model. If the dm doesn't respond
quickly (for example, because it has crashed), this may block the
whole process. Explain this in a comment, rename the function to
libxl__wait_for_device_model_deprecated, and explain what to use
instead.
libxl__wait_for_offspring is the core implementation for the above.
Its name leads people to think it might be generally useful for
waiting for children, which is far from the case. It only waits for
xenstore. Also it has the problems described above. Explain this,
rename it to libxl__xenstore_child_wait_deprecated, and explain what
to use instead.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 3 Sep 2013 12:41:46 +0000 (13:41 +0100)]
libxl: Do not generate short block in libxl__datacopier_prefixdata
libxl__datacopier_prefixdata would prepend a deliberately short block
(not just a half-full one, but one with a short buffer) to the
dc->bufs queue. However, this is wrong because datacopier_readable
will find it and try to continue to fill it up.
Instead, allocate a full-sized buffer.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Chunyan Liu <cyliu@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
This allows a long-running ao to avoid accumulating memory. Each
nested ao has its own gc.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
specifically used signed integers, identical to the code copied out of vsprintf.
When committed, these had changed to unsigned integers, which causes a
functional change. This causes glacial boot performance and an excessive
quantity of spaces printed to the serial console, as we loop to the upper
bound of a 32bit integer.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 12 Nov 2013 15:28:47 +0000 (16:28 +0100)]
x86: eliminate has_arch_mmios()
... as being generally insufficient: Either has_arch_pdevs() or
cache_flush_permitted() should be used (in particular, it is
insufficient to consider MMIO ranges alone - I/O port ranges have the
same requirements if available to a guest).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 12:19:25 +0000 (13:19 +0100)]
evtchn/fifo: don't spin indefinitely when setting LINK
A malicious or buggy guest can cause another domain to spin
indefinitely by repeatedly writing to an event word when the other
guest is trying to link a new event. The cmpxchg() in
evtchn_fifo_set_link() will repeatedly fail and the loop may never
terminate.
Fixing this requires a change to the ABI which is documented in draft
H of the design.
Since a well-behaved guest only makes a limited set of state changes,
the loop can terminate early if the guest makes an invalid state
transition.
The guest may:
- clear LINKED and LINK.
- clear PENDING
- set MASKED
- clear MASKED
It is valid for the guest to mask and unmask an event at any time so
specify that it is not valid for a guest to clear MASKED if Xen is
trying to update LINK. Indicate this to the guest with an additional
BUSY bit in the event word. The guest must not clear MASKED if BUSY
is set and it should spin until BUSY is cleared.
The remaining valid writes (clear LINKED, clear PENDING, set MASKED,
clear MASKED by Xen) will limit the number of failures of the
cmpxchg() to at most 4. A clear of LINKED will also terminate the
loop early. Therefore, the loop can then be limited to at most 4
iterations.
If the buggy or malicious guest does cause the loop to exit with
LINKED set and LINK unset then that buggy guest will lose events.
Reported-by: Anthony Liguori <aliguori@amazon.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 12 Nov 2013 10:52:19 +0000 (11:52 +0100)]
VMX: don't crash processing 'd' debug key
There's a window during scheduling where "current" and the active VMCS
may disagree: The former gets set much earlier than the latter. Since
both vmx_vmcs_enter() and vmx_vmcs_exit() immediately return when the
subject vCPU is "current", accessing VMCS fields would, depending on
whether there is any currently active VMCS, either read wrong data, or
cause a crash.
Going forward we might want to consider reducing the window during
which vmx_vmcs_enter() might fail (e.g. doing a plain __vmptrld() when
v->arch.hvm_vmx.vmcs != this_cpu(current_vmcs) but arch_vmx->active_cpu
== -1), but that would add complexities (acquiring and - more
importantly - properly dropping v->arch.hvm_vmx.vmcs_lock) that don't
look worthwhile adding right now.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 12 Nov 2013 10:51:15 +0000 (11:51 +0100)]
nested SVM: adjust guest handling of structure mappings
For one, nestedsvm_vmcb_map() error checking must not consist of using
assertions: Global (permanent) mappings can fail, and hence failure
needs to be dealt with properly. And non-global (transient) mappings
can't fail anyway.
And then the I/O port access bitmap handling was broken: It checked
only to first of the accessed ports rather than each of them.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Christoph Egger <chegger@amazon.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
David Vrabel [Tue, 12 Nov 2013 10:47:26 +0000 (11:47 +0100)]
x86: check kexec relocation code fits in a page
The kexec relocation (control) code must fit in a single page so add a
link time check for this.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 10:47:07 +0000 (11:47 +0100)]
libxc: add API for kexec hypercall
Add xc_kexec_exec(), xc_kexec_get_ranges(), xc_kexec_load(), and
xc_kexec_unload(). The load and unload calls require the v2 load and
unload ops.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 10:46:39 +0000 (11:46 +0100)]
libxc: add hypercall buffer arrays
Hypercall buffer arrays are used when a hypercall takes a variable
length array of buffers.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 10:46:06 +0000 (11:46 +0100)]
kexec crash image when dom0 crashes
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 10:44:41 +0000 (11:44 +0100)]
kexec: extend hypercall with improved load/unload ops
In the existing kexec hypercall, the load and unload ops depend on
internals of the Linux kernel (the page list and code page provided by
the kernel). The code page is used to transition between Xen context
and the image so using kernel code doesn't make sense and will not
work for PVH guests.
Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
that no longer require a code page to be provided by the guest -- Xen
now provides the code for calling the image directly.
The new load op looks similar to the Linux kexec_load system call and
allows the guest to provide the image data to be loaded. The guest
specifies the architecture of the image which may be a 32-bit subarch
of the hypervisor's architecture (i.e., an EM_386 image on an
EM_X86_64 hypervisor).
The toolstack can now load images without kernel involvement. This is
required for supporting kexec when using a dom0 with an upstream
kernel.
Crash images are copied directly into the crash region on load.
Default images are copied into domheap pages and a list of source and
destination machine addresses is created. This is list is used in
kexec_reloc() to relocate the image to its destination.
The old load and unload sub-ops are still available (as
KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
of the new infrastructure.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 10:41:02 +0000 (11:41 +0100)]
kexec: add infrastructure for handling kexec images
Add the code needed to handle and load kexec images into Xen memory or
into the crash region. This is needed for the new KEXEC_CMD_load and
KEXEC_CMD_unload hypercall sub-ops.
Much of this code is derived from the Linux kernel.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 10:39:29 +0000 (11:39 +0100)]
kexec: add public interface for improved load/unload sub-ops
Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
kexec hypercall. These new sub-ops allow a priviledged guest to
provide the image data to be loaded into Xen memory or the crash
region instead of guests loading the image data themselves and
providing the relocation code and metadata.
The old interface is provided to guests requesting an interface
version prior to 4.4.
Bump __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
David Vrabel [Tue, 12 Nov 2013 10:37:19 +0000 (11:37 +0100)]
x86: give FIX_EFI_MPF its own fixmap entry
FIX_EFI_MPF was the same as FIX_KEXEC_BASE_0 which is going away. So
add its own entry.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Tested-by: Don Slutz <dslutz@verizon.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 12 Nov 2013 10:11:30 +0000 (11:11 +0100)]
common/symbols: Remove print_symbol() and associated infrastructure
Also adjust the one common user of print_symbol() to use the new printk()
format. While adjusting the format string, increase the width so a
long-to-expire plt_overflow() timer doesn't break the column alignment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Dario Faggioli [Tue, 12 Nov 2013 09:54:28 +0000 (10:54 +0100)]
numa-sched: leave node-affinity alone if not in "auto" mode
If the domain's NUMA node-affinity is being specified by the
user/toolstack (instead of being automatically computed by Xen),
we really should stick to that. This means domain_update_node_affinity()
is wrong when it filters out some stuff from there even in "!auto"
mode.
This commit fixes that. Of course, this does not mean node-affinity
is always honoured (e.g., a vcpu won't run on a pcpu of a different
cpupool) but the necessary logic for taking into account all the
possible situations lives in the scheduler code, where it belongs.
What could happen without this change is that, under certain
circumstances, the node-affinity of a domain may change when the
user modifies the vcpu-affinity of the domain's vcpus. This, even
if probably not a real bug, is at least something the user does
not expect, so let's avoid it.
Zheng Li [Thu, 31 Oct 2013 16:32:56 +0000 (16:32 +0000)]
oxenstored: allow updates regardless of quota
Allow a domain updating existing xenstore keys even if it has already reached
its max entries limit
As updating existing key won't increase the number of entries belonging to a
domain, we should avoid checking the max entries limit prematurely. The patch
addresses this issue in the following functions: write/add, mkdir, setperms.
Signed-off-by: Zheng Li <zheng.li@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Rob Hoes [Wed, 6 Nov 2013 17:50:03 +0000 (17:50 +0000)]
libxl: ocaml: provide defaults for libxl types
Libxl functions such as libxl_domain_create_new take large structs
of configuration parameters. Often, we would like to use the default
values for many of these parameters.
The struct and keyed-union types in libxl have init functions, which
fill in the defaults for a given type. This commit provides an OCaml
interface to obtain records of defaults by calling the relevant init
function.
These default records can be used as a base to construct your own
records, and to selectively override parameters where needed.
For example, a Domain_create_info record can now be created as follows:
Xenlight.Domain_create_info.({ default ctx () with
ty = Xenlight.DOMAIN_TYPE_PV;
name = Some vm_name;
uuid = vm_uuid;
})
For types with KeyedUnion fields, such as Domain_build_info, a record
with defaults is obtained by specifying the type key:
Rob Hoes [Wed, 6 Nov 2013 17:49:53 +0000 (17:49 +0000)]
libxl: ocaml: use the "string option" type for IDL strings
The libxl IDL is based on C type "char *", and therefore "strings" can
by NULL, or be an actual string. In ocaml, it is common to encode such
things as option types.
Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Rob Hoes [Wed, 6 Nov 2013 17:49:52 +0000 (17:49 +0000)]
libxl: ocaml: fix the handling of enums in the bindings generator
Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com>
Rob Hoes [Wed, 6 Nov 2013 17:49:51 +0000 (17:49 +0000)]
libxl: ocaml: add domain_build/create_info/config and events to the bindings.
We now have enough infrastructure in place to do this trivially.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com>
Rob Hoes [Wed, 6 Nov 2013 17:49:50 +0000 (17:49 +0000)]
libxl: ocaml: make Val_defbool GC-proof
In order to avoid newly created OCaml values from being GC'ed, they must be
registered as roots with the GC, before an iteration of the GC may happen. The
Val_* functions potentially allocate new values on the OCaml heap, and may
trigger an iteration of the OCaml GC.
The way to register a value with the GC is to assign it to a variable declared
with a CAMLparam or CAMLlocal macro, which put the value into a struct that
can be reached from a GC root.
This leads to slightly weird looking C code, but avoids hard to find segfaults.
Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Rob Hoes [Wed, 6 Nov 2013 17:49:45 +0000 (17:49 +0000)]
libxl: ocaml: switch all functions over to take a context.
Since the context has a logger we can get rid of the logger built into these
bindings and use the xentoollog bindings instead.
The gc is of limited use when most things are freed with libxl_FOO_dispose,
so get rid of that too.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com>
Rob Hoes [Wed, 6 Nov 2013 17:49:44 +0000 (17:49 +0000)]
libxl: ocaml: allocate a long lived libxl context.
Rather than allocating a new context for every libxl call begin to
switch to a model where a context is allocated by the caller and may
then be used for multiple calls down into the library.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com>
Rob Hoes [Wed, 6 Nov 2013 17:49:43 +0000 (17:49 +0000)]
libxc: ocaml: add simple binding for xentoollog (output only).
These bindings allow ocaml code to receive log message via xentoollog
but do not support injecting messages into xentoollog from ocaml.
Receiving log messages from libx{c,l} and forwarding them to ocaml is
the use case which is needed by the following patches.
Add a simple noddy test case (tools/ocaml/test).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com>
[ ijc -- dropped the xtl test harness, it failed to link ]
Rob Hoes [Wed, 6 Nov 2013 17:49:41 +0000 (17:49 +0000)]
libxl: ocaml: support for KeyedUnion in the bindings generator.
A KeyedUnion consists of two fields in the containing struct. First an
enum field ("e") used as a descriminator and second a union ("u")
containing potentially anonymous structs associated with each enum
value.
We map the anonymous structs to structs named after the descriminator
field ("e") and the specific enum values. We then declare an ocaml
variant type name e__union mapping each enum value to its associated
struct.
Ian Campbell [Tue, 29 Oct 2013 11:39:50 +0000 (11:39 +0000)]
tools: support system supplied ovmf binary
Debian Jessie at least contains an ovmf package that includes
/usr/share/ovmf/OVMF.fd. It's also possible that user may want to supply
his/her own ovmf binary.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Kelley Nielsen [Mon, 11 Nov 2013 10:27:54 +0000 (02:27 -0800)]
libxl: use macro GCNEW in libxl_qmp.c
The new coding style uses the convenience macro GCNEW as declared in
libxl_internal.h. Substitute an invocation of this macro for its
body at the one place it occurs in libxl_qmp.c.
Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Kelley Nielsen [Mon, 11 Nov 2013 10:08:58 +0000 (02:08 -0800)]
libxl: use macro CTX in libxl_qmp.c
The new coding style uses the convenience macro CTX as declared in
libxl_internal.h. Substitute an invocation of this macro for its body
at the one place it occurs in libxl_qmp.c.
Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Kelley Nielsen [Mon, 11 Nov 2013 10:08:57 +0000 (02:08 -0800)]
libxl: add convenience macros to qmp_send() in libxl_qmp.c
Update qmp_send() in libxl_qmp.c to use the new convenience macros
declared in libxl_internal.h. Uses GC_INIT at the top of the function,
and GC_FREE at the exit. Since GC_INIT returns a libxl__gc by reference
and not by value, remove the address operator from the left of the
variable gc where it is passed as a parameter.
Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Kelley Nielsen [Sun, 10 Nov 2013 03:05:05 +0000 (19:05 -0800)]
libxl: macro LOG() used in place of LIBXL__LOG in libxl_qmp.c
Code cleanup -- no functional changes
Coding style has recently been changed for libxl. The convenience macro
LOG() has been introduced, and it is intended that it calls to the old
macro LIBXL__LOG() be replaced with it. Change 7 occurences of the old
macro (in functions that have a local libxl_gc *gc) to the new one.
Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Mon, 11 Nov 2013 10:01:04 +0000 (11:01 +0100)]
x86/idle: reduce contention on ACPI register accesses
Other than when they're located in I/O port space, accessing them when
in MMIO space (currently) implies usage of some sort of global lock: In
-unstable this would be due to the use of vmap(), is older trees the
necessary locking was introduced by 2ee9cbf9 ("ACPI: fix
acpi_os_map_memory()"). This contention was observed to result in Dom0
kernel soft lockups during the loading of the ACPI processor driver
there on systems with very many CPU cores.
There are a couple of things being done for this:
- re-order elements of an if() condition so that the register access
only happens when we really need it
- turn off arbitration disabling only when the first CPU leaves C3
(paralleling how arbitration disabling gets turned on)
- only set the (global) bus master reload flag once (when the first
target CPU gets processed)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Mon, 11 Nov 2013 10:00:21 +0000 (11:00 +0100)]
x86/Intel: don't probe CPUID faulting on family 0xf CPUs
These are known to not support the feature, so we can save ourselves
from emitting the resulting #GP fault recovery related message (which
might worry people looking at the logs).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Liu Jinsong <jinsong.liu@intel.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Mon, 11 Nov 2013 08:15:04 +0000 (09:15 +0100)]
nested VMX: VMLANUCH/VMRESUME emulation must check permission first thing
Otherwise uninitialized data may be used, leading to crashes.
This is CVE-2013-4551 / XSA-75.
Reported-and-tested-by: Jeff Zimmerman <Jeff_Zimmerman@McAfee.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-and-tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 8 Nov 2013 10:08:32 +0000 (11:08 +0100)]
x86/EFI: make trampoline allocation more flexible
Certain UEFI implementations reserve all memory below 1Mb at boot time,
making it impossible to properly allocate the chunk necessary for the
trampoline. Fall back to simply grabbing a chunk from EfiBootServices*
regions immediately prior to calling ExitBootServices().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Kouya Shimura [Fri, 8 Nov 2013 10:07:14 +0000 (11:07 +0100)]
x86/hvm: fix restart of RTC periodic timer with vpt_align=1
The commit 58afa7ef "x86/hvm: Run the RTC periodic timer on a
consistent time series" aligns the RTC periodic timer to the VM's boot time.
However, it's aligned later again to the system time in create_periodic_time()
with vpt_align=1. The next tick might be skipped.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>