Keir Fraser [Wed, 9 Mar 2011 16:16:11 +0000 (16:16 +0000)]
x86: make get_page_from_l1e() return a proper error code
... so that the guest can actually know the reason for the (hypercall)
failure.
ptwr_do_page_fault() could propagate the error indicator received from
get_page_from_l1e() back to the guest in the high half of the error
code (entry_vector), provided we're sure all existing guests can deal
with that (or indicate so by means of a to-be-added guest feature
flag). Alternatively, a second virtual status register (like CR2)
could be introduced.
Jan Beulich [Wed, 9 Mar 2011 16:15:36 +0000 (16:15 +0000)]
x86: run-time callers of map_pages_to_xen() must check for errors
Again, (out-of-memory) errors must not cause hypervisor crashes, and
hence ought to be propagated.
This also adjusts the cache attribute changing loop in
get_page_from_l1e() to not go through an unnecessary iteration. While
this could be considered mere cleanup, it is actually a requirement
for the subsequent now necessary error recovery path.
Also make a few functions static, easing the check for potential
callers needing adjustment.
Keir Fraser [Tue, 8 Mar 2011 16:30:30 +0000 (16:30 +0000)]
Fix rcu domain locking for transitive grants
When acquiring a transitive grant for copy then the owning domain
needs to be locked down as well as the granting domain. This was being
done, but the unlocking was not. The acquire code now stores the
struct domain * of the owning domain (rather than the domid) in the
active entry in the granting domain. The release code then does the
unlock on the owning domain. Note that I believe I also fixed a bug
where, for non-transitive grants the active entry contained a
reference to the acquiring domain rather than the granting
domain. From my reading of the code this would stop the release code
for transitive grants from terminating its recursion correctly.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Make the DISK_TYPE_* id numbering in tapdisk-disktypes.h contiguous.
Currently, id 8 is unallocated causing a null disk type entry in
tapdisk_disk_drivers array in tapdisk-disktypes.c. This causes the
function tapdisk_disktype_find() to return an error on encountering
disk types >7 (remus:, log:, etc.).
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Daniel Stodden <daniel.stodden@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tim Deegan [Mon, 7 Mar 2011 11:34:09 +0000 (11:34 +0000)]
xen: add "lto=y" option to build Xen with link-time optimizations.
This involves gathering object files from .asm (which will be binary)
and object files from .c (which will be in LTO format) separately
until the final link.
Only tested for x86_64 Xen builds using Clang/LLVM bitcode; it should be
possible to do the same with newer GCCs and GIMPLE.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Mon, 7 Mar 2011 11:21:11 +0000 (11:21 +0000)]
xen: add "clang=y" option to build Xen with clang/llvm instead of gcc.
Tested with svn snapshot of clang and llvm from 17 February 2011.
Only x86_64 hypervisor builds (make dist-xen clang=y) are supported
and I haven't even begun to look at cross-compiling.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Mon, 7 Mar 2011 11:21:11 +0000 (11:21 +0000)]
credit2: remove two nested functions, replacing them with static ones
and passing the outer scope's variables in explicitly.
This is needed to compile xen with clang.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Keir Fraser [Sat, 5 Mar 2011 11:34:41 +0000 (11:34 +0000)]
x86: On CPU offline, fix master waiting for slave to be fully dead.
On two back-to-back CPU offline operations, on second offline the
cpu_state var will be CPU_STATE_DEAD from the first offline. Hence
__cpu_die() will incorrectly not wait for the second slave to fully
die and set cpu_state itself.
The fix is to set cpu_state to a new value, CPU_STATE_DYING, earlier
during CPU offline, before __cpu_die() starts to execute.
Original diagnosis and patch by Liu, Jinsong <jinsong.liu@intel.com>
Wei Gang [Thu, 3 Mar 2011 18:51:13 +0000 (18:51 +0000)]
tools: gtracestate: fix several problems
Fixed problems include:
* previously just print out a error instead of help info
while running without cmdline parameters.
* -u & -n lead to Segmentation fault.
* -c and then use default ranges, the default ranges is not
50us... but 50000/tsc2us....
Signed-off-by: Wei Gang <gang.wei@intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 3 Mar 2011 17:11:31 +0000 (17:11 +0000)]
libxl: correctly initialise yylineno
Sometimes xl would read an uninitialised variable when printing error
messages, resulting in things like this:
/etc/xen/thing.cfg:1030057088: config parsing error near `"ws08r2-x64-2': lexical error
This is because yylineno is a variable inside the scanner created by
yylex_init, but it is not initialised by yylex_init.
(Debian bug #616099.)
On the way I discovered a lot of complication to do with the calling
convention between bison and flex in reentrant parsers/scanners which
use locations (Debian bug #616100) but as the above change makes the
current code in xen-unstable work I don't propose to do anything else
about that now in our tree.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 3 Mar 2011 17:07:40 +0000 (17:07 +0000)]
xl: add "device_model_args" to pass arbitrary extra arguments to device model
The libxl support was already in place so simply plumb it through.
This allows for passing debug options to the device model and provides
a method to work around missing toolstack functionality.
e.g. xl does not current support floppy disks but adding:
device_model_args = [ "-fda", "/scratch/fdboot.img" ]
allowed me to boot FreeDOS from a floppy image.
I was unable to find any equivalent functionality in xend so this is a
new xl feature.
Moved xmalloc/xrealloc earlier to allow use from parse_config_data.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxc: fix incorrect scanning of pfn array in pagebuf during migration
xc_domain_restore.c:apply_batch function makes two passes over the pfn_types
array in pagebuf to allocate the needed MFNs. The curbatch parameter to this
function specifies the array offset in pfn_types, from where the current scan
should begin. But this variable is not taken into account (index always starts
at 0) during the two passes. While this [bug] does not manifest itsef during
save/restore or live migration, under Remus, xc_domain_restore fails due to
corrupt guest page tables.
(This appears to have been broken by 21588:6c3d8aec202d which reverted
two changesets from before Remus support was added and hence
reintroduced some none-Remus compatible bits.)
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Liu, Jinsong [Wed, 2 Mar 2011 10:31:06 +0000 (10:31 +0000)]
x86: Fix cpu online/offline bug: mce memory leak.
Current Xen mce logic didn't free mcabanks. This would be a memory
leak when cpu offline. When repeatly do cpu online/offline, this
memory leak would make xenpool shrink, and at a time point, will call
alloc_heap_pages --> flush_area_mask, which
ASSERT(local_irq_is_enabled()). However, cpu online is irq disable,
so it finally result in Xen crash.
This patch fix the memory leak bug, and tested OK over 50,000 round
cpu online/offline.
Juergen Gross [Mon, 28 Feb 2011 15:09:33 +0000 (15:09 +0000)]
Avoid possible live-lock in vcpu_migrate
If vcpu_migrate is called for two vcpus active on different cpus
resulting in swapping the cpus, a live-lock could occur as both
instances try to take the scheduling lock of the physical cpus in
opposite order.
To avoid this problem the locks are always taken in the same order
(sorted by the address of the lock).
Ian Jackson [Fri, 25 Feb 2011 18:43:48 +0000 (18:43 +0000)]
xl: allow config filename to precede options
"xm create" supports options which follow the domain config filename.
So xl should do as well.
This is an ad-hoc fixup to the "xl create" command line parser. We
should revisit the xl command line parser in 4.2.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reported-by: W. Michael Petullo <mike@flyn.org> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Fri, 25 Feb 2011 17:26:18 +0000 (17:26 +0000)]
libxc: Handle failed xc_get_hvm_param in domain save
The domain save code will read an HVM param, and if it's not zero,
make an entry for it. However, if the hypercall fails for any reason,
the data may not be written, and the value for the previous parameter
may be written in the save file as the parameter that failed.
Initialize the value to zero before each hypercall, so that in case of
a failure, no value will be written.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Fix tapdisk-disktype.c's initialization for remus' disk_info_t,
which is currently initializing the disk name with disk description.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 25 Feb 2011 17:15:20 +0000 (17:15 +0000)]
libxl: Multi-device passthrough coldplug: do not wait for unstarted guest
When doing a PCI passthrough, the code checks to see whether there is
an existing backend directory in xenstore with a nonzero "num_devs".
If there isn't, it creates the backend directory with just the
required device.
If there is, it would assume that it was doing hotplug. If doing
hotplug, it needs to set the "state" node in xenstore to "7"
(reconfiguring) and thus avoid racing with the backend needs to wait
for the backend to be "4" (connected).
However during guest creation, the presence of "num_devs" doesn't
necessarily mean hotplug. If we are still creating the initial
xenstore setup (ie, adding devices as a subroutine of domain
creation), we can just write the new devices to xenstore. So do that.
This involves adding a new parameter "starting", indicating that we
are still in domain creation, to libxl_device_pci_add_xenstore (a
misnamed internal function) and its callers. Its callers include
libxl_device_pci_add which we therefore split into an internal version
with the new parameter, and an external version used only for hotplug
by libxl-using applications.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 25 Feb 2011 17:13:53 +0000 (17:13 +0000)]
libxl: do not ignore errors from libxl_device_pci_add_xenstore in do_pci_add
Without this, some failures of PCI device passthrough would be
ignored.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 25 Feb 2011 17:03:55 +0000 (17:03 +0000)]
xl/libxl: treat vif "ip" fields as a simple string
Currently we parse the string as an IPv4 address but this does not
handle IPv6. We then format the IP address as a string into xenstore.
Rather than add further parsing and formatting to support IPv6 simply
treat the field as a string, which it turns out is all xend does.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Memory parity error is only valid for IBM PC-AT, newer machines use
bit 7 (0x80) of 0x61 port for PCI SERR. While memory errors are
usually reported via MCE.
Rename the memory parity error handler to pci serr handler and
print a warning and continue instead of crashing.
Juergen Gross [Fri, 25 Feb 2011 11:28:15 +0000 (11:28 +0000)]
cpupool: Avoid race when moving cpu between cpupools
Moving cpus between cpupools is done under the schedule lock of the
moved cpu. When checking a cpu being member of a cpupool this must be
done with the lock of that cpu being held. Hot-unplugging of physical
cpus might encounter the same problems, but this should happen only
very rarely.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Thu, 24 Feb 2011 09:33:19 +0000 (09:33 +0000)]
amd-k8-mce: remove a stray break statement
This was a leftover of converting from a switch to an if/else
somewhere between 3.4 and 4.0.
It also looks suspicious that MCEQUIRK_K7_BANK0 is not actually used
anywhere. Perhaps amd_k7_mcheck_init() and amd_k8_mcheck_init() were
intended to get (partially) folded?
Ian Campbell [Fri, 18 Feb 2011 15:32:02 +0000 (15:32 +0000)]
libxl/xl: enable support for routed network configurations.
Add "vifscript" option to xl.conf which configures the default vif
script to use (default remains "vif-bridge")
Write each VIFs "ip" option to xenstore so the vif-route script can
pick it up.
Reported-by: W. Michael Petullo <mike@flyn.org>. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
When multiple disks are passed to libxl_wait_for_disk_ejects, watch the
correct disk paths.
Parse the backend type and backend domid from xenstore in
libxl_event_get_disk_eject_info.
libxl_event_get_disk_eject_info must return a valid string in
disk->vdev, while at the moment is free'ed before returning.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: remove the entries from xenstore when destroying a disk
Currently we are only changing the backend state but it is not enough to
entirely destroying a disk device: remove all the entries from xenstore
as well.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reuse the same code used to parse the disk line from the VM config file
in cd_insert.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 17 Feb 2011 19:48:05 +0000 (19:48 +0000)]
docs: vbd-interface.txt: correct behaviour for modern Linux pv-on-hvm
Modern PV on HVM kernels map hd* devices to corresponding xvd*.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 17 Feb 2011 19:40:17 +0000 (19:40 +0000)]
libxl: do slow resume after failed migration attempt
both of the current callers for libxl_domain_resume are calling after
a migration has failed, one is failure to suspend on the sender and
the other is failure to start on the destination, both leading to a
resume attempt on the sender.
However in the first case, failure to suspend, there is no guarantee
that the guest has made it as far as the suspend hypercall and
therefore the fast resume method, which frobs the hypercall return to
indicate a cancelled suspend, cannot safely be used since it will
corrupt %eax/%rax.
For the second case, failure to start on destination, I don't think it
really matters if the resume is fast or slow.
Therefore always use the slow/uncooperative version of xc_domain_resume from
libxl_domain_resume.
This makes a PV domain which failed to suspend (e.g. because the core
Linux PM infrastructure within the guest didn't allow it) recover
gracefully.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tim Deegan [Wed, 16 Feb 2011 09:48:05 +0000 (09:48 +0000)]
x86/shadow: unconditionally set the p2m/log-dirty allocation functions.
Otherwise enabling log-dirty mode on a PV guest that already has
a shadow allocation can leave the alloc/free functions pointers NULL,
and later try to dereference them.
p2m internals should always gate on whether HAP is enabled for the
domain, not whether a HAP paging mode is currently advertised.
This lets us revert the change to hap_enable() that advertises the
new mode before it's safe to use it.
docs: document disk configuration string syntax (particularly, xl's syntax)
Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Special case how we validate vhd image files. Without this patch when
tap:aio:vhd prefixed image files are specified in the config file,
disk validation and thus vm creation will fail.
Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Currently we pile all the backend and format information pertaining to
disk option in a single enum. This check-in separates the two and
uses two enums, one for disk format and another for disk backend.
This helps clearly differentiate between disk format and backend
within the implementation and also helps cleanup the code in this area
in preparation for the impending parser revamping to be done post 4.1.
Along with separating format and backend, this check-in also removes
unwanted types and renames variables in the disk interface and fixes
the code affected by the interface changes.
In specific, here are the disk interface changes made - In
libxl_device_disk structure physpath was renamed to pdev_path,
virtpath was renamed to vdev, phystype was removed and replaced with
backend and format enums. Also previously a single enum
libxl_disk_phystype held the values for qcow, qcow2, vhd, aio, file,
phy, empty and that got refactored into two enums, libxl_disk_format
to hold unknown, qcow, qcow2, vhd, raw, empty and libxl_disk_backend
to hold unknown, phy, tap and qdisk.
Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Tue, 15 Feb 2011 19:39:05 +0000 (19:39 +0000)]
tools: Include cpupool example in /etc/xen
xl cpupool-create at the moment requires a config file. Make
sure to include the example config file in the install.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 14 Feb 2011 17:02:55 +0000 (17:02 +0000)]
libxl: fix migrate for HVM guests
Prior to 22909:6868f7f3ab3f libxl would loop waiting simultaneously
for the domain the acknowledge a PV suspend request (by clearing the
XenStore node) and for the domain to actually suspend. For HVM guests
without PV drivers this same loop was simply waiting for the domain to
suspend.
In 22909:6868f7f3ab3f the original loop was split into two loops
(first waiting for the acknowledgement and then for the actual
suspend). This caused libxl to incorrectly wait for an HVM guest
without PV drivers to acknowledge the XenStore request, which is not
something it would ever do.
Fix this by only waiting for an acknowledgement from a guest which
contains PV drivers.
Previously we were also making the request regardless of whether the
guest had PV drivers, change that to only make the request if the
guest has PV drivers.
Lastly there is no need to sample HVM_PARAM_ACPI_S_STATE twice and not
doing so simplifies the test for PVHVM vs. normal HVM guests.
Tested with:
Windows with GPL PV drivers (event channel suspend mode)
Windows without PV drivers (xc_domain_shutdown mode)
Linux PV (PV with XenBus control node mode)
Linux HVM (PVHVM with XenBus control node mode (*))
Linux HVM (xc_domain_shutdown mode)
(*) In this case the kernel didn't actually suspend, due to:
PM: Device input1 failed to suspend: error -22
xen suspend: dpm_suspend_start -22
which may be a misconfiguration in my setup or may be a kernel
bug, but the libxl side dealt with this as gracefully as it could.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Juergen Gross [Mon, 14 Feb 2011 16:56:20 +0000 (16:56 +0000)]
xl: Support more than 32 vcpus for xl vcpu-set
xl vcpu-set currently uses a 32 bit mask for specifying which cpus are to be
set online. This restricts the number of cpus supported by this command.
The patch switches to libxl_cpumap, the interface of libxl_set_vcpuonline()
is changed accordingly.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Juergen Gross [Mon, 14 Feb 2011 16:55:00 +0000 (16:55 +0000)]
xl: correct xl cpupool-create with extra parameters
xl cpupool-create won't take always extra parameters specified on the command
line, as a 0-byte is missing at the end of the configuration file contents.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
This is the equivalent of xm trigger s3resume and it is implemented the
same way: using an ACPI state change.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Gang [Mon, 14 Feb 2011 10:41:12 +0000 (10:41 +0000)]
x86: Fix S3 resume for HPET MSI IRQ case
Jan Beulich found that for S3 resume on platforms without ARAT feature
but with MSI capable HPET, request_irq() will be called in
hpet_setup_msi_irq() for irq already setup(no release_irq() called
during S3 suspend), so that always falling back to using
legacy_hpet_event.
Fix it by conditional calling request_irq() for 4.1. Planned to split
the S3 resume path from booting path post 4.1, as Jan suggested.
Signed-off-by: Wei Gang <gang.wei@intel.com> Acked-by: Jan Beulich <jbeulich@novell.com>
Ian Jackson [Fri, 11 Feb 2011 18:21:35 +0000 (18:21 +0000)]
tools/hotplug/Linux: Use correct device name for vifs in setup scripts
In vif-common.sh, set the shell variable "dev" to the new interface
name when interfaces are renamed, and consistently use this variable
in all the vif scripts.
This fixes hotplug of renamed interfaces.
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
From: Patrick Scharrenberg <pittipatti@web.de> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Patrick Scharrenberg <pittipatti@web.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 11 Feb 2011 17:57:32 +0000 (17:57 +0000)]
libxl/xl: improve behaviour when guest fails to suspend itself.
The PV suspend protocol requires guest co-operating whereby the guest
must respond to a suspend request written to the xenstore control node
by clearing the node and then making a suspend hypercall.
Currently when a guest fails to do this libxl times out and returns
a generic failure code to the caller.
In response to this failure xl attempts to resume the guest. However
if the guest has not responded to the suspend request then the is no
guarantee that the guest has made the suspend hypercall (in fact it is
quite unlikely). Since the resume process attempts to modify the
return value of the hypercall (to indicate a cancelled suspend) this
results in the guest eax/rax register being corrupted!
To fix this change libxl to do the following:
* Wait for the guest to acknowledge the suspend request.
- on timeout cancel the suspend request.
- if cancellation is successful then return a new error code to
indicate that the guest is not responding.
- if the cancel does not succeed then we raced with the guest
which actually did acknowledge at the last minute, so
continue.
* Wait for the guest to suspend.
- on timeout return the standard error code as before
* Guest successfully suspended, return success.
Lastly in xl do not attempt to resume a guest if it has not responded
to the suspend request.
Tested by live migration of PVops kernels which either ignore the
suspend request, have already crashed and those which suspend/resume
correctly. In the first two cases the source domain is left alone (and
continues to function in the first case) and in the third the
migration is successful.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 11 Feb 2011 17:56:24 +0000 (17:56 +0000)]
libxl: allow guest to write "control/shutdown" xenstore node.
The PV shutdown/reboot/suspend protocol requires that the guest
acknowledge a request by clearing the node therefore it is necessary
to allow the guest to write to the node.
Currently libxl is quite relaxed about this protocol and doesn't
reeally seem to mind that the guest is unable to write the node to
perform the acknowledgement. However in a followup patch libxl needs
to be able to detect that a guest has acknowledged a suspend request.
A side effect of this change is that an empty "control/shutdown" node
is created upon domain creation instead of only being created when a
shutdown/reboot/suspend is requested. This should not (and does not
in my tests) have any negative impact on the guest.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: do not call libxl__file_reference_unmap twice
Fix double free due to libxl__file_reference_unmap(&info->kernel) called
multiple times: first at the end of libxl__domain_build and then in
libxl_domain_build_info_destroy.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 11 Feb 2011 17:49:13 +0000 (17:49 +0000)]
libxc: increase lzma max memory constant to 128Mby
According to lzma's configure.ac (!) the minimum memory limit to cope
with arbitrary input is 128Mby (!)
This is obviously an unreasonable amount of memory for this kind of
task, but we need to increase the constant limit for it not to
randomly fail. So do so.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Thu, 10 Feb 2011 14:19:54 +0000 (14:19 +0000)]
x86: suppress HPET broadcast initialization in the presence of ARAT
This follows Linux commit 39fe05e58c5e448601ce46e6b03900d5bf31c4b0,
noticing that all this setup is pointless when ARAT support is there,
and knowing that on SLED11's native kernel it has actually caused S3
resume issues.
A question would be whether HPET legacy interrupts should be forced
off in this case (rather than leaving whatever came from firmware).
Keir Fraser [Thu, 10 Feb 2011 14:19:23 +0000 (14:19 +0000)]
x86: tighten conditions under which writing certain MSRs is permitted
MSRs that control physical CPU aspects generally are pointless (and
possibly dangerous) to be written when the writer isn't sufficiently
aware that it's running virtualized.