Ian Jackson [Fri, 25 Feb 2011 18:43:48 +0000 (18:43 +0000)]
xl: allow config filename to precede options
"xm create" supports options which follow the domain config filename.
So xl should do as well.
This is an ad-hoc fixup to the "xl create" command line parser. We
should revisit the xl command line parser in 4.2.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reported-by: W. Michael Petullo <mike@flyn.org> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Fri, 25 Feb 2011 17:26:18 +0000 (17:26 +0000)]
libxc: Handle failed xc_get_hvm_param in domain save
The domain save code will read an HVM param, and if it's not zero,
make an entry for it. However, if the hypercall fails for any reason,
the data may not be written, and the value for the previous parameter
may be written in the save file as the parameter that failed.
Initialize the value to zero before each hypercall, so that in case of
a failure, no value will be written.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Fix tapdisk-disktype.c's initialization for remus' disk_info_t,
which is currently initializing the disk name with disk description.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 25 Feb 2011 17:15:20 +0000 (17:15 +0000)]
libxl: Multi-device passthrough coldplug: do not wait for unstarted guest
When doing a PCI passthrough, the code checks to see whether there is
an existing backend directory in xenstore with a nonzero "num_devs".
If there isn't, it creates the backend directory with just the
required device.
If there is, it would assume that it was doing hotplug. If doing
hotplug, it needs to set the "state" node in xenstore to "7"
(reconfiguring) and thus avoid racing with the backend needs to wait
for the backend to be "4" (connected).
However during guest creation, the presence of "num_devs" doesn't
necessarily mean hotplug. If we are still creating the initial
xenstore setup (ie, adding devices as a subroutine of domain
creation), we can just write the new devices to xenstore. So do that.
This involves adding a new parameter "starting", indicating that we
are still in domain creation, to libxl_device_pci_add_xenstore (a
misnamed internal function) and its callers. Its callers include
libxl_device_pci_add which we therefore split into an internal version
with the new parameter, and an external version used only for hotplug
by libxl-using applications.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 25 Feb 2011 17:13:53 +0000 (17:13 +0000)]
libxl: do not ignore errors from libxl_device_pci_add_xenstore in do_pci_add
Without this, some failures of PCI device passthrough would be
ignored.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 25 Feb 2011 17:03:55 +0000 (17:03 +0000)]
xl/libxl: treat vif "ip" fields as a simple string
Currently we parse the string as an IPv4 address but this does not
handle IPv6. We then format the IP address as a string into xenstore.
Rather than add further parsing and formatting to support IPv6 simply
treat the field as a string, which it turns out is all xend does.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Memory parity error is only valid for IBM PC-AT, newer machines use
bit 7 (0x80) of 0x61 port for PCI SERR. While memory errors are
usually reported via MCE.
Rename the memory parity error handler to pci serr handler and
print a warning and continue instead of crashing.
Juergen Gross [Fri, 25 Feb 2011 11:28:15 +0000 (11:28 +0000)]
cpupool: Avoid race when moving cpu between cpupools
Moving cpus between cpupools is done under the schedule lock of the
moved cpu. When checking a cpu being member of a cpupool this must be
done with the lock of that cpu being held. Hot-unplugging of physical
cpus might encounter the same problems, but this should happen only
very rarely.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Thu, 24 Feb 2011 09:33:19 +0000 (09:33 +0000)]
amd-k8-mce: remove a stray break statement
This was a leftover of converting from a switch to an if/else
somewhere between 3.4 and 4.0.
It also looks suspicious that MCEQUIRK_K7_BANK0 is not actually used
anywhere. Perhaps amd_k7_mcheck_init() and amd_k8_mcheck_init() were
intended to get (partially) folded?
Ian Campbell [Fri, 18 Feb 2011 15:32:02 +0000 (15:32 +0000)]
libxl/xl: enable support for routed network configurations.
Add "vifscript" option to xl.conf which configures the default vif
script to use (default remains "vif-bridge")
Write each VIFs "ip" option to xenstore so the vif-route script can
pick it up.
Reported-by: W. Michael Petullo <mike@flyn.org>. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
When multiple disks are passed to libxl_wait_for_disk_ejects, watch the
correct disk paths.
Parse the backend type and backend domid from xenstore in
libxl_event_get_disk_eject_info.
libxl_event_get_disk_eject_info must return a valid string in
disk->vdev, while at the moment is free'ed before returning.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: remove the entries from xenstore when destroying a disk
Currently we are only changing the backend state but it is not enough to
entirely destroying a disk device: remove all the entries from xenstore
as well.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reuse the same code used to parse the disk line from the VM config file
in cd_insert.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 17 Feb 2011 19:48:05 +0000 (19:48 +0000)]
docs: vbd-interface.txt: correct behaviour for modern Linux pv-on-hvm
Modern PV on HVM kernels map hd* devices to corresponding xvd*.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 17 Feb 2011 19:40:17 +0000 (19:40 +0000)]
libxl: do slow resume after failed migration attempt
both of the current callers for libxl_domain_resume are calling after
a migration has failed, one is failure to suspend on the sender and
the other is failure to start on the destination, both leading to a
resume attempt on the sender.
However in the first case, failure to suspend, there is no guarantee
that the guest has made it as far as the suspend hypercall and
therefore the fast resume method, which frobs the hypercall return to
indicate a cancelled suspend, cannot safely be used since it will
corrupt %eax/%rax.
For the second case, failure to start on destination, I don't think it
really matters if the resume is fast or slow.
Therefore always use the slow/uncooperative version of xc_domain_resume from
libxl_domain_resume.
This makes a PV domain which failed to suspend (e.g. because the core
Linux PM infrastructure within the guest didn't allow it) recover
gracefully.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tim Deegan [Wed, 16 Feb 2011 09:48:05 +0000 (09:48 +0000)]
x86/shadow: unconditionally set the p2m/log-dirty allocation functions.
Otherwise enabling log-dirty mode on a PV guest that already has
a shadow allocation can leave the alloc/free functions pointers NULL,
and later try to dereference them.
p2m internals should always gate on whether HAP is enabled for the
domain, not whether a HAP paging mode is currently advertised.
This lets us revert the change to hap_enable() that advertises the
new mode before it's safe to use it.
docs: document disk configuration string syntax (particularly, xl's syntax)
Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Special case how we validate vhd image files. Without this patch when
tap:aio:vhd prefixed image files are specified in the config file,
disk validation and thus vm creation will fail.
Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Currently we pile all the backend and format information pertaining to
disk option in a single enum. This check-in separates the two and
uses two enums, one for disk format and another for disk backend.
This helps clearly differentiate between disk format and backend
within the implementation and also helps cleanup the code in this area
in preparation for the impending parser revamping to be done post 4.1.
Along with separating format and backend, this check-in also removes
unwanted types and renames variables in the disk interface and fixes
the code affected by the interface changes.
In specific, here are the disk interface changes made - In
libxl_device_disk structure physpath was renamed to pdev_path,
virtpath was renamed to vdev, phystype was removed and replaced with
backend and format enums. Also previously a single enum
libxl_disk_phystype held the values for qcow, qcow2, vhd, aio, file,
phy, empty and that got refactored into two enums, libxl_disk_format
to hold unknown, qcow, qcow2, vhd, raw, empty and libxl_disk_backend
to hold unknown, phy, tap and qdisk.
Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Tue, 15 Feb 2011 19:39:05 +0000 (19:39 +0000)]
tools: Include cpupool example in /etc/xen
xl cpupool-create at the moment requires a config file. Make
sure to include the example config file in the install.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 14 Feb 2011 17:02:55 +0000 (17:02 +0000)]
libxl: fix migrate for HVM guests
Prior to 22909:6868f7f3ab3f libxl would loop waiting simultaneously
for the domain the acknowledge a PV suspend request (by clearing the
XenStore node) and for the domain to actually suspend. For HVM guests
without PV drivers this same loop was simply waiting for the domain to
suspend.
In 22909:6868f7f3ab3f the original loop was split into two loops
(first waiting for the acknowledgement and then for the actual
suspend). This caused libxl to incorrectly wait for an HVM guest
without PV drivers to acknowledge the XenStore request, which is not
something it would ever do.
Fix this by only waiting for an acknowledgement from a guest which
contains PV drivers.
Previously we were also making the request regardless of whether the
guest had PV drivers, change that to only make the request if the
guest has PV drivers.
Lastly there is no need to sample HVM_PARAM_ACPI_S_STATE twice and not
doing so simplifies the test for PVHVM vs. normal HVM guests.
Tested with:
Windows with GPL PV drivers (event channel suspend mode)
Windows without PV drivers (xc_domain_shutdown mode)
Linux PV (PV with XenBus control node mode)
Linux HVM (PVHVM with XenBus control node mode (*))
Linux HVM (xc_domain_shutdown mode)
(*) In this case the kernel didn't actually suspend, due to:
PM: Device input1 failed to suspend: error -22
xen suspend: dpm_suspend_start -22
which may be a misconfiguration in my setup or may be a kernel
bug, but the libxl side dealt with this as gracefully as it could.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Juergen Gross [Mon, 14 Feb 2011 16:56:20 +0000 (16:56 +0000)]
xl: Support more than 32 vcpus for xl vcpu-set
xl vcpu-set currently uses a 32 bit mask for specifying which cpus are to be
set online. This restricts the number of cpus supported by this command.
The patch switches to libxl_cpumap, the interface of libxl_set_vcpuonline()
is changed accordingly.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Juergen Gross [Mon, 14 Feb 2011 16:55:00 +0000 (16:55 +0000)]
xl: correct xl cpupool-create with extra parameters
xl cpupool-create won't take always extra parameters specified on the command
line, as a 0-byte is missing at the end of the configuration file contents.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
This is the equivalent of xm trigger s3resume and it is implemented the
same way: using an ACPI state change.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Gang [Mon, 14 Feb 2011 10:41:12 +0000 (10:41 +0000)]
x86: Fix S3 resume for HPET MSI IRQ case
Jan Beulich found that for S3 resume on platforms without ARAT feature
but with MSI capable HPET, request_irq() will be called in
hpet_setup_msi_irq() for irq already setup(no release_irq() called
during S3 suspend), so that always falling back to using
legacy_hpet_event.
Fix it by conditional calling request_irq() for 4.1. Planned to split
the S3 resume path from booting path post 4.1, as Jan suggested.
Signed-off-by: Wei Gang <gang.wei@intel.com> Acked-by: Jan Beulich <jbeulich@novell.com>
Ian Jackson [Fri, 11 Feb 2011 18:21:35 +0000 (18:21 +0000)]
tools/hotplug/Linux: Use correct device name for vifs in setup scripts
In vif-common.sh, set the shell variable "dev" to the new interface
name when interfaces are renamed, and consistently use this variable
in all the vif scripts.
This fixes hotplug of renamed interfaces.
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
From: Patrick Scharrenberg <pittipatti@web.de> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Patrick Scharrenberg <pittipatti@web.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 11 Feb 2011 17:57:32 +0000 (17:57 +0000)]
libxl/xl: improve behaviour when guest fails to suspend itself.
The PV suspend protocol requires guest co-operating whereby the guest
must respond to a suspend request written to the xenstore control node
by clearing the node and then making a suspend hypercall.
Currently when a guest fails to do this libxl times out and returns
a generic failure code to the caller.
In response to this failure xl attempts to resume the guest. However
if the guest has not responded to the suspend request then the is no
guarantee that the guest has made the suspend hypercall (in fact it is
quite unlikely). Since the resume process attempts to modify the
return value of the hypercall (to indicate a cancelled suspend) this
results in the guest eax/rax register being corrupted!
To fix this change libxl to do the following:
* Wait for the guest to acknowledge the suspend request.
- on timeout cancel the suspend request.
- if cancellation is successful then return a new error code to
indicate that the guest is not responding.
- if the cancel does not succeed then we raced with the guest
which actually did acknowledge at the last minute, so
continue.
* Wait for the guest to suspend.
- on timeout return the standard error code as before
* Guest successfully suspended, return success.
Lastly in xl do not attempt to resume a guest if it has not responded
to the suspend request.
Tested by live migration of PVops kernels which either ignore the
suspend request, have already crashed and those which suspend/resume
correctly. In the first two cases the source domain is left alone (and
continues to function in the first case) and in the third the
migration is successful.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 11 Feb 2011 17:56:24 +0000 (17:56 +0000)]
libxl: allow guest to write "control/shutdown" xenstore node.
The PV shutdown/reboot/suspend protocol requires that the guest
acknowledge a request by clearing the node therefore it is necessary
to allow the guest to write to the node.
Currently libxl is quite relaxed about this protocol and doesn't
reeally seem to mind that the guest is unable to write the node to
perform the acknowledgement. However in a followup patch libxl needs
to be able to detect that a guest has acknowledged a suspend request.
A side effect of this change is that an empty "control/shutdown" node
is created upon domain creation instead of only being created when a
shutdown/reboot/suspend is requested. This should not (and does not
in my tests) have any negative impact on the guest.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: do not call libxl__file_reference_unmap twice
Fix double free due to libxl__file_reference_unmap(&info->kernel) called
multiple times: first at the end of libxl__domain_build and then in
libxl_domain_build_info_destroy.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 11 Feb 2011 17:49:13 +0000 (17:49 +0000)]
libxc: increase lzma max memory constant to 128Mby
According to lzma's configure.ac (!) the minimum memory limit to cope
with arbitrary input is 128Mby (!)
This is obviously an unreasonable amount of memory for this kind of
task, but we need to increase the constant limit for it not to
randomly fail. So do so.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Thu, 10 Feb 2011 14:19:54 +0000 (14:19 +0000)]
x86: suppress HPET broadcast initialization in the presence of ARAT
This follows Linux commit 39fe05e58c5e448601ce46e6b03900d5bf31c4b0,
noticing that all this setup is pointless when ARAT support is there,
and knowing that on SLED11's native kernel it has actually caused S3
resume issues.
A question would be whether HPET legacy interrupts should be forced
off in this case (rather than leaving whatever came from firmware).
Keir Fraser [Thu, 10 Feb 2011 14:19:23 +0000 (14:19 +0000)]
x86: tighten conditions under which writing certain MSRs is permitted
MSRs that control physical CPU aspects generally are pointless (and
possibly dangerous) to be written when the writer isn't sufficiently
aware that it's running virtualized.
Juergen Gross [Thu, 10 Feb 2011 09:02:50 +0000 (09:02 +0000)]
Cpupools: vcpu affinity handling
If a vcpu is pinned to multiple physical cpus, the pinning is not
removed if all those physical cpus are removed from the cpupool. When
disabling the scheduler on a cpu, the affinity mask must be checked
against the cpumask of the cpupool.
Wei Wang [Wed, 9 Feb 2011 08:57:12 +0000 (08:57 +0000)]
amd iommu: dynamic page table depth adjustment.
IO Page table growth is triggered by amd_iommu_map_page and grows to
upper level. I have tested it well for different devices (nic and gfx)
and different guests (linux and Win7) with different guest memory
sizes (512M, 1G, 4G and above).
Keir Fraser [Wed, 9 Feb 2011 08:44:38 +0000 (08:44 +0000)]
cpupool: Strict parameter checking for cpupool operations
Some cpupool actions didn't check the cpupool_id exactly. For some
actions this doesn't make any sense, so refuse those actions if the
specified cpupool doesn't exist.
Keir Fraser [Wed, 9 Feb 2011 08:40:05 +0000 (08:40 +0000)]
[VTD][QUIRK] add spin lock across snb pre/postamble functions
Added a spinlock across snb_vtd_ops_preamble() and
snb_vtd_ops_postamble() to make modifications to IGD registers atomic.
Continue keeping snb_igd_quirk default off.
James Harper [Tue, 8 Feb 2011 16:35:35 +0000 (16:35 +0000)]
xend: canonicalise symlinks found in /dev for vbds (helps vscsi)
By default, vscsi expects to be passed the final device name (eg
/dev/st3) instead of one of the various udev symlinks (eg
/dev/tape/by-path/pci-0000:01:08.0-scsi-0:0:2:0-st). The following patch
resolves the path to the real path if the name starts with /dev/
Signed-off-by: James Harper <james.harper@bendigoit.com.au> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: initialise some variables in print_bitmap, to suppress bogus warning
GCC 4.2.4 cannot figure out that three variables aren't used before
initialisation:
xl_cmdimpl.c: In function `print_domain_vcpuinfo':
xl_cmdimpl.c:3351: warning: `firstset' may be used uninitialized in this function
[etc]
Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Olaf Hering [Mon, 7 Feb 2011 16:55:25 +0000 (16:55 +0000)]
tools/hotplug: set mtu from bridge also on vif interface
Apply mtu size from bridge interface also in vif interface.
This depends on a kernel change which allows arbitrary mtu sizes until
the frontend driver has connected to the backend driver. Without this
kernel change, the vif mtu size will be limited to 1500 even with this
change to the vif-bridge script.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Mon, 7 Feb 2011 15:04:32 +0000 (15:04 +0000)]
cpupool: Correct cpupool diag printing
Some of the cpupool_dprintk() calls are using undefined or
uninitialized variables. Correct the argument lists to be able to
define cpupool_printk as printk.
Ian Campbell [Mon, 7 Feb 2011 12:13:24 +0000 (12:13 +0000)]
minios: do not export {test,set,clear}_bit etc to applications
Fixes ioemu stubdom build:
CC i386-stubdom/piix4acpi.o
[...]/stubdom/ioemu/hw/piix4acpi.c:272: error: expected ')' before '?' token
[...]/stubdom/ioemu/hw/piix4acpi.c:277: error: conflicting types for 'set_bit'
[...]/stubdom/../extras/mini-os/include/x86/mini-os/os.h:396: error: previous definition of 'set_bit' was here
[...]/stubdom/ioemu/hw/piix4acpi.c:282: error: conflicting types for 'clear_bit'
[...]/stubdom/../extras/mini-os/include/x86/mini-os/os.h:414: error: previous definition of 'clear_bit' was here
[...]/stubdom/ioemu/hw/piix4acpi.c: In function 'gpe_sts_write':
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Mon, 7 Feb 2011 09:58:11 +0000 (09:58 +0000)]
Pack some hvmop memory structures better
Some of the hvmop memory structures have a shocking amount of
unnecesssary padding in them. Elements which can have only 3 values
are given 64 bits of memory, and then aligned (so that there is
padding behind them).
This patch resizes and reorganizes in the following way, (hopefully)
without introducing any differences between the layout for 32- and
64-bit.
xen_hvm_set_mem_type:
hvmmem_type -> 16 bits
nr -> 32 bits (limiting us to setting 16TB at a time)
xen_hvm_set_mem_access:
hvmmem_access -> 16 bits
nr -> 32 bits
xen_hvm_get_mem_access:
hvmmem_access -> 16 bits
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Sun, 6 Feb 2011 17:26:31 +0000 (17:26 +0000)]
hvm: fix XSAVE leaf 0 EBX size calculation
Fixes a size calculation bug when enabled bits in XFEATURE_MASK (xcr0)
aren't contiguous.
Current for_loop will stop when xcr0 feature bit is 0. But in reality,
the bits can be non-contiguous. One example is that LWP is bit 62 on
AMD platform. This patch iterates through all bits to calculate the
size for enabled features.
Keir Fraser [Sun, 6 Feb 2011 17:10:31 +0000 (17:10 +0000)]
xsm/flask: Fix permission tables
At some point, it seems that someone manually added Flask permission
definitions to one header file without updating the corresponding
policy configuration or the other related table. The end result is
that we can get uninterpretable AVC messages like this:
# xl dmesg | grep avc
(XEN) avc: denied { 0x4000000 } for domid=0
scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t
tclass=domain
Fix this by updating the flask config and regenerating the headers
from it. In the future, this can be further improved by integrating
the automatic generation of the headers into the build process as is
presently done in SELinux.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Keir Fraser [Sun, 6 Feb 2011 17:03:09 +0000 (17:03 +0000)]
hvm amd: Fix 32bit guest VM save/restore issues associated with SYSENTER MSRs
This patch turn-on SYSENTER MSRs interception for 32bit guest VMs on
AMD CPUs. With it, hvm_svm.guest_sysenter_xx fields always contain the
canonical version of SYSENTER MSRs and are used in guest save/restore.
The data fields in VMCB save area are updated as necessary.
Reported-by: James Harper <james.harper@bendigoit.com.au> Signed-off-by: Wei Huang <wei.huang2@amd.com>
Keir Fraser [Sun, 6 Feb 2011 16:54:01 +0000 (16:54 +0000)]
amd iommu: Fix a xen crash after pci-attach
pci-detach triggers IO page table deallocation if the last passthru
device has been removed from pdev list, and this will result a BUG on
amd systems for next pci-attach. This patch fixes this issue.
Keir Fraser [Sun, 6 Feb 2011 16:07:27 +0000 (16:07 +0000)]
cpupool: Check for memory allocation failure on switching schedulers
When switching schedulers on a physical cpu due to a cpupool operation
check for a potential memory allocation failure and stop the operation
gracefully.
Ian Jackson [Fri, 4 Feb 2011 18:47:39 +0000 (18:47 +0000)]
libxl: vncviewer: make autopass work properly
The file we write the vnc password to must be rewound back to the
beginning, or the vnc viewer will simply get EOF.
When the syscalls for communicating the password to the vnc client
fail, bomb out with an error messsage rather than blundering on (and
probably producing a spurious password prompt).
Following this patch, xl vncviewer --autopass works, provided the qemu
patch for writing the password to xenstore has also been applied.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 4 Feb 2011 18:47:20 +0000 (18:47 +0000)]
libxl: vncviewer: unconditionally read listen port address and password
The /local/domain/DOMID/device/vfb/0/backend path is irrelevant.
libxl does not create it, so the branch would never be taken.
Instead, simply read the target paths of interest.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 4 Feb 2011 18:46:22 +0000 (18:46 +0000)]
libxl: vncviewer: fix use-after-free
This bug can prevent xl vncviewer from working at all.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 4 Feb 2011 18:46:00 +0000 (18:46 +0000)]
libxl: actually print an error when execve (in libxl__exec) fails
The header comment says libxl__exec logs errors. So it should do so.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 4 Feb 2011 18:45:26 +0000 (18:45 +0000)]
libxl: SECURITY: always honour request for vnc password
qemu only sets a password on its vnc display if the value for the -vnc
option has the ",password" modifier. The code for constructing
qemu-dm options was broken and only added this modifier for one of the
cases.
Unfortunately there does not appear to be any code for passing the vnc
password to upstream qemu (ie, in the case where
libxl_build_device_model_args_new is called). To avoid accidentally
running the domain without a password, check for this situation and
fail an assertion. This will have to be revisited after 4.1.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: fix console autoconnect with pygrub, by invoking xenconsole twice
When using pygrub we have to connect to the console twice: once at the
beginning to connect to pygrub and a second time after creating the pv
console to connect to the guest's console.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andre Przywara [Fri, 4 Feb 2011 17:32:39 +0000 (17:32 +0000)]
xl: fix broken xl vcpu-list output (tool hangs on large machines)
The algorithm for printing the CPU affinity in a condensed way
looks for a set bit in a zero-byte:
for (i = 0; !(pcpumap & 1); ++i, pcpumap >>= 1)
Looking at the code I found that it is entirely broken if more than 8
CPUs are used. Beside that endless loop issue the output is totally
bogus except for the "any CPU" case, which is handled explicitly earlier.
I tried to fix it, but the whole approach does not work if the outer
loops actually iterates (executing more than once).
This fix reimplements the whole algorithm in a clean (though not much
optimized way). It survived some unit-testing.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Allen Kay [Wed, 2 Feb 2011 17:06:36 +0000 (17:06 +0000)]
libxl: pass gfx_passthru parameter to QEMU
Pass gfx_passthru parameter to QEMU. Keep it boolean for now as QEMU
does not expect any other integer value.
Signed-off-by: Allen Kay <allen.m.kay@intel.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 2 Feb 2011 17:05:27 +0000 (17:05 +0000)]
libxl: change default HVM emulated network card to rtl8139
xend uses rtl8139, and we want xl to be compatible with xm. Some
older operating systems don't have e1000 drivers, and we want widest
compatibility rather than best performance (people who want good
performance are best advised to use PV-on-HVM drivers).
We'll probably switch to a new default when switching to upstream
qemu, in the Xen 4.2 release cycle.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 1 Feb 2011 19:26:36 +0000 (19:26 +0000)]
libxc: maintain a small, per-handle, cache of hypercall buffer memory
Constantly m(un)locking memory can have significant overhead on
systems with large numbers of CPUs. This was previously fixed by
20841:fbe8f32fa257 but this was dropped during the transition to
hypercall buffers.
Introduce a small cache of single page hypercall buffer allocations
which can be resused to avoid this overhead.
Add some statistics tracking to the hypercall buffer allocations.
The cache size of 4 was chosen based on these statistics since they
indicated that 2 pages was sufficient to satisfy all concurrent single
page hypercall buffer allocations seen during "xl create", "xl
shutdown" and "xl destroy" of both a PV and HVM guest therefore 4
pages should cover the majority of important cases.
This fixes http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1719.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reported-by: Zheng, Shaohui <shaohui.zheng@intel.com> Tested-by: Haitao Shan <maillists.shan@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
The current libxl_set_memory_target function subtracts a negative amount
from an uint32_t variable without checking if the operation wraps
around.
This patch fixes this bug (that I previously believed to be an
hypervisor issue):
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1729
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>