Keir Fraser [Tue, 17 Mar 2009 14:22:50 +0000 (14:22 +0000)]
x86 mcheck: Replace hypervisor MCA telemetry structures with something
more robust and designed to make terminal error telemetry available to
the dom0 panic flow for diagnosis on reboot.
Use common code for a lot of the AMD and Intel MCE handling code.
Signed-off-by: Gavin Maltby <gavin.maltby@sun.com> Signed-off-by: Frank van der Linden <frank.vanderlinden@sun.com>
Keir Fraser [Tue, 17 Mar 2009 10:49:42 +0000 (10:49 +0000)]
xend: Add s3_integrity attribute for XenAPI
When XenAPI is available, xm new and xm create fail.
# xm new vm1
Using config file "/etc/xen/vm1".
Attribute 's3_integrity' not declared
# xm create vm1
Using config file "/etc/xen/vm1".
Attribute 's3_integrity' not declared
Keir Fraser [Tue, 17 Mar 2009 10:40:47 +0000 (10:40 +0000)]
xend: Accept udev events and update physical resource information
When a udev event is received, udevevent.py parses the udev data and
tells XendNode.py to update the physical resource information.
This patch also add a boolean parameter 'xend-udev-event-server', to
let users indicate whether we should enable this function or not.
Keir Fraser [Tue, 17 Mar 2009 10:36:20 +0000 (10:36 +0000)]
xend: Implement DGRAM (connectionless) type socket listeners
Introduce SocketDgramListener and UnixDgramListener classes.
We already have STREAM (connection) type socket listener classes in
the source tree, but we need DGRAM (connectionless) type listeners to
receive udev events.
Keir Fraser [Fri, 13 Mar 2009 10:09:25 +0000 (10:09 +0000)]
xenpm: Add CPU topology info (thread/core/socket)
CPU topology info is necessary for power management analysis. For
example, to analysis the effect of Px state coordination, Cx
package/core coordination, the thread/core/socket topology
information is needed.
This patch add new command "get-cpu-topology" in xenpm to print the
CPU topology info:
Keir Fraser [Fri, 13 Mar 2009 07:37:24 +0000 (07:37 +0000)]
minios: allow to allocate machine contiguous pages
This is a port of XenLinux xen_alloc_contig_memory() to mini-os. A
sufficiently privileged mini-os guest can exchange a small number of
its pages with machine contiguous pages.
Keir Fraser [Thu, 12 Mar 2009 18:48:09 +0000 (18:48 +0000)]
blktap: Move error signaling to blktapctrl
Until now the udev script for blktap devices needs to decide if to
signal success or failure to xend. As this script runs completely
independent of blktapctrl and tapdisk/ioemu which do the real work,
the udev script can't even theoretically know if tapdisk is happy.
This patch removes the udev script and replaces its checks by new
ones in libblktap.
Keir Fraser [Thu, 12 Mar 2009 18:46:26 +0000 (18:46 +0000)]
blktapctrl: Fix too early close of pipes
Connections to ioemu have single_handler set, so they are closed as
soon as all images of a certain type are closed. This is wrong with
ioemu: All images that belong to the same domain are handled by the
same backend process (usually qemu-dm, but also tapdisk-ioemu for
domains without device model), regardless of the image type.
This patch checks for the same-domain condition for ioemu connections.
Keir Fraser [Thu, 12 Mar 2009 18:42:59 +0000 (18:42 +0000)]
blktap: Export disk type constants for ioemu
Currently all disk types that are supported are defined in a header
file private to blktapctrl and tapdisk. When restoring ioemu as a
backend for blktap these constants are needed by ioemu, so move them
to a more public header file.
Keir Fraser [Thu, 12 Mar 2009 18:42:31 +0000 (18:42 +0000)]
blktapctrl: Select backend by prefix
This patch adds support for specifying the backend (tapdisk or ioemu)
to blktapctrl. Images can be specified e.g. as tap:tapdisk:aio,
tap:ioemu:qcow2 or tap:vmdk. When omitting the backend, a default is
chosen based on the image type (currently always tapdisk because ioemu
as a backend is broken until a follow-up patch series against qemu-xen
is applied)
Keir Fraser [Thu, 12 Mar 2009 15:08:08 +0000 (15:08 +0000)]
xend: Fix removing /vm/UUID/device paths when device cannot be disconnected
This is a corrected version of a patch commited as c/s 19250 and
reverted by c/s 19314. Thanks to "trap sigerr ERR" in
xen-hotplug-common.sh the xen-hotplug-cleanup would exit when reading
/local/domain/ID/vm fails thus skipping all the xenstore-rm lines in
the rest of the script.
Change deviceDestroy behavior to remove /vm/UUID/device/...
path only when force was used (as it already does so for both frontend
and backend) and do the removing from xen-hotplug-cleanup script when
we are sure the device is really not attached to the guest any more.
Keir Fraser [Thu, 12 Mar 2009 11:16:54 +0000 (11:16 +0000)]
x86: Fix APIC 0x40 error when CPU online and Host s3 resume
disable_APIC_timer actually is not useful here. Actually it will
trigger a local APIC error when masking the LVT entry when vector is
zero (before timer is inited) on Intel P6 family. This APIC error(40)
appears when online the offlined CPU and Host S3 resume.
Signed-off-by: Liping Ke <liping.ke@intel.com> Signed-off-by: Gang Wei <gang.wei@intel.com>
Keir Fraser [Thu, 12 Mar 2009 11:09:57 +0000 (11:09 +0000)]
Domain core-dumping fixes
The code was attempting to use the domain's current number of pages
(info.nr_pages) as a maximum index. We then walk the memory map and
can easily over-write past the end of the nr_pages-sized array, if the
domain has more pages mapped in than earlier (live dump). Restrict
ourselves to the current number of pages.
Also fix the dump core method in xend to actually implement the crash
and live options. In particular this means that xend clients other
than xm now get non-live dumps by default.
Keir Fraser [Thu, 12 Mar 2009 11:07:00 +0000 (11:07 +0000)]
Fix qemu spawn for Solaris
On Solaris, xend runs in a 'process contract' such that all children
are killed when the service is restarted. Spawn qemu processes in a
new contract to avoid this.
The Solaris curses library has a broken timeout() function: after a
first timeout() call with a positive value for an argument, subsequent
calls will fail to reset it. So, getch() always times out, confusing
the pygrub timer in the main loop. Add an extra check to avoid exiting
prematurely.
Signed-off-by: Frank van der Linden <frank.vanderlinden@sun.com>
Keir Fraser [Thu, 12 Mar 2009 10:56:55 +0000 (10:56 +0000)]
xenconsole: Solaris ptys have different semantics.
Make sure that tty semantics are active for Solaris ptys, or if they
aren't (and not needed) to not do tcget/setattr on the filedescriptor
in Python code.
Also work around a bug in the Solaris ptm streams driver, which will
cause a write error on the master side of a pty (because of e.g. a
missing slave) to persist forever.
Signed-off-by: Frank van der Linden <frank.vanderlinden@sun.com>
Keir Fraser [Wed, 11 Mar 2009 10:14:33 +0000 (10:14 +0000)]
xend: Discard error messages of lsscsi
In the case of a host OS without lsscsi command, the following
error message is recorded into xend-debug.log when xend is started.
The error message is recorded once certainly. If SCSI devices are
connected to the host OS, it is recorded to the number of SCSI
devices.
sh: lsscsi: command not found
This patch discards the error message to /dev/null.
Keir Fraser [Wed, 11 Mar 2009 10:10:15 +0000 (10:10 +0000)]
xend: Test tap devices in testDeviceComplete()
XendDomainInfo.testDeviceComplete() should check block devices have
shutdown correctly but it only considers vbd class devices and ignores
tap devices. The attached patch changes testDeviceComplete() to wait
for both vbd and tap devices to be shutdown correctly.
Keir Fraser [Wed, 11 Mar 2009 10:05:00 +0000 (10:05 +0000)]
passthrough: allow pass-through devices to share virtual GSI
Allow multiple pass-through devices to use the same guest_gsi.
The motivation for this is:
* Allow multi-function devices to be passed through as multi-function
devices
* Allow more than two pass-through devices.
- This will place more contention on the GSI-space, and allocation
becomes a lot simpler if GSI sharing is allowed.
Keir Fraser [Mon, 9 Mar 2009 15:01:34 +0000 (15:01 +0000)]
xentrace: trace when we continue with the same task
Trace when the scheduler decides to continue running the same process.
This lets us see that this is happening for one; it also lets us see
domains in a trace which are actively running on pcpu but never
scheduled out.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Mon, 9 Mar 2009 13:50:45 +0000 (13:50 +0000)]
hvm/vpt: Check that an irq is not blocked before waking the vcpu
Currently, when a timer fires for a vpt interrupt, the interrupt
handler calls vcpu_kick() without checking to see if the IRQ is
blocked. This causes the vcpu to wake up out of a halt when it
shouldn't.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Mon, 9 Mar 2009 09:37:52 +0000 (09:37 +0000)]
Add vcpu_migration_delay=<microsecs> boot option to scheduler
The idea is borrowed from Linux kernel: if the vCPU is just
scheduled out and put to run-queue, it's likely cache-hot on its
current pCPU, and it may be scheduled in in a short period of time;
however, if vCPU is migrated to another pCPU, it need to re-warm the
cache.
The patch introduces an option vcpu_migration_delay to avoid
aggressive vCPU migration (actually we really see migration frequency
is very high most of the time.), while in the meantime keeping load
balancing over slightly longer time scales.
Linux kernel uses 0.5ms by default. Considering the cost may be
higher (e.g. VMCS impact) than in native, vcpu_migration_delay=1000 is
chosen for our tests, which are performed on a 4x 6-core Dunnington
platform. In 24-VM case, there is ~2% stable performance gain for
enterprise workloads like SPECjbb and sysbench. If HVM is with
stubdom, the gain is more: 4% for the same workloads.
Signed-off-by: Xiaowei Yang <xiaowei.yang@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 9 Mar 2009 09:19:10 +0000 (09:19 +0000)]
pygrub: Enable domU boot without xen specific arg
This patch makes domUs bring up without xen specific args to guest
kernels. A domU should be bootable without args parameter because
tools/examples/xmexample1 doesn't have one.
Keir Fraser [Mon, 9 Mar 2009 09:18:08 +0000 (09:18 +0000)]
[IA64] fix issue of can not find the qemu-dm in ia64
IA64 uses the /usr/lib/ for xen. We do not=20
need to check the arch_libdir in ia64. It will
return the wrong value to arch_libdir. And guest
can not boot.
Keir Fraser [Fri, 6 Mar 2009 19:18:39 +0000 (19:18 +0000)]
Page offline support in Xen side
This patch add support to offline a page. The basical idea is, when a
page is assigned, it will be marked offline pending and be moved out of
buddy when freed, when a page is free, it will be moved out of buddy directly.
One notice after this change is, now the page->count_info is not
always 0, especially for shadow page, since the PGC_offlining bit may be set.
Keir Fraser [Fri, 6 Mar 2009 19:14:50 +0000 (19:14 +0000)]
x86/mm: Do not set page's count_info directly
Page offline patch add several flag to page_info->count_info. However,
currently some code will try to set count_info after alloc_domheap_pages
without using "&" or "|" operation, this may cause the new flags lost, since
there are no protection. This patch try to make sure all write to
count_info will only impact specific field.
Also currently shadow code assume count_info is 0 for shadow page,
however, this is invalid after the new flags. Change some assert in
shadow code.
Keir Fraser [Fri, 6 Mar 2009 19:10:29 +0000 (19:10 +0000)]
tboot: Fix return code for S3 integrity
The original patch left in a debug return value from one of the memory
integrity checks. This patch returns the correct error code in case of a
failure. This was re-tested to ensure that it still passes for the
expected case.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Keir Fraser [Fri, 6 Mar 2009 19:06:30 +0000 (19:06 +0000)]
vt-d: Better restrict memory ranges considered to be in Xen
The current implementation of xen_in_range() misses several memory
ranges that are used by the hypervisor and thus shouldn't get mapped
into dom0's VT-d tables. This patch should make the check complete.
This patch is only against x86 because I'm not familiar enough with
IA64 to know how much, if any, of these checks apply there.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Keir Fraser [Fri, 6 Mar 2009 18:58:41 +0000 (18:58 +0000)]
tool: make management of PCI D-states by guest optional
D3hot state in some PCI devices causes the failure of domain
creation/destruction.
The default is "pci_power_mgmt=3D0" which disables the guest OS from
managing D-states because it would be better to avoid the trouble than
advantage of low power consumption.
Keir Fraser [Fri, 6 Mar 2009 18:56:28 +0000 (18:56 +0000)]
xm-test: Identifying the network env specified in xend config fails,
if an additional parameter is given for the network-bridge
(e.g. netdev=eth1) The patch splits the network command into the
command name and its parameters to determine the netenv (bridge,
route, nat)