Keir Fraser [Wed, 20 Feb 2008 14:36:45 +0000 (14:36 +0000)]
x86 hvm: Replace old MMIO emulator with x86_emulate()-based harness.
Re-factor VMX real-mode emulation to use the same harness. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 18 Feb 2008 13:50:25 +0000 (13:50 +0000)]
x86: Fix mod_l3_entry() for PAE-on-64 guests. The adjustment of
_PAGE_RW and _PAGE_USER cannot happen before get_page_from_l3e().
Original patch by Manuel Bouyer <bouyer@netbsd.org>. Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 15 Feb 2008 14:13:17 +0000 (14:13 +0000)]
Enable HVM guest VT-d device hotplug via a simple ACPI hotplug device model.
** Currently only 2 virtual hotplug pci slots(6~7) are created so more
than 2 vtd dev can't be hotplugged, but we can easily extend it in
future.
Three new commands are added:
"xm pci-list domid" show the current assigned vtd device, like:
VSlt domain bus slot func
0x6 0x0 0x02 0x00 0x0
"xm pci-detach" hot remove the specified vtd device by the virtual
slot, like:
xm pci-detach EdwinHVMDomainVtd 6
"xm pci-attach DomainID dom bus dev func [vslot]" hot add a new vtd
device in the vslot. If no vslot specified, a free slot will be picked
up. e.g. to insert '0000:03:00.0':
xm pci-attach EdwinHVMDomainVtd 0 3 0 0
** guest pci hotplug
linux: pls. use 2.6.X and enable ACPI PCI hotplug ( Bus options=> PCI
hotplug => ACPI PCI hotplug driver )
windows: 2000/xp/2003/vista are all okay
Keir Fraser [Fri, 15 Feb 2008 12:33:11 +0000 (12:33 +0000)]
Provide fast write emulation path to release shadow lock.
Basically we can consider shadow fault logic into two parts,
with 1st part to cover logistic work like validating guest
page table or fix shadow table, and the 2nd part for write
emulation.
However there's one scenario we can optimize to skip the
1st part. For previous successfully emulated virtual frame,
it's very likely approaching at write emulation logic again
if next adjacent shadow fault is hitting same virtual frame.
It's wasteful to re-walk 1st part which is already covered
by last shadow fault. In this case, actually we can jump to
emulation code early, without any lock acquisition until
final shadow validation for write emulation. By perfc counts
on 64bit SMP HVM guest, 89% of total shadow write emulation
are observed falling into this fast path when doing kernel
build in guest.
Keir Fraser [Thu, 14 Feb 2008 11:14:17 +0000 (11:14 +0000)]
x86 iommu: Define vendor-neutral interface for access to IOMMU. Signed-off-by: Wei Wang <wei.wang2@amd.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 14 Feb 2008 10:33:12 +0000 (10:33 +0000)]
x86 shadow: Reduce scope of shadow lock.
emulate_map_dest doesn't require holding lock, since
only shadow related operation possibly involved is to
remove shadow which is less frequent and can acquire
lock inside. Rest are either guest table walk or
per-vcpu monitor table manipulation
Keir Fraser [Wed, 13 Feb 2008 18:09:27 +0000 (18:09 +0000)]
vmx realmode: Only check for pending interrupts every 16th
instruction, since it is a moderately expensive operation. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 13 Feb 2008 16:35:51 +0000 (16:35 +0000)]
vmx realmode: __hvm_copy() should not hvm_get_segment_register() when
we are emulating. Firstly it is bogus, since VMCS segment state is
stale in this context. Secondly, real mode and real->protected
contexts are rather unlikely tohappen with SS.DPL == 3. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 13 Feb 2008 16:28:38 +0000 (16:28 +0000)]
x86 vmx: Streamline vmx_interrupt_blocked() to avoid a VMREAD if
interrupt delivery is blocked by EFLAGS.IF. This speeds up real-mode
emulation in some cases (where we are currently executing
hvm_local_events_need_delivery() after every instruction). Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 13 Feb 2008 10:43:13 +0000 (10:43 +0000)]
Tools: fix save/restore of 32-bit PV guests with 64-bit tools
by removing some obvious typos, handling CR3 folding and hvirt_start
based on guest word-size, and understanding 32-bit INVALID_MFN.
Keir Fraser [Wed, 13 Feb 2008 10:42:09 +0000 (10:42 +0000)]
pv-on-hvm: Signal crash to Xen tools when HVM guest panics.
Attached patch adds a function to automatically dump core file when
guest linux on HVM domain panics, in the same way as PV domain.
I tested this patch with kernel 2.6.9 and 2.6.18 on both of x86 and
ia64 (to buid for ia64, some patches in the ia64 tree are needed) by
the following steps, and confirmed it works well:
1. Build xen-platform-pci.ko.
2. In /etc/xen/xend-config.sxp, set (enable-dump yes).
3. On guest linux, execute insmod:
# insmod xen-platform-pci.ko
4. When guest linux panics, a core file is dumped.
Keir Fraser [Tue, 12 Feb 2008 16:46:23 +0000 (16:46 +0000)]
stubdom: Rename stubdom/*.build into stubdom/*-build, newlib into
newlib-cvs, lwip into lwip-cvs. Fix .hgignore to ignore only them and
not the patches.
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Tue, 12 Feb 2008 14:59:22 +0000 (14:59 +0000)]
[BUILD] Disable LOCALVERSION_AUTO in upstream Linux builds.
If this option is enabled then the Xen mercurial version ID gets
tacked onto the kernel version (e.g. 2.6.24-git22-hg2593b69b183b)
which is unlikely to be useful or desirable. All the trees which we
build using this method already have uniquely identifying versions
(e.g. 2.6.24-git22 or 2.6.24-mm1).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Tue, 12 Feb 2008 14:59:01 +0000 (14:59 +0000)]
[BUILD] Fixup support for building upstream kernels.
In particular:
- support merged x86 architecture. To facilitate this it made sense
to encode some existing logic in shell scripts rather than
increasing complicated make conditionals.
- set CONFIG_PARAVIRT_GUEST=y which is required for newer kernels.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Tue, 12 Feb 2008 14:35:39 +0000 (14:35 +0000)]
Add stubdomain support. See stubdom/README for usage details.
- Move PAGE_SIZE and STACK_SIZE into __PAGE_SIZE and __STACK_SIZE in
arch_limits.h so as to permit getting them from there without
pulling all the internal Mini-OS defines.
- Setup a xen-elf cross-compilation environment in stubdom/cross-root
- Add a POSIX layer on top of Mini-OS by linking against the newlib C
library and lwIP, and implementing the Unixish part in mini-os/lib/sys.c
- Cross-compile zlib and libpci too.
- Add an xs.h-compatible layer on top of Mini-OS' xenbus.
- Cross-compile libxc with an additional xc_minios.c and a few things
disabled.
- Cross-compile ioemu with an additional block-vbd, but without sound,
tpm and other details. A few hacks are needed:
- Align ide and scsi buffers at least on sector size to permit
direct transmission to the block backend. While we are at it, just
page-align it to possibly save a segment. Also, limit the scsi
buffer size because of limitations of the block paravirtualization
protocol.
- Allocate big tables dynamically rather that letting them go to
bss: when Mini-OS gets installed in memory, bss is not lazily
allocated, and doing so during Mini-OS is unnecessarily trick while
we can simply use malloc.
- Had to change the Mini-OS compilation somehow, so as to export
Mini-OS compilation flags to the Makefiles of libxc and ioemu.
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Tue, 12 Feb 2008 11:37:45 +0000 (11:37 +0000)]
libxenctrl headers should not pollute macro namespace with
mb/rmb/wmb. Instead add a xen_ prefix. Modify Xen's public headers to
expect the prefixed names instead of bare mb/rmb/wmb, but gate this
expectation on a bump of __XEN_INTERFACE_VERSION__. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 12 Feb 2008 10:57:49 +0000 (10:57 +0000)]
device-dm: Use SIGHUP before SIGKILL
Make qemu unblock SIGHUP and make sure the default handler is in
place. Have the domain killer send SIGHUP to the device-model script,
allow the script 10s to clean up, and if still not dead, send
SIGKILL.
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Tue, 12 Feb 2008 10:16:20 +0000 (10:16 +0000)]
Add timestamp option to xenconsoled
Similar to the --log option, --timestamp or -t takes:
- none : No timestamping
- hv : Timestamp hypervisor logs
- guest: Timestamp guest logs
- all : Timestamp guest and hypervisor logs
Keir Fraser [Mon, 11 Feb 2008 15:59:49 +0000 (15:59 +0000)]
Rendezvous selected cpus in softirq (stop_machine).
This is similar to stop_machine_run stub from Linux, to pull
selected cpus in rendezvous point and the do some batch work
under a safe environment. Current one usage is from S3 path,
where individual cpu is pulled down with related online
footprints being cleared. It's dangerous to have other cpus
checking clobbered data structure in the middle, such as
cpu_online_map, cpu_sibling_map, etc.
Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 11 Feb 2008 14:55:33 +0000 (14:55 +0000)]
ioemu: Dynamic VNC colour depth.
The qemu vnc server changes its internal colour depth based on the
client request. This way just one colour conversion is done: the one
in vga_template.h, from the guest colour depth and the vnc server
internal colour depth.
This patch is meant to remove this colour conversion to improve
performances. It accomplishes the goal making the qemu internal colour
depth always the same as the guest colour depth.
The basic idea is that the vnc client is the one that should do the
colour conversion, if necessary. In general it should accept the pixel
format suggested by the server during the initial negotiation. This
behaviour can be set in most vnc clients (vncviewer included).
If the guest changes colour depth, the qemu vnc server changes colour
depth too and notifies the client. The problem is that the vnc
protocol doesn't provide a message from the server to the client to
ask for a colour depth change. So what I am doing is either:
1) quietly starting to do the conversion on vnc server (not gaining
any performance here);
2) closing the vnc connection with the client, so the client can
reconnect and choose the new pixel format.
By default I am doing 1), however the second choice can be enabled
passing the -vnc-switch-bpp command line option.
In order to do the colour conversion on the vnc server I had to
improve the colour conversion code already in place because it only
supported conversions from 32 bpp. The patch adds colour conversion
code that support conversions from any resolution to any resolution.
A last note: to get most out of this patch it is best to set Windows
to 16 bit colour depth, because the 24 bit mode is 24 bit depth and 24
bpp, meaning no alpha channel. The vnc protocol doesn't support 24
bpp, only 32 bpp, so this conversion is unavoidable.
Keir Fraser [Mon, 11 Feb 2008 14:45:29 +0000 (14:45 +0000)]
x86 hvm: Allow HPET to be configured as a per-domain config option.
A new platform variable 'hpet' is added, which defaults to 0 for new
guests (that is, hpet disabled). Default is off (no hpet) because
hpet is currently less accurate in keeping time than PIT (because no
timer_mode adjustments).
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 11 Feb 2008 14:42:52 +0000 (14:42 +0000)]
xend: Better support for legacy HVM config with hvmloader configured
via the 'kernel' config option:
1. Look for any string containing 'hvmloader'.
2. The 'kernel' option must be scrubbed to avoid taking
PV-kernel-loading paths during later guest setup. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 11 Feb 2008 10:16:53 +0000 (10:16 +0000)]
x86 shadow: Move the shadow linear mapping for n-on-3-on-4 shadows so
that guest mappings of the bottom 4GB are not reflected in the monitor
pagetable. This ensures in particular that page 0 is not mapped,
allowing us to catch NULL dereferences in the hypervisor.
Keir Fraser [Mon, 11 Feb 2008 10:15:07 +0000 (10:15 +0000)]
xend: Remove redundant xc.domain_setcpuweight() all the way down to libxenctrl. Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 11 Feb 2008 09:57:38 +0000 (09:57 +0000)]
qemu: Queue mouse clicks.
qemu doesn't enqueue mouse events, just records the latest mouse
state. This can cause some lost mouse double clicks if the events are
not processed fast enought. This patch implements a simple queue for
left mouse click events.
Keir Fraser [Mon, 11 Feb 2008 09:47:19 +0000 (09:47 +0000)]
xentrace: Remove redundant tb_done_init checks, and add missing ones.
Hand inspection of gcc -02 output confirms significantly shorter
codepaths for inactive (i.e. normal case) tracing.
Signed-off-by: Michael A Fetterman <Michael.Fetterman@cl.cam.ac.uk>
Keir Fraser [Mon, 11 Feb 2008 09:46:53 +0000 (09:46 +0000)]
xentrace: Improve xentrace to use VIRQ_TBUF interrupts as well as a
user-specified polling interval in order to determine when to empty
the trace buffers. Removed the old and unused/unimplemented
new_data_threshold logic.
Signed-off-by: Michael A Fetterman <Michael.Fetterman@cl.cam.ac.uk>
Keir Fraser [Mon, 11 Feb 2008 09:46:21 +0000 (09:46 +0000)]
xentrace: Allow xentrace to handle >4G of trace data.
It was previously assert'ing when it hit 4G.
Also, because the trace buffer is not a power of 2 in size,
using modulo arithmetic to address the buffer does not work
when the index wraps around 2^32.
This patch fixes both issues, and as a side effect, removes all
integer division from the hypervisor side of the trace mechanism.
Signed-off-by: Michael A Fetterman <Michael.Fetterman@cl.cam.ac.uk>
Keir Fraser [Mon, 11 Feb 2008 09:45:36 +0000 (09:45 +0000)]
xentrace: Fix bug in logic for bytes_to_wrap in trace buffer.
Admittedly, the bug could only be manifest with much larger trace
records than are currently allowed (or equivalently, much smaller
trace buffers), but the old code was harder to read, and thus hid the
logic bug well, too.
Signed-off-by: Michael A Fetterman <Michael.Fetterman@cl.cam.ac.uk>
Keir Fraser [Thu, 7 Feb 2008 10:31:48 +0000 (10:31 +0000)]
hvm: Clean up CPUID_0000_0001 return values.
The fix to EBX.ApicID was pointed out by Andre Przywara
<andre.przywara@amd.com>. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 7 Feb 2008 09:28:55 +0000 (09:28 +0000)]
Add 'coredump-destroy' and 'coredump-restart' actions for crashed domains.
Xen-API already specifies these actions for the 'on_crash' domain exit
event. This patch makes them available for use in traditional domU
config files and through the xm tool as well.
Keir Fraser [Thu, 7 Feb 2008 09:27:46 +0000 (09:27 +0000)]
xm reboot: Fix wait option of xm reboot command
When I rebooted a domain by xm reboot command with wait option,
I saw the following message. But, rebooting the domain succeeded.
Domain vm1 destroyed for failed in rebooting
The cause why the message was shown is the domain is destroyed
temporarily by processing of xm reboot command. The domain
information is not gotten from Xend by server.xend.domains()
function till recreating the domain is completed.
This patch fixes processing of xm reboot command in Xm side.
It waits just a bit till recreating the domain is completed,
then it measures the success or failure of the reboot of the
domain.
Keir Fraser [Thu, 7 Feb 2008 09:21:19 +0000 (09:21 +0000)]
ioemu: avoid name clashes due to LIST_* macros
Here is what I wrote in my submission to qemu upstream:
qemu's audio subdirectory contains a copy of BSD's sys-queue.h, which
defines a bunch of LIST_ macros. This makes it difficult to build a
program made partly out of qemu and partly out of the Linux
kernel[1], since Linux has a different set of LIST_ macros. It might
also cause trouble when mixing with BSD-derived code.
Under the circumstances it's probably best to rename the versions in
qemu. The attached patch does this.
[1] You might well ask why anyone would want to do this. In Xen we
are moving our emulation of IO devices from processes which run on
the host into a dedicated VM (one per actual VM) which we call a
`stub domain'. This dedicated VM runs a very cut-down `operating
system' which uses some code from Linux.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Thu, 7 Feb 2008 09:19:12 +0000 (09:19 +0000)]
ioemu: config cleanup re AF_UNIX sockets on non-Windows
Here is what I wrote in my submission to qemu upstream:
The patch below makes it possible to disable AF_UNIX (unix-domain)
sockets in host environments which do not define _WIN32, by adding
-DNO_UNIX_SOCKETS to the compiler flags. This is useful in the
effectively-embedded qemu host which are going to be using for device
emulation in Xen.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>