Keir Fraser [Tue, 18 Mar 2008 11:29:18 +0000 (11:29 +0000)]
minios: Fix lost events
evtchn_bind_interdomain used to clear any already pending event before
binding a handler, because else the handler may be called before it is
ready. That however leads to missed events, which I had to workaround
for the HVM case.
This changes the semantics of bind_evtchn, and thus of all the
event channel binding functions (bind_virq, evtchn_alloc_unbound,
evtchn_bind_interdomain) into not unmasking the event itself, hence
letting the caller initialize properly before unmasking the port (e.g.
record the port number in an appropriate place).
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Tue, 18 Mar 2008 11:27:36 +0000 (11:27 +0000)]
stubdom: support save/restore by passing the -loadvm parameter,
letting the stubdomain access the save area, and having it watch the
correct xenstore node.
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Tue, 18 Mar 2008 11:07:00 +0000 (11:07 +0000)]
Convert XenAPI platform values to appropriate types.
XenAPI defines the platform attribute of a VM as a string-string map
but in various code paths in xend the platform entries are expected to be
another type, e.g. int. This patch defines the types of each platform
entry and converts the entry values to appropriate type when new domU
configuration is created via XenAPI.
Alternatively the values could be casted to appropriate type when used
but seems prudent to do the conversion when domU configuration is
created.
Keir Fraser [Tue, 18 Mar 2008 11:06:26 +0000 (11:06 +0000)]
Set device model when creating rfb console via XenAPI
When creating a rfb console via XenAPI the device model is not being
set to qemu-dm, resulting in no pvfb since qemu-dm is not launched
when domU is started. This patch sets device model when a rfb console
is created via XenAPI.
Keir Fraser [Tue, 18 Mar 2008 11:05:53 +0000 (11:05 +0000)]
Initialization of new domU config options via XenAPI
Many of the new domU config options related to hvm guests (e.g. hpet,
rtc_timeoffset, etc.) are not initialized with default values via
XenAPI, which prevents starting an hvm domU created through XenAPI.
This patch ensures the new options are set with appropriate default
values in XendConfig platform sanity check.
Keir Fraser [Tue, 18 Mar 2008 11:02:00 +0000 (11:02 +0000)]
Each TAP/TUN device name for a HVM guest includes its domain ID.
The TAP/TUN devices are used for connection to a HVM domain, but there
is no relationship between each name of the devices and its domain
ID. This patch assigns a HVM domain ID to each TAP/TUN device name.
Keir Fraser [Tue, 18 Mar 2008 10:58:47 +0000 (10:58 +0000)]
Use ioemu block drivers through blktap.
Add support for a tap:ioemu pseudo driver. Devices using this driver
won't use tapdisk (containing the code duplication) any more, but will
connect to the qemu-dm of the domain. In this way no working
configuration should be broken right now as you can still choose to
use the tapdisk drivers.
Keir Fraser [Tue, 18 Mar 2008 10:51:20 +0000 (10:51 +0000)]
x86: Clean ups and fixes after bitops changes.
Firstly, the vlapic bitops need fewer casts.
Secondly, the minimum-alignment check is unnecessary and also breaks
the build (page_info's type_info field has alignment == 1). It is an
unnecessary check because bitops operate on only one bit of the word
they access, so lack of atomicity of the read and writeback does not
matter -- furthermore the LOCKed variants are guaranteed atomic
regardless of alignment.
Keir Fraser [Mon, 17 Mar 2008 11:39:50 +0000 (11:39 +0000)]
SVM: handle page faults in emulated instruction fetches
Deal with failures in hvm_copy_from_guest_virt when fetching
instructions in the various SVM emulation paths. Since we know that
the instruction was fetchable by the hardware, we can usually just
return from the VMEXIT and try again; whatever caused us to fail will
cause the hardware to fail next time and we'll get the correct exit
code.
Keir Fraser [Sun, 16 Mar 2008 14:11:34 +0000 (14:11 +0000)]
x86: Allow bitop functions to be applied only to fields of at least 4
bytes. Otherwise the 'longword' processor instructions used will
overlap with adjacent fields with unpredictable consequences.
This change requires some code fixup and just a few casts (mainly when
operating on guest-shared fields which cannot be changed, and which by
observation are clearly safe).
Based on ideas from Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 5 Mar 2008 10:52:51 +0000 (10:52 +0000)]
x86: New vcpu_op call to get physical CPU identity.
Some AMD machines have APIC IDs that not equal to CPU IDs. In
the default Xen configuration, ACPI calls on these machines
can get confused. This shows up most noticeably when running
AMD PowerNow!. The only solution is for dom0 to get the
hypervisor's cpuid to apicid table when needed (ie, when dom0
vcpus are pinned).
Add a vcpu op to Xen to allow dom0 to query the hypervisor for
architecture dependent physical cpu information if dom0 vcpus are
pinned.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 4 Mar 2008 13:30:07 +0000 (13:30 +0000)]
ioemu: improve colordepth negotiation
By moving the colourdepth callback a bit earlier, we can let the
display decide the actual depth to be used before the draw and whether
sharing is possible or not.
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Tue, 4 Mar 2008 10:33:50 +0000 (10:33 +0000)]
x86: On CPU shutdown, clear pending FPU exceptions.
I've seen at least one BIOS which fails warm reboot if FPU exceptions
are pending. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 4 Mar 2008 10:32:13 +0000 (10:32 +0000)]
x86_emulate: Emit emulated forms of most FPU instructions as '.byte
xx,yy'. This is arguably clearer than using the mnemonic opcode, since
it is more clearly the instruction we have just decoded. Furthermore,
gas likes to reverse FPU operands on some two-operand FPU instructions
for historical reasons. Finally, 'byte xx,yy' is potentially more
amenable to further macro-isation down the road. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 3 Mar 2008 15:19:39 +0000 (15:19 +0000)]
hvm emulate: Correctly probe when we are in 64-bit mode and set
address-size default appropriately. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 3 Mar 2008 13:19:44 +0000 (13:19 +0000)]
ioemu: xenfb shared memory patch
Share the internal xenfb backend buffer with sdl or vnc. All the
needed functions are already in place because have been implemented
for the previous cirrus vga shared memory patch.
Keir Fraser [Mon, 3 Mar 2008 11:47:40 +0000 (11:47 +0000)]
x86_emulate: INS/OUTS need Mov attribute to force writeback (since
dst.orig_val is not initialised). Also, Mov attribute on cmpxchg is
not necessary -- when destination is memory (i.e., successful cmpxchg)
then dst.orig_val is already correctly filled in. In case that
dst.orig_val == dst.val then the instruction linearises at the point
we first read the destination (and initialised dst.orig_val). Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 3 Mar 2008 11:09:33 +0000 (11:09 +0000)]
Fix xm vcpu-pin for inactive managed domains
We cannot currently change cpu affinity definitions of inactive
managed domains by xm vcpu-pin command. This patch enables changing
cpu affinity definitions of inactive managed domains. However, we
cannot change cpu affinity definitions to each vcpu because we cannot
currently define cpu affinity definitions to each vcpu to domain
configuration definitions. Therefore, we need to specify 'all' to
VCPU argument of xm vcpu-pin command.
Keir Fraser [Mon, 3 Mar 2008 11:06:31 +0000 (11:06 +0000)]
ioemu: sdl blitting
Right now qemu takes care of converting pixels between the guest pixel
format and the sdl pixel format, after that qemu also memcpy the
converted pixels to the sdl buffer in video ram. This process can be
improved using the SDL blit capabilities: the patch I am attaching
creates an SDL Surface from the Cirrus VGA framebuffer and uses SDL
blitting functions to convert and write pixels to video ram. SDL
blitting functions are optimized and can even be hardware accelerated
on some platforms.
Keir Fraser [Mon, 3 Mar 2008 11:05:18 +0000 (11:05 +0000)]
ioemu: fix xenfb slow case update by shifting to the left before
masking low bits instead of shifting to the right and masking high
bits. Also adds 24bpp support.
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Mon, 3 Mar 2008 10:56:09 +0000 (10:56 +0000)]
kexec: Add XLAT_kexec_range
Add XLAT_kexec_range and use it to translate between xen_kexec_range_t
and compat_kexec_range_t. I missed this in my previous patche which
created the explicit definition of kexec_get_range_compat().
Alex Williamson [Fri, 29 Feb 2008 16:18:01 +0000 (09:18 -0700)]
[IA64] kexec: Unpin TLB in the hypervisor
The dom0 relocate_new_kernel code makes a large number of assumptions about
various compile time constants, and thus assumes that these constants are
the same for the hypervisor and dom0. Despite extensive #ifdef work this
has proved to be both fragile and incomplete.
This patch changes things around so that the unpinning work is done
by code provided by the hypervisor, reusing existing code there.
Apart from being a solution that works, its also likely
a much more maintainable solution, as as TLB changes in the hypervisor
code are made, the code paths in the hypervisor are much more likely
to be checked than this one which lies in a completely different tree.
There is also a dom0 Linux kernel portion to this patch.
Its commit message has comments detailing various implementation
issues. See linux-2.6.18-xen.hg ee7015727bd15e80e17e725f70c0a5336e45607a
Keir Fraser [Thu, 28 Feb 2008 13:44:28 +0000 (13:44 +0000)]
NUMA node migration
Adds NUMA node migration based on live migration to
xend. By adding another parameter to "xm migrate" the target NUMA node
number gets propagated to the target host (can be both localhost or a
remote host). The restore function then sets the VCPU affinity
accordingly. Only changes Python code in xend. I hope that the patch
doesn't break XenAPI compatibility (adding a parameter seems fine?).
# xm migrate --live --node=<nodenr> <domid> localhost
<nodenr> is the number as shown with 'xm info' under node_to_cpu
I am aware that using live migration isn't the best approach (takes
twice the memory and quite some time), but it's less intrusive and
works fine (given localhost migration stability...)
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Thu, 28 Feb 2008 13:40:30 +0000 (13:40 +0000)]
x86 shadow: Audit tables and guest walk when we know they are consistent.
From: Gianluca Guida <gianluca.guida@eu.citrix.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 28 Feb 2008 13:21:49 +0000 (13:21 +0000)]
Add ACPI tables support for AMD IOMMU
Configuration information for AMD IOMMU control fields are descirbed
by I/O virtualization Reporting Structure (IVRS) table, this patch set
parses IVRS table and updates iommu control flags according to the result.
Keir Fraser [Thu, 28 Feb 2008 13:18:29 +0000 (13:18 +0000)]
x86 shadow: Remove lock on first guest table walk.
Existing shadow fault path grabs big lock before walking
guest tables, to ensure consistency with shadow content
lest concurrent change from other vcpu in a bad OS.
But this lock brings more lock contention when scaled up
for a good guest which already prevents above case happen.
So this patch tries to remove the lock on first guest
table walk, and then delay check at some special points.
The key is to check whether any guest table update happens
between 1st walk and holding shadow lock. Here we take
two hints for guest table update:
* write permission removal
* write emulation
If any above two operations are observed within the race
window, it indicates possiblity that previous walk result
may be inaccurate and re-check is requried. If mismatch,
simply return to trigger another fault.
I made some experiment to sample perfc count:
<64bit guest>
3.7% of gwalks are re-checked
For re-check, 68% comes from write permission removal
<32bit pae guest>
7.2% of gwalks are re-checked
For re-check, 54.9% comes from write permission removal
Actually previous fast emulation optimization already skip
lots of guest table walks, and thus above ratio can be
smaller if compared to total shadow fault count.
Basically shadow promotion with write permision removal
does suffer higher overhead, but the benefit to reduce
lock contention is more obvious.
Improvement on kernel compile for this patch is:
(64bit Xen)
32bit guest: 1.1%
pae guest: 0.4%
64bit guest: 0.5%
Keir Fraser [Thu, 28 Feb 2008 13:09:28 +0000 (13:09 +0000)]
Rename struct xenkbd_position member abs_z to rel_z. Z-axis motion is
relative, not absolute.
From: Markus Armbruster <armbru@redhat.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 28 Feb 2008 10:48:42 +0000 (10:48 +0000)]
kexec: extend hypercall for efi memory map, boot parameter and xenheap regions
Extend the kexec hypercall to allow it to return the efi memory map,
boot parameter and xen heap regions.
The efi memory map and boot parameter regions need to be supplied
by xen to dom0, rather than established by dom0 as dom0's memory
is not machine memory and thus the regions are not in the correct
location. These regions are inserted into machine_iomem by dom0.
The xen heap region is needed as on xen the hypervisor code and
heap exist in two different EFI memory regions, which are
reflected in machine_iomem. Thus a single xen code region can't be
inserted into machine_iomem.
Keir Fraser [Thu, 28 Feb 2008 10:45:47 +0000 (10:45 +0000)]
kexec: limit scope of the use of compat_kexec_range_t
Unless I am mistaken, the compat functions are provided a stable ABI.
This includes providing a stable version of xen_kexec_range_t in the
form of compat_kexec_range_t. However, internally it doesn't really
matter how xen represents the data.
Currently the code provides for the creation of a compat version of
all kexec range functions, which use the compat_kexec_range_t
function. This is difficult to extend if range code exists outside of
xen/common/kexec.c.
The existence of "#ifdef CONFIG_X86_64" in the code suggests that some
of the range code might be better off in architecture specific code.
Furthermore, subsequent patches will introduce ia64-specific range
handling code, which really would be much better off somewhere in
arch/ia64/.
With this in mind, the handling of compat_kexec_range_t is changed
such that the code which reads and returns data from user-space
translates between compat_kexec_range_t and xen_kexec_range_t. As,
padding aside, the two structures are currently the same this is quite
easy. Things may get more tricky in the future, but I don't believe
this change is likely to make things significantly worse (or better)
in that regard. In any case, refactoring can occur again as required.
Keir Fraser [Thu, 28 Feb 2008 10:29:25 +0000 (10:29 +0000)]
ioemu: Send logs to stderr and have xend redirect stderr to the
correct log file.
At the same time, this patch renames the logfile to be
'qemu-dm-{NAME}.log' instead of qemu-dm-{ID}.log. This makes it
easier to track/find the QEMU logfile associated with a VM. It will
also save 1 backup qemu-dm-{NAME}.log.1 so if a domain crashes &
restart, you don't loose/overwrite the logfile immediately.
Finally it changes the QEMU monitor prompt back to '(qemu)' instead of
'(HVMXen)' because automated tools /scripts interacting with QEMU's
monitor need a consistent prompt to look for & changing it for Xen
serves no useful purpose.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Wed, 27 Feb 2008 13:16:02 +0000 (13:16 +0000)]
ioemu: Fix e1000 mmio range size.
Per Intel 82540EM Software Developer's Manual pp. 211, the mmio size
is 0x20000, or address overlapping occurs and causes the second card
to fail, which happened to me earlier.
From: Tina Yang <tina.yang@oracle.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 26 Feb 2008 14:50:45 +0000 (14:50 +0000)]
ioemu: VNC updates should be sent only when requested.
Reading qemu code I realized that the qemu vnc server sometimes sends
framebuffer updates even if the client didn't request any. This is not
consistent with the RFB protocol spec and can break some clients.
This patch strictly enforces compliance with the RFB protocol making
sure framebuffer updates are sent only if the client requested one.
Doing so is more difficult than it seems because some framebuffer
pseudo-encoding updates cannot be discarded but must be sent anyway:
for example desktop resize and pixel format change messages. To solve
the problem I wrote a queue that stores those messages and sends them
as soon as the client asks for an update. Since 90% of the times the
queue is used to store only few elements, the queue allocates 10
elements at the beginning and every time it runs out of elements
allocates other 10 elements. This is should drastically limit the
number of malloc and free needed to maintain the queue. I did some
stress tests in the last couple of days and seems to work well.
Keir Fraser [Tue, 26 Feb 2008 14:38:57 +0000 (14:38 +0000)]
xentrace: Add option to reserve disk space
Before writing records, xentrace will check to make sure that there is
a minimum amount of space left on the output filesystem. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>