Keir Fraser [Mon, 29 Oct 2007 09:17:38 +0000 (09:17 +0000)]
x86, hvm: Clean up code style in stdvga code and do not compile for
32-bit hypervisor (it doesn't work). Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Mon, 29 Oct 2007 08:46:34 +0000 (08:46 +0000)]
xend, acm: Trigger a script when a resource's label changes
This patch triggers a script when a resource's label changes. The xend
config file should provide a variable 'resource-label-change-script'
that can then be launched.
Keir Fraser [Fri, 26 Oct 2007 16:16:14 +0000 (17:16 +0100)]
Fix PVFB device initialization
The final series of patches I sent out lost 2 hunks in the big
refactoring patches I did thanks to a messed up rebase/rediff :-( This
patch fixes the device nodename initialization so that watches work
correctly.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Fri, 26 Oct 2007 15:06:49 +0000 (16:06 +0100)]
x86: Replace FLUSH_LEVEL() parameter to flush_area() with rather
clearer FLUSH_ORDER(). Also remove bogus assertion. Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Fri, 26 Oct 2007 14:14:38 +0000 (15:14 +0100)]
serial: Check index argument before indexing into an array.
Pointed out by Christoph Egger, and worth fixing for clarity even if
it's not necessarily a bug. Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Fri, 26 Oct 2007 10:40:10 +0000 (11:40 +0100)]
x86/64: paravirt 32-on-64 call gate support
As we realized while trying out NetWare's ring 3 support, call gates
didn't work for 32-bit guests on 64-bit hypervisor. Since x86-64
doesn't know 16- or 32-bit call gates, the only option was to emulate
them. The code here was developed against 3.0.4, so hasn't been
checked for potential integration possibilities with the much improved
emulator; nevertheless I want to supply this patch.
As was realized in the course of creating this patch, 64-bit gates
don't work either, and will also need to be emulated if any
environment intends to use them. The patch changes behavior here in
that rather silently permitting the use of 64-bit gates (with possibly
difficult to understand exceptions happening on the first instruction
of the call/jump target) the call/jump itself will now fault, with the
error code indicating the gate that was attempted to be used. I intend
to complete the emulation to also cover 64-bit gates, but there is one
issue that first needs to be addressed: Whether a gate transitions
from user to kernel mode doesn't depend on the gate, but rather on the
descriptor referenced by the selector held in the gate. As the two can
change independently, this decision can be made only at the point of
use of the gate, and consequently descriptors for kernel code segments
must become distinguishable from user ones, which they currently
aren't as they both get their DPL forced to 3. An initial thought here
is to possibly leverage the otherwise meaningless conforming bit
(i.e. forcing it on for all user code segments, and off for kernel
ones, where then the distinction can be made at the point the
descriptor gets verified/fixed up based of the kernel supplied DPL
[wouldn't work for old guests when setting the DPL to 3 was still
required to be done by the guest]).
The patch also changes behavior of check_descriptor() in that no
modification is done to the descriptor anymore unless all verification
steps passed, and in that the selector RPL of selectors in call gates
no longer gets fixed up (a comment elsewhere in the code correctly
states that the RPL field here isn't used for anything by the
processor); really, this field is now used on 64-bits to store the
original DPL of the gate, because the architectural one now gets
forced to zero.
Keir Fraser [Fri, 26 Oct 2007 09:55:50 +0000 (10:55 +0100)]
LAPIC timer accounting fix
Offset emulated local APIC timer so it doesn't tick during guest's
timer related processing. Otherwise, guests using the local APIC for
process accounting can see long sequences of process ticks incorrectly
charged to interrupt processing.
Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Gary Grebus <ggrebus@virtualiron.com>
Keir Fraser [Fri, 26 Oct 2007 09:32:20 +0000 (10:32 +0100)]
x86, hvm: Improve standard VGA performance
This patch improves the performance of Standard VGA,
the mode used during Windows boot and by the Linux
splash screen.
It does so by buffering all the stdvga programmed output ops
and memory mapped ops (both reads and writes) that are sent to QEMU.
We maintain locally essential VGA state so we can respond
immediately to input and read ops without waiting for
QEMU. We snoop output and write ops to keep our state
up-to-date.
PIO input ops are satisfied from cached state without
bothering QEMU.
PIO output and mmio ops are passed through to QEMU, including
mmio read ops. This is necessary because mmio reads
can have side effects.
I have changed the format of the buffered_iopage.
It used to contain 80 elements of type ioreq_t (48 bytes each).
Now it contains 672 elements of type buf_ioreq_t (6 bytes each).
Being able to pipeline 8 times as many ops improves
VGA performance by a factor of 8.
I changed hvm_buffered_io_intercept to use the same
registration and callback mechanism as hvm_portio_intercept
rather than the hacky hardcoding it used before.
In platform.c, I fixed send_timeoffset_req() to sets its
ioreq size to 8 (rather than 4), and its count to 1 (which
was missing).
Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Robert Phillips <rphillips@virtualiron.com>
Keir Fraser [Fri, 26 Oct 2007 08:56:54 +0000 (09:56 +0100)]
hvm, x86: Allow virtual timer mode to be specified.
In HVM config file:
timer_mode=0 # Default: virtual time is delayed when timer ticks are
# missed dur to preemption
timer_mode=1 # Virtual time always equals wall time, even while missed
# ticks are pending
Keir Fraser [Thu, 25 Oct 2007 14:54:19 +0000 (15:54 +0100)]
pv-on-hvm: fixes for unmodified drivers build and modern Linux
- The adjustments to README and overrides.mk are generic.
- The removal of explicit linux/config.h inclusion should also not
cause any issues.
- The introduction of irq_handler_t should eliminiate warnings on
2.6.19+ kernels (I didn't check they're there, but since the
request_irq prototype changed, I'm sure there's at least
one. However, as a result changes to the Linux tree are expected to
be required.
- The change setup_xen_features -> xen_setup_features follows the
naming in mainline 2.6.23 but would apparently also require changes
to the Linux tree.
- The changes SA_* -> IRQF_ and pci_module_init ->
pci_register_driver should also not cause issues.
Keir Fraser [Thu, 25 Oct 2007 14:04:33 +0000 (15:04 +0100)]
qemu: Add extra tracing around logdirty bitmap setup. Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Gary Grebus <ggrebus@virtualiron.com>
Keir Fraser [Thu, 25 Oct 2007 14:01:59 +0000 (15:01 +0100)]
hvm: In xenstore_process_logdirty_event(), if a stale shared memory
key is encountered reset 'seg' to NULL so the shared memory
initialization can be retried later.
Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Robert Phillips <rphillips@virtualiron.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Thu, 25 Oct 2007 13:55:37 +0000 (14:55 +0100)]
hvm,x86: Add more vmxassist opcodes for Ubuntu 7.0.4 support Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Gary Grebus <ggrebus@virtualiron.com>
Keir Fraser [Thu, 25 Oct 2007 13:45:47 +0000 (14:45 +0100)]
pv-qemu 10/10: Make xenconsoled ignore doms with qemu-dm
This patch writes a field /local/vm/DOMID/console/type taking the
value 'ioemu' or 'xenconsoled'. If xenconsoled sees a type that is
not its own, then it skips handling of that guest. The qemu-dm
process doesn't need to read this field since it will only attach
to the console if given the -serial pty arg which XenD already
ensures matches this xenstore field.
The overall behaviour is that if a paravirt guest has a qemu-dm
process running then that handles the console, otherwise the
xenconsoled handles it. The former is more functional, with the
exception of not currently supporting persistent logging to a
file at the same time as exposing a PTY.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:45:07 +0000 (14:45 +0100)]
pv-qemu 9/10: XenD device model re-factoring
This patches adapts XenD so that it is capable of starting a qemu-dm
device model for both paravirt and fullyvirt guests. A paravirt guest
will only be given a device model if it has a VFB configured, or the
user explicitly include the device_model option in the config
config. This avoids unnecessary overhead for those wanting a minimal
paravirt guest.
The bulk of this patch involves moving code from the HVMImageHandler
into the base ImageHandler class. The HVMImageHandler and
LinuxImageHandler subclasses now merely containing a couple of
overrides to set some specific command line flags. The most important
is -M xenpv, vs -M xenfv.
The XenConfig class has a minor refactoring to add a has_rfb() method
to avoid duplicating code in a couple of places. Instead of hardcoding
DEFAULT_DM it now uses the xen.util.auxbin APIs to locate it - this
works on platforms where qemu-dm is in /usr/lib64 instead of
/usr/lib. As before paravirt only gets a default qemu-dm if using a
VFB.
The vfbif.py class is trimmed out since it no longer needs to spawn a
daemon. A few other misc fixes deal with qemu-dm interactions when
saving/restoring, and in particular recovering from save failures (or
checkpointing).
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:42:40 +0000 (14:42 +0100)]
pv-qemu 8/10: Add pv console to QEMU paravirt machine
This patch adds a paravirt console driver to qemu-dm. This is used
when the QEMU machine type is 'xenpv', connecting to the ring buffer
provided by the guest kernel. The '-serial' command line flag controls
how the guest console is exposed.
For parity with xenconsoled the '-serial pty' arg can be used. For
guests which are running a qemu-dm device model, the xenconsoled
daemon is no longer needed for guest consoles. The code for the
xen_console.c is based on the original code in
tools/console/daemon/io.c, but simplified; since its only dealing with
a single guest there's no state tracking to worry about.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:41:35 +0000 (14:41 +0100)]
pv-qemu 7/10: Async negotiation with xenfb frontend
This patch re-factors the paravirt console xenfb_attach_dom
method. The original method blocks the caller until the front &
backends have both switched to the connected state. This isn't an
immediate problem, but patches which follow will extend qemu to also
handle the text console so blocking on graphics console startup will
block the text console processing.
The new code is basically a state machine. It starts off with a watch
waiting for the KBD backend to switch to 'initialized' mode, then does
the same for the FB backend. Now it waits for KBD & FB frontend
devices to initialize, reading & mapping the framebuffer & its config
at the appropriate step. When the KBD frontend finally reaches the
connected state it registers a graphical console with QEMU and sets up
the various framebuffer, mouse & keyboard event handlers. If a client
connects to the VNC server before this is completed, then they will
merely see a text console (or perhaps the monitor if configured that
way).
The main difference from previous versions of this patch, is that at
the suggestion of Markus Armbruster, I'vere-ordered the individual
static functions so they are in order-of-call, rather than
reversed. Although I now have to pre-declare them, it is much easier
to read the code. I have also fixed the keycode -> keysym translations
to match previous behaviour.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:40:19 +0000 (14:40 +0100)]
pv-qemu 6/10: Merge private & public xenfb structs
This patch merges the public & private structs from the paravirt FB
into a single struct. Since QEMU is the only consumer of this code
there is no need for the artifical pub/priv split. Merging the two
will make it possible to more tightly integrate with QEMU's event
loop and do asynchronous non-blocking negoiation with the frontend
devices (see next patch).
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:39:33 +0000 (14:39 +0100)]
pv-qemu 5/10: Refactor QEMU console integration
This patch moves a bunch of code out of the xen_machine_pv.c file and
into the xenfb.c file. This is simply a re-factoring to facilitate the
two patches which follow.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:38:47 +0000 (14:38 +0100)]
pv-qemu 4/10: Refactor xenfb event handlers
This patch is a simple code re-factoring to move the event loop
integration directly into the xenfb.c file. It is to facilitate
the patches which follow.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:37:23 +0000 (14:37 +0100)]
pv-qemu: Remove standalone xenfb code
This patch removes all trace of the standalone paravirt framebuffer
daemon. With this there is no longer any requirement for
LibVNCServer. Everything is handled by the QEMU device model. The
xenfb.c and xenfb.h files are now moved (without code change) into
tools/ioemu/hw/ & the temporary Makefile hack from the previous patch
is removed.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:35:04 +0000 (14:35 +0100)]
pv-qemu 2/10: Add a QEMU machine type for paravirt guests
This patch adds a paravirt machine type to QEMU. This can be requested
by passing the arg '-M xenpv' to qemu-dm. Aside from -d, and
-domain-name, the only other args that are processed are the VNC / SDL
graphics related args. Any others will be ignored. A tweak to
helper2.c was made to stop it setting up a file handler watch when
there are no CPUs registered.
The paravirt machine is in hw/xen_machine_pv.c and registers an
instance of the xenfb class, integrating it with the QEMU event loop
and key/mouse handlers. A couple of methods were adding to xenfb.h to
allow direct access to the file handles for xenstore & the event
channel.
The vfbif.py device controller is modified to launch qemu-dm instead
of the old xen-vncfb / sdlfb daemons.
When receiving framebuffer updates from the guest, the update has to
be copied into QEMU's copy of the framebuffer. This is because QEMU
stores the framebuffer in the format that is native to the SDL
display, or VNC client. This is not neccessarily the same as the guest
framebuffer which is always 32bpp. If there is an exact depth match we
use memcpy for speed, but in the non-matching case we have to fallback
to slow code to convert pixel formats. It fully supports all features
of the paravirt framebuffer including the choice between absolute &
relative pointers. The overall VIRT memory image size is about same as
old xen-vncfb, but the resident memory size is a little increased due
to copy of the framebuffer & some QEMU static state overhead. Most of
this is shared across QEMU processes.
To avoid both moving the xenfb.c and making changes to it in the same
patch, this just uses a Makefile hack to link against the xenfb.o from
the tools/xenfb/ directory. This will be removed in the following
patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 13:33:01 +0000 (14:33 +0100)]
pv-qemu 1/10: Add a QEMU machine type for fullvirt guests
This patch does a (no functional change) re-arrangement of the code
for starting up a fully virtualized guest. In particular it creates a
new QEMU machine type for Xen fullyvirt guests which can be specified
with '-M xenfv'. For compatibility this is in fact made to be the
default. The code for setting up memory maps is moved out of vl.c, and
into hw/xen_machine_fv.c. This is basically to ensure that it can be
easily skipped when we add a paravirt machine type in the next patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Thu, 25 Oct 2007 08:43:42 +0000 (09:43 +0100)]
x86: GDTR must be reset after using real-mode BIOS services. Some
BIOSes clobber GDTR. While we're here reset IDTR too, although it's
not really necessary. Signed-off-by: John Byrne <john.l.byrne@hp.com> Sigend-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Thu, 25 Oct 2007 08:24:28 +0000 (09:24 +0100)]
xend, acm: Put the __UNLABELED__ label into the mapfile if policy specifies it
Put the __UNLABELED__ label into the mapfile if policy specifies this
label rather than keeping the NULL_LABEL there. Also lock the map file
when it's rewritten and propagate the return code from compiling the
policy to callers.
Keir Fraser [Thu, 25 Oct 2007 08:22:28 +0000 (09:22 +0100)]
xm-test: various fixes
- recently I added an other_config field to the VTPM record which now
needs to be accounted for otherwise the test determines a bad key
- the dry-run command was throwing a different type of exception
(ACMError) than what was caught (XSMError)
- the tests based on the raw Xen-API need to build the PV_args
parameters from the old 'root' and 'extra' parameters.
Keir Fraser [Wed, 24 Oct 2007 09:20:03 +0000 (10:20 +0100)]
x86, cpufreq: Allow dom0 kernel to govern cpufreq via the Intel
Enahanced SpeedStep MSR.
From: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Alex Williamson [Tue, 23 Oct 2007 16:21:31 +0000 (10:21 -0600)]
[IA64] Prevent softlock when destroying VTi domain
Prevent softlock up when VTi domain destruction by making
relinquish_memory() continuable. It was assumed that
mm_teardown() frees most of page_list so that the list which
is passed to relinquish_memory() is short. However the
assumption isn't true for VTi domain case because qemu-dm
maps all the domain pages. To avoid softlock up message,
make relinquish_memory() continuable.
Keir Fraser [Tue, 23 Oct 2007 13:38:47 +0000 (14:38 +0100)]
hvm, vt-d: Add memory cache-attribute pinning domctl for HVM
guests. Use this to pin virtual framebuffer VRAM as attribute WB, even
if guest tries to map with other attributes. Signed-off-by: Disheng Su <disheng.su@intel.com>
Keir Fraser [Tue, 23 Oct 2007 08:26:43 +0000 (09:26 +0100)]
xenmon: Fix security vulnerability CVE-2007-3919.
The xenbaked daemon and xenmon utility communicate via a mmap'ed
shared file. Since this file is located in /tmp, unprivileged users
can cause arbitrary files to be truncated by creating a symlink from
the well-known /tmp filename to e.g., /etc/passwd.
The fix is to place the shared file in a directory to which only root
should have access (in this case /var/run/).
This bug was reported, and the fix suggested, by Steve Kemp
<skx@debian.org>. Thanks!
Keir Fraser [Mon, 22 Oct 2007 20:06:11 +0000 (21:06 +0100)]
x86: small boot-time changes:
* use memory 0x8c000-0x90000 to avoid trampling the area above
0x90000 -- some bootloaders may leave droppings in that region
* reserve 2kB for vga mode table -- limit of 128 VESA modes could
overflow the original 1kB allocation
* remove unnecessary alignment of trampoline GDT
Alex Williamson [Mon, 22 Oct 2007 18:26:53 +0000 (12:26 -0600)]
[IA64] Don't share privregs with hvm domain
Don't share privregs with hvm domain and twist IA64 xen dump core format
slightly. Xen shares privregs pages with IA64 HVM domain for xm dump-core
to dump the pages. However sharing the page allows hvm guest domain
peek/destroy the page contents that might cause xen crash. And the xen
dump core file doesn't need privregs page because cpu context should be
obtained from vcpu context in case of IA64 HVM domain.
Although this patch modify xen dump core format, current crash utility
(at least crash 4.0-4.7) doesn't look into .xen_ia64_mmapped_regs section
and I don't know any other tools to understand xen dump core file.
So this format modification doesn't cause incompatibility issue.
Alex Williamson [Mon, 22 Oct 2007 18:19:42 +0000 (12:19 -0600)]
[IA64] Kdump: 64-bit aligned access to elf-note data
xen_core_regs, as passed by kexec_crash_save_info(), is 32-bit aligned as
it is the data section of an ELF-note. In order to ensure 64-bit aligned
access when xen_core_regs is filled in, shift it a bit and then memmove()
the data back into the 32-bit aligned location after the values have been
written.
Without this change kdump panics on an unaligned-access.
Keir Fraser [Mon, 22 Oct 2007 13:22:39 +0000 (14:22 +0100)]
A few small fixes for xenstored:
- Proper sizeof parameter to snprintf
- Return proper xs_domain_dev for netbsd. Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Mon, 22 Oct 2007 12:04:32 +0000 (13:04 +0100)]
x86: Allow NMI callback CS to be specified via set_trap_table()
hypercall.
Based on a patch by Jan Beulich. Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Mon, 22 Oct 2007 06:44:25 +0000 (07:44 +0100)]
x86: Allow BOOT_TRAMPOLINE to be changed without needing manual
modification of the trampoline GDT. Adjust trampoline base to
0x94000. Signed-off-by: Keir Fraser <keir@xensource.com>
Alex Williamson [Sun, 21 Oct 2007 21:52:25 +0000 (15:52 -0600)]
[IA64] New features for xenitp
Add auto-repeat feature
(Just press enter to re-execute the last go/sstep/cb/disass command).
Do not flush stdout in the signal handler.
Single step over a breakpoint.
Can quit with domain paused (quit paused)
'disp db' now displays watchpoint.
Keir Fraser [Fri, 19 Oct 2007 17:00:10 +0000 (18:00 +0100)]
Replace sysctl.physinfo.sockets_per_node with more directly useful
sysctl.physinfo.nr_cpus. This also avoids miscalculation of
sockets_per_node by Xen where the number of CPUs in the system is
clipped.
From: Elizabeth Kon <eak@us.ibm.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Fri, 19 Oct 2007 16:47:12 +0000 (17:47 +0100)]
Avoid passing uninitialised ACPI tables to dom0 when checksums fail.
If during boot, ACPI checksum failures disable ACPI support in Xen,
pass 'acpi=off' to the domain 0 kernel to avoid a fatal page fault
as domain 0 attempts to access the uninitialized ACPI tables.
Signed-off-by: David Lively <dlively@virtualiron.com> Signed-off-by: Steve Ofsthun <sofsthun@virtualiron.com>
Keir Fraser [Fri, 19 Oct 2007 13:49:08 +0000 (14:49 +0100)]
Fix x86/64 build for *BSD.
- Config.mk: uname -m prints "amd64". Deal with this.
- do not assume python is always in /usr/bin
- get-fields.sh: make it portable and non-bash specific Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Fri, 19 Oct 2007 10:32:18 +0000 (11:32 +0100)]
x86: Remove io_apic fake-vector style of IRQ acknowledgement. Not
needed now that pass-through IRQs can use the 'new' ack method. Signed-off-by: Keir Fraser <keir@xensource.com>