Keir Fraser [Wed, 7 Nov 2007 11:44:05 +0000 (11:44 +0000)]
x86: Change cache attributes of Xen 1:1 page mappings in response to
guest mapping requests.
Based on a patch by Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Wed, 7 Nov 2007 09:22:31 +0000 (09:22 +0000)]
Enable loopback disk image files on readonly nfs filesystem.
When we losetup a file on readonly nfs filesystem, it fails with:
# losetup /dev/loop7 /data/vm/xen_el5_i386_para/system.raw
/data/vm/xen_el5_i386_para/system.raw: Permission denied
New version of losetup has add a "-r" option for readonly loop, which
Linux kernel has supported for a long time. Some distribution (EL5
update, Fedora 8, etc.) have shipped it. This patch benefit this
option while doesn't break the old versions of losetup.
Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Keir Fraser [Tue, 6 Nov 2007 11:49:15 +0000 (11:49 +0000)]
[PV-ON-HVM] Fix evtchn of unbind_from_irqhandler()
When xm block-detach command was done on PV-ON-HVM, the response of
other disks was lost. It is because a wrong event channel was
invalidated when detaching it. Not the evtchn number but the irq
number is invalidated specifying it.
Keir Fraser [Tue, 6 Nov 2007 09:43:22 +0000 (09:43 +0000)]
vt-d: Free memory of g2m_ioport_list.
This patch frees memory of g2m_ioport_list when remove g2m_ioport or
destroy iommu domain to avoid memory leak. In addtion, does some
cleanup on domctl.c.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Tue, 6 Nov 2007 09:41:57 +0000 (09:41 +0000)]
Users manual updates:
1) PAE as 32-bit Xen default
2) IA64 and Power are supported
3) AMD Virtualization is supported
4) Add console_timestamps boot param
Keir Fraser [Tue, 6 Nov 2007 09:40:44 +0000 (09:40 +0000)]
xenstored: Recover from corrupt tdb on reboot
Xen cannot work when xenstored's tdb is corrupt. When that happens
somehow (and we've seen it happen), even reboot doesn't recover from
it. It could: there is no state in tdb that needs to be persisted
across reboots.
This patch arranges that tdb is removed before xenstored is started,
provided it doesn't already run. This is safe, because:
* xenstored cannot be restarted. If it dies, Xen's screwed until
reboot.
* /usr/sbin/xend always starts xenstored anyway.
* xenstored locks its pid-file (see write_pidfile() in
tools/xenstore/xenstored_core.c), and refuses to start when it
can't.
* My patch makes /usr/sbin/xend remove tdb iff it can lock the
pid-file. In other words, it removes tdb only when xenstored is not
running, and locks it out until it is done.
Bonus fix: it also removes stale copies of the tdb xenstored tends
to leave behind when it exits uncleanly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Keir Fraser [Mon, 5 Nov 2007 16:38:47 +0000 (16:38 +0000)]
[SHADOW] Fix error paths in guest-pagetable walker.
Real hardware sets PFEC_page_present regardless of the access bits,
and doesn't write back _PAGE_ACCESSED except after a successful walk. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Mon, 5 Nov 2007 10:45:07 +0000 (10:45 +0000)]
cpufreq, amd: Xen support for architectural AMD pstate driver
With the third generation Opteron parts, AMD switched to an
architecturally defined interface for PowerNow! that uses
different MSRs than previous versions.
Add support in msr-index.h and traps.c for the new interface.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Keir Fraser [Fri, 2 Nov 2007 16:34:54 +0000 (16:34 +0000)]
hvm: Timer fixes:
1. Do not record more than one pending interrupt in
no-missed-tick-accounting mode. We do not stack up missed interrupts
in this timer mode.
2. Always record all missed ticks when we are in a
missed-tick-accounting mode. Do not have a ceiling for this as it
simply causes guests to lose track of wall time.
3. General bits of cleanup and simplification.
From: Dave Winchell <dwinchell@virtualiron.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Fri, 2 Nov 2007 16:06:06 +0000 (16:06 +0000)]
x86, svm: Add hunk I missed from Jan's debug-register handling
patch. We need to handle SVM debug-register read access intercepts. Signed-off-by: Keir Fraser <keir@xensource.com>
Tim Deegan [Fri, 2 Nov 2007 15:41:57 +0000 (15:41 +0000)]
[SHADOW] Make the guest PT walker more complete.
We now check access rights and write back the _PAGE_ACCESSED and
_PAGE_DIRTY bits into the guest entries as we walk the tables.
This makes the shadow fault handler simpler, and the various emulation
paths more correct.
This patch doesn't add checking and write-back to the HAP pagetable walker;
it just fixes up its arguments to match the new shadow one.
Signed-off-by: Tim Deegan <Tim.Deegan@eu.citrix.com>
Keir Fraser [Fri, 2 Nov 2007 09:30:51 +0000 (09:30 +0000)]
x86, svm: Adds nested paging performance counter to svmexit
PERFCOUNTER_ARRAY while keeping array compacted. Signed-off-by: Stephen Wilson <stephen.wilson@amd.com>
Keir Fraser [Thu, 1 Nov 2007 16:34:43 +0000 (16:34 +0000)]
Fix use-after-free in xenconsoled.
shutdown_domain() MUST NOT call cleanup_domain(), just flagging them
as dead is enough. cleanup_domains() for dead domains is called by
the main loop in handle_io() in a safe way already.
shutdown_domain() calling cleanup_domain() too leads struct domain
being accessed after freeing and to a double-free.
Fixed by simply dropping the cleanup_domain() call and by making the
functions called by the main loop in handle_io() ignore dead domains.
Keir Fraser [Thu, 1 Nov 2007 16:34:20 +0000 (16:34 +0000)]
ioemu: ioemu portion of buffered-io fix. Signed-off-by: Robert Phillips <rphillips@virtualiron.com> Signed-off-by: Ben Guthro <bguthro@virtualiron.com>
Keir Fraser [Thu, 1 Nov 2007 16:16:25 +0000 (16:16 +0000)]
x86: Fix various problems with debug-register handling. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xensource.com>
It is possible to double-free the sal queue entry when multiple
ia64_sal_get_state_info() from Dom0 are called simultaniously.
In the worst case, the kernel might panic.
Keir Fraser [Wed, 31 Oct 2007 09:36:45 +0000 (09:36 +0000)]
x86, hvm: Allow HAP to be enabled on 32-bit Xen (but still disabled by default).
HAP remains disabled on 32-bit Xen by default because AMD NPT
restrictions mean that guests are restricted to a 4GB pseudophysical
memory map with AMD NPT on 32-bit host.
Alex Williamson [Tue, 30 Oct 2007 17:33:55 +0000 (11:33 -0600)]
[IA64] Make Xen relocatable
1. Put xenheap at 0xf400000004000000, then xenheap doesn't
overlap with identity mapping.
2. Xen itself can be relocated by OS loader if there is no
low memory in platform.
3. Use another DTR for mapping xenheap
Keir Fraser [Tue, 30 Oct 2007 16:15:17 +0000 (16:15 +0000)]
x86, hvm: Flush local TLB after any change to linear pagetable mapping.
This was not needed when vmenter/vmexit always had the side effect of
flushing host TLBs.
But, with SVM ASIDs, it is possible to:
(1) Update CR3 update,
(2) vmenter the guest, and
(3) and vmexit due to a page fault
all without an intervening host TLB flush.
Then the page fault code could use the linear pagetable
to read a top-level shadow page table entry.
But, without this change, it would fetch the wrong value
due to a stale TLB.
Signed-off-by: Robert Phillips <rphillips@virtualiron.com> Signed-off-by: Ben Guthro <bguthro@virtualiron.com>
Keir Fraser [Tue, 30 Oct 2007 10:39:52 +0000 (10:39 +0000)]
vt-d: Do dpci eoi outside of irq_lock.
Deadlock may occur if do hvm_dpci_eoi() inside of irq_lock on MP
platform. For example, there are two physical cpus. If interrupt is
injected on cpu0, but vcpu is migrated to cpu1 and it does eoi inside
of irq_lock, then IPI will be issued to cpu0. At the same time, cpu0
may have disabled irq and is acquiring the same irq_lock. In addition,
current code cannot guarantee do hvm_dpci_eoi() inside of irq_lock
when timeout. This patch does hvm_dpci_eoi() outside of irq_lock, and
solves above problems.
Keir Fraser [Tue, 30 Oct 2007 09:32:10 +0000 (09:32 +0000)]
qemu vnc auth 4/4: XenD config for VNC TLS protocol
This patch adds support to XenD for configuring the previously added
TLS encryption and x509 certificate validation. At this time I have
only enabled this config to be done system-wide via
/etc/xen/xend-config.sxp. Since it requires the admin to add
certificates on the local FS, there's not much point in making it per
VM. The x509 certificates are located in /etc/xen/vnc. Since this
requires a special VNC client program (GTK-VNC,
virt-viewer/virt-manager or VeNCrypt viewer) the use of TLS is
disabled by default. Admins can enable it if they are using a suitable
client.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Tue, 30 Oct 2007 09:30:49 +0000 (09:30 +0000)]
qemu vnc auth 3/4: Add VNC auth support from upstream QEMU
This patch adds in the upstream QEMU VNC authentication code. This
spports the previous VNC password auth scheme, as well as the VeNCrypt
protocol extension. The latter allows for performing a TLS handshake,
and client verification of the server identify using x509
certificates. It is also possible for the server to request a client
certificate and validate that as a simple auth scheme. The code
depends on GNU TLS for SSL APIs, and the configure script will
auto-detect this.
The image.py code is changed to deal with the new syntax for the -vnc
option. In particular password authentication is now enabled by
giving the 'password' flag,
eg -vnc 0.0.0:1,password
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Tue, 30 Oct 2007 09:24:17 +0000 (09:24 +0000)]
qemu vnc auth 2/4: Revert current VNC auth support
This patch reverts the current Xen specific implementation of VNC
authentication from the QEMU code. This is basically reverting
11840:02506a744315. The idea here is to get the VNC code back to more
closely match upstream QEMU, before applying the upstream auth
patches.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Tue, 30 Oct 2007 09:22:27 +0000 (09:22 +0000)]
qemu vnc auth 1/4: QEMU event handler bug fix
This patch pulls in an upstream QEMU fix for dealing with a problem in
the event dispatcher where a write callback gets unregistered while a
write event is pending from poll. Without this the QEMU process with
deference a NULL pointer and crash.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Tue, 30 Oct 2007 09:19:43 +0000 (09:19 +0000)]
xend: Reduce xenstore transactions when listing domains
In summary, this allows a xenstore transaction object to be passed
around the various device controllers, so that they don't have to do
lots of singleton transactions. Transactions have very heavy I/O
impact from xenstored so reducing their number is important.
When running 3 guests, this patch reduces the impact of 'xm list
--long' from 176 transactions, scaling O(n) with guests, to 26
transactions with O(1) scaling.
I have previously attempted to also address the same issue with 'xm
create' but that's much harder since the device front/back handshake
requires that XenD use a number of small transactions. So i've not
changed anything here.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Mon, 29 Oct 2007 16:49:02 +0000 (16:49 +0000)]
x86: allow pv guests to disable TSC for applications
Linux, under CONFIG_SECCOMP, has been capable of hiding the TSC from
processes for quite a while. This patch enables this to actually work
for pv kernels, by allowing them to control CR4.TSD (and, as a simple
thing to do at the same time, CR4.DE).
Applies cleanly only on top of the previously submitted debug register
handling patch.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Also clean up CR4 and EFER handling, and hack-n-slash header file
inclusion madness to get the tree building again.
Keir Fraser [Mon, 29 Oct 2007 14:43:19 +0000 (14:43 +0000)]
xenconsoled: Rate-limit activity caused by each domU.
Allow each domU to fire its event channel at most 30 times every 200ms.
Signed-off by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Mon, 29 Oct 2007 09:49:39 +0000 (09:49 +0000)]
x86: Clean up NMI delivery logic. Allow set_trap_table vector 2 to be
specified as not disabling event delivery, just like any other vector. Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Mon, 29 Oct 2007 09:17:38 +0000 (09:17 +0000)]
x86, hvm: Clean up code style in stdvga code and do not compile for
32-bit hypervisor (it doesn't work). Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Mon, 29 Oct 2007 08:46:34 +0000 (08:46 +0000)]
xend, acm: Trigger a script when a resource's label changes
This patch triggers a script when a resource's label changes. The xend
config file should provide a variable 'resource-label-change-script'
that can then be launched.
Keir Fraser [Fri, 26 Oct 2007 16:16:14 +0000 (17:16 +0100)]
Fix PVFB device initialization
The final series of patches I sent out lost 2 hunks in the big
refactoring patches I did thanks to a messed up rebase/rediff :-( This
patch fixes the device nodename initialization so that watches work
correctly.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keir Fraser [Fri, 26 Oct 2007 15:06:49 +0000 (16:06 +0100)]
x86: Replace FLUSH_LEVEL() parameter to flush_area() with rather
clearer FLUSH_ORDER(). Also remove bogus assertion. Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Fri, 26 Oct 2007 14:14:38 +0000 (15:14 +0100)]
serial: Check index argument before indexing into an array.
Pointed out by Christoph Egger, and worth fixing for clarity even if
it's not necessarily a bug. Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Fri, 26 Oct 2007 10:40:10 +0000 (11:40 +0100)]
x86/64: paravirt 32-on-64 call gate support
As we realized while trying out NetWare's ring 3 support, call gates
didn't work for 32-bit guests on 64-bit hypervisor. Since x86-64
doesn't know 16- or 32-bit call gates, the only option was to emulate
them. The code here was developed against 3.0.4, so hasn't been
checked for potential integration possibilities with the much improved
emulator; nevertheless I want to supply this patch.
As was realized in the course of creating this patch, 64-bit gates
don't work either, and will also need to be emulated if any
environment intends to use them. The patch changes behavior here in
that rather silently permitting the use of 64-bit gates (with possibly
difficult to understand exceptions happening on the first instruction
of the call/jump target) the call/jump itself will now fault, with the
error code indicating the gate that was attempted to be used. I intend
to complete the emulation to also cover 64-bit gates, but there is one
issue that first needs to be addressed: Whether a gate transitions
from user to kernel mode doesn't depend on the gate, but rather on the
descriptor referenced by the selector held in the gate. As the two can
change independently, this decision can be made only at the point of
use of the gate, and consequently descriptors for kernel code segments
must become distinguishable from user ones, which they currently
aren't as they both get their DPL forced to 3. An initial thought here
is to possibly leverage the otherwise meaningless conforming bit
(i.e. forcing it on for all user code segments, and off for kernel
ones, where then the distinction can be made at the point the
descriptor gets verified/fixed up based of the kernel supplied DPL
[wouldn't work for old guests when setting the DPL to 3 was still
required to be done by the guest]).
The patch also changes behavior of check_descriptor() in that no
modification is done to the descriptor anymore unless all verification
steps passed, and in that the selector RPL of selectors in call gates
no longer gets fixed up (a comment elsewhere in the code correctly
states that the RPL field here isn't used for anything by the
processor); really, this field is now used on 64-bits to store the
original DPL of the gate, because the architectural one now gets
forced to zero.
Keir Fraser [Fri, 26 Oct 2007 09:55:50 +0000 (10:55 +0100)]
LAPIC timer accounting fix
Offset emulated local APIC timer so it doesn't tick during guest's
timer related processing. Otherwise, guests using the local APIC for
process accounting can see long sequences of process ticks incorrectly
charged to interrupt processing.
Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Gary Grebus <ggrebus@virtualiron.com>
Keir Fraser [Fri, 26 Oct 2007 09:32:20 +0000 (10:32 +0100)]
x86, hvm: Improve standard VGA performance
This patch improves the performance of Standard VGA,
the mode used during Windows boot and by the Linux
splash screen.
It does so by buffering all the stdvga programmed output ops
and memory mapped ops (both reads and writes) that are sent to QEMU.
We maintain locally essential VGA state so we can respond
immediately to input and read ops without waiting for
QEMU. We snoop output and write ops to keep our state
up-to-date.
PIO input ops are satisfied from cached state without
bothering QEMU.
PIO output and mmio ops are passed through to QEMU, including
mmio read ops. This is necessary because mmio reads
can have side effects.
I have changed the format of the buffered_iopage.
It used to contain 80 elements of type ioreq_t (48 bytes each).
Now it contains 672 elements of type buf_ioreq_t (6 bytes each).
Being able to pipeline 8 times as many ops improves
VGA performance by a factor of 8.
I changed hvm_buffered_io_intercept to use the same
registration and callback mechanism as hvm_portio_intercept
rather than the hacky hardcoding it used before.
In platform.c, I fixed send_timeoffset_req() to sets its
ioreq size to 8 (rather than 4), and its count to 1 (which
was missing).
Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Robert Phillips <rphillips@virtualiron.com>
Keir Fraser [Fri, 26 Oct 2007 08:56:54 +0000 (09:56 +0100)]
hvm, x86: Allow virtual timer mode to be specified.
In HVM config file:
timer_mode=0 # Default: virtual time is delayed when timer ticks are
# missed dur to preemption
timer_mode=1 # Virtual time always equals wall time, even while missed
# ticks are pending
Keir Fraser [Thu, 25 Oct 2007 14:54:19 +0000 (15:54 +0100)]
pv-on-hvm: fixes for unmodified drivers build and modern Linux
- The adjustments to README and overrides.mk are generic.
- The removal of explicit linux/config.h inclusion should also not
cause any issues.
- The introduction of irq_handler_t should eliminiate warnings on
2.6.19+ kernels (I didn't check they're there, but since the
request_irq prototype changed, I'm sure there's at least
one. However, as a result changes to the Linux tree are expected to
be required.
- The change setup_xen_features -> xen_setup_features follows the
naming in mainline 2.6.23 but would apparently also require changes
to the Linux tree.
- The changes SA_* -> IRQF_ and pci_module_init ->
pci_register_driver should also not cause issues.
Keir Fraser [Thu, 25 Oct 2007 14:04:33 +0000 (15:04 +0100)]
qemu: Add extra tracing around logdirty bitmap setup. Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Gary Grebus <ggrebus@virtualiron.com>
Keir Fraser [Thu, 25 Oct 2007 14:01:59 +0000 (15:01 +0100)]
hvm: In xenstore_process_logdirty_event(), if a stale shared memory
key is encountered reset 'seg' to NULL so the shared memory
initialization can be retried later.
Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Robert Phillips <rphillips@virtualiron.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Keir Fraser [Thu, 25 Oct 2007 13:55:37 +0000 (14:55 +0100)]
hvm,x86: Add more vmxassist opcodes for Ubuntu 7.0.4 support Signed-off-by: Ben Guthro <bguthro@virtualron.com> Signed-off-by: Gary Grebus <ggrebus@virtualiron.com>
Keir Fraser [Thu, 25 Oct 2007 13:45:47 +0000 (14:45 +0100)]
pv-qemu 10/10: Make xenconsoled ignore doms with qemu-dm
This patch writes a field /local/vm/DOMID/console/type taking the
value 'ioemu' or 'xenconsoled'. If xenconsoled sees a type that is
not its own, then it skips handling of that guest. The qemu-dm
process doesn't need to read this field since it will only attach
to the console if given the -serial pty arg which XenD already
ensures matches this xenstore field.
The overall behaviour is that if a paravirt guest has a qemu-dm
process running then that handles the console, otherwise the
xenconsoled handles it. The former is more functional, with the
exception of not currently supporting persistent logging to a
file at the same time as exposing a PTY.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>