Keir Fraser [Wed, 12 May 2010 07:51:26 +0000 (08:51 +0100)]
vmx: Change the default Pause-Loop-Exiting "Gap" parameter
PLE_Gap controls teh maximum allowable time between executions of
PAUSE in a busy loop. Essentially this controls the sensitivity of the
processor's busy-loop detection.
Changed the default PLE_Gap to 128 for
1. not using odd number like 41
2. getting a little bit more PLE vmexits to improve performance
Keir Fraser [Wed, 12 May 2010 07:49:13 +0000 (08:49 +0100)]
xs: avoid pthread_join deadlock in xs_daemon_close
Doing a pthread_cancel and join on the reader thread while holding all
the request/reply/watch mutexes can deadlock if the thread needs to
take any of those mutexes to exit. Kill off the reader thread before
taking any mutexes (which should be redundant if we're
single-threaded at that point).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keir Fraser [Tue, 11 May 2010 07:35:45 +0000 (08:35 +0100)]
VT-d: prevent watchdog timer from kicking in when
initializing on systems with huge amounts of memory
Process pending soft-IRQs every 4G worth of pages initialized for Dom0
to keep timekeeping happy and prevent the NMI watchdog (when enabled)
from kicking in.
Keir Fraser [Tue, 11 May 2010 07:34:58 +0000 (08:34 +0100)]
x86: TSC handling cleanups
"I am removing the tsc_scaled variable that is never actually used
because when tsc needs to be scaled vtsc is 1. I am also making this
more explicit in tsc_set_info. I am also removing hvm_domain.gtsc_khz
that is a duplicate of d->arch.tsc_khz. I am using scale_delta(delta,
&d->arch.ns_to_vtsc) to scale the tsc value before returning it to the
guest like in the pv case. I added a feature flag to specify that the
pvclock algorithm is safe to be used in an HVM guest so that the guest
can now use it without hanging."
Keir Fraser [Mon, 10 May 2010 08:22:52 +0000 (09:22 +0100)]
xentrace: fix bug in t_info size
t_info size should be in bytes, not pages. This fixes a bug
that crashes the hypervisor if the total number of all pages
is more than 1024 but less than 2048.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Keir Fraser [Fri, 7 May 2010 18:22:28 +0000 (19:22 +0100)]
svm: Avoid VINTR injection during NMI shadow
It is invalid because we get vmexit via IRET interception in this
case. VINTR is unaware of NMI shadows and may vmexit early, leaving us
in an endless loop of VINTR injections and interceptions.
Signed-off-by: Wei Wang <wei.wang2@amd.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 7 May 2010 08:50:17 +0000 (09:50 +0100)]
xenpm: remove wrong and pointless "current" indicator
Using the CPU number to compare with an index into an array containing
only a subset of CPUs isn't valid. And indicator isn't necessary here
at all since the CPU number being dealt with gets printed right before
this line.
Keir Fraser [Fri, 7 May 2010 08:46:50 +0000 (09:46 +0100)]
x86/cpufreq: fix turbo mode detection
{acpi,powernow}_cpufreq_cpu_init() generally don't run on the CPU the
policy they deal with is related to, hence using cpuid() directly
works only as long as all CPUs in the system are identical (which
admittedly is commonly the case).
Further add a per-policy flag indicating the availability of
APERF/MPERF MSRs, so that globally setting the .getavg accessor won't
be a problem on heterogeneous configurations.
Keir Fraser [Thu, 6 May 2010 16:00:08 +0000 (17:00 +0100)]
Reduce 'd' debug key's global impact
On large systems, dumping state may cause time management to get
stalled for so long a period that it wouldn't recover. Therefore alter
the state dumping logic to alternatively block each CPU as it prints
rather than one CPU for a very long time (using the alternative key
handling toggle introduced with an earlier patch).
Further, instead of using on_selected_cpus(), which is unsafe when
the dumping happens from a hardware interrupt, introduce and use a
dedicated IPI sending function (which each architecture can implement
to its liking)
Finally, don't print useless data (e.g. the hypervisor context of the
interrupt that is used for triggering the printing, but isn't part of
the context that's actually interesting).
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 6 May 2010 14:54:52 +0000 (15:54 +0100)]
Remove CPUID4 emulation for AMD CPUs
The CPUID4 emulation code for AMD CPUs in intel_cacheinfo.c won't be
executed. This emulation code was from upstream kernel, where CPUID4
is used for cache information report in sysfs. But in Xen, this code
path won't be executed on AMD CPUs. init_amd() uses
display_cacheinfo() to find out CPU cache size instead.
Signed-off-by: Wei Huang <wei.huang2@amd.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Keir Fraser [Thu, 6 May 2010 10:59:55 +0000 (11:59 +0100)]
Reduce '0' debug key's global impact
On large systems, dumping state may cause time management to get
stalled for so long a period that it wouldn't recover. Therefore add
a tasklet-based alternative mechanism to handle Dom0 state dumps.
Keir Fraser [Thu, 6 May 2010 10:43:54 +0000 (11:43 +0100)]
svm: support EFER.LMSLE for guests
Now that the feature is officially documented (see
http://support.amd.com/us/Processor_TechDocs/24593.pdf), I think it
makes sense to also allow HVM guests to make use of it.
Keir Fraser [Tue, 4 May 2010 11:52:48 +0000 (12:52 +0100)]
CPUIDLE: shorten hpet spin_lock holding time
Try to reduce spin_lock overhead for deep C state entry/exit. This
will benefit systems with a lot of cpus which need the hpet broadcast
to wakeup from deep C state.
Keir Fraser [Tue, 4 May 2010 11:51:33 +0000 (12:51 +0100)]
x86: Relocate boot trampoline to avoid BIOS conflicts.
Fix booting through iSCSI protocol with Broadcom network cards.
These boards use the option ROM feature to implement the TCP/IP stack
protocol, and the iSCSI software initiator. The memory address
normally used by the PMM is 0x87000 which conflicts with the memory
allocation for Xen's trampoline routine, currently 0x88000.
Keir Fraser [Tue, 4 May 2010 11:48:28 +0000 (12:48 +0100)]
CPUIDLE: re-implement mwait wakeup process
It MWAITs on a completely new flag field, avoiding the IPI-avoidance
semantics of softirq_pending. It also does wakeup-waiting checks on
timer_deadline_start, that being the field that initiates wakeup via
the MONITORed memory region.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Tue, 4 May 2010 11:42:56 +0000 (12:42 +0100)]
linux pvdrv: generalize location of autoconf.h
The location of the file in the build tree changed in recent Linux;
since there can be only one such file, using a wild card instead of
an explicit directory name seems the easiest solution.
Keir Fraser [Tue, 4 May 2010 11:42:21 +0000 (12:42 +0100)]
x86: fix Dom0 booting time regression
Unfortunately the changes in c/s 21035 caused boot time to go up
significantly on certain large systems. To rectify this without going
back to the old behavior, introduce a new memory allocation flag so
that Dom0 allocations can exhaust non-DMA memory before starting to
consume DMA memory. For the latter, the behavior introduced in
aforementioned c/s gets retained, while for the former we can now even
try larger chunks first.
This builds on the fact that alloc_chunk() gets called with non-
increasing 'max_pages' arguments, end hence it can store locally the
allocation order last used (as larger order allocations can't succeed
during subsequent invocations if they failed once).
Keir Fraser [Tue, 4 May 2010 11:41:11 +0000 (12:41 +0100)]
x86: add support for domain-initiated global cache flush
Newer Linux' AGP code wants to flush caches on all CPUs under certain
circumstances. Since doing this on all vCPU-s of the domain in
question doesn't yield the intended effect, this needs to be done in
the hypervisor. Add a new MMUEXT operation for this.
Keir Fraser [Tue, 4 May 2010 11:38:19 +0000 (12:38 +0100)]
blktap: Fix old QCow tapdisk image handling
When I tried to use QCow image, I found that only each second boot is
successful. As I discovered, this is caused by wrong handling old qcow
tapdisk images. Extended header flag is not stored correctly so the
blktap tries to change endian fo L1 table on each startup.
From: Miroslav Rezanina <mrezanin@redhat.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>