Keir Fraser [Tue, 4 May 2010 11:52:48 +0000 (12:52 +0100)]
CPUIDLE: shorten hpet spin_lock holding time
Try to reduce spin_lock overhead for deep C state entry/exit. This
will benefit systems with a lot of cpus which need the hpet broadcast
to wakeup from deep C state.
Keir Fraser [Tue, 4 May 2010 11:51:33 +0000 (12:51 +0100)]
x86: Relocate boot trampoline to avoid BIOS conflicts.
Fix booting through iSCSI protocol with Broadcom network cards.
These boards use the option ROM feature to implement the TCP/IP stack
protocol, and the iSCSI software initiator. The memory address
normally used by the PMM is 0x87000 which conflicts with the memory
allocation for Xen's trampoline routine, currently 0x88000.
Keir Fraser [Tue, 4 May 2010 11:48:28 +0000 (12:48 +0100)]
CPUIDLE: re-implement mwait wakeup process
It MWAITs on a completely new flag field, avoiding the IPI-avoidance
semantics of softirq_pending. It also does wakeup-waiting checks on
timer_deadline_start, that being the field that initiates wakeup via
the MONITORed memory region.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Tue, 4 May 2010 11:42:56 +0000 (12:42 +0100)]
linux pvdrv: generalize location of autoconf.h
The location of the file in the build tree changed in recent Linux;
since there can be only one such file, using a wild card instead of
an explicit directory name seems the easiest solution.
Keir Fraser [Tue, 4 May 2010 11:42:21 +0000 (12:42 +0100)]
x86: fix Dom0 booting time regression
Unfortunately the changes in c/s 21035 caused boot time to go up
significantly on certain large systems. To rectify this without going
back to the old behavior, introduce a new memory allocation flag so
that Dom0 allocations can exhaust non-DMA memory before starting to
consume DMA memory. For the latter, the behavior introduced in
aforementioned c/s gets retained, while for the former we can now even
try larger chunks first.
This builds on the fact that alloc_chunk() gets called with non-
increasing 'max_pages' arguments, end hence it can store locally the
allocation order last used (as larger order allocations can't succeed
during subsequent invocations if they failed once).
Keir Fraser [Tue, 4 May 2010 11:41:11 +0000 (12:41 +0100)]
x86: add support for domain-initiated global cache flush
Newer Linux' AGP code wants to flush caches on all CPUs under certain
circumstances. Since doing this on all vCPU-s of the domain in
question doesn't yield the intended effect, this needs to be done in
the hypervisor. Add a new MMUEXT operation for this.
Keir Fraser [Tue, 4 May 2010 11:38:19 +0000 (12:38 +0100)]
blktap: Fix old QCow tapdisk image handling
When I tried to use QCow image, I found that only each second boot is
successful. As I discovered, this is caused by wrong handling old qcow
tapdisk images. Extended header flag is not stored correctly so the
blktap tries to change endian fo L1 table on each startup.
From: Miroslav Rezanina <mrezanin@redhat.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 4 May 2010 11:16:37 +0000 (12:16 +0100)]
Make sure git clone gets the right kernel branch
When cloning kernel repo:
1. make remote called "xen" rather than the default "origin"
2. directly checkout the desired branch, rather than the default
then the desired one
Git 1.5 doesn't support -b on git clone, and seems to do something odd
with the checkout branch argument, so avoid using the newer
commandline options.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keir Fraser [Tue, 4 May 2010 08:30:53 +0000 (09:30 +0100)]
Remus: python netlink fixes
Fix deprecation warning in Qdisc class under python 2.6.
Fix rtattr length and padding (rta_len is unaligned).
Null-terminate qdisc name in rtnl messages.
x86, shadow: propagate pat caching on the shadow l1
PAT caching was only propagated if has_arch_pdevs(),
causing the hvm_get_mem_pinned_cacheattr() to be ignored
in the non passthrough case.
l1_disallow_mask() needs to be relaxed.
Signed-off-by: Jean Guyader <jean.guyader@citrix.com>
The motivation comes from distributors that configure their
crashkernel command line automatically with some configuration tool
(YaST, you know ;)). Of course that tool knows the value of System
RAM, but if the user removes RAM, then the system becomes unbootable
or at least unusable and error handling is very difficult."
For x86, other than Linux we pass the actual amount of RAM rather than
the highest page's address (to cope with sparse physical address
maps).
console: Make initial static console buffer __initdata.
The previous scheme --- freeing an area of BSS --- did not interact
nicely with device passthrough as IOMMU will not have any Xen BSS area
in guest device pagetables. Hence if the freed BSS space gets
allocated to a guest, DMAs to guest's own memory can fail.
The simple solution here is to always free the static buffer at end of
boot (initmem is specially handled for IOMMUs) and require a
dynamically-allocated buffer always to be created.
xend: earlier remove the backend of tapdisk device in
xenstore to release the resource allocated in backend driver
lies in dom0'kernel
Blktapctl thread will use qemu-dm connection instead of tapdisk-ioemu
in the case of FV VM. We found the resource like memory allocated for
this Guest can't be free for backend driver couldn't be closed in qemu-dm.
This patch would remove the backend of tapdisk device earlier in
xenstore to triger qemu-dm to notify the backend driver to release the
resource allocated.
I have tested this patch at the case of
1, save && restore
2, destory && shutdown
3, snapshot
Signed-off-by: James ( Song Wei ) <jsong@novell.com>
The info subcommand was missing from the xl tool. Use the new libxl
wrapper functions to create a clone of "xm info". The splitting into
several smaller functions is enspired by the implementation in
XendNode.py.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Xen provides a xen_version hypercall to query the values of several
interesting things (like hypervisor version, commandline used, actual
changeset, etc.). Create a user-friendly and efficient wrapper around
the libxc function to provide values for xl info output.
Since the information is static during the whole runtime, we store
it within the libxl_ctx structure and just deliver the pointer on
subsequent calls.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
The libxl version of the physinfo sysctl does not contain some
fields like nr_nodes or capabilities needed for xl info output.
Add them to the structure and the retrieving function.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
BSD sed does not support the '+' in the basic re while gnu sed does.
BSD sed supports '+' in the extended re and uses the -E flag while
gnu sed uses -r.
The only difference with the original version is that the '+'
qualifier is replaced with '\{1\,\}' which should work with both BSD
sed and GNU sed.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Exporting cpu on/offline and memory on/offline hotplug interfaces,
so that users can do those (memory/cpu) hotplug actions with
following command line freely:
x86: No need to sync_local_execstate() during CPU hot-unplug.
This is done implicitly when we enter stopmachine_run() context,
because the underlying tasklet mechanism performs the sync before
running a tasklet handler.
Synchronise lazy execstate before calling tasklet handlers.
This ensures we are properly running on idle-vcpu state, which certain
things (e.g., use of vmx_vmcs_{enter,exit}) rely on. It also means we
don't need to do the same thing in the stopmachine_run handler.
Late in the 4.0 release it was discovered that certain order>0
allocations could fail and had no fallback. This conflicted with
tmem especially when combined with aggressive ballooning.
A hack-y workaround patch was added in time for 4.0 that has
reduced (but not completely eliminated) the problem but
tmem was left disabled-by-default for the 4.0 release.
Re-enable it in xen-unstable by default to help identify cases
where the workaround is insufficient. Tmem can be
disabled with the no-tmem Xen boot option. Please report
failures (that are fixed with the no-tmem option) to me.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Implement tasklets as running in VCPU context (sepcifically, idle-VCPU context)
...rather than in softirq context. This is expected to avoid a lot of
subtle deadlocks relating to the fact that softirqs can interrupt a
scheduled vcpu.
Move tasklet implementation into its own source files.
This is preparation for implementing tasklets in vcpu context rather
than softirq context. There is no change to the implementation of
tasklets in this patch.
I noticed that 2 scripts in Xen 4.0.0 are calling "gawk". Normally, in
most distributions, gawk is considered a specific version of awk.
Calling "gawk" and not "awk" generally means that you need
specificities of the "g" version of awk, as opposed to "mawk" which is
another implementation of the same tool.
So, unless I misread the scripts, Xen doesn't need to use gawk but
just any implementation of awk, and the attached patch can safely be
applied.
If I am wrong (which I don't think I am at the first look) and that
there's a reason why gawk is used and not awk, then IMHO, the toplevel
README should mention it in the prerequisites.
From: Thomas Goirand <thomas@goirand.fr> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
mce: Add a x86_mcinfo_reserve() function, to reserve space from mc_info.
With this method, we don't need to collect bank and globalinformation
to a local variable and do x86_mcinfo_add() to copy that information
to mc_info. This avoid copy and also we can be aware earlier if there
is enough space in the mc_info.
Also extract function that get global/bank information to seperated
function mca_init_bank/mca_init_global.
It's meaningless to get the current information in mce context, keep
it here but should be removed in future.
Also a flag added to mc_info, to indicate some information is lost due
to OOM.
Clean up MCA MSR virtualization and vMCE injection
Remove all virtual MCE related work into a seperated file.
It also try to do some clean-up on the vMCE, including:
a) renmae some function name like mce_init_msr/mce_rdmsr to be
vmce_init_msr/vmce_rdmsr to make it more straightforward,
b) make the vmca_msrs be a pointer in arch_domain,
to decrease arch_domain's size
c) extract per-bank MCA MSR access to be seperated function
(bank_mce_wrmsr/bank_mce_rdmsr) to make it be a bit cleaner.
d) A new file xen/include/asm-x86/mce.h is added for vmce related
header.