Keir Fraser [Wed, 5 Jan 2011 10:01:11 +0000 (10:01 +0000)]
relax vCPU pinned checks
Both writing of certain MSRs and VCPUOP_get_physid make sense also for
dynamically (perhaps temporarily) pinned vcpus.
Likely a couple of other MSR writes (MSR_K8_HWCR, MSR_AMD64_NB_CFG,
MSR_FAM10H_MMIO_CONF_BASE) would make sense to be restricted by an
is_pinned() check too, possibly also some MSR reads.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22649:39194f457534
xen-unstable date: Wed Jan 05 09:57:15 2011 +0000
Keir Fraser [Fri, 24 Dec 2010 10:29:50 +0000 (10:29 +0000)]
VT-d: fix and improve print_vtd_entries()
Fix leaking of mapped domain pages (root_entry and ctxt_entry when
falling out of the level traversing loop). Do this by re-arranging
things slightly so that a mapping is retained only as long as it
really is needed.
Fix the failure to use map_domain_page() in the level traversing loop
of the function.
Add a mssing return statement in one of the error paths.
Also I wonder whether not being able to call print_vtd_entries() from
iommu_page_fault_do_one() in ix86 is still correct, now that
map_domain_page() is IRQ safe.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22632:7cc87dcf30a1
xen-unstable date: Fri Dec 24 10:14:01 2010 +0000
Keir Fraser [Fri, 24 Dec 2010 10:29:14 +0000 (10:29 +0000)]
re-add calls accidentally deleted from run_all_nonirq_keyhandlers()
c/s 22538:a3a29e67aa7e, having got applied in a form different from
the one submitted, resulted in the calls to
console_{start,end}_log_everything() getting removed without
replacement. Add them back since, other than run_all_keyhandlers(),
this doesn't run with log-everything already in effect.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22631:dca1b7cf2e2c
xen-unstable date: Fri Dec 24 10:12:58 2010 +0000
Keir Fraser [Fri, 24 Dec 2010 10:28:35 +0000 (10:28 +0000)]
x86 hvm ept: Remove EPT guest linear address validation
For EPT violation resulting from an attempt to load the guest PDPTEs
as part of the execution of the MOV CR instruction, the EPT_GLA_VALID
is not valid. This situation should not happen in most situation,
since we always populate guest memory. But this is not ture for PAE
guest under the PoD/Page sharing situation. In that situation, a page
pointed by CR3 may be un-populated, and we need handle it in such
situation.
Keir Fraser [Mon, 20 Dec 2010 10:21:20 +0000 (10:21 +0000)]
tools/python: fix xm list for Python 2.7
This patch fixes
Unexpected error: <type 'exceptions.AttributeError'>
This is due to xmlrpc changes in Python 2.7. This patch should
fixe it for both old and new versions.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 22045:2940165380de
xen-unstable date: Thu Aug 19 17:09:30 2010 +0100
Keir Fraser [Fri, 17 Dec 2010 17:57:33 +0000 (17:57 +0000)]
tools: fetch remote changesets when force refetching/resetting qemu
This makes "make tools/ioemu-dir-force-update" usable for picking up
an entirely new QEMU_TAG.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 22425:d6c2695f05eb
xen-unstable date: Tue Nov 23 19:29:13 2010 +0000
Keir Fraser [Fri, 17 Dec 2010 17:56:52 +0000 (17:56 +0000)]
tools: provide explicit target for refetching/resetting qemu
This patch adds an explicit update mechanism:
make tools/ioemu-dir-force-update
This isn't brilliant but is better than doing "cd tools/ioemu-remote
&& git reset --hard <sha1...>" by hand.
Note that invoking this target will destroy all working tree changes
made to qemu-xen.
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 22381:2bedffabbcab
xen-unstable date: Tue Nov 09 18:15:25 2010 +0000
Keir Fraser [Fri, 17 Dec 2010 17:56:00 +0000 (17:56 +0000)]
tools/python: Replace python string exceptions with ValueError exceptions
There are at least some syntax errors when trying to use the xen utils
with python2.6. The attached patch changes these string exception
into ValueErrors:
- tools/python/xen/util/bugtool.py (getBugTitle)
- tools/python/xen/xend (class XendBase): not catched
- tools/python/xen/xm/xenapi_create.py (sxp2xmlconvert_sxp_to_xml):
the method already raises a ValueError for similiar condition.
- tools/python/xen/xm/main.py (xm_network_attach): not catched.
For all but maybe the first one, the replacement of the string
exceptions into ValueErrors seems to be safe.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 22153:95c90bd63aed
xen-unstable date: Tue Sep 14 17:46:21 2010 +0100
Keir Fraser [Fri, 17 Dec 2010 16:13:54 +0000 (16:13 +0000)]
tools/hotplug/Linux: Avoid dependency on iptables conntrack module.
Checking for RELATED,ESTABLISHED traffic being sent to a domU requires
connection tracking, which adds unexpected (to most users) load to
dom0. Heavily loaded systems can fill the conntrack tables.
So avoid this, be more liberal in what we accept, and leave it to domU
to police its own input.
tools/hotplug/Linux: supply --physdev-is-bridged in iptables runes
With newer (pvops) kernels logs get flooded with this iptables
warning: physdev match: using --physdev-out in the OUTPUT, FORWARD and
POSTROUTING chains for non-bridged traffic is not supported anymore
Using the --physdev-is-bridged option prevents this.
See also: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=571634#10
Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 22385:b0fe8260cefa
xen-unstable date: Wed Nov 10 14:37:19 2010 +0000
Keir Fraser [Wed, 15 Dec 2010 10:47:52 +0000 (10:47 +0000)]
ept: Remove lock in ept_get_entry, replace with access-once semantics.
This mirrors the RVI/shadow situation, where p2m read access is
lockless because it's done in the hardware (linear map of the p2m
table).
This fixes the original bug (call it bug A) without introducing bug B
(a deadlock).
Bug A was caused by a race when updating p2m entries: between testing
if it's valid, and testing if it's populate-on-demand, it may have
been changed from populate-on-demand to valid.
My original patch simply introduced a lock into ept_get_entry, but
that caused bug B, caused by circular locking order: p2m_change_type
[grabs p2m lock] -> set_p2m_entry -> ept_set_entry ->
ept_set_middle_level -> p2m_alloc [grabs hap lock] write cr4 ->
hap_update_paging_modes [grabes hap lock] -> hap_update_cr3 ->
gfn_to_mfn -> ept_get_entry -> [grabs p2m lock]
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 22526:7a5ee3800417
xen-unstable date: Wed Dec 15 10:47:05 2010 +0000
Keir Fraser [Wed, 15 Dec 2010 10:32:22 +0000 (10:32 +0000)]
tmem: two wrongs (or three lefts and a wrong) make a right
These two bugs apparently complement each other enough that
they escaped problems in my testing, but eventually gum
up the works and are obviously horribly wrong.
Found while developing tmem for native Linux.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 22525:01f3b3509023
xen-unstable date: Wed Dec 15 10:27:18 2010 +0000
Keir Fraser [Wed, 15 Dec 2010 10:31:59 +0000 (10:31 +0000)]
x86/iommu: account for necessary allocations when calculating Dom0's
initial allocation size
As of c/s 21812:e382656e4dcc, IOMMU related allocations for Dom0
happen only after it got all of its memory allocated, and hence the
reserve (mainly for setting up its swiotlb) may get exhausted without
accounting for the necessary allocations up front.
While not precise, the estimate has been found to be within a couple
of pages for the systems it got tested on.
For the calculation to be reasonably correct, this depends on the
patch titled "x86/iommu: don't map RAM holes above 4G" sent out
yesterday.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22506:618ba64260fa
xen-unstable date: Tue Dec 14 09:54:10 2010 +0000
Keir Fraser [Wed, 15 Dec 2010 10:31:08 +0000 (10:31 +0000)]
x86 acpi: Follow Windows behaviour more closely during reset.
This follows some changes proposed for upstream Linux:
1. Do not check the FADT reset register size/offset
2. Try ACPI poking twice during our reset attempt sequence
Hopefully this will help us reset reliably on a wider range of
platforms.
Keir Fraser [Fri, 10 Dec 2010 11:34:28 +0000 (11:34 +0000)]
x86 hvm: Add a new HVMOP to get the current Xen system time
Xen absolute system time, so that it can use SCHEDOP_poll in a
sensible fashion. HVM PV drivers can't use the normal PV clock
because they might have TSC offsets that hey don't know about.
Keir Fraser [Thu, 9 Dec 2010 10:14:57 +0000 (10:14 +0000)]
x86/mm: change ASSERTs to BUG_ONs in mem_sharing.c
These two ASSERTs have important side-effects so make them into
BUG_ONs
consistent with the rest of the file.
Bug found by Jui-Hao Chiang <juihaochiang@gmail.com>.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 22467:89116f28083f
xen-unstable date: Wed Dec 08 10:46:31 2010 +0000
Keir Fraser [Tue, 7 Dec 2010 18:37:31 +0000 (18:37 +0000)]
x86: remove BUG_ON() from QUIRK_IOAPIC_*_REGSEL handler
Since (non-pvops, 32-bit only up to 2.6.27) Linux would report "BAD"
unconditionally on all SiS chipset versions (it only looks for a PCI
device at 0000:00:00.0 with SiS as the vendor), we must not crash if
the report on a 64-bit hypervisor doesn't match the #define (which is
zero).
While we could honor the quirk indication even on 64-bit, it doesn't
seem worthwhile, as there's no evidence that newer SiS chipsets
(supporting 64-bit CPUs) are actually affected.
This should also address bug 1687 (mis-reported, however, afaict).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22466:bfd13358b8bf
xen-unstable date: Tue Dec 07 18:32:04 2010 +0000
Keir Fraser [Wed, 1 Dec 2010 20:14:56 +0000 (20:14 +0000)]
x86: fix IRQ migration when using directed EOI (broken with c/s 20465)
In directed-EOI mode, there is no chance to do the migration in
mask_and_ack_level_ioapic_irq(), as the remote IRR bit can't possibly
be clear after issuing the EOI to the LAPIC. Consequently, there's no
point to even try. Instead, migration must be done in
end_level_ioapic_irq(), and it requires masking the interrupt source
prior to issuing the EOI to the IO-APIC.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22452:62bf12040b0f
xen-unstable date: Wed Dec 01 20:10:27 2010 +0000
Keir Fraser [Tue, 30 Nov 2010 11:38:16 +0000 (11:38 +0000)]
x86 hvm: Do not overwrite boot-cpu capability data on VMX/SVM startup.
Apparently required back in the earliest days of Xen, we now properly
initialise CPU capabilities early during bootstrap. Re-writing
capability data later now causes problems if specific features have
been deliberately masked out.
Thanks to Weidong Han at Intel for finding such a bug where XSAVE
feature is masked out by default, but then erroneously written back
during VMX initialisation. This would cause memory corruption problems
during boot for XSAVE-capable systems.
Keir Fraser [Mon, 29 Nov 2010 14:46:43 +0000 (14:46 +0000)]
x86: tighten filter on ptwr_do_page_fault()
Even not-so-recent Linux may, due to post-2.6.18 changes to the
process creation code, cause quite a number (depending on environment
and argument size) of faulting accesses to user space originating from
kernel mode. Generally those happen for non-present pages and would
lead to a nested page fault from guest_get_eff_l1e(). They can be
avoided by checking for PFEC_page_present as long as the guest isn't
running on shadow page tables.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 22449:3afb5ecbf69f
xen-unstable date: Mon Nov 29 14:40:55 2010 +0000
Keir Fraser [Mon, 29 Nov 2010 14:46:01 +0000 (14:46 +0000)]
x86-64: don't crash Xen upon direct pv guest access to GDT/LDT mapping area
handle_gdt_ldt_mapping_fault() is intended to deal with indirect
accesses (i.e. those caused by descriptor loads) to the GDT/LDT
mapping area only. While for 32-bit segment limits indeed prevent the
function being entered for direct accesses (i.e. a #GP fault will be
raised even before the address translation gets done, on 64-bit even
user mode accesses would lead to control reaching the BUG_ON() at the
beginning of that function.
Fortunately the fix is simple: Since the guest kernel runs in ring 3,
any guest direct access will have the "user mode" bit set, whereas
descriptor loads always do the translations to access the actual
descriptors as kernel mode ones.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Further, relax the BUG_ON() in handle_gdt_ldt_mapping_fault() to a
check-and-bail. This avoids any problems in future, if we don't
execute x86_64 guest kernels in ring 3 (e.g., because we use a
lightweight HVM container).
Keir Fraser [Wed, 10 Nov 2010 14:16:45 +0000 (14:16 +0000)]
hvmloader: fix off-by-one-bit error when initialising PCI devices
hvmloader is responsible for - amoungst other things - initialising
the PCI device BARs prior to loading the guest BIOS. The previous
code only probed for devfn up to 128. The lower 3 bits are function
IDs so this meant that only devices in slots 0-15 were actually being
initialized.
Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com> Acked-by: Gianni Tedesco <gianni.tedesco@citrix.com>
xen-unstable changeset: 22383:cba667fb80cf
xen-unstable date: Wed Nov 10 13:58:16 2010 +0000
hvmloader: Fix 22383:cba667fb80cf iterating over defns 0..255
We need to declare devfn as wider than 8 bits for a loop 0<devfn<256
to terminate.
Keir Fraser [Mon, 8 Nov 2010 15:36:58 +0000 (15:36 +0000)]
Fix "Error: Device 51952 not connected"Â error when using pygrub
The following is the process of booting a DomU with 'mounted-blktap2'
(VHD
for example) and 'pygrub' as bootloader:
1. Connect boot-device to Dom0 as '/dev/xpvd'
2. Pygrub get info for load DomU
3. Disconnect boot-device from Dom0
4. Boot DomU
During step 3 the created device is disconnected from Dom0, but
xenstore does not scrape away after the device is disconnected so you
get the following error:
   "Error: Device /dev/xvdp (51952, tap2) is already connected."
During step 3 xend calls destroyDevice always with 'tap' as argument.
Keir Fraser [Mon, 8 Nov 2010 15:35:30 +0000 (15:35 +0000)]
tools/xenpaging: Add _XOPEN_SOURCE to fix build problems with recent gcc
This patch fixes compilation issues with
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21).
Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 22023:af6799abc6e9
xen-unstable date: Wed Aug 18 16:48:25 2010 +0100
Keir Fraser [Wed, 3 Nov 2010 08:28:36 +0000 (08:28 +0000)]
VT-d: fix device assignment failure (regression from Xen c/s 19805:2f1fa2215e60)
If the device at <secbus>:00.0 is the device the mapping operation was
initiated for, trying to map it a second time will fail, and hence
this second mapping attempt must be prevented (as was done prior to
said c/s).
While at it, simplify the code a little, too.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Weidong Han <weidong.han@intel.com>
xen-unstable changeset: 22348:2dfba250c50b
xen-unstable date: Wed Nov 03 08:18:51 2010 +0000
Keir Fraser [Sun, 24 Oct 2010 12:26:45 +0000 (13:26 +0100)]
x86/kexec: fix very old regression and make compatible with modern Linux
c/s 13829 lost the (32-bit only) cpu_has_pae argument passed to the
primary kernel's stub (in the 32-bit Xen case only), and Linux
2.6.27/.30 (32-/64-bit) introduced a new argument (for KEXEC_JUMP)
which for now simply gets passed a hardcoded value.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22280:d6e3cd10a9a6
xen-unstable date: Sun Oct 24 13:15:06 2010 +0100
Keir Fraser [Sun, 24 Oct 2010 12:26:17 +0000 (13:26 +0100)]
Allow max_pages to be set to less than tot_pages
The memory allocation code sometimes needs to enforce that a guest
that's been told to balloon down isn't going to expand further
(because it's still executing a previous balloon-up operation). That
means being able to set the desired max_pages even before the balloon
driver has brought tot_pages down to the right level.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 22279:2208a036f8d9
xen-unstable date: Sun Oct 24 13:13:04 2010 +0100
Keir Fraser [Wed, 20 Oct 2010 12:34:36 +0000 (13:34 +0100)]
x86-64: workaround for BIOSes wrongly enabling LAHF_LM feature indicator
This workaround is taken from Linux, and the main motivation (besides
such workarounds indeed belonging in the hypervisor rather than each
kernel) is to suppress the warnings in the Xen log each Linux guest
would cause due to the disallowed wrmsr.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 22232:eb964c4b4f31
xen-unstable date: Mon Oct 11 09:02:36 2010 +0100
Keir Fraser [Sat, 2 Oct 2010 14:13:01 +0000 (15:13 +0100)]
x86 shadow: reset up-pointers on all l3s when l3s stop being pinnable.
Walking the pinned-shadows list isn't enough: there could be an
unpinned (but still shadowed) l3 somewhere and if we later try to
unshadow it it'll have an up-pointer of PAGE_LIST_NULL:PAGE_LIST_NULL.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 22224:a4016a257672
xen-unstable date: Sat Oct 02 15:05:50 2010 +0100
Keir Fraser [Sat, 2 Oct 2010 14:10:53 +0000 (15:10 +0100)]
Vt-d: fix dom0 graphics problem on Levnovo T410.
The patch is derived from a similar quirk in Linux kernel by David
Woodhouse and Adam Jackson. It checks for VT enabling bit in IGD GGC
register. If VT is not enabled correctly in the IGD, Xen does not
enable VT-d translation for IGD VT-d engine. In case where iommu boot
parameter is set to force, Xen calls panic().
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
xen-unstable changeset: 22223:4beee5779122
xen-unstable date: Sat Oct 02 15:04:21 2010 +0100
Keir Fraser [Sat, 2 Oct 2010 14:10:22 +0000 (15:10 +0100)]
x86: fix boot failure (regression from pre-4.0 IRQ handling changes)
With the change to index irq_desc[] by IRQ rather than by vector, the
prior implicit change of the used flow handler when altering the IRQ
routing path to go through the 8259A didn't work anymore, and hence
on boards needing the ExtINT delivery workaround failed to boot.
Make make_8259A_irq() a real function again, thus allowing the flow
handler to be changed there.
Also eliminate the generally superfluous and (at least theoretically)
dangerous hard coded setting of the flow handler for IRQ0: Earlier
code should have set this already based on information coming from
ACPI/MPS, and non-standard systems may e.g. have this IRQ level
triggered.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Tested-by: Markus Schuster <ml@markus.schuster.name>
xen-unstable changeset: 22222:aed9fd361340
xen-unstable date: Sat Oct 02 15:03:15 2010 +0100
Keir Fraser [Sat, 2 Oct 2010 14:10:01 +0000 (15:10 +0100)]
Vt-d: fix feature boot messages
Changed vt-d feature boot messages from "supported" to "enabled" since
they reflect what is currently enabled in this Xen boot - not what is
supported by VT-d hardware.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
xen-unstable changeset: 22221:3518149c4d5d
xen-unstable date: Sat Oct 02 15:00:05 2010 +0100
While not as relevant after c/s 21894, is still seems safer to check
the CPUID level here, just like Linux does. The is particularly
relevant for the 4.0 tree (which doesn't have said c/s), but also
possibly for nested environments where writing MSR_IA32_MISC_ENABLE
may not actually take effect (Xen itself ignores such writes).
Mfns for PV domains were not properly checked, potentially
allowing a buggy or malicious PV guest to crash Xen. Also,
use get_page/put_page to claim a reference to the pages
so they can't disappear out from under tmem's feet.
Revert 22186:7167d6dd5c7c "x86: Retry do_mmu_update() a few times"
It does not work reliably for a couple of reasons:
(1) page_lock() fails if a page is !PGT_validated, and a page can
remain in that state for unbounded time.
(2) in the kernel-side race that motivated this patch, pgd_pin() can
lose to vmalloc_sync_all() -- pgd_pin() can try to chaneg a pmd page's
type to l2_pagetable while
vmalloc_sync_all()->set_pmd()->do_mmu_update() has it temporarily
pinned as writable. This is hard to fix on the Xen side.
Hence I give up on this approach, revert the patch, and settle for
kernel-side patching only.
x86: Retry do_mmu_update() a few times when called on a pte whose type is in flux.
This can really happen -- all our PV Linux kernels have a race
between vmalloc_sync_all() and pgdir pinning/unpinning. The former is
protected by pgd_lock while the latter by mm->page_table_lock. Hence
they can happen concurrently, and vmalloc_sync_all() can attempt to
set_pmd() on a page directory which is in the process of being
pinned. This can confuse the hypervisor which may see a type change,
and hence fail do_mmu_update(). Until this patch. :-)
sched_credit: Raise bar for inter-socket migrations on mostly-idle systems
The credit scheduler ties to keep work balanced, even on a mostly idle
system. Unfortunately, if you have one VM burning cpu and another VM
idle, the effect is that the busy VM will flip back and forth between
sockets.
This patch addresses this, by only migrating to a different socket if
the number of idle processors is twice that of the socket the vcpu is
currently on.
This will only affect mostly-idle systems; as the system becomes more
busy, other load-balancing code will come into effect.
Several seconds of backward time drift per minute can be seen on a
RHEL6 HVM guest by switching the clocksource to 'acpi_pm' and then
running gettimeofday() in a loop. This is due to the accumulation
of small inaccuracies that are caused by shifting out the lower 32
bits when pmt_update_time() computes 'tmr_val'.
The patch makes sure that the lower 32 bits of the computed value
are not lost. They are saved in a new field 'not_accounted' in the
PMTState structure and are accounted the next time pmt_update_time()
is called.
C6 state with EOI issue fix for some Intel processors
There is an errata in some of Intel processors.
AAJ72. EOI Transaction May Not be Sent if Software Enters Core C6
During an Interrupt Service Routine
If core C6 is entered after the start of an interrupt service routine
but before a write to the APIC EOI register, the core may not send an
EOI transaction (if needed) and further interrupts from the same
priority level or lower may be blocked.
This patch fix this issue, by checking if ISR is pending before enter
deep Cx state. If so, it would use power->safe_state instead of deep
Cx state to prevent the above issue happen.
tmem (tools): move to new ABI version to handle long object-ids
After a great deal of discussion and review with linux
kernel developers, it appears there are "next-generation"
filesystems (such as btrfs, xfs, Lustre) that will not
be able to use tmem due to an ABI limitation... a field
that represents a unique file identifier is 64-bits in
the tmem ABI and may need to be as large as 192-bits.
So to support these guest filesystems, the tmem ABI must be
revised, from "v0" to "v1".
I *think* it is still the case that tmem is experimental
and is not used anywhere yet in production.
The tmem ABI is designed to support multiple revisions,
so the Xen tmem implementation could be updated to
handle both v0 and v1. However this is a bit
messy and would require data structures for both v0
and v1 to appear in public Xen header files.
I am inclined to update the Xen tmem implementation
to only support v1 and gracefully fail v0. This would
result in only a performance loss (as if tmem were
disabled) for newly launched tmem-v0-enabled guests,
but live-migration between old tmem-v0 Xen and new
tmem-v1 Xen machines would fail, and saved tmem-v0
guests will not be able to be restored on a tmem-v1
Xen machine. I would plan to update both pre-4.0.2
and unstable (future 4.1) to only support v1.
I believe these restrictions are reasonable at this
point in the tmem lifecycle, though they may not
be reasonable in the near future; should the tmem
ABI need to be revised from v1 to v2, I understand
backwards compatibility will be required.
tmem (hv): move to new ABI version to handle long object-ids
After a great deal of discussion and review with linux
kernel developers, it appears there are "next-generation"
filesystems (such as btrfs, xfs, Lustre) that will not
be able to use tmem due to an ABI limitation... a field
that represents a unique file identifier is 64-bits in
the tmem ABI and may need to be as large as 192-bits.
So to support these guest filesystems, the tmem ABI must be
revised, from "v0" to "v1".
I *think* it is still the case that tmem is experimental
and is not used anywhere yet in production.
The tmem ABI is designed to support multiple revisions,
so the Xen tmem implementation could be updated to
handle both v0 and v1. However this is a bit
messy and would require data structures for both v0
and v1 to appear in public Xen header files.
I am inclined to update the Xen tmem implementation
to only support v1 and gracefully fail v0. This would
result in only a performance loss (as if tmem were
disabled) for newly launched tmem-v0-enabled guests,
but live-migration between old tmem-v0 Xen and new
tmem-v1 Xen machines would fail, and saved tmem-v0
guests will not be able to be restored on a tmem-v1
Xen machine. I would plan to update both pre-4.0.2
and unstable (future 4.1) to only support v1.
I believe these restrictions are reasonable at this
point in the tmem lifecycle, though they may not
be reasonable in the near future; should the tmem
ABI need to be revised from v1 to v2, I understand
backwards compatibility will be required.
Keir Fraser [Mon, 30 Aug 2010 07:59:46 +0000 (08:59 +0100)]
ept: Put locks around ept_get_entry
There's a subtle race in ept_get_entry, such that if tries to read an
entry that ept_set_entry is modifying, it gets neither the old entry
nor the new entry, but empty. In the case of multi-cpu
populate-on-demand guests, this manifests as a guest crash when one
vcpu tries to read a page which another page is trying to populate,
and ept_get_entry returns p2m_mmio_dm.
This bug can also be fixed by making both ept_set_entry and
ept_next_level access-once (i.e., ept_next_level reads full ept_entry
and then works with local value; ept_set_entry construct the entry
locally and then sets it in one write). But there doesn't seem to be
any major performance implications of just making ept_get_entry use
locks; so the simpler, the better.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 22071:c5aed2e049bc
xen-unstable date: Mon Aug 30 08:39:52 2010 +0100
Keir Fraser [Mon, 30 Aug 2010 07:50:52 +0000 (08:50 +0100)]
x2APIC: Improve x2APIC suspend/resume
x2apic depends on interrupt remapping, so it should disable interrupt
remapping behind x2apic disabling. And also this patch wraps
__enable_x2apic to get rid of duplicated code.
Signed-off-by: Weidong Han <weidong.han@intel.com>
xen-unstable changeset: 3cee41690fa2
xen-unstable date: Fri Aug 13 14:58:06 2010 +0100