Jan Beulich [Sat, 17 Sep 2011 15:38:31 +0000 (16:38 +0100)]
x86/vmx: don't call __vmxoff() blindly
If vmx_vcpu_up() failed, __vmxon() would generally not have got
(successfully) executed, and in that case __vmxoff() will #UD.
Additionally, any panic() during early resume (namely the tboot
related one) would cause vmx_cpu_down() to get executed without
vmx_cpu_up() having run before.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 23848:cf37d2eec2ef
xen-unstable date: Sat Sep 17 16:26:37 2011 +0100
George Dunlap [Sat, 17 Sep 2011 15:37:56 +0000 (16:37 +0100)]
xen: Move tsc reliability check until after CPUs have booted
AMD CPUs by default enable X86_FEATURE_TSC_RELIABLE, and depend upon a
later check to disable this feature if TSC drift is detected.
Unfortunately, this check is done in time.c:init_xen_time(), which is
done before any secondary CPUs are brought up, and is thus guaranteed
to succed.
This patch moves the check into its own function, and calls it after
cpus are brought up.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 23846:bf2aaf21e8e7
xen-unstable date: Sat Sep 17 16:22:54 2011 +0100
Latest Intel processor add cpuid faulting feature. This patch is used
to support cpuid faulting in Xen. Like cpuid spoofing, cpuid faulting
mainly used to support live migration. When cpuid faulting enabled,
cpuid instruction runs at cpl>0 will produce GP, vmm then emulate
execution of the cpuid instruction. Hence will appear to guest
software the value chosen by the vmm.
Andrew Cooper [Tue, 13 Sep 2011 09:38:34 +0000 (10:38 +0100)]
IRQ: IO-APIC support End Of Interrupt for older IO-APICs
The old io_apic_eoi() function using the EOI register only works for
IO-APICs with a version of 0x20. Older IO-APICs do not have an EOI
register so line level interrupts have to be EOI'd by flipping the
mode to edge and back, which clears the IRR and Delivery Status bits.
This patch replaces the current io_apic_eoi() function with one which
takes into account the version of the IO-APIC and EOI's
appropriately.
v2: make recursive call to __io_apic_eoi() to reduce code size.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23833:ffe8e65f6687
xen-unstable date: Tue Sep 13 10:33:10 2011 +0100
xen: if mapping GSIs we run out of pirq < nr_irqs_gsi, use the others
PV on HVM guests can have more GSIs than the host, in that case we
could run out of pirq < nr_irqs_gsi. When that happens use pirq >=
nr_irqs_gsi rather than returning an error.
xen: __hvm_pci_intx_assert should check for gsis remapped onto pirqs
If the isa irq corresponding to a particular gsi is disabled while the
gsi is enabled, __hvm_pci_intx_assert will always inject the gsi
through the violapic, even if the gsi has been remapped onto a pirq.
This patch makes sure that even in this case we inject the
notification appropriately.
hvm_domain_use_pirq should return true when the guest is using a
certain pirq, no matter if the corresponding event channel is
currently enabled or disabled. As an additional complication, qemu is
going to request pirqs for passthrough devices even for Xen unaware
HVM guests, so we need to wait for an event channel to be connected
before considering the pirq of a passthrough device as "in use".
Andrew Cooper [Wed, 31 Aug 2011 14:31:22 +0000 (15:31 +0100)]
IRQ: manually EOI migrating line interrupts
When migrating IO-APIC line level interrupts between PCPUs, the
migration code rewrites the IO-APIC entry to point to the new
CPU/Vector before EOI'ing it.
The EOI process says that EOI'ing the Local APIC will cause a
broadcast with the vector number, which the IO-APIC must listen to to
clear the IRR and Status bits.
In the case of migrating, the IO-APIC has already been
reprogrammed so the EOI broadcast with the old vector fails to match
the new vector, leaving the IO-APIC with an outstanding vector,
preventing any more use of that line interrupt. This causes a lockup
especially when your root device is using PCI INTA (megaraid_sas
driver *ehem*)
However, the problem is mostly hidden because send_cleanup_vector()
causes a cleanup of all moving vectors on the current PCPU in such a
way which does not cause the problem, and if the problem has occured,
the writes it makes to the IO-APIC clears the IRR and Status bits
which unlocks the problem.
This fix is distinctly a temporary hack, waiting on a cleanup of the
irq code. It checks for the edge case where we have moved the irq,
and manually EOI's the old vector with the IO-APIC which correctly
clears the IRR and Status bits. Also, it protects the code which
updates irq_cfg by disabling interrupts.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23805:7048810180de
xen-unstable date: Wed Aug 31 15:19:24 2011 +0100
libxl: Do not SEGV when no 'removable' disk parameter in xenstore
Just assume disk as not removable when no 'removable' paremeter
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23607:2f63562df1c4 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Previously PyString_FromString(NULL) was called, which caused assertion
failure.
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23606:cc2f376d0cd9 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: Allocate memory for strings in libxl_device_disk
Memory for strings in libxl_device_disk must be allocated from outside of
libxl__gc to not be freed at the end of function (by libxl__free_all).
Fixes xl block-detach
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23603:6656d80b4de4 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: Accept disk name in libxl_devid_to_device_disk
Accept disk name in xl block-detach.
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23604:5d7998be2252 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: Remove frontend and backend devices from xenstore after destroy
Cleanup frontend and backend devices from xenstore for all dev types - not only
disks. Because backend cleanup moved to libxl__device_destroy,
libxl__devices_destroy is somehow simpler.
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23605:ff8d170852b3 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
remus: handle exceptions while installing/unstalling net buffer
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23600:15fc211a13bf Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
At the end of a checkpoint, when a new flush (of buffered disk writes)
is merged with ongoing flush, we have to make sure that none of the new
disk I/O requests overlap with ones in in progress. If it does, hold the
request and dont issue I/O until the overlapping one finishes. If we allow
the I/O to proceed, we might end up with two overlapping requests in the
disk's queue and the disk may not offer any guarantee on which one is
written first.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23414:ecff559bf474 Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
DRBD disk backends can be used instead of tapdisk backends for Remus.
This requires a Remus style disk replication protocol (asynchronous
replication with output buffering at backup), that is not available in
standard DRBD code. A modified version that supports this new replication
protocol is available from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus
Use of DRBD disk backends provides a means for efficient
resynchronization of data after the crashed machine comes back
online. Since DRBD allows for online resynchronization, a DRBD backed
Remus VM does not have to be stopped or shutdown while the disks are
resynchronizing. Once resynchronization is complete, Remus can be
started at will.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23413:62c0dfc9efbf Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
This was introduced in 23195:13ec53a59a42
It is a problem for Python 2.4 and earlier, only.
So use try...(try...except)...finally as suggested by Ian Campbell.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Acked-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23379:b04e57ec4671 Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
The new --null option allows one to test and play with just the
memory checkpointing and network buffering aspect of remus, without
the need for a second host. The disk is not replicated. All replication
data is sent to /dev/null. This option is pretty handy when a user
wants to see the page churn for his workload or observe the latency hit
though the latter will not be accurate.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23196:29d81623dc14 Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
current check includes ingress and pfifo_fast.
Add mq to the list of allowed qdiscs already installed
on ifb. This patch fixes cases where remus fails to start,
due to an mq qdisc already present on the vif.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23109:c8ae80a11d47
Backport-requested: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 30 Aug 2011 15:57:05 +0000 (16:57 +0100)]
xl: print sxp on dry-run of create.
The help text for xm create's --dry-run says "Dry run - prints the
resulting configuration in SXP but does not create the domain." so
update xl implementation to match. At least the xendomains initscript
relies on this (for better or worse).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Carsten Schiers <carsten@schiers.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23467:2ae357405850
Backport-requested: Carsten Schiers <carsten@schiers.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Fabio Fantoni [Tue, 30 Aug 2011 15:56:22 +0000 (16:56 +0100)]
tools: Improved LSB headers in init.d scripts
xendomains service now working also without xend service
Signed-off-by: Fabio Fantoni <fabio.fantoni@heliman.it> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23673:0648846b4d17
Backport-requested: Carsten Schiers <carsten@schiers.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
/etc/init.d/xendomains relies on simple pattern matching from sructures
being printed by "xl list -l" command. so update xl implementation to
match.
Signed-off-by: Carsten Schiers <carsten@schiers.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23567:c2995f0555af Backported-by: Carsten Schiers <carsten@schiers.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
David Vrabel [Mon, 22 Aug 2011 09:16:15 +0000 (10:16 +0100)]
x86: use 'dom0_mem' to limit the number of pages for dom0
Use the 'dom0_mem' command line option to set the maximum number of
pages for dom0. dom0 can use then use the XENMEM_maximum_reservation
memory op to automatically find this limit and reduce the size of any
page tables etc.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
xen-unstable changeset: 23779:c56dd5eb0fa2
xen-unstable date: Mon Aug 22 10:05:27 2011 +0100
Kevin Tian [Mon, 22 Aug 2011 09:14:14 +0000 (10:14 +0100)]
cpuidle: initialize default Cstate information
C0/C1 should be always available when cpuidle is enabled in Xen.
When there's case that Dom0 doesn't register ACPI Cstate information,
e.g. due to BIOS issue or acpi processor module is not installed,
this patch provides basic C0/C1 information available to xenpm tool.
Andrew Cooper [Fri, 19 Aug 2011 09:00:25 +0000 (10:00 +0100)]
x86/KEXEC: disable hpet legacy broadcasts earlier
On x2apic machines which booted in xapic mode,
hpet_disable_legacy_broadcast() sends an event check IPI to all online
processors. This leads to a protection fault as the genapic blindly
pokes x2apic MSRs while the local apic is in xapic mode.
One option is to change genapic when we shut down the local apic, but
there are still problems with trying to IPI processors in the online
processor map which are actually sitting in NMI loops
Another option is to have each CPU take itself out of the online CPU
map during the NMI shootdown.
Realistically however, disabling hpet legacy broadcasts earlier in the
kexec path is the easiest fix to the problem.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23776:0ddb4481f883
xen-unstable date: Fri Aug 19 09:58:22 2011 +0100
Jan Beulich [Tue, 16 Aug 2011 14:21:46 +0000 (15:21 +0100)]
x86/PCI-MSI: properly determine VF BAR values
As was discussed a couple of times on this list, SR-IOV virtual
functions have their BARs read as zero - the physical function's
SR-IOV capability structure must be consulted instead. The bogus
warnings people complained about are being eliminated with this
change.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23766:8d6edc3d26d2
xen-unstable date: Sat Aug 13 10:14:58 2011 +0100
PCI: consolidate interface for adding devices
The functionality of pci_add_device_ext() can be easily folded into
pci_add_device(), and eliminates the need to change two functions for
future adjustments.
Andrew Cooper [Tue, 16 Aug 2011 14:17:43 +0000 (15:17 +0100)]
x86: IRQ fix incorrect logic in __clear_irq_vector
In the old code, tmp_mask is the cpu_and of cfg->cpu_mask and
cpu_online_map. However, in the usual case of moving an IRQ from one
PCPU to another because the scheduler decides its a good idea,
cfg->cpu_mask and cfg->old_cpu_mask do not intersect. This causes the
old cpu vector_irq table to keep the irq reference when it shouldn't.
This leads to a resource leak if a domain is shut down wile an irq has
a move pending, which results in Xen's create_irq() eventually failing
with -ENOSPC when all vector_irq tables are full of stale references.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23765:68b903bb1b01
xen-unstable date: Sat Aug 13 10:14:28 2011 +0100
Jan Beulich [Tue, 16 Aug 2011 14:17:06 +0000 (15:17 +0100)]
VT-d: don't reject valid DMAR/ATSR tables on systems with multiple PCI segments
On multi-PCI-segment systems, each segment has to be expected to have
an include-all DRHD and an all-ports ATSR, so the firmware consistency
check incorrectly rejects valid configurations there (which is
particularly problematic when the firmware also pre-enabled x2apic
mode, as the system will panic in that case due to being unable to
enable interrupt remapping). Thus constrain the check to just segment
0 for now; once full multi-segment support is there (which I'm working
on), it can be revisited whether we'd want to track this per segment,
or whether we trust the firmware of such large systems.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23763:8f647d409196
xen-unstable date: Sat Aug 13 10:12:49 2011 +0100
Tim Deegan [Mon, 25 Jul 2011 15:48:39 +0000 (16:48 +0100)]
VT-d: always clean up dpci timers.
If a VM has all its PCI devices deassigned, need_iommu(d) becomes
false but it might still have DPCI EOI timers that were init_timer()d
but not yet kill_timer()d. That causes xen to crash later because the
linked list of inactive timers gets corrupted, e.g.:
tools: xencommons NetBSD init script: Multiple bugfixes and improvements
Added a cleanup of the xenstore database, to purge old entries,
prevented the restart of xenstore and set Domain-0 name. Also
replaced the sleep 5 (wait for xenstore to come up) with the method
used in the linux init script.
Signed-off-by: Roger Pau Monne <roger.pau@entel.upc.edu> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Christoph Egger <Christoph.Egger@amd.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23699:6fe9f26bb9ae
xen-unstable date: Fri Jul 15 18:22:03 2011 +0100
xend: NetBSD portability fix for LVM raw volume names
Xen 4.1.1 was incorrectly passing /dev/rmapper/vg-lvname to pygrub
(notice the r in front of mapper), when it should pass
/dev/mapper/rvg-lvname (add the r to the last file) when using NetBSD.
I've patched it to work correctly. I'm attaching a unified diff with
the patch made against Xen 4.1.1 (it's a really simple modification).
From: Roger Pau Monne <roger.pau@entel.upc.edu> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23629:89ce3439686b
xen-unstable date: Tue Jun 28 13:56:53 2011 +0100
Mike McClurg [Thu, 21 Jul 2011 13:40:39 +0000 (14:40 +0100)]
tools/ocaml: ask compiler for correct library
OCaml libraries will live in /usr/local/ if the user compiles OCaml
from source. This patch asks the OCaml compiler where we should look
for libraries.
NB: it may be that we should do the same thing for the NetBSD case,
but I don't have a BSD box to test this out.
Signed-off-by: Mike McClurg <mike.mcclurg@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23566:7e5b54d1643e
xen-unstable date: Tue Jun 21 18:01:51 2011 +0100
hvmloader: Switch to absolute addressing for calling hypercall stubs.
This is clearer and less fragile than trying to make relative calls
work. In particular, the old approach failed if _start was not
== HVMLOADER_PHYSICAL_ADDRESS. This was the case for some modern
toolchains which reorder functions.
Jan Beulich [Sat, 16 Jul 2011 08:33:46 +0000 (09:33 +0100)]
x86: fix guest migration after c/s 20892:d311d1efc25e
Guests would not manage to run successfully after being migrated to a
host having sufficiently much more memory than the host they were
originally started on.
Subsequently the plan is to re-enable the changes behavior under the
control of a guest kernel announced feature flag.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 23706:3dd399873c9e
xen-unstable date: Sat Jul 16 09:18:21 2011 +0100
David Vrabel [Sat, 16 Jul 2011 08:33:07 +0000 (09:33 +0100)]
xen/libxc: set CPUID topology leaf as unsupported for PV guests
The result of a CPUID Extended Topology Enumeration leaf for PV guests
is invalid as the level in ECX is ignored. This can cause some guests
to loop endlessly when trying to enumerate the topology.
Since the physical topology isn't useful to PV guests set the topology
leaf as unsupported.
Guests affected include Linux kernels prior 2.6.32 where a workaround
was applied ("xen: mask extended topology info in cpu", 82d6469916c6fcfa345636a49004c9d1753905d1).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
xen-unstable changeset: 23700:867bb675b57b
xen-unstable date: Sat Jul 16 09:05:45 2011 +0100
23408:1fc3347850c7 causes the following error:
machine_kexec.c:106: error: static declaration of
'machine_kexec_get_xen' follows non-static declaration
/xen-unstable.hg/xen/include/xen/kexec.h:39: error: previous
declaration of 'machine_kexec_get_xen' was here
Andrew Cooper [Fri, 8 Jul 2011 07:57:11 +0000 (08:57 +0100)]
KEXEC: disconnect all PCI devices from the PCI bus on crash
In the case of a crash, IOMMU DMA remapping gets turned off so that
the kdump kernel may boot. However, this is warned as being dangerous
in the VTD specification if a DMA transaction is in progress.
Also, in the case of a crash, DMA transactions and interrupts from
peripheral devices such as network cards are likely to keep coming in.
Without DMA remapping enabled, the transactions will be writing over
low memory, corrupting the crash state, and perhaps even the kdump
reserved memory.
Therefore, on the crash path, we can disconnect all PCI devices from
their respective buses so that they are no longer able to be DMA
busmasters. This reduces the risk of DMA transactions corrupting
state (and will also reduce spurious interrupts arriving to the kdump
kernel) until the kdump kernel and properly reset the PCI devices.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23666:b96f8bdcaa15
xen-unstable date: Fri Jul 08 08:38:35 2011 +0100
Paul Durrant [Fri, 8 Jul 2011 07:56:42 +0000 (08:56 +0100)]
x86/hvm: Don't expose CPUID time leaf when not using PVRDTSCP
Some versions of Oracle's Solaris PV drivers make a check that the
maximal Xen hypervisor CPUID leaf is <= base leaf + 2 and refuse to
work if this is not the case. The addition of the time leaf makes the
maximal leaf == base leaf + 3 so this patch introduces a workaround
that obscures the time leaf unless PVRDTSCP is in operation.
x86 cpu: Fix bug: unify cpu_dev attr as __cpuinitdata
Currently different x86 cpu define different attr for cpu_dev.
Some cpu define as __initdata, this would be risk under cpu hotplug.
This patch fix the bug, unify them as __cpuinitdata, as what AMD cpu
define now.
Tim Deegan [Tue, 28 Jun 2011 08:32:00 +0000 (09:32 +0100)]
x86: fix boot-time watchdog test.
Since the perf counter that the LAPIC NMI watchdog uses only
runs while the core isn't halted, and all APs are idle at
this point in the boot process, it's possible that remote
CPUs won't see any NMIs during the 10-tick waiting period.
Force all CPUs to busy-wait so we know the timers are running.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 23612:6c7a23e08a04
xen-unstable date: Tue Jun 28 09:16:13 2011 +0100
pv-on-hvm: hvm_domain_use_pirq return positive no matter if the evtchn is bound
This patch fixes PV on HVM interrupt remapping with recent Linux
kernels and upstream qemu. hvm_domain_use_pirq should return positive
even if the evtchn is not currently bound. If it doesn't assert_irq
ends up injecting legacy interrupts even after the guest disabled the
irq.
Keir Fraser [Thu, 23 Jun 2011 10:54:53 +0000 (11:54 +0100)]
x86/hvm: add SMEP support to HVM guest
Intel new CPU supports SMEP (Supervisor Mode Execution
Protection). SMEP
prevents software operating with CPL < 3 (supervisor mode) from
fetching
instructions from any linear address with a valid translation for
which the U/S
flag (bit 2) is 1 in every paging-structure entry controlling the
translation
for the linear address.
This patch adds SMEP support to HVM guest.
Signed-off-by: Yang Wei <wei.y.yang@intel.com> Signed-off-by: Shan Haitao <haitao.shan@intel.com> Signed-off-by: Li Xin <xin.li@intel.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 23504:c34604d5a293
xen-unstable date: Mon Jun 06 13:46:48 2011 +0100
Intel new CPU supports SMEP (Supervisor Mode Execution
Protection). SMEP prevents software operating with CPL < 3 (supervisor
mode) from fetching instructions from any linear address with a valid
translation for which the U/S flag (bit 2) is 1 in every
paging-structure entry controlling the translation for the linear
address.
This patch enables SMEP in Xen to protect Xen hypervisor from
executing pv guest instructions, whose translation paging-structure
entries' U/S flags are all set.
Signed-off-by: Yang Wei <wei.y.yang@intel.com> Signed-off-by: Shan Haitao <haitao.shan@intel.com> Signed-off-by: Li Xin <xin.li@intel.com> Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 23481:0c0884fd8b49
xen-unstable date: Fri Jun 03 21:39:00 2011 +0100
Keir Fraser [Thu, 23 Jun 2011 10:48:18 +0000 (11:48 +0100)]
kexec: Backport fixes from xen-unstable
KEXEC: prevent panic on the kexec path when talking to the DMAR
hardware
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23547:b5955b9fc26c
xen-unstable date: Thu Jun 16 16:11:13 2011 +0100
KEXEC: correctly revert x2apic state when kexecing
Introduce the boolean variable 'kexecing' which indicates to functions
whether we are on the kexec path or not. This is used by
disable_local_APIC() to try and revert the APIC mode back to how it
was found on boot.
We also need some fudging of the x2apic_enabled variable. It is used
in multiple places over the codebase to mean multiple things,
including:
What did the user specifify on the command line?
Did the BIOS boot me in x2apic mode?
Is the BSP Local APIC in x2apic mode?
What mode is my Local APIC in?
Therefore, set it up to prevent a protection fault when disabling the
IOAPICs. (In this case, it is used in the "What mode is my Local APIC
in?" case, so the processor doesnt suffer a protection fault because
of trying to use x2apic MSRs when it should be using xapic MMIO)
Finally, make sure that interrupts are disabled when jumping into the
purgatory code. It would be bad to service interrupts in the Xen
context when the next kernel is booting.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23542:23c068b10923
xen-unstable date: Wed Jun 15 16:16:41 2011 +0100
IOMMU: add crash_shutdown iommu_op
The kdump kernel has problems booting with interrupt/dma
remapping enabled, so we need a new iommu_ops called
crash_shutdown which is basically suspend but doesn't
need to bother saving state.
Make sure that crash_shutdown is called on the kexec
path. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23541:c6307ddd3ab1
xen-unstable date: Wed Jun 15 16:10:11 2011 +0100
Experimental evidence shows that Extended Interrupt Mode remains in
effect even after Interrupt Remapping is disabled in each DMAR Global
Command Register. A consiquence of this is that when we switch from
x2apic mode back to xapic mode, and disable interrupt remapping for
the kdump kernel, interrupts passing through the IO APICs are in
x2apic format as opposed xapic. This causes a triple fault in the
kexec kernel.
As EIM is explicitly set up each time Interrup Remapping is enabled,
it is safe for us to clobber this when taring down.
Also, change the header definition of IRTA_REG_EIME_SHIFT. It caused
verbose and error-prone code, and was only used in 1 place before. We
now have IRTA_EIME which is the specific bit in the register.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23540:96f53d2b966e
xen-unstable date: Wed Jun 15 16:07:45 2011 +0100
Experimental evidence shows that Extended Interrupt Mode remains in
effect even after Interrupt Remapping is disabled in each DMAR Global
Command Register. A consiquence of this is that when we switch from
x2apic mode back to xapic mode, and disable interrupt remapping for
the kdump kernel, interrupts passing through the IO APICs are in
x2apic format as opposed xapic. This causes a triple fault in the
kexec kernel.
As EIM is explicitly set up each time Interrup Remapping is enabled,
it is safe for us to clobber this when taring down.
Also, change the header definition of IRTA_REG_EIME_SHIFT. It caused
verbose and error-prone code, and was only used in 1 place before. We
now have IRTA_EIME which is the specific bit in the register.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23515:337520d94cba
xen-unstable date: Tue Jun 14 13:04:09 2011 +0100
x86/apic: record local APIC state on boot
Xen does not store the boot local APIC state which leads to problems
when shutting down for a kexec jump. This patch records the boot
state so we can return to the boot state when kexecing.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Keir Fraser <keir@xen.org> Acked-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23514:d04608ad70f8
xen-unstable date: Tue Jun 14 13:02:00 2011 +0100
x86/kexec: nmi_shootdown_cpus() should leave irqs disabled
Jan Beulich [Wed, 15 Jun 2011 19:45:54 +0000 (20:45 +0100)]
x86-64: fix incorrect assertion in __maddr_to_virt()
When memory map sparseness reduction is in use, machine address ranges
can't validly be compared directly against the total size of the
direct mapping range.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23543:a8edfacd4b5e
xen-unstable date: Wed Jun 15 20:24:09 2011 +0100
George Dunlap [Wed, 15 Jun 2011 19:45:20 +0000 (20:45 +0100)]
x86/hvm: Crash domain rather than guest on unexpected PIO IO state
Under certain conditions, if an IO gets into an unexpected state,
hvmemul_do_io can return X86EMUL_UNHANDLEABLE. Unfortunately,
handle_pio() does not expect this state, and calls BUG() if it sees
it, crashing the host.
Other HVM io-related code crashes the guest in this case. This patch
makes handle_pio() do the same.
The crash was seen when executing crash_guest in dom0 to forcibly
crash the guest.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 23538:35b4220c98bc
xen-unstable date: Wed Jun 15 16:05:14 2011 +0100
Andrew Cooper [Wed, 15 Jun 2011 19:44:44 +0000 (20:44 +0100)]
x86/apic: fix potential Protection Fault during shutdown
This is a rare case, but if the BIOS is set to uniprocessor, and Xen
is booted with 'lapic x2apic', Xen will switch into x2apic mode, which
will cause a protection fault when disabling the local APIC. This
leads to a general protection fault as this code is also in the fault
handler.
When x2apic mode is enabled, the only tranlsation which does
not result in a protection fault is to clear both the EN and EXTD
bits, which is safe to do in all cases, even if you are in xapic
mode rather than x2apic mode.
The linux code from which this is derrived is protected by an
if ( ! x2apic_mode ...) clause which is how they get away with it.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23512:0feb98534a87
xen-unstable date: Tue Jun 14 12:47:45 2011 +0100
mem_event: Revert pointless, unrelated, and broken (on i386) change in 23434:ef410f262299
vcpu_pause() is nestable in the hypervisor, hence checking for
already-paused is not required.
Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 23435:c15f06b99bbe
xen-unstable date: Sat May 28 08:33:54 2011 +0100
mem_event: Allow memory access listener to perform single step execution.
Add a new memory event that handles single step. This allows the
memory access listener to handle instructions that modify data within
the execution page. This can be enabled in the listener by doing:
xc_set_hvm_param(xch, domain_id, HVM_PARAM_MEMORY_EVENT_SINGLE_STEP,
HVMPME_mode_sync)
Now the listener can start single stepping by:
xc_domain_debug_control(xch, domain_id,
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON, vcpu_id)
And stop single stepping by: xc_domain_debug_control(xch, domain_id,
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF, vcpu_id)
Signed-off-by: Aravindh Puthiyaparambil <aravindh@virtuata.com> Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 23434:ef410f262299
xen-unstable date: Fri May 27 18:44:26 2011 +0100
Markus Gross [Sat, 28 May 2011 08:09:40 +0000 (09:09 +0100)]
libxc: obtain correct length of p2m during core dumping
while implementing core dumping functionality for the libxl driver
of libvirt, I discovered an issue with mapping pages of a pv guest.
After dumping the core of a pv guest the domain was not cleared up
properly and some pages were not unmapped. This issue is similar
to the one reported here:
http://lists.xensource.com/archives/html/xen-devel/2011-05/msg01314.html
In xc_domain_dumpcore_via_callback in the file xc_core.c the function
xc_core_arch_map_p2m is called to map P2M_FL_ENTRIES pages to the
variable p2m.
But to unmap the pages later, the dinfo->p2m_size has to be set
accordingly.
This was not done, instead a variable named p2m_size was set.
This way P2M_FL_ENTRIES was always zero and the pages were left
mapped.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23374:8bd7b5e98f2a
xen-unstable date: Tue May 24 15:00:16 2011 +0100
Jim Fehlig [Sat, 28 May 2011 08:08:21 +0000 (09:08 +0100)]
libxc: after saving, unmap correct amount for live_m2p
With some help from Olaf, I've finally got to the bottom of an issue I
came across while trying to implement save/restore in the libvirt
libxenlight driver. After issuing the save operation, the saved
domain was not being cleaned up properly and left in this state from
xl's perspective
xen33:# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 6821 8 r----- 122.5
(null) 2 2 2 --pssd 10.8
Checking the libvirtd /proc/$pid/maps I found this
So not all all pages belonging to the domain were unmapped from
libvirtd. In tools/libxc/xc_domain_save.c we found that
P2M_FL_ENTRIES were being mapped but only P2M_FLL_ENTRIES were being
unmapped. The attached patch changes the unmapping to use the same
P2M_FL_ENTRIES macro. I'm not too familiar with this code though so
posting here for review.
I suspect this was not noticed before since most (all?) processes
doing save terminate after the save and are not long-running like
libvirtd.
Ian Campbell writes:
> Looks like I introduced this in 18558:ccf0205255e1, sorry!
>
> I guess it is also wrong in the error path out of map_and_save_p2m_table
> and so we also need [another hunk].
This change should be backported to relevant earlier trees. -iwj
From: Jim Fehlig <jfehlig@novell.com>
From: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23373:171007b4e2c4
xen-unstable date: Tue May 24 14:50:00 2011 +0100
Tim Deegan [Tue, 24 May 2011 07:19:39 +0000 (08:19 +0100)]
drivers/passthrough: fix error paths in pci_add_device*()
When a device can't be allocated to dom0 by the IOMMU, don't leave
dom0 in the "domain" field. It causes pci_remove_device()
to crash trying to remove the dev from the domain's list of devices
(and was probably the wrong thing to do anyway).
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 23371:4326bcd88b33
xen-unstable date: Mon May 23 18:35:32 2011 +0100
Keir Fraser [Tue, 24 May 2011 07:18:42 +0000 (08:18 +0100)]
Fix Config.mk's cc-option for -Wno-* options.
These disable-warning options are handled specially by GCC:
(a) they are ignored unless the compiler emits a warning; and
(b) even then they produce a warning rather than an error
To handle this, modify the test invocation of GCC to compile a
fragment of code that will always provoke a warning (integer assigned
to pointer). This works around (a) above.
Then, we grep the compiler's stdout/stderr for the option-under-test,
the presence of which would indicate an "unrecognized command-line
option" warning/error. This works around (b) above, letting us
distinguish between the "integer assigned to pointer" and
"unrecognized command-line option" warnings.