x86: re-inject emulated level pirqs in PV on HVM guests if still asserted
PV on HVM guests can loose level interrupts coming from emulated
devices if they have been remapped onto event channels. The reason is
that we are missing the code to inject a pirq again in the guest when
the guest EOIs it, if it corresponds to an emulated level interrupt
and the interrupt is still asserted.
Fix this issue and also return error when the guest tries to get the
irq_status of a non-existing pirq.
Mark Langsdorf [Sat, 12 Nov 2011 16:11:21 +0000 (16:11 +0000)]
x86/amd: Eliminate cache flushing when entering C3 on select AMD processors
AMD Fam15h processors have a shared cache. It does not need
to be be flushed when entering C3 and doing so causes reduces
performance. Modify acpi_processor_power_init_bm_check to
prevent these processors from flushing when entering C3.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
xen-unstable changeset: 23511:450f1d198e1e
xen-unstable date: Tue Jun 14 12:46:29 2011 +0100 Committed-by: Keir Fraser <keir@xen.org>
Ian Jackson [Wed, 2 Nov 2011 15:02:18 +0000 (15:02 +0000)]
tools/ocaml: unify build process
Unify ocaml build process for different platforms.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ Backport had a conflict; fixed up in the obvious way. -iwj ]
xen-unstable.hg changeset: 24050:068d3d55ce6e Backport-requested-by: Christoph Egger <Christoph.Egger@amd.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Thu, 27 Oct 2011 15:24:01 +0000 (16:24 +0100)]
Return -EINVAL when trying to kick/kill a nonexistent domain watchdog
... to be more in-line with the NR_DOMAIN_WATCHDOG_TIMERS check at the
top of domain_watchdog(), and also to follow the
timer_(delete|settime)
POSIX API's EINVAL return value.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Also, replace EEXIST with ENOSPC when failing to allocate a new
domain watchdog.
Boris Ostrovsky [Thu, 27 Oct 2011 15:22:53 +0000 (16:22 +0100)]
x86/AMD: Do not enable ARAT feature on AMD processors below family 0x12
Determining whether an AMD processor is affected by erratum 400 may
have some corner cases and handling these cases is somewhat
complicated.
In the interest of simplicity we won't claim ARAT support on processor
families below 0x12.
Wei Wang2 [Thu, 27 Oct 2011 15:14:36 +0000 (16:14 +0100)]
Backport per-device vector map patches to xen 4.1.3
Recently we found an issue in xen 4.1. Under heavy I/O stress such as
running bonnie++, Dom0 would lost its hard disk with lots of I/O
errors. We found that some PCI-E devices was using the same vector as
SMBus on AMD platforms and George' patch set that enables per-device
vector map can fix this problem.
23752 xen: Infrastructure to allow irqs to share vector maps
23753 xen: Option to allow per-device vector maps for MSI IRQs
23754 xen: AMD IOMMU: Automatically enable per-device vector maps
23786 x86: Fix up irq vector map logic
23812 xen: Add global irq_vector_map option
23899 AMD-IOMMU: remove dead variable references
xen: Add global irq_vector_map option, set if using AMD global intremap tables
As mentioned in previous changesets, AMD IOMMU interrupt
remapping tables only look at the vector, not the destination
id of an interrupt. This means that all IRQs going through
the same interrupt remapping table need to *not* share vectors.
The irq "vector map" functionality was originally introduced
after a patch which disabled global AMD IOMMUs entirely. That
patch has since been reverted, meaning that AMD intremap tables
can either be per-device or global.
This patch therefore introduces a global irq vector map option,
and enables it if we're using an AMD IOMMU with a global
interrupt remapping table.
This patch removes the "irq-perdev-vector-map" boolean
command-line optino and replaces it with "irq_vector_map",
which can have one of three values: none, global, or per-device.
Setting the irq_vector_map to any value will override the
default that the AMD code sets.
We need to make sure that cfg->used_vector is only cleared once;
otherwise there may be a race condition that allows the same vector to
be assigned twice, defeating the whole purpose of the map.
This makes two changes:
* __clear_irq_vector() only clears the vector if the irq is not being
moved
* smp_iqr_move_cleanup_interrupt() only clears used_vector if this
is the last place it's being used (move_cleanup_count==0 after
decrement).
Also make use of asserts more consistent, to catch this kind of logic
bug in the future.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 23786:3a05da2dc7c0
xen-unstable date: Mon Aug 22 16:15:33 2011 +0100
xen: Option to allow per-device vector maps for MSI IRQs
Add a vector-map to pci_dev, and add an option to point MSI-related
IRQs to the vector-map of the device.
This prevents irqs from the same device from being assigned
the same vector on different pcpus. This is required for systems
using an AMD IOMMU, since the intremap tables on AMD only look at
vector, and not destination ID.
xen: Infrastructure to allow irqs to share vector maps
Laying the groundwork for per-device vector maps. This generic
code allows any irq to point to a vector map; all irqs sharing the
same vector map will avoid sharing vectors.
Keir Fraser [Thu, 6 Oct 2011 18:47:14 +0000 (19:47 +0100)]
build: Make XEN_ROOT an absolute path.
Otherwise make can search the path relative to certain standard paths
such as /usr/include (e.g., the line '-include $(XEN_ROOT)/.config' in
Config.mk suffers from this).
Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 23049:ff3b7749008b Backport-requested-by: Allen M Kay <allen.m.kay@intel.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
pcidevs is an array of ndev elements (ndev is the number of pci devices
assigend to a specific domain), but we access pcidevs + *num
where *num is the global number of pci devices assigned so far to all
domains in the system.
Fix the issue removing pcidevs and just realloc'ing *list every time we
want to add a new pci device to the array.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23685:5239811f92e1 Backport-requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 3 Oct 2011 15:33:29 +0000 (16:33 +0100)]
build: fix grep invocation in cc-options
Currently the build produces lots of
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
This is due to the "grep -- $(2)" in cc-options. It seems that the
default of reading stdin is disabled when using "--". I don't know if
this is a bug in grep or how it is supposed to be but we can work
around it by explicitly passing in "-"
Jan Beulich [Mon, 3 Oct 2011 15:32:06 +0000 (16:32 +0100)]
x86: ucode-amd: Don't warn when no ucode is available for a CPU
revision
This patch originally comes from the Linus mainline kernel (2.6.33),
find below the patch details:
From: Andreas Herrmann <herrmann.der.user@googlemail.com>
There is no point in warning when there is no ucode available
for a specific CPU revision. Currently the container-file, which
provides the AMD ucode patches for OS load, contains only a few
ucode patches.
It's already clearly indicated by the printed patch_level
whenever new ucode was available and an update happened. So the
warning message is of no help but rather annoying on systems
with many CPUs.
Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 23871:503ee256fecf
xen-unstable date: Thu Sep 22 18:35:30 2011 +0100
Jan Beulich [Mon, 3 Oct 2011 15:31:12 +0000 (16:31 +0100)]
VT-d: fix off-by-one error in RMRR validation
(base_addr,end_addr) is an inclusive range, and hence there shouldn't
be a subtraction of 1 in the second invocation of page_is_ram_type().
For RMRRs covering a single page that actually resulted in the
immediately preceding page to get checked (which could have resulted
in a false warning).
Igor Mammedov [Mon, 3 Oct 2011 15:29:52 +0000 (16:29 +0100)]
Clear IRQ_GUEST in irq_desc->status when setting action to NULL.
Looking more closely at usage of action field with relation to
IRQ_GUEST flag. It appears that set IRQ_GUEST implies that action
is not NULL. As result it is not safe to set action to NULL and
leave IRQ_GUEST set.
Hence IRQ_GUEST should be cleared in dynamic_irq_cleanup where
action is set to NULL.
An addition remove BUGON at __pirq_guest_unbind that appears to be
bogus and not needed anymore.
Thanks Paolo Bonzini for NACKing previous patch, and pointing at the
correct solution.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reinstate the BUG_ON, but after the action==NULL check. Since we then
go and start interpreting action as an irq_guest_action_t, the BUG_ON
is relevant here.
More generally, the brute-force nature of dynamic_irq_cleanup() looks
a bit worrying. Possibly there should be more integratioin with
pirq_guest_unbind() logic, for cleaning up un-acked EOIs and the like.
libxl: fix double free at get_all_assigned_devices
Do not free() list manually - it will be freed by libxl__free_all.
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Wed, 21 Sep 2011 16:12:58 +0000 (17:12 +0100)]
libxl: do not start a xenpv qemu solely for tap devices if blktap is available
qemu is used as a fallback for DISK_BACKEND_TAP if no blktap is
available but if blktap is available, or for DISK_BACKEND_PHY, we
don't need a qemu process.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23044:d4ca456c0c25
xen-unstable date: Tue Mar 15 18:19:47 2011 +0000 Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Sat, 17 Sep 2011 15:38:31 +0000 (16:38 +0100)]
x86/vmx: don't call __vmxoff() blindly
If vmx_vcpu_up() failed, __vmxon() would generally not have got
(successfully) executed, and in that case __vmxoff() will #UD.
Additionally, any panic() during early resume (namely the tboot
related one) would cause vmx_cpu_down() to get executed without
vmx_cpu_up() having run before.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 23848:cf37d2eec2ef
xen-unstable date: Sat Sep 17 16:26:37 2011 +0100
George Dunlap [Sat, 17 Sep 2011 15:37:56 +0000 (16:37 +0100)]
xen: Move tsc reliability check until after CPUs have booted
AMD CPUs by default enable X86_FEATURE_TSC_RELIABLE, and depend upon a
later check to disable this feature if TSC drift is detected.
Unfortunately, this check is done in time.c:init_xen_time(), which is
done before any secondary CPUs are brought up, and is thus guaranteed
to succed.
This patch moves the check into its own function, and calls it after
cpus are brought up.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 23846:bf2aaf21e8e7
xen-unstable date: Sat Sep 17 16:22:54 2011 +0100
Latest Intel processor add cpuid faulting feature. This patch is used
to support cpuid faulting in Xen. Like cpuid spoofing, cpuid faulting
mainly used to support live migration. When cpuid faulting enabled,
cpuid instruction runs at cpl>0 will produce GP, vmm then emulate
execution of the cpuid instruction. Hence will appear to guest
software the value chosen by the vmm.
Andrew Cooper [Tue, 13 Sep 2011 09:38:34 +0000 (10:38 +0100)]
IRQ: IO-APIC support End Of Interrupt for older IO-APICs
The old io_apic_eoi() function using the EOI register only works for
IO-APICs with a version of 0x20. Older IO-APICs do not have an EOI
register so line level interrupts have to be EOI'd by flipping the
mode to edge and back, which clears the IRR and Delivery Status bits.
This patch replaces the current io_apic_eoi() function with one which
takes into account the version of the IO-APIC and EOI's
appropriately.
v2: make recursive call to __io_apic_eoi() to reduce code size.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23833:ffe8e65f6687
xen-unstable date: Tue Sep 13 10:33:10 2011 +0100
xen: if mapping GSIs we run out of pirq < nr_irqs_gsi, use the others
PV on HVM guests can have more GSIs than the host, in that case we
could run out of pirq < nr_irqs_gsi. When that happens use pirq >=
nr_irqs_gsi rather than returning an error.
xen: __hvm_pci_intx_assert should check for gsis remapped onto pirqs
If the isa irq corresponding to a particular gsi is disabled while the
gsi is enabled, __hvm_pci_intx_assert will always inject the gsi
through the violapic, even if the gsi has been remapped onto a pirq.
This patch makes sure that even in this case we inject the
notification appropriately.
hvm_domain_use_pirq should return true when the guest is using a
certain pirq, no matter if the corresponding event channel is
currently enabled or disabled. As an additional complication, qemu is
going to request pirqs for passthrough devices even for Xen unaware
HVM guests, so we need to wait for an event channel to be connected
before considering the pirq of a passthrough device as "in use".
Andrew Cooper [Wed, 31 Aug 2011 14:31:22 +0000 (15:31 +0100)]
IRQ: manually EOI migrating line interrupts
When migrating IO-APIC line level interrupts between PCPUs, the
migration code rewrites the IO-APIC entry to point to the new
CPU/Vector before EOI'ing it.
The EOI process says that EOI'ing the Local APIC will cause a
broadcast with the vector number, which the IO-APIC must listen to to
clear the IRR and Status bits.
In the case of migrating, the IO-APIC has already been
reprogrammed so the EOI broadcast with the old vector fails to match
the new vector, leaving the IO-APIC with an outstanding vector,
preventing any more use of that line interrupt. This causes a lockup
especially when your root device is using PCI INTA (megaraid_sas
driver *ehem*)
However, the problem is mostly hidden because send_cleanup_vector()
causes a cleanup of all moving vectors on the current PCPU in such a
way which does not cause the problem, and if the problem has occured,
the writes it makes to the IO-APIC clears the IRR and Status bits
which unlocks the problem.
This fix is distinctly a temporary hack, waiting on a cleanup of the
irq code. It checks for the edge case where we have moved the irq,
and manually EOI's the old vector with the IO-APIC which correctly
clears the IRR and Status bits. Also, it protects the code which
updates irq_cfg by disabling interrupts.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23805:7048810180de
xen-unstable date: Wed Aug 31 15:19:24 2011 +0100
libxl: Do not SEGV when no 'removable' disk parameter in xenstore
Just assume disk as not removable when no 'removable' paremeter
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23607:2f63562df1c4 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Previously PyString_FromString(NULL) was called, which caused assertion
failure.
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23606:cc2f376d0cd9 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: Allocate memory for strings in libxl_device_disk
Memory for strings in libxl_device_disk must be allocated from outside of
libxl__gc to not be freed at the end of function (by libxl__free_all).
Fixes xl block-detach
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23603:6656d80b4de4 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: Accept disk name in libxl_devid_to_device_disk
Accept disk name in xl block-detach.
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23604:5d7998be2252 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: Remove frontend and backend devices from xenstore after destroy
Cleanup frontend and backend devices from xenstore for all dev types - not only
disks. Because backend cleanup moved to libxl__device_destroy,
libxl__devices_destroy is somehow simpler.
Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
xen-unstable changest: 23605:ff8d170852b3 Backport-requested-by: Marek Marczykowski <marmarek@mimuw.edu.pl> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
remus: handle exceptions while installing/unstalling net buffer
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23600:15fc211a13bf Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
At the end of a checkpoint, when a new flush (of buffered disk writes)
is merged with ongoing flush, we have to make sure that none of the new
disk I/O requests overlap with ones in in progress. If it does, hold the
request and dont issue I/O until the overlapping one finishes. If we allow
the I/O to proceed, we might end up with two overlapping requests in the
disk's queue and the disk may not offer any guarantee on which one is
written first.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23414:ecff559bf474 Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
DRBD disk backends can be used instead of tapdisk backends for Remus.
This requires a Remus style disk replication protocol (asynchronous
replication with output buffering at backup), that is not available in
standard DRBD code. A modified version that supports this new replication
protocol is available from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus
Use of DRBD disk backends provides a means for efficient
resynchronization of data after the crashed machine comes back
online. Since DRBD allows for online resynchronization, a DRBD backed
Remus VM does not have to be stopped or shutdown while the disks are
resynchronizing. Once resynchronization is complete, Remus can be
started at will.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23413:62c0dfc9efbf Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
This was introduced in 23195:13ec53a59a42
It is a problem for Python 2.4 and earlier, only.
So use try...(try...except)...finally as suggested by Ian Campbell.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Acked-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23379:b04e57ec4671 Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
The new --null option allows one to test and play with just the
memory checkpointing and network buffering aspect of remus, without
the need for a second host. The disk is not replicated. All replication
data is sent to /dev/null. This option is pretty handy when a user
wants to see the page churn for his workload or observe the latency hit
though the latter will not be accurate.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23196:29d81623dc14 Backport-requested-by: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
current check includes ingress and pfifo_fast.
Add mq to the list of allowed qdiscs already installed
on ifb. This patch fixes cases where remus fails to start,
due to an mq qdisc already present on the vif.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23109:c8ae80a11d47
Backport-requested: Shriram Rajagopalan <rshriram@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 30 Aug 2011 15:57:05 +0000 (16:57 +0100)]
xl: print sxp on dry-run of create.
The help text for xm create's --dry-run says "Dry run - prints the
resulting configuration in SXP but does not create the domain." so
update xl implementation to match. At least the xendomains initscript
relies on this (for better or worse).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Carsten Schiers <carsten@schiers.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23467:2ae357405850
Backport-requested: Carsten Schiers <carsten@schiers.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Fabio Fantoni [Tue, 30 Aug 2011 15:56:22 +0000 (16:56 +0100)]
tools: Improved LSB headers in init.d scripts
xendomains service now working also without xend service
Signed-off-by: Fabio Fantoni <fabio.fantoni@heliman.it> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23673:0648846b4d17
Backport-requested: Carsten Schiers <carsten@schiers.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
/etc/init.d/xendomains relies on simple pattern matching from sructures
being printed by "xl list -l" command. so update xl implementation to
match.
Signed-off-by: Carsten Schiers <carsten@schiers.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23567:c2995f0555af Backported-by: Carsten Schiers <carsten@schiers.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
David Vrabel [Mon, 22 Aug 2011 09:16:15 +0000 (10:16 +0100)]
x86: use 'dom0_mem' to limit the number of pages for dom0
Use the 'dom0_mem' command line option to set the maximum number of
pages for dom0. dom0 can use then use the XENMEM_maximum_reservation
memory op to automatically find this limit and reduce the size of any
page tables etc.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
xen-unstable changeset: 23779:c56dd5eb0fa2
xen-unstable date: Mon Aug 22 10:05:27 2011 +0100
Kevin Tian [Mon, 22 Aug 2011 09:14:14 +0000 (10:14 +0100)]
cpuidle: initialize default Cstate information
C0/C1 should be always available when cpuidle is enabled in Xen.
When there's case that Dom0 doesn't register ACPI Cstate information,
e.g. due to BIOS issue or acpi processor module is not installed,
this patch provides basic C0/C1 information available to xenpm tool.
Andrew Cooper [Fri, 19 Aug 2011 09:00:25 +0000 (10:00 +0100)]
x86/KEXEC: disable hpet legacy broadcasts earlier
On x2apic machines which booted in xapic mode,
hpet_disable_legacy_broadcast() sends an event check IPI to all online
processors. This leads to a protection fault as the genapic blindly
pokes x2apic MSRs while the local apic is in xapic mode.
One option is to change genapic when we shut down the local apic, but
there are still problems with trying to IPI processors in the online
processor map which are actually sitting in NMI loops
Another option is to have each CPU take itself out of the online CPU
map during the NMI shootdown.
Realistically however, disabling hpet legacy broadcasts earlier in the
kexec path is the easiest fix to the problem.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23776:0ddb4481f883
xen-unstable date: Fri Aug 19 09:58:22 2011 +0100
Jan Beulich [Tue, 16 Aug 2011 14:21:46 +0000 (15:21 +0100)]
x86/PCI-MSI: properly determine VF BAR values
As was discussed a couple of times on this list, SR-IOV virtual
functions have their BARs read as zero - the physical function's
SR-IOV capability structure must be consulted instead. The bogus
warnings people complained about are being eliminated with this
change.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23766:8d6edc3d26d2
xen-unstable date: Sat Aug 13 10:14:58 2011 +0100
PCI: consolidate interface for adding devices
The functionality of pci_add_device_ext() can be easily folded into
pci_add_device(), and eliminates the need to change two functions for
future adjustments.
Andrew Cooper [Tue, 16 Aug 2011 14:17:43 +0000 (15:17 +0100)]
x86: IRQ fix incorrect logic in __clear_irq_vector
In the old code, tmp_mask is the cpu_and of cfg->cpu_mask and
cpu_online_map. However, in the usual case of moving an IRQ from one
PCPU to another because the scheduler decides its a good idea,
cfg->cpu_mask and cfg->old_cpu_mask do not intersect. This causes the
old cpu vector_irq table to keep the irq reference when it shouldn't.
This leads to a resource leak if a domain is shut down wile an irq has
a move pending, which results in Xen's create_irq() eventually failing
with -ENOSPC when all vector_irq tables are full of stale references.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23765:68b903bb1b01
xen-unstable date: Sat Aug 13 10:14:28 2011 +0100
Jan Beulich [Tue, 16 Aug 2011 14:17:06 +0000 (15:17 +0100)]
VT-d: don't reject valid DMAR/ATSR tables on systems with multiple PCI segments
On multi-PCI-segment systems, each segment has to be expected to have
an include-all DRHD and an all-ports ATSR, so the firmware consistency
check incorrectly rejects valid configurations there (which is
particularly problematic when the firmware also pre-enabled x2apic
mode, as the system will panic in that case due to being unable to
enable interrupt remapping). Thus constrain the check to just segment
0 for now; once full multi-segment support is there (which I'm working
on), it can be revisited whether we'd want to track this per segment,
or whether we trust the firmware of such large systems.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23763:8f647d409196
xen-unstable date: Sat Aug 13 10:12:49 2011 +0100
Tim Deegan [Mon, 25 Jul 2011 15:48:39 +0000 (16:48 +0100)]
VT-d: always clean up dpci timers.
If a VM has all its PCI devices deassigned, need_iommu(d) becomes
false but it might still have DPCI EOI timers that were init_timer()d
but not yet kill_timer()d. That causes xen to crash later because the
linked list of inactive timers gets corrupted, e.g.:
tools: xencommons NetBSD init script: Multiple bugfixes and improvements
Added a cleanup of the xenstore database, to purge old entries,
prevented the restart of xenstore and set Domain-0 name. Also
replaced the sleep 5 (wait for xenstore to come up) with the method
used in the linux init script.
Signed-off-by: Roger Pau Monne <roger.pau@entel.upc.edu> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Christoph Egger <Christoph.Egger@amd.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23699:6fe9f26bb9ae
xen-unstable date: Fri Jul 15 18:22:03 2011 +0100
xend: NetBSD portability fix for LVM raw volume names
Xen 4.1.1 was incorrectly passing /dev/rmapper/vg-lvname to pygrub
(notice the r in front of mapper), when it should pass
/dev/mapper/rvg-lvname (add the r to the last file) when using NetBSD.
I've patched it to work correctly. I'm attaching a unified diff with
the patch made against Xen 4.1.1 (it's a really simple modification).
From: Roger Pau Monne <roger.pau@entel.upc.edu> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23629:89ce3439686b
xen-unstable date: Tue Jun 28 13:56:53 2011 +0100
Mike McClurg [Thu, 21 Jul 2011 13:40:39 +0000 (14:40 +0100)]
tools/ocaml: ask compiler for correct library
OCaml libraries will live in /usr/local/ if the user compiles OCaml
from source. This patch asks the OCaml compiler where we should look
for libraries.
NB: it may be that we should do the same thing for the NetBSD case,
but I don't have a BSD box to test this out.
Signed-off-by: Mike McClurg <mike.mcclurg@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23566:7e5b54d1643e
xen-unstable date: Tue Jun 21 18:01:51 2011 +0100
hvmloader: Switch to absolute addressing for calling hypercall stubs.
This is clearer and less fragile than trying to make relative calls
work. In particular, the old approach failed if _start was not
== HVMLOADER_PHYSICAL_ADDRESS. This was the case for some modern
toolchains which reorder functions.
Jan Beulich [Sat, 16 Jul 2011 08:33:46 +0000 (09:33 +0100)]
x86: fix guest migration after c/s 20892:d311d1efc25e
Guests would not manage to run successfully after being migrated to a
host having sufficiently much more memory than the host they were
originally started on.
Subsequently the plan is to re-enable the changes behavior under the
control of a guest kernel announced feature flag.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 23706:3dd399873c9e
xen-unstable date: Sat Jul 16 09:18:21 2011 +0100
David Vrabel [Sat, 16 Jul 2011 08:33:07 +0000 (09:33 +0100)]
xen/libxc: set CPUID topology leaf as unsupported for PV guests
The result of a CPUID Extended Topology Enumeration leaf for PV guests
is invalid as the level in ECX is ignored. This can cause some guests
to loop endlessly when trying to enumerate the topology.
Since the physical topology isn't useful to PV guests set the topology
leaf as unsupported.
Guests affected include Linux kernels prior 2.6.32 where a workaround
was applied ("xen: mask extended topology info in cpu", 82d6469916c6fcfa345636a49004c9d1753905d1).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
xen-unstable changeset: 23700:867bb675b57b
xen-unstable date: Sat Jul 16 09:05:45 2011 +0100
23408:1fc3347850c7 causes the following error:
machine_kexec.c:106: error: static declaration of
'machine_kexec_get_xen' follows non-static declaration
/xen-unstable.hg/xen/include/xen/kexec.h:39: error: previous
declaration of 'machine_kexec_get_xen' was here
Andrew Cooper [Fri, 8 Jul 2011 07:57:11 +0000 (08:57 +0100)]
KEXEC: disconnect all PCI devices from the PCI bus on crash
In the case of a crash, IOMMU DMA remapping gets turned off so that
the kdump kernel may boot. However, this is warned as being dangerous
in the VTD specification if a DMA transaction is in progress.
Also, in the case of a crash, DMA transactions and interrupts from
peripheral devices such as network cards are likely to keep coming in.
Without DMA remapping enabled, the transactions will be writing over
low memory, corrupting the crash state, and perhaps even the kdump
reserved memory.
Therefore, on the crash path, we can disconnect all PCI devices from
their respective buses so that they are no longer able to be DMA
busmasters. This reduces the risk of DMA transactions corrupting
state (and will also reduce spurious interrupts arriving to the kdump
kernel) until the kdump kernel and properly reset the PCI devices.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23666:b96f8bdcaa15
xen-unstable date: Fri Jul 08 08:38:35 2011 +0100
Paul Durrant [Fri, 8 Jul 2011 07:56:42 +0000 (08:56 +0100)]
x86/hvm: Don't expose CPUID time leaf when not using PVRDTSCP
Some versions of Oracle's Solaris PV drivers make a check that the
maximal Xen hypervisor CPUID leaf is <= base leaf + 2 and refuse to
work if this is not the case. The addition of the time leaf makes the
maximal leaf == base leaf + 3 so this patch introduces a workaround
that obscures the time leaf unless PVRDTSCP is in operation.
x86 cpu: Fix bug: unify cpu_dev attr as __cpuinitdata
Currently different x86 cpu define different attr for cpu_dev.
Some cpu define as __initdata, this would be risk under cpu hotplug.
This patch fix the bug, unify them as __cpuinitdata, as what AMD cpu
define now.
Tim Deegan [Tue, 28 Jun 2011 08:32:00 +0000 (09:32 +0100)]
x86: fix boot-time watchdog test.
Since the perf counter that the LAPIC NMI watchdog uses only
runs while the core isn't halted, and all APs are idle at
this point in the boot process, it's possible that remote
CPUs won't see any NMIs during the 10-tick waiting period.
Force all CPUs to busy-wait so we know the timers are running.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 23612:6c7a23e08a04
xen-unstable date: Tue Jun 28 09:16:13 2011 +0100
pv-on-hvm: hvm_domain_use_pirq return positive no matter if the evtchn is bound
This patch fixes PV on HVM interrupt remapping with recent Linux
kernels and upstream qemu. hvm_domain_use_pirq should return positive
even if the evtchn is not currently bound. If it doesn't assert_irq
ends up injecting legacy interrupts even after the guest disabled the
irq.
Keir Fraser [Thu, 23 Jun 2011 10:54:53 +0000 (11:54 +0100)]
x86/hvm: add SMEP support to HVM guest
Intel new CPU supports SMEP (Supervisor Mode Execution
Protection). SMEP
prevents software operating with CPL < 3 (supervisor mode) from
fetching
instructions from any linear address with a valid translation for
which the U/S
flag (bit 2) is 1 in every paging-structure entry controlling the
translation
for the linear address.
This patch adds SMEP support to HVM guest.
Signed-off-by: Yang Wei <wei.y.yang@intel.com> Signed-off-by: Shan Haitao <haitao.shan@intel.com> Signed-off-by: Li Xin <xin.li@intel.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
xen-unstable changeset: 23504:c34604d5a293
xen-unstable date: Mon Jun 06 13:46:48 2011 +0100
Intel new CPU supports SMEP (Supervisor Mode Execution
Protection). SMEP prevents software operating with CPL < 3 (supervisor
mode) from fetching instructions from any linear address with a valid
translation for which the U/S flag (bit 2) is 1 in every
paging-structure entry controlling the translation for the linear
address.
This patch enables SMEP in Xen to protect Xen hypervisor from
executing pv guest instructions, whose translation paging-structure
entries' U/S flags are all set.
Signed-off-by: Yang Wei <wei.y.yang@intel.com> Signed-off-by: Shan Haitao <haitao.shan@intel.com> Signed-off-by: Li Xin <xin.li@intel.com> Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 23481:0c0884fd8b49
xen-unstable date: Fri Jun 03 21:39:00 2011 +0100
Keir Fraser [Thu, 23 Jun 2011 10:48:18 +0000 (11:48 +0100)]
kexec: Backport fixes from xen-unstable
KEXEC: prevent panic on the kexec path when talking to the DMAR
hardware
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23547:b5955b9fc26c
xen-unstable date: Thu Jun 16 16:11:13 2011 +0100
KEXEC: correctly revert x2apic state when kexecing
Introduce the boolean variable 'kexecing' which indicates to functions
whether we are on the kexec path or not. This is used by
disable_local_APIC() to try and revert the APIC mode back to how it
was found on boot.
We also need some fudging of the x2apic_enabled variable. It is used
in multiple places over the codebase to mean multiple things,
including:
What did the user specifify on the command line?
Did the BIOS boot me in x2apic mode?
Is the BSP Local APIC in x2apic mode?
What mode is my Local APIC in?
Therefore, set it up to prevent a protection fault when disabling the
IOAPICs. (In this case, it is used in the "What mode is my Local APIC
in?" case, so the processor doesnt suffer a protection fault because
of trying to use x2apic MSRs when it should be using xapic MMIO)
Finally, make sure that interrupts are disabled when jumping into the
purgatory code. It would be bad to service interrupts in the Xen
context when the next kernel is booting.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23542:23c068b10923
xen-unstable date: Wed Jun 15 16:16:41 2011 +0100
IOMMU: add crash_shutdown iommu_op
The kdump kernel has problems booting with interrupt/dma
remapping enabled, so we need a new iommu_ops called
crash_shutdown which is basically suspend but doesn't
need to bother saving state.
Make sure that crash_shutdown is called on the kexec
path. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23541:c6307ddd3ab1
xen-unstable date: Wed Jun 15 16:10:11 2011 +0100
Experimental evidence shows that Extended Interrupt Mode remains in
effect even after Interrupt Remapping is disabled in each DMAR Global
Command Register. A consiquence of this is that when we switch from
x2apic mode back to xapic mode, and disable interrupt remapping for
the kdump kernel, interrupts passing through the IO APICs are in
x2apic format as opposed xapic. This causes a triple fault in the
kexec kernel.
As EIM is explicitly set up each time Interrup Remapping is enabled,
it is safe for us to clobber this when taring down.
Also, change the header definition of IRTA_REG_EIME_SHIFT. It caused
verbose and error-prone code, and was only used in 1 place before. We
now have IRTA_EIME which is the specific bit in the register.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23540:96f53d2b966e
xen-unstable date: Wed Jun 15 16:07:45 2011 +0100
Experimental evidence shows that Extended Interrupt Mode remains in
effect even after Interrupt Remapping is disabled in each DMAR Global
Command Register. A consiquence of this is that when we switch from
x2apic mode back to xapic mode, and disable interrupt remapping for
the kdump kernel, interrupts passing through the IO APICs are in
x2apic format as opposed xapic. This causes a triple fault in the
kexec kernel.
As EIM is explicitly set up each time Interrup Remapping is enabled,
it is safe for us to clobber this when taring down.
Also, change the header definition of IRTA_REG_EIME_SHIFT. It caused
verbose and error-prone code, and was only used in 1 place before. We
now have IRTA_EIME which is the specific bit in the register.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen-unstable changeset: 23515:337520d94cba
xen-unstable date: Tue Jun 14 13:04:09 2011 +0100
x86/apic: record local APIC state on boot
Xen does not store the boot local APIC state which leads to problems
when shutting down for a kexec jump. This patch records the boot
state so we can return to the boot state when kexecing.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Keir Fraser <keir@xen.org> Acked-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23514:d04608ad70f8
xen-unstable date: Tue Jun 14 13:02:00 2011 +0100
x86/kexec: nmi_shootdown_cpus() should leave irqs disabled