Jan Beulich [Thu, 22 Sep 2011 17:28:03 +0000 (18:28 +0100)]
PCI multi-seg: AMD-IOMMU specific adjustments
There are two places here where it is entirely unclear to me where the
necessary PCI segment number should be taken from (as IVMD descriptors
don't have such, only IVHD ones do). AMD confirmed that for the time
being it is acceptable to imply that only segment 0 exists.
Jan Beulich [Sat, 17 Sep 2011 23:26:52 +0000 (00:26 +0100)]
x86: split MSI IRQ chip
With the .end() accessor having become optional and noting that
several of the accessors' behavior really depends on the result of
msi_maskable_irq(), the splits the MSI IRQ chip type into two - one
for the maskable ones, and the other for the (MSI only) non-maskable
ones.
At once the implementation of those methods gets moved from io_apic.c
to msi.c.
Jan Beulich [Sat, 17 Sep 2011 23:25:57 +0000 (00:25 +0100)]
pass struct irq_desc * to all other IRQ accessors
This is again because the descriptor is generally more useful (with
the IRQ number being accessible in it if necessary) and going forward
will hopefully allow to remove all direct accesses to the IRQ
descriptor array, in turn making it possible to make this some other,
more efficient data structure.
This additionally makes the .end() accessor optional, noting that in a
number of cases the functions were empty.
Jan Beulich [Sat, 17 Sep 2011 23:24:37 +0000 (00:24 +0100)]
pass struct irq_desc * to set_affinity() IRQ accessors
This is because the descriptor is generally more useful (with the IRQ
number being accessible in it if necessary) and going forward will
hopefully allow to remove all direct accesses to the IRQ descriptor
array, in turn making it possible to make this some other, more
efficient data structure.
The patch will fix XSave CPUID virtualization for PV guests. The XSave
area size returned by CPUID leaf D is changed dynamically depending on
the XCR0. Tools/libxc only assigns a static value. The fix will adjust
xsave area size during runtime.
Note: This fix is already in HVM cpuid virtualization. And Dom0 is not
affected, either.
Igor Mammedov [Sat, 17 Sep 2011 23:00:26 +0000 (00:00 +0100)]
Clear IRQ_GUEST in irq_desc->status when setting action to NULL.
Looking more closely at usage of action field with relation to
IRQ_GUEST flag. It appears that set IRQ_GUEST implies that action
is not NULL. As result it is not safe to set action to NULL and
leave IRQ_GUEST set.
Hence IRQ_GUEST should be cleared in dynamic_irq_cleanup where
action is set to NULL.
An addition remove BUGON at __pirq_guest_unbind that appears to be
bogus and not needed anymore.
Thanks Paolo Bonzini for NACKing previous patch, and pointing at the
correct solution.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reinstate the BUG_ON, but after the action==NULL check. Since we then
go and start interpreting action as an irq_guest_action_t, the BUG_ON
is relevant here.
More generally, the brute-force nature of dynamic_irq_cleanup() looks
a bit worrying. Possibly there should be more integratioin with
pirq_guest_unbind() logic, for cleaning up un-acked EOIs and the like.
Jan Beulich [Sat, 17 Sep 2011 15:27:36 +0000 (16:27 +0100)]
x86-64/EFI: 2.0 hypercall extensions
Flesh out the interface to EFI 2.0 runtime calls and implement what
can reasonably be without actually having active call paths getting
there (i.e. without actual debugging possible: The capsule interfaces
certainly require an environment where an initial implementation can
actually be tested).
Jan Beulich [Sat, 17 Sep 2011 15:26:37 +0000 (16:26 +0100)]
x86/vmx: don't call __vmxoff() blindly
If vmx_vcpu_up() failed, __vmxon() would generally not have got
(successfully) executed, and in that case __vmxoff() will #UD.
Additionally, any panic() during early resume (namely the tboot
related one) would cause vmx_cpu_down() to get executed without
vmx_cpu_up() having run before.
Jan Beulich [Sat, 17 Sep 2011 15:25:53 +0000 (16:25 +0100)]
x86/tboot: make resume error messages visible
With tboot_s3_resume() running before console_resume(), the error
messages so far printed by it are mostly guaranteed to go into
nirwana. Latch MACs into a static variable instead, and issue the
messages right before calling panic().
George Dunlap [Sat, 17 Sep 2011 15:22:54 +0000 (16:22 +0100)]
xen: Move tsc reliability check until after CPUs have booted
AMD CPUs by default enable X86_FEATURE_TSC_RELIABLE, and depend upon a
later check to disable this feature if TSC drift is detected.
Unfortunately, this check is done in time.c:init_xen_time(), which is
done before any secondary CPUs are brought up, and is thus guaranteed
to succed.
This patch moves the check into its own function, and calls it after
cpus are brought up.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Paul Durrant [Sat, 17 Sep 2011 15:22:13 +0000 (16:22 +0100)]
x86/hvm: Tidy up the viridian code a little and flesh out the APIC
assist MSR handling code.
We don't say we that handle that MSR but Windows assumes it. In
Windows 7 it just wrote to the MSR and we used to handle that
ok. Windows 8 also reads from the MSR so we need to keep a record of
the contents.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
James Carter [Sat, 17 Sep 2011 15:20:58 +0000 (16:20 +0100)]
xen/xsm: Compile error due to naming clash between XSM and EFI runtime
The problem is that efi_runtime_call is the name of both a function in
xen/arch/x86/efi/runtime.c and a member of the xsm_operations struct
in xen/include/xsm/xsm.h. This causes the macro "#define
efi_runtime_call(x) efi_compat_runtime_call(x)" on line 15 of
xen/arch/x86/x86_64/platform_hypercall.c to cause the above compile
error.
Renaming the XSM struct member fixes the problem.
Signed-off-by: James Carter <jwcart2@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com>
Olaf Hering [Fri, 16 Sep 2011 11:19:26 +0000 (12:19 +0100)]
mem_event: use different ringbuffers for share, paging and access
Up to now a single ring buffer was used for mem_share, xenpaging and
xen-access. Each helper would have to cooperate and pull only its own
requests from the ring. Unfortunately this was not implemented. And
even if it was, it would make the whole concept fragile because a crash
or early exit of one helper would stall the others.
What happend up to now is that active xenpaging + memory_sharing would
push memsharing requests in the buffer. xenpaging is not prepared for
such requests.
This patch creates an independet ring buffer for mem_share, xenpaging
and xen-access and adds also new functions to enable xenpaging and
xen-access. The xc_mem_event_enable/xc_mem_event_disable functions will
be removed. The various XEN_DOMCTL_MEM_EVENT_* macros were cleaned up.
Due to the removal the API changed, so the SONAME will be changed too.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Tim Deegan <tim@xen.org>
Olaf Hering [Fri, 16 Sep 2011 11:13:31 +0000 (12:13 +0100)]
mem_event: pass mem_event_domain pointer to mem_event functions
Pass a struct mem_event_domain pointer to the various mem_event
functions. This will be used in a subsequent patch which creates
different ring buffers for the memshare, xenpaging and memaccess
functionality.
Remove the struct domain argument from some functions.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
Olaf Hering [Thu, 15 Sep 2011 10:08:05 +0000 (11:08 +0100)]
xenstored: allow guest to shutdown all its watches/transactions
During kexec all old watches have to be removed, otherwise the new
kernel will receive unexpected events. Allow a guest to reset itself
and cleanup all of its watches and transactions.
Add a new XS_RESET_WATCHES command to do the reset on behalf of the
guest.
(Changes by iwj: specify the argument to be a single nul byte. Permit
read-only clients to use the new command.)
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 14 Sep 2011 10:38:13 +0000 (11:38 +0100)]
tools: Revert seabios and upstream qemu build changes
These have broken the build and it seems to be difficult to fix. So
we will revert the whole lot for now, and await corrected patch(es).
Revert "fix the build when CONFIG_QEMU is specified by the user"
Revert "tools: fix permissions of git-checkout.sh"
Revert "scripts/git-checkout.sh: Is not bash specific. Invoke with /bin/sh."
Revert "Clone and build Seabios by default"
Revert "Clone and build upstream Qemu by default"
Revert "Rename ioemu-dir as qemu-xen-traditional-dir"
Revert "Move the ioemu-dir-find shell script to an external file"
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 13 Sep 2011 13:52:22 +0000 (14:52 +0100)]
tools: fix permissions of git-checkout.sh
23828:0d21b68f528b introduced a new scripts/git-checkout.sh, but it
had the wrong permissions. chmod +x it, and add a blank line at the
end to make sure it actually gets updated.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Tue, 13 Sep 2011 09:33:10 +0000 (10:33 +0100)]
IRQ: IO-APIC support End Of Interrupt for older IO-APICs
The old io_apic_eoi() function using the EOI register only works for
IO-APICs with a version of 0x20. Older IO-APICs do not have an EOI
register so line level interrupts have to be EOI'd by flipping the
mode to edge and back, which clears the IRR and Delivery Status bits.
This patch replaces the current io_apic_eoi() function with one which
takes into account the version of the IO-APIC and EOI's
appropriately.
v2: make recursive call to __io_apic_eoi() to reduce code size.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen: if mapping GSIs we run out of pirq < nr_irqs_gsi, use the others
PV on HVM guests can have more GSIs than the host, in that case we
could run out of pirq < nr_irqs_gsi. When that happens use pirq >=
nr_irqs_gsi rather than returning an error.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Benjamin Schweikert <b.schweikert@googlemail.com>
Ian Campbell [Tue, 13 Sep 2011 09:22:03 +0000 (10:22 +0100)]
hvmloader: don't clear acpi_info after filling in some fields
In particular the madt_lapic0_addr and madt_csum_addr fields are
filled in while building the tables.
This fixes a bluescreen on shutdown with certain versions of Windows.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reported-by: Christoph Egger <Christoph.Egger@amd.com> Tested-and-acked-by: Christoph Egger <Christoph.Egger@amd.com>
Olaf Hering [Mon, 5 Sep 2011 14:10:09 +0000 (15:10 +0100)]
mem_event: add ref counting for free requestslots
If mem_event_check_ring() is called by many vcpus at the same time
before any of them called also mem_event_put_request(), all of the
callers must assume there are enough free slots available in the ring.
Record the number of request producers in mem_event_check_ring() to
keep track of available free slots.
Add a new mem_event_put_req_producers() function to release a request
attempt made in mem_event_check_ring(). Its required for
p2m_mem_paging_populate() because that function can only modify the
p2m type if there are free request slots. But in some cases
p2m_mem_paging_populate() does not actually have to produce another
request when it is known that the same request was already made
earlier by a different vcpu.
mem_event_check_ring() can not return a reference to a free request
slot because there could be multiple references for different vcpus
and the order of mem_event_put_request() calls is not known. As a
result, incomplete requests could be consumed by the ring user.
Andrew Cooper [Mon, 5 Sep 2011 14:09:24 +0000 (15:09 +0100)]
IRQ: Introduce old_vector to irq_cfg
Introduce old_vector to irq_cfg with the same principle as
old_cpu_mask. This removes a brute force loop from
__clear_irq_vector(), and paves the way to correct bitrotten logic
elsewhere in the irq code.
Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 5 Sep 2011 14:08:38 +0000 (15:08 +0100)]
IRQ: Fold irq_status into irq_cfg
irq_status is an int for each of nr_irqs which represents a single
boolean variable. Fold it into the bitfield in irq_cfg, which saves
768 bytes per CPU with per-cpu IDTs in use.
Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Mon, 5 Sep 2011 14:00:46 +0000 (15:00 +0100)]
xen, vtd: Fix device check for devices behind PCIe-to-PCI bridges
On some systems, requests devices behind a PCIe-to-PCI bridge all
appear to the IOMMU as though they come from from slot 0, function 0
on that device; so the mapping code much punch a hole for X:0.0 in the
IOMMU for such devices. When punching the hole, if that device has
already been mapped once, we simply need to check ownership to make
sure it's legal. To do so, domain_context_mapping_one() will look up
the device for the mapping with pci_get_pdev() and look for the owner.
However, if there is no device in X:0.0, this look up will fail.
Rather than returning -ENODEV in this situation (causing a failure in
mapping the device), try to get the domain ownership from the iommu
context mapping itself.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
George Dunlap [Mon, 5 Sep 2011 14:00:15 +0000 (15:00 +0100)]
xen: Add global irq_vector_map option, set if using AMD global intremap tables
As mentioned in previous changesets, AMD IOMMU interrupt
remapping tables only look at the vector, not the destination
id of an interrupt. This means that all IRQs going through
the same interrupt remapping table need to *not* share vectors.
The irq "vector map" functionality was originally introduced
after a patch which disabled global AMD IOMMUs entirely. That
patch has since been reverted, meaning that AMD intremap tables
can either be per-device or global.
This patch therefore introduces a global irq vector map option,
and enables it if we're using an AMD IOMMU with a global
interrupt remapping table.
This patch removes the "irq-perdev-vector-map" boolean
command-line optino and replaces it with "irq_vector_map",
which can have one of three values: none, global, or per-device.
Setting the irq_vector_map to any value will override the
default that the AMD code sets.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
ns16550: Simplify UART and UART-interrupt probing logic.
1. No need to check for UART existence in the polling routine. We
already check for UART existence during boot-time initialisation (see
check_existence() function).
2. No obvious need to send a dummy character. The poll routine will
run until a character is eventually sent, but for the most common use
of serial ports (console logging) that will happen almost immediately.
xen: __hvm_pci_intx_assert should check for gsis remapped onto pirqs
If the isa irq corresponding to a particular gsi is disabled while the
gsi is enabled, __hvm_pci_intx_assert will always inject the gsi
through the violapic, even if the gsi has been remapped onto a pirq.
This patch makes sure that even in this case we inject the
notification appropriately.
hvm_domain_use_pirq should return true when the guest is using a
certain pirq, no matter if the corresponding event channel is
currently enabled or disabled. As an additional complication, qemu is
going to request pirqs for passthrough devices even for Xen unaware
HVM guests, so we need to wait for an event channel to be connected
before considering the pirq of a passthrough device as "in use".
Andrew Cooper [Wed, 31 Aug 2011 14:19:24 +0000 (15:19 +0100)]
IRQ: manually EOI migrating line interrupts
When migrating IO-APIC line level interrupts between PCPUs, the
migration code rewrites the IO-APIC entry to point to the new
CPU/Vector before EOI'ing it.
The EOI process says that EOI'ing the Local APIC will cause a
broadcast with the vector number, which the IO-APIC must listen to to
clear the IRR and Status bits.
In the case of migrating, the IO-APIC has already been
reprogrammed so the EOI broadcast with the old vector fails to match
the new vector, leaving the IO-APIC with an outstanding vector,
preventing any more use of that line interrupt. This causes a lockup
especially when your root device is using PCI INTA (megaraid_sas
driver *ehem*)
However, the problem is mostly hidden because send_cleanup_vector()
causes a cleanup of all moving vectors on the current PCPU in such a
way which does not cause the problem, and if the problem has occured,
the writes it makes to the IO-APIC clears the IRR and Status bits
which unlocks the problem.
This fix is distinctly a temporary hack, waiting on a cleanup of the
irq code. It checks for the edge case where we have moved the irq,
and manually EOI's the old vector with the IO-APIC which correctly
clears the IRR and Status bits. Also, it protects the code which
updates irq_cfg by disabling interrupts.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Laszlo Ersek [Wed, 31 Aug 2011 14:16:14 +0000 (15:16 +0100)]
x86: Increase the default NR_CPUS to 256
Changeset 21012:ef845a385014 bumped the default to 128 about one and a
half years ago. Increase it now to 256, as systems with eg. 160
logical CPUs are becoming (have become) common.
Jan Beulich [Sat, 27 Aug 2011 11:14:38 +0000 (12:14 +0100)]
x86: work around certain Intel BIOSes causing (transient) hangs during boot
They apparently leave the USB legacy emulation bits set in ICH10's
SMI Control and Enable register, but fail to handle the resulting SMIs
gracefully. The hangs can apparently extend indefinitely, but are
commonly observed to last between a few seconds and a minute.
This assumes that only ICH10-based systems on Intel main boards with
Intel BIOS may be affected. Until Intel comes up with a more precise
identification of affected BIOSes, all Intel ones on Intel boards
will get this workaround applied.
Jan Beulich [Sat, 27 Aug 2011 11:13:39 +0000 (12:13 +0100)]
x86-64: allow mapping mmcfg space for high numbered PCI segments
Rather than using the segment number directly when determining the
virtual address for a particular mmconfig block, use the array index
instead. Thus a system with (perhaps significantly) less than 2048 PCI
segments, but with some having numbers beyond 2047 can actually have
all its mmconfig blocks mapped.
Without the 'break', assigning a pci device to a PV guest results in an abort,
since the code always falls through to the default abort case in the switch
statement.
Signed-off-by: Kaushik Kumar Ram <kaushik@rice.edu> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Mon, 22 Aug 2011 15:15:33 +0000 (16:15 +0100)]
x86: Fix up irq vector map logic
We need to make sure that cfg->used_vector is only cleared once;
otherwise there may be a race condition that allows the same vector to
be assigned twice, defeating the whole purpose of the map.
This makes two changes:
* __clear_irq_vector() only clears the vector if the irq is not being
moved
* smp_iqr_move_cleanup_interrupt() only clears used_vector if this
is the last place it's being used (move_cleanup_count==0 after
decrement).
Also make use of asserts more consistent, to catch this kind of logic
bug in the future.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Mon, 22 Aug 2011 09:12:36 +0000 (10:12 +0100)]
ACPI: add _PDC input override mechanism
In order to have Dom0 call _PDC with input fully representing Xen's
capabilities, and in order to avoid building knowledge of Xen
implementation details into Dom0, this provides a mechanism by which
the Dom0 kernel can, once it filled the _PDC input buffer according to
its own knowledge, present the buffer to Xen to apply overrides for
the parts of the C-, P-, and T-state management that it controls. This
is particularly to address the dependency of Xen using MWAIT to enter
certain C-states on the availability of the break-on-interrupt
extension (which the Dom0 kernel should have no need to know about).
Jan Beulich [Mon, 22 Aug 2011 09:11:10 +0000 (10:11 +0100)]
x86/IO-APIC: clear remoteIRR in clear_IO_APIC_pin()
It was found that in a crash scenario, the remoteIRR bit in an IO-APIC
RTE could be left set, causing problems when bringing up a kdump
kernel. While this generally is most important to be taken care of in
the new kernel (which usually would be a native one), it still seems
desirable to also address this problem in Xen so that (a) the problem
doesn't bite Xen when used as a secondary emergency kernel and (b) an
attempt is being made to save un-fixed secondary kernels from running
into said problem.
Based on a Linux patch from suresh.b.siddha@intel.com.
David Vrabel [Mon, 22 Aug 2011 09:05:27 +0000 (10:05 +0100)]
x86: use 'dom0_mem' to limit the number of pages for dom0
Use the 'dom0_mem' command line option to set the maximum number of
pages for dom0. dom0 can use then use the XENMEM_maximum_reservation
memory op to automatically find this limit and reduce the size of any
page tables etc.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Tim Deegan [Fri, 19 Aug 2011 12:29:27 +0000 (13:29 +0100)]
nestedhvm: avoid endless loop of nested page faults
Stop sending IPIs to flush the nested-on-nested pagetable
after write operations. Instead flush the TLB only.
This fixes an endless loop of nested page faults after
adding an entry to the nested-on-nested pagetable.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Committed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Fri, 19 Aug 2011 08:58:22 +0000 (09:58 +0100)]
x86/KEXEC: disable hpet legacy broadcasts earlier
On x2apic machines which booted in xapic mode,
hpet_disable_legacy_broadcast() sends an event check IPI to all online
processors. This leads to a protection fault as the genapic blindly
pokes x2apic MSRs while the local apic is in xapic mode.
One option is to change genapic when we shut down the local apic, but
there are still problems with trying to IPI processors in the online
processor map which are actually sitting in NMI loops
Another option is to have each CPU take itself out of the online CPU
map during the NMI shootdown.
Realistically however, disabling hpet legacy broadcasts earlier in the
kexec path is the easiest fix to the problem.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
mini-os: work around ld bug causing stupid CTOR count
I'm seeing pvgrub crashing when running CTORs. It appears its because
the magic in the linker script is generating junk. If I get ld to
output a map, I see:
Jan Beulich [Fri, 19 Aug 2011 08:54:53 +0000 (09:54 +0100)]
x86: trampoline cleanup
To make future changes less error prone, and to slightly simplify a
possible future conversion to a relocatable trampoline even for the
multiboot path (pretty desirable given that we had to change the
trampoline base a number of times to escape collisions with firmware
placed data),
- remove final uses of bootsym_phys() from trampoline.S, allowing the
symbol to be undefined before including this file (to make sure no
new references get added)
- replace two easy to deal with uses of bootsym_phys() in head.S
- remove an easy to replace reference to BOOT_TRAMPOLINE
Jan Beulich [Fri, 19 Aug 2011 08:54:26 +0000 (09:54 +0100)]
x86: make run-time part of trampoline relocatable
In order to eliminate an initial hack in the EFI boot code (where
memory for the trampoline was just "claimed" instead of properly
allocated), the trampoline code must no longer make assumption on the
address at which it would be located. For the time being, the fixed
address is being retained for the traditional multiboot path.
As an additional benefit (at least from my pov) it allows confining
the visibility of the BOOT_TRAMPOLINE definition to just the boot
code.
Jan Beulich [Tue, 16 Aug 2011 14:05:55 +0000 (15:05 +0100)]
x86: simplify (and fix) clear_IO_APIC{,_pin}()
These are used during bootup and (emergency) shutdown only, and their
only purpose is to get the actual IO-APIC's RTE(s) cleared.
Consequently, only the "raw" accessors should be used (and the ones
going through interrupt remapping code can be skipped), with the
exception of determining the delivery mode: This one must always go
through the interrupt remapping path, as in the VT-d case the actual
IO-APIC's RTE will have the delivery mode always set to zero (which
before possibly could have resulted in such an entry getting cleared
in the "raw" pass, though I haven't observed this case in practice).
Jan Beulich [Tue, 16 Aug 2011 14:05:30 +0000 (15:05 +0100)]
passthrough: don't use open coded IO-APIC accesses
This makes the respective functions quite a bit more legible.
Since this requires fiddling with __ioapic_{read,write}_entry()
anyway,
make them and their wrappers have their argument types match those of
__io_apic_{read,write}() (int -> unsigned int).
Jan Beulich [Tue, 16 Aug 2011 14:05:03 +0000 (15:05 +0100)]
x86-64/mmcfg: relax base address restriction
Following what Linux did quite a while ago, don't generally disallow
MMCFG base addresses to live above the 4Gb boundary: New systems are
assumed to be fine, and SGI ones are, too.
I have constructed the attached patch which reverts c/s 23733
(adjusted for conflicts due to subsequent patches). With this
reversion Xen once more boots on these machines.
23733 has been in the tree for some time now, causing this breakage,
and has already been fingered by the automatic bisector and discussed
on xen-devel as the cause of boot failures. I think it is now time to
revert it pending a correct fix to the original problem.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Wang [Tue, 16 Aug 2011 14:03:11 +0000 (15:03 +0100)]
amd iommu: Automatic page coalescing
This patch implements automatic page coalescing when separated io page
table is used. It uses ignore bits in iommu pde to cache how many
entries lower next page level are suitable for coalescing and then
builds a super page entry when all lower entries are contiguous. This
patch has been tested OK for weeks mainly with graphic devices and 3D
mark vantage.
Jan Beulich [Sat, 13 Aug 2011 09:14:58 +0000 (10:14 +0100)]
x86/PCI-MSI: properly determine VF BAR values
As was discussed a couple of times on this list, SR-IOV virtual
functions have their BARs read as zero - the physical function's
SR-IOV capability structure must be consulted instead. The bogus
warnings people complained about are being eliminated with this
change.
Andrew Cooper [Sat, 13 Aug 2011 09:14:28 +0000 (10:14 +0100)]
x86: IRQ fix incorrect logic in __clear_irq_vector
In the old code, tmp_mask is the cpu_and of cfg->cpu_mask and
cpu_online_map. However, in the usual case of moving an IRQ from one
PCPU to another because the scheduler decides its a good idea,
cfg->cpu_mask and cfg->old_cpu_mask do not intersect. This causes the
old cpu vector_irq table to keep the irq reference when it shouldn't.
This leads to a resource leak if a domain is shut down wile an irq has
a move pending, which results in Xen's create_irq() eventually failing
with -ENOSPC when all vector_irq tables are full of stale references.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>