Keir Fraser [Wed, 4 Feb 2009 12:26:00 +0000 (12:26 +0000)]
linux: fix IRQ handling for PV passthrough
For DomU-s registering PIRQ-s must be done separately, as they don't
use the IO-APIC code.
Additionally make sure the IRQ chip doesn't get set twice (and the
event channel information overwritten) for an IRQ possibly in use by
more than one device.
Keir Fraser [Wed, 4 Feb 2009 12:25:09 +0000 (12:25 +0000)]
linux: remove xen specific member from pci_dev
Move msi related variable irq_old out of struct pci_dev. This is
logically more consistent and has the additional benefit that xen
kernel and vanilla kernel now have the same pci_dev layout
Keir Fraser [Tue, 3 Feb 2009 13:59:17 +0000 (13:59 +0000)]
fbfront: Improve diagnostics when kthread_run() fails
Failure is reported with xenbus_dev_fatal(..."register_framebuffer"),
which was already suboptimal before it got moved away from
register_framebuffer(), and is outright misleading now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Keir Fraser [Wed, 14 Jan 2009 14:03:42 +0000 (14:03 +0000)]
revert: "netfront/back: do not mark packets of length < MSS as GSO"
changeset: 774:107e10e0e07c
user: Keir Fraser <keir.fraser@citrix.com>
date: Tue Jan 13 15:17:54 2009 +0000
summary: netfront/back: do not mark packets of length < MSS as GSO
Herbert Xu suggested a better fix in the network
stack which will follow.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Tue, 13 Jan 2009 15:17:54 +0000 (15:17 +0000)]
netfront/back: do not mark packets of length < MSS as GSO
Linux assumes that skbs marked for GSO are longer than MSS. In
particular tcp_tso_segment assumes that skb_segment will return a
chain of at least 2 skbs.
Both netfront and back should therefor not pass such a packet up the
stack.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
This patch fixes some weird issues in upstream.
Dom0 uses one page shared with hypervisor to notify which pirqs need EOI
writes, but the page is set incorrectly for ia64 due to following reasons:
1. the related two hypercalls are not enabled in the correct way, so this page
is not really used by dom0 and hypervisor do nothing when dom0 writes eoi.
Keir Fraser [Thu, 18 Dec 2008 11:51:36 +0000 (11:51 +0000)]
netback: handle non-netback foreign pages
An SKB can contain pages which are foreign but not tracked by netback,
such as those created by gnttab_copy_grant_page when in
NETBK_DELAYED_COPY_SKB mode. These pages do not have a mapping field
which points to a valid offset in the pending_tx_info array.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 11 Dec 2008 13:38:48 +0000 (13:38 +0000)]
add hvc compatibility mode to xencons.
Makes switching back and forth with a pvops kernel easier. Taken from
http://lists.alioth.debian.org/pipermail/pkg-xen-devel/2008-October/002098.html
http://svn.debian.org/viewsvn/kernel?rev=12337&view=rev with thanks to
Bastian Blank.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Isaku Yamahata [Wed, 3 Dec 2008 02:38:32 +0000 (11:38 +0900)]
IA64: xencomm support for multi call with physdev_op and event_channel_op.
Recently the c/s of d545a95fca73 makes use of multi call
with __HYPERVISOR_event_channel_op and __HYPERVISOR_physdev_op.
This patch adds support of those hypercall.
Keir Fraser [Tue, 2 Dec 2008 11:54:47 +0000 (11:54 +0000)]
Fix buggy mask_base in saving/restoring MSI-X table during S3
Fix mask_base (actually MSI-X table base, copy name from native) to be
a virtual address rather than a physical address. And remove wrong
printk in pci_disable_msix.
Keir Fraser [Fri, 28 Nov 2008 13:07:36 +0000 (13:07 +0000)]
dom0 linux: Fix and cleanup reassigning memory resource code.
When we use PCI pass-through, we have to assign page-aligned resources
to device. To do this, we round up the alignment to PAGE_SIZE, if
device is specified by "reassigndev=" boot parameter.
"pdev_sort_resources" function uses the alignment. But it does not
round up the alignment to PAGE_SIZE. This patch makes
"pdev_sort_resources" function round up the alignment to PAGE_SIZE.
"pbus_size_mem" function round up the alignment of bridge's resource
window as well as that of normal resource. But we don't need to do
this. This patch makes "pbus_size_mem" function exclude bridges's
resource window.
This patch also cleanups code of reassigning memory resource.
Keir Fraser [Mon, 24 Nov 2008 11:04:54 +0000 (11:04 +0000)]
pciback: error handler for PCIE_AER.
This patch is the main implementation for enabling PCIE_AER handling,
adding related pci error handler in pciback and pcifront.
When a device sends a PCIE error message to the root port, it will
trigger an interrupt. The irq handler will then collect roor error
status register, then schedule a work to process the error based on
the error type.
If the error is non-correctable error (fatal or non-fatal), AER
service driver will call the callback funtions of the endpoint's
driver. For bridge, it will broadcast the error to the downstream
ports. Pciback error handler will be called accordingly. Pciback then
ask pcifront help to call the end-device driver for finally completing
the related pci error handling jobs.
Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com> Signed-off-by: Ke Liping <liping.ke@intel.com>
Keir Fraser [Wed, 19 Nov 2008 13:15:46 +0000 (13:15 +0000)]
linux/x86: remove broken HYPERVISOR_acm_op()
That hypercall apparently never really worked (it's being passed two
arguments, but the hypercall entry point code only loaded one, while
do_acm_op() again consumed two), appears to be pointless in the kernel
anyway, and there's been no __HYPERVISOR_acm_op for quite a while.
Keir Fraser [Tue, 18 Nov 2008 16:04:04 +0000 (16:04 +0000)]
linux, S3: dom0 doesn't need save ioapic state
Dom0 doesn't need to save/restore ioapic state across S3
suspend/resume, as Xen already does it. The more important
is to avoid warnings on some platforms which may have
uninitialized RTEs to be weird value (like smi mode) but
masked. When dom0 saves those entries and then write back
later, it's easy to trigger Xen's sanity check from
ioapic_guest_write.
Keir Fraser [Fri, 7 Nov 2008 17:04:20 +0000 (17:04 +0000)]
xen: Shouldn't remove device in pci_bus_probe_wrapper()
In pci_bus_probe_wrapper(), it adds (assign) a device to dom0 firstly,
but if pci_bus_probe() for the device fails (don't have driver), the
device will be removed (deassigned) from dom0. For PCIe-to-PCI
bridges, they are removed from dom0 when they are hooked by
pci_bus_probe_wrapper(). That's to say they are not mapped in VT-d
page table. Thus the PCI devices under these bridges cannot work. This
situation happens when install pciback module, because pciback will
probe these bridges and removed them from dom0. Built-in pciback won't
result in this problem due to these bridges (for example 00:1e.0) are
probed before their devices (for example 02:00.0). (When map a pci
device (02:00.0) to VT-d, it will also map its pcie-to-pci bridge
(00:1e.0) to VT-d)
So I think should not remove (deassign) devices from dom0 when
pci_bus_probe() fails. Each device which can DMA should be mapped in
VT-d when VT-d is enabled. But current code make it possible some
these devices are not mapped into VT-d.
From: Weidong Han <weidong.han@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 5 Nov 2008 15:43:55 +0000 (15:43 +0000)]
prevent invalid or unsupportable PIRQs from being used (v2)
By keeping the respective irq_desc[] entries pointing to no_irq_type,
setup_irq() (and thus request_irq()) will fail for such IRQs. This
matches native behavior, which also only installs ioapic_*_type out of
ioapic_register_intr().
At the same time, make assign_irq_vector() fail not only when Xen
doesn't support the PIRQ, but also if the IRQ requested doesn't fall
in the kernel's PIRQ space.
Keir Fraser [Wed, 5 Nov 2008 14:45:34 +0000 (14:45 +0000)]
linux: prevent invalid or unsupportable PIRQs from being used
By keeping the respective irq_desc[] entries pointing to no_irq_type,
setup_irq() (and thus request_irq()) will fail for such IRQs. This
matches native behavior, which also only installs ioapic_*_type out of
ioapic_register_intr().
At the same time, make assign_irq_vector() fail not only when Xen
doesn't support the PIRQ, but also if the IRQ requested doesn't fall
in the kernel's PIRQ space.
Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm
Platform firmware exports the support for Native C-state to OS
using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface
Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm
With Processor Native C-state, we use 'MWAIT' instruction on the
processor
to enter different C-states (C1, C2, C3). We won't use the
special IO
ports to enter C-state and no SMM mode etc required to enter
C-state.
Overall this will mean better C-state support.
One major advantage of using MWAIT for all C-states is, with this
and "treat interrupt as break event" feature of MWAIT, we can now get
accurate timing for the time spent in C1, C2, .. states.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Mon, 27 Oct 2008 10:43:45 +0000 (10:43 +0000)]
Fix IRQ-from-evtchn delivery so that softirq handling does not happen
while IRQ delivery is blocked. We do this by moving irq_enter/irq_exit
outside the mutual-exclusion region in evtchn_do_upcall(). We then
have to remove irq_enter/irq_exit from do_IRQ(), otherwise the
preempt_coutn check in the rcu code will always fail and we hang
during boot.
Thanks to Eduard Guzovsky of Stratus for help with this patch.
Keir Fraser [Fri, 17 Oct 2008 11:01:56 +0000 (12:01 +0100)]
dom0 linux: Fix issue on reassigning resources to PCI-PCI bridge.
This patch fixes the issue on reassigning resources to PCI-PCI bridge,
which was found by Zhao, Yu.
Current "quirk_align_mem_resources" updates Base/Limit register of
PCI-PCI bridge, if IORESOURCE_MEM is set in "dev->resource[i].flags".
But, when "quirk_align_mem_resources" is called,
dev->resource[i].flags
is not initialized, because "quirk_align_mem_resources" is called
before "pci_read_bridge_bases". As a result, current code does not
update Base/Limit register.
This patch sets All F to Base register and sets 0 to Limit register,
regardless of "dev->resource[i].flags". After that,
"pci_assign_unassigned_resources" calculates resource window size and
assigns resource to PCI-PCI bridge.
Keir Fraser [Fri, 10 Oct 2008 08:58:50 +0000 (09:58 +0100)]
This patch adds the power management support to ahci driver.
And it is necessary for S3.
It is back-ported from linux kernel mainline tree.
More precisely, the patch is the diff between the commit c1332875cbe0c148c7f200d4f9b36b64e34d9872 and tag v2.8.18.
[PATCH] ahci: separate out ahci_reset_controller() and
ahci_init_controller()
Separate out ahci_reset_controller() and ahci_init_controller()
from
ata_host_init(). These will be used by PM callbacks. This patch
doesn't introduce any behavior change.
[PATCH] libata: improve driver initialization and deinitialization
Implement ahci_[de]init_port() and use it during initialization
and
de-initialization. ahci_[de]init_port() are supersets of what
used to
be done during driver [de-]initialization. This patch makes the
following behavior changes.
* Per-port IRQ mask is cleared on driver load as done in other
drivers. The mask will be configured properly during probe.
* During init_one(), HOST_IRQ_STAT is cleared after masking port
IRQs
such that there is no race window.
* CMD_SPIN_UP is cleared during init_one() instead of being set.
It
is set in port_start(). This is more consistent with overall
structure of initialization. Note that CMD_SPIN_UP simply
controls
PHY activation.
* Slumber and staggered spin-up are handled properly.
* All init/deinit operations are done in step-by-step manner as
described in the spec instead of issued as single merged
command.
Original implementation is from Zhao, Forrest
<forrest.zhao@intel.com>
Simplify ahci_start_engine() by killing prerequisite condition
checks.
Rationales are..
* No user checks error return from ahci_start_engine()
* Code flow guarantees the prerequisite conditions unless the
controller is malfunctioning. In such cases, the driver had
chances
to learn about the problem _before_ calling this function.
* Closely related to the above two, driver calls into this
function
even when prerequisites fail hoping for the best.
Basically, ahci_start_engine() should only do the operation
itself.
It isn't the right place to check for prerequisites.
* move ahci_port_start/stop() below EH functions. This makes ahci
more consistent with other drivers and makes prototypes for
ahci_start/stop_engine() unnecessary.
* swap positions between ahci_start_engine() and
ahci_stop_engine()
for readability.
[PATCH] The redefinition of ahci_start_engine() and
ahci_stop_engine()
- Make ahci_start_engine() and ahci_stop_engine() more consistent
with
AHCI spec 1.1
- Change their input parameter from ap to port_mmio
- Update the existing users of ahci_start_engine() and
ahci_stop_engine()
Signed-off-by: Forrest Zhao <forrest.zhao@intel.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
=============
Some of the commits above may not be directly related to the ahci pm
problem, but they lay the basic ground for the final commit.
Keir Fraser [Thu, 9 Oct 2008 10:10:43 +0000 (11:10 +0100)]
xen/dom0: Reassign memory resources to device for pci passthrough.
This patch adds the function that reassign page-aligned memory
resources, to dom0 linux. The function is useful when we assign I/O
device to HVM domain using pci passthrough.
When we assign a device to HVM domain using pci passthrough,
the device needs to be assigned page-aligned memory resources. If the
memory resource is not page-aligned, following error occurs.
Error: pci: 0000:00:1d.7: non-page-aligned MMIO BAR found.
On many system, BIOS assigns memory resources to the device and
enables it. So my patch disables the device, and releases resources,
Then it assigns page-aligned memory resource to the device.
To reassign resources, please add boot parameters of dom0 linux as
follows.
reassign_resources reassigndev=00:1d.7,01:00.0
reassign_resources
Enables reassigning resources.
reassigndev= Specifies devices include I/O device and
PCI-PCI
bridge to reassign resources. PCI-PCI bridge
can be specified, if resource windows need to
be expanded.
Keir Fraser [Thu, 9 Oct 2008 09:11:13 +0000 (10:11 +0100)]
dom0: Fix bad pte at booting time
Backport upstream kernel patch to fix Dom0's bad pte bug.
- In Dom0 kernel, at boot time, system will call bt_ioremap() to do
mappings for the Boot Time Fix Memory region. Also system will call
bt_iounmap() to unmap the memory region by setting phys=3D0. In this
case, system will encounter pte_ERROR(). This patch backports the
upstream kernel patch by Ingo Molnar <mingo@elte.hu>, with commit: 70c9f590ffc3f959cc81c1a3cecb6b8133caf35d
[PATCH] i386: Don't delete cpu_devs data to identify different x86
types in late_initcall
In arch/i386/cpu/common.c there is:
cpu_devs[X86_VENDOR_INTEL]
cpu_devs[X86_VENDOR_CYRIX]
cpu_devs[X86_VENDOR_AMD]
...
They are all filled with data early.
The data (struct) got set to NULL for all, but Intel in different
late_initcall (exit_cpu_vendor) calls.
I don't see what sense this makes at all, maybe something that got
forgotten with the HOTPLUG_CPU extenstions?
Please check/review whether initdata, cpuinitdata is still ok and
this still works with HOTPLUG_CPU and without, it should...
Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: davej@redhat.com
[PATCH] i386: mark cpu init functions as __cpuinit, data as
__cpuinitdata
Mark i386-specific cpu init functions as __cpuinit. They are all
only called from arch/i386/common.c:identify_cpu() that already is
marked as __cpuinit. This patch also removes the empty function
init_umc().
Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>