Keir Fraser [Wed, 18 Mar 2009 11:40:10 +0000 (11:40 +0000)]
PCI: add SR-IOV API for Physical Function driver
Add or remove the Virtual Function when the SR-IOV is enabled or
disabled by the device driver. This can happen anytime rather than
only at the device probe stage.
Keir Fraser [Wed, 18 Mar 2009 11:39:04 +0000 (11:39 +0000)]
PCI: initialize and release SR-IOV capability
If a device has the SR-IOV capability, initialize it (set the ARI
Capable Hierarchy in the lowest numbered PF if necessary; calculate
the System Page Size for the VF MMIO, probe the VF Offset, Stride
and BARs). A lock for the VF bus allocation is also initialized if
a PF is the lowest numbered PF.
PCI: Restore PCI Express capability registers after PM event
Restore PCI Express capability registers after PM event.
This includes maxumum MTU for PCI express and other vital data.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cc692a5f1e9816671b77da77c6d6c463156ba1c7
Author: Stephen Hemminger <shemminger@osdl.org>
Date: Wed Nov 8 16:17:15 2006 -0800
PCI: save/restore PCI-X state
Shouldn't PCI-X state be saved/restored? No device really needs
this
right now. qla24xx (fc HBA) and mthca (infiniband) don't do
suspend,
and sky2 resets its tweaks when links are brought up.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
This patch moves all definitions of the PCI resource names to an
'enum',
and also replaces some hard-coded resource variables with symbol
names. This change eases introduction of device specific
resources.
PCI: remove unnecessary arg of pci_update_resource()
This cleanup removes unnecessary argument 'struct resource *res'
in
pci_update_resource(), so it takes same arguments as other
companion
functions (pci_assign_resource(), etc.).
PCI: allow pci_alloc_child_bus() to handle a NULL bridge
Allow pci_alloc_child_bus() to allocate buses without bridge
devices.
Some SR-IOV devices can occupy more than one bus number, but there
is no
explicit bridges because that have internal routing mechanism.
Change parameter of pci_ari_enabled() from 'pci_dev' to 'pci_bus'.
ARI forwarding on the bridge mostly concerns the subordinate
devices
rather than the bridge itself. So this change will make the
function
easier to use.
PCI: fix ARI code to be compatible with mixed ARI/non-ARI systems
The original ARI support code has a compatibility problem with
non-ARI
devices. If a device doesn't support ARI, turning on ARI
forwarding on
its upper level bridge will cause undefined behavior.
This fix turns on ARI forwarding only when the subordinate devices
support it.
This patch adds support for PCI Express Alternative Routing-ID
Interpretation (ARI) capability.
The ARI capability extends the Function Number field of the PCI
Express
Endpoint by reusing the Device Number which is otherwise hardwired
to 0.
With ARI, an Endpoint can have up to 256 functions.
Since patch 6ac665c63dcac8fcec534a1d224ecbb8b867ad59 my infiniband
controller hasn't worked. This is because it has 64-bit
prefetchable
memory, which was mistakenly being taken to be 32-bit memory.
The
resource flags in this case are PCI_BASE_ADDRESS_MEM_TYPE_64 |
PCI_BASE_ADDRESS_MEM_PREFETCH.
This patch checks only for the PCI_BASE_ADDRESS_MEM_TYPE_64 bit;
thus
whether the region is prefetchable or not is ignored. This fixes
my
Infiniband.
Reviewed-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Yu Zhao <yu.zhao@intel.com>
PCI: handle 64-bit resources better on 32-bit machines
If the kernel is configured to support 64-bit resources on a
32-bit
machine, we can support 64-bit BARs properly. Just change the
condition
to check sizeof(resource_size_t) instead of BITS_PER_LONG.
Factor out the code to read one BAR from the loop in
pci_read_bases into
a new function, __pci_read_base. The new code is slightly more
readable, better commented and removes the ifdef.
Keir Fraser [Mon, 2 Mar 2009 11:06:52 +0000 (11:06 +0000)]
netfront: Unregister inetdev notifiers on failure
If you attempt to modprobe the pv-on-hvm netfront driver on a machine
not running under Xen (say, bare-metal, or under another hypervisor), the
netfront code correctly returns an ENODEV and fails to load. However, if you
then shutdown that machine, you will oops while tearing down the network.
This is because we forget to unregister the the inetaddr_notifier on failure,
and so the kernel takes a fatal page fault. The attached patch just unregisters
the notifier on failure, and solves the problem for me.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Keir Fraser [Mon, 2 Mar 2009 10:57:56 +0000 (10:57 +0000)]
pciback: Fix invalid use of pci_match_id()
We cannot use pci_match_id() because the first argument (tmp_quirk->devid)
is not an array of pci device ids. Instead this patch adds a utility
function to compare a pci_device_id and a pci_dev.
The ACPI_PDC_SMP_T_SWCOORD bit is set by and OS that is capable of
native ACPI throttling software coordination for mutli-processors
using the _TSD information.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
ACPI: Get throttling info from BIOS only after evaluating _PDC
Previously _PDC was evaluated later, and thus we'd not get
the chance to tell the BIOS that we can suport FixedHW registers
(MSRs)
and the BIOS would always ask us to use System I/O access
for throttling.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Add throttling control via MSR when T-states uses
the FixHW Control Status registers.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Tue, 17 Feb 2009 11:17:11 +0000 (11:17 +0000)]
pvSCSI: add new device assignment mode
Add a new device assignment mode, which assigns whole HBA
(SCSI host) to guest domain. Current implementation requires SCSI
command emulation on backend driver, and it causes limitations for
some SCSI commands. (Please see
"http://www.xen.org/files/xensummit_tokyo/24_Hitoshi%20Matsumoto_en.pdf"
for detail about why we need the new assignment mode.
SCSI command emulation on backend driver is bypassed when "host" mode
is specified.
Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com> Signed-off-by: Jun Kamada <kama@jp.fujitsu.com>
Keir Fraser [Wed, 4 Feb 2009 12:26:00 +0000 (12:26 +0000)]
linux: fix IRQ handling for PV passthrough
For DomU-s registering PIRQ-s must be done separately, as they don't
use the IO-APIC code.
Additionally make sure the IRQ chip doesn't get set twice (and the
event channel information overwritten) for an IRQ possibly in use by
more than one device.
Keir Fraser [Wed, 4 Feb 2009 12:25:09 +0000 (12:25 +0000)]
linux: remove xen specific member from pci_dev
Move msi related variable irq_old out of struct pci_dev. This is
logically more consistent and has the additional benefit that xen
kernel and vanilla kernel now have the same pci_dev layout
Keir Fraser [Tue, 3 Feb 2009 13:59:17 +0000 (13:59 +0000)]
fbfront: Improve diagnostics when kthread_run() fails
Failure is reported with xenbus_dev_fatal(..."register_framebuffer"),
which was already suboptimal before it got moved away from
register_framebuffer(), and is outright misleading now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Keir Fraser [Wed, 14 Jan 2009 14:03:42 +0000 (14:03 +0000)]
revert: "netfront/back: do not mark packets of length < MSS as GSO"
changeset: 774:107e10e0e07c
user: Keir Fraser <keir.fraser@citrix.com>
date: Tue Jan 13 15:17:54 2009 +0000
summary: netfront/back: do not mark packets of length < MSS as GSO
Herbert Xu suggested a better fix in the network
stack which will follow.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Tue, 13 Jan 2009 15:17:54 +0000 (15:17 +0000)]
netfront/back: do not mark packets of length < MSS as GSO
Linux assumes that skbs marked for GSO are longer than MSS. In
particular tcp_tso_segment assumes that skb_segment will return a
chain of at least 2 skbs.
Both netfront and back should therefor not pass such a packet up the
stack.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
This patch fixes some weird issues in upstream.
Dom0 uses one page shared with hypervisor to notify which pirqs need EOI
writes, but the page is set incorrectly for ia64 due to following reasons:
1. the related two hypercalls are not enabled in the correct way, so this page
is not really used by dom0 and hypervisor do nothing when dom0 writes eoi.
Keir Fraser [Thu, 18 Dec 2008 11:51:36 +0000 (11:51 +0000)]
netback: handle non-netback foreign pages
An SKB can contain pages which are foreign but not tracked by netback,
such as those created by gnttab_copy_grant_page when in
NETBK_DELAYED_COPY_SKB mode. These pages do not have a mapping field
which points to a valid offset in the pending_tx_info array.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 11 Dec 2008 13:38:48 +0000 (13:38 +0000)]
add hvc compatibility mode to xencons.
Makes switching back and forth with a pvops kernel easier. Taken from
http://lists.alioth.debian.org/pipermail/pkg-xen-devel/2008-October/002098.html
http://svn.debian.org/viewsvn/kernel?rev=12337&view=rev with thanks to
Bastian Blank.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Isaku Yamahata [Wed, 3 Dec 2008 02:38:32 +0000 (11:38 +0900)]
IA64: xencomm support for multi call with physdev_op and event_channel_op.
Recently the c/s of d545a95fca73 makes use of multi call
with __HYPERVISOR_event_channel_op and __HYPERVISOR_physdev_op.
This patch adds support of those hypercall.
Keir Fraser [Tue, 2 Dec 2008 11:54:47 +0000 (11:54 +0000)]
Fix buggy mask_base in saving/restoring MSI-X table during S3
Fix mask_base (actually MSI-X table base, copy name from native) to be
a virtual address rather than a physical address. And remove wrong
printk in pci_disable_msix.
Keir Fraser [Fri, 28 Nov 2008 13:07:36 +0000 (13:07 +0000)]
dom0 linux: Fix and cleanup reassigning memory resource code.
When we use PCI pass-through, we have to assign page-aligned resources
to device. To do this, we round up the alignment to PAGE_SIZE, if
device is specified by "reassigndev=" boot parameter.
"pdev_sort_resources" function uses the alignment. But it does not
round up the alignment to PAGE_SIZE. This patch makes
"pdev_sort_resources" function round up the alignment to PAGE_SIZE.
"pbus_size_mem" function round up the alignment of bridge's resource
window as well as that of normal resource. But we don't need to do
this. This patch makes "pbus_size_mem" function exclude bridges's
resource window.
This patch also cleanups code of reassigning memory resource.
Keir Fraser [Mon, 24 Nov 2008 11:04:54 +0000 (11:04 +0000)]
pciback: error handler for PCIE_AER.
This patch is the main implementation for enabling PCIE_AER handling,
adding related pci error handler in pciback and pcifront.
When a device sends a PCIE error message to the root port, it will
trigger an interrupt. The irq handler will then collect roor error
status register, then schedule a work to process the error based on
the error type.
If the error is non-correctable error (fatal or non-fatal), AER
service driver will call the callback funtions of the endpoint's
driver. For bridge, it will broadcast the error to the downstream
ports. Pciback error handler will be called accordingly. Pciback then
ask pcifront help to call the end-device driver for finally completing
the related pci error handling jobs.
Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com> Signed-off-by: Ke Liping <liping.ke@intel.com>