Keir Fraser [Thu, 19 Mar 2009 10:21:46 +0000 (10:21 +0000)]
PCI: pass ARI and SR-IOV device information to the hypervisor
PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the Extended
Function -- a function whose function number is greater than 7 within an
ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the Extended
Function is under the scope of the same remapping unit as the traditional
function. The hypervisor needs to know if a function is Extended
Function so it can find proper DMAR for it.
And section 8.3.3 specifies that the SR-IOV Virtual Function is under the
scope of the same remapping unit as the Physical Function. The hypervisor
also needs to know if a function is the Virtual Function and which
Physical Function it's associated with for same reason.
Keir Fraser [Thu, 19 Mar 2009 10:21:21 +0000 (10:21 +0000)]
PCI: save and restore PCIe 2.0 registers
PCIe 2.0 defines several new registers (Device Control 2, Link Control
2, and Slot Control 2). Save and retore them in pci_save_pcie_state()
and pci_restore_pcie_state().
Keir Fraser [Thu, 19 Mar 2009 10:06:52 +0000 (10:06 +0000)]
linux pciback/pcifront: work queue management fixes
flush_scheduled_work() only flushes work queued to the global
keventd_wq, but pciback is using its own local work queue, so that is
what needs to be flushed.
Calling cancel_delayed_work() on something never inserted through
queue_delayed_work() or schedule_delayed_work() is pointless.
Keir Fraser [Wed, 18 Mar 2009 11:51:05 +0000 (11:51 +0000)]
x86: Fix interaction of NTP and dom0->xen time updates
Don't discard NTP sync when updating Xen wallclock time from dom0,
as that's almost the first thing we do when we become synced.
Move the call to ntp_clear() into do_settimeofday(), which is the
only caller of __update_wallclock() that looks like it should break
NTP sync.
This fixes the timer chain that sets Xen's wallclock every minute when
dom0 is NTP synced, which in turn greatly improves wallclock accuracy
in PV domU.
Keir Fraser [Wed, 18 Mar 2009 11:45:30 +0000 (11:45 +0000)]
backport: allocate cap save buffers for PCIe/PCI-X.
Changeset 819:e8a9f8910a3f doesn't backport all the necessary code.
This patch adds the missing part. It is also backported from upstream
Linux kernel and the git commit is:
PCI: handle PCI state saving with interrupts disabled
Since interrupts will soon be disabled at PCI resume time, we need
to
pre-allocate memory to save/restore PCI config space (or use
GFP_ATOMIC=, but this is safer).
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>=
Keir Fraser [Wed, 18 Mar 2009 11:40:10 +0000 (11:40 +0000)]
PCI: add SR-IOV API for Physical Function driver
Add or remove the Virtual Function when the SR-IOV is enabled or
disabled by the device driver. This can happen anytime rather than
only at the device probe stage.
Keir Fraser [Wed, 18 Mar 2009 11:39:04 +0000 (11:39 +0000)]
PCI: initialize and release SR-IOV capability
If a device has the SR-IOV capability, initialize it (set the ARI
Capable Hierarchy in the lowest numbered PF if necessary; calculate
the System Page Size for the VF MMIO, probe the VF Offset, Stride
and BARs). A lock for the VF bus allocation is also initialized if
a PF is the lowest numbered PF.
PCI: Restore PCI Express capability registers after PM event
Restore PCI Express capability registers after PM event.
This includes maxumum MTU for PCI express and other vital data.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cc692a5f1e9816671b77da77c6d6c463156ba1c7
Author: Stephen Hemminger <shemminger@osdl.org>
Date: Wed Nov 8 16:17:15 2006 -0800
PCI: save/restore PCI-X state
Shouldn't PCI-X state be saved/restored? No device really needs
this
right now. qla24xx (fc HBA) and mthca (infiniband) don't do
suspend,
and sky2 resets its tweaks when links are brought up.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
This patch moves all definitions of the PCI resource names to an
'enum',
and also replaces some hard-coded resource variables with symbol
names. This change eases introduction of device specific
resources.
PCI: remove unnecessary arg of pci_update_resource()
This cleanup removes unnecessary argument 'struct resource *res'
in
pci_update_resource(), so it takes same arguments as other
companion
functions (pci_assign_resource(), etc.).
PCI: allow pci_alloc_child_bus() to handle a NULL bridge
Allow pci_alloc_child_bus() to allocate buses without bridge
devices.
Some SR-IOV devices can occupy more than one bus number, but there
is no
explicit bridges because that have internal routing mechanism.
Change parameter of pci_ari_enabled() from 'pci_dev' to 'pci_bus'.
ARI forwarding on the bridge mostly concerns the subordinate
devices
rather than the bridge itself. So this change will make the
function
easier to use.
PCI: fix ARI code to be compatible with mixed ARI/non-ARI systems
The original ARI support code has a compatibility problem with
non-ARI
devices. If a device doesn't support ARI, turning on ARI
forwarding on
its upper level bridge will cause undefined behavior.
This fix turns on ARI forwarding only when the subordinate devices
support it.
This patch adds support for PCI Express Alternative Routing-ID
Interpretation (ARI) capability.
The ARI capability extends the Function Number field of the PCI
Express
Endpoint by reusing the Device Number which is otherwise hardwired
to 0.
With ARI, an Endpoint can have up to 256 functions.
Since patch 6ac665c63dcac8fcec534a1d224ecbb8b867ad59 my infiniband
controller hasn't worked. This is because it has 64-bit
prefetchable
memory, which was mistakenly being taken to be 32-bit memory.
The
resource flags in this case are PCI_BASE_ADDRESS_MEM_TYPE_64 |
PCI_BASE_ADDRESS_MEM_PREFETCH.
This patch checks only for the PCI_BASE_ADDRESS_MEM_TYPE_64 bit;
thus
whether the region is prefetchable or not is ignored. This fixes
my
Infiniband.
Reviewed-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Yu Zhao <yu.zhao@intel.com>
PCI: handle 64-bit resources better on 32-bit machines
If the kernel is configured to support 64-bit resources on a
32-bit
machine, we can support 64-bit BARs properly. Just change the
condition
to check sizeof(resource_size_t) instead of BITS_PER_LONG.
Factor out the code to read one BAR from the loop in
pci_read_bases into
a new function, __pci_read_base. The new code is slightly more
readable, better commented and removes the ifdef.
Keir Fraser [Mon, 2 Mar 2009 11:06:52 +0000 (11:06 +0000)]
netfront: Unregister inetdev notifiers on failure
If you attempt to modprobe the pv-on-hvm netfront driver on a machine
not running under Xen (say, bare-metal, or under another hypervisor), the
netfront code correctly returns an ENODEV and fails to load. However, if you
then shutdown that machine, you will oops while tearing down the network.
This is because we forget to unregister the the inetaddr_notifier on failure,
and so the kernel takes a fatal page fault. The attached patch just unregisters
the notifier on failure, and solves the problem for me.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Keir Fraser [Mon, 2 Mar 2009 10:57:56 +0000 (10:57 +0000)]
pciback: Fix invalid use of pci_match_id()
We cannot use pci_match_id() because the first argument (tmp_quirk->devid)
is not an array of pci device ids. Instead this patch adds a utility
function to compare a pci_device_id and a pci_dev.
The ACPI_PDC_SMP_T_SWCOORD bit is set by and OS that is capable of
native ACPI throttling software coordination for mutli-processors
using the _TSD information.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
ACPI: Get throttling info from BIOS only after evaluating _PDC
Previously _PDC was evaluated later, and thus we'd not get
the chance to tell the BIOS that we can suport FixedHW registers
(MSRs)
and the BIOS would always ask us to use System I/O access
for throttling.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Add throttling control via MSR when T-states uses
the FixHW Control Status registers.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Tue, 17 Feb 2009 11:17:11 +0000 (11:17 +0000)]
pvSCSI: add new device assignment mode
Add a new device assignment mode, which assigns whole HBA
(SCSI host) to guest domain. Current implementation requires SCSI
command emulation on backend driver, and it causes limitations for
some SCSI commands. (Please see
"http://www.xen.org/files/xensummit_tokyo/24_Hitoshi%20Matsumoto_en.pdf"
for detail about why we need the new assignment mode.
SCSI command emulation on backend driver is bypassed when "host" mode
is specified.
Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com> Signed-off-by: Jun Kamada <kama@jp.fujitsu.com>
Keir Fraser [Wed, 4 Feb 2009 12:26:00 +0000 (12:26 +0000)]
linux: fix IRQ handling for PV passthrough
For DomU-s registering PIRQ-s must be done separately, as they don't
use the IO-APIC code.
Additionally make sure the IRQ chip doesn't get set twice (and the
event channel information overwritten) for an IRQ possibly in use by
more than one device.
Keir Fraser [Wed, 4 Feb 2009 12:25:09 +0000 (12:25 +0000)]
linux: remove xen specific member from pci_dev
Move msi related variable irq_old out of struct pci_dev. This is
logically more consistent and has the additional benefit that xen
kernel and vanilla kernel now have the same pci_dev layout
Keir Fraser [Tue, 3 Feb 2009 13:59:17 +0000 (13:59 +0000)]
fbfront: Improve diagnostics when kthread_run() fails
Failure is reported with xenbus_dev_fatal(..."register_framebuffer"),
which was already suboptimal before it got moved away from
register_framebuffer(), and is outright misleading now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Keir Fraser [Wed, 14 Jan 2009 14:03:42 +0000 (14:03 +0000)]
revert: "netfront/back: do not mark packets of length < MSS as GSO"
changeset: 774:107e10e0e07c
user: Keir Fraser <keir.fraser@citrix.com>
date: Tue Jan 13 15:17:54 2009 +0000
summary: netfront/back: do not mark packets of length < MSS as GSO
Herbert Xu suggested a better fix in the network
stack which will follow.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Tue, 13 Jan 2009 15:17:54 +0000 (15:17 +0000)]
netfront/back: do not mark packets of length < MSS as GSO
Linux assumes that skbs marked for GSO are longer than MSS. In
particular tcp_tso_segment assumes that skb_segment will return a
chain of at least 2 skbs.
Both netfront and back should therefor not pass such a packet up the
stack.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
This patch fixes some weird issues in upstream.
Dom0 uses one page shared with hypervisor to notify which pirqs need EOI
writes, but the page is set incorrectly for ia64 due to following reasons:
1. the related two hypercalls are not enabled in the correct way, so this page
is not really used by dom0 and hypervisor do nothing when dom0 writes eoi.
Keir Fraser [Thu, 18 Dec 2008 11:51:36 +0000 (11:51 +0000)]
netback: handle non-netback foreign pages
An SKB can contain pages which are foreign but not tracked by netback,
such as those created by gnttab_copy_grant_page when in
NETBK_DELAYED_COPY_SKB mode. These pages do not have a mapping field
which points to a valid offset in the pending_tx_info array.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 11 Dec 2008 13:38:48 +0000 (13:38 +0000)]
add hvc compatibility mode to xencons.
Makes switching back and forth with a pvops kernel easier. Taken from
http://lists.alioth.debian.org/pipermail/pkg-xen-devel/2008-October/002098.html
http://svn.debian.org/viewsvn/kernel?rev=12337&view=rev with thanks to
Bastian Blank.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Isaku Yamahata [Wed, 3 Dec 2008 02:38:32 +0000 (11:38 +0900)]
IA64: xencomm support for multi call with physdev_op and event_channel_op.
Recently the c/s of d545a95fca73 makes use of multi call
with __HYPERVISOR_event_channel_op and __HYPERVISOR_physdev_op.
This patch adds support of those hypercall.
Keir Fraser [Tue, 2 Dec 2008 11:54:47 +0000 (11:54 +0000)]
Fix buggy mask_base in saving/restoring MSI-X table during S3
Fix mask_base (actually MSI-X table base, copy name from native) to be
a virtual address rather than a physical address. And remove wrong
printk in pci_disable_msix.
Keir Fraser [Fri, 28 Nov 2008 13:07:36 +0000 (13:07 +0000)]
dom0 linux: Fix and cleanup reassigning memory resource code.
When we use PCI pass-through, we have to assign page-aligned resources
to device. To do this, we round up the alignment to PAGE_SIZE, if
device is specified by "reassigndev=" boot parameter.
"pdev_sort_resources" function uses the alignment. But it does not
round up the alignment to PAGE_SIZE. This patch makes
"pdev_sort_resources" function round up the alignment to PAGE_SIZE.
"pbus_size_mem" function round up the alignment of bridge's resource
window as well as that of normal resource. But we don't need to do
this. This patch makes "pbus_size_mem" function exclude bridges's
resource window.
This patch also cleanups code of reassigning memory resource.