Keir Fraser [Mon, 2 Mar 2009 10:57:56 +0000 (10:57 +0000)]
pciback: Fix invalid use of pci_match_id()
We cannot use pci_match_id() because the first argument (tmp_quirk->devid)
is not an array of pci device ids. Instead this patch adds a utility
function to compare a pci_device_id and a pci_dev.
The ACPI_PDC_SMP_T_SWCOORD bit is set by and OS that is capable of
native ACPI throttling software coordination for mutli-processors
using the _TSD information.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
ACPI: Get throttling info from BIOS only after evaluating _PDC
Previously _PDC was evaluated later, and thus we'd not get
the chance to tell the BIOS that we can suport FixedHW registers
(MSRs)
and the BIOS would always ask us to use System I/O access
for throttling.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Add throttling control via MSR when T-states uses
the FixHW Control Status registers.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Tue, 17 Feb 2009 11:17:11 +0000 (11:17 +0000)]
pvSCSI: add new device assignment mode
Add a new device assignment mode, which assigns whole HBA
(SCSI host) to guest domain. Current implementation requires SCSI
command emulation on backend driver, and it causes limitations for
some SCSI commands. (Please see
"http://www.xen.org/files/xensummit_tokyo/24_Hitoshi%20Matsumoto_en.pdf"
for detail about why we need the new assignment mode.
SCSI command emulation on backend driver is bypassed when "host" mode
is specified.
Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com> Signed-off-by: Jun Kamada <kama@jp.fujitsu.com>
Keir Fraser [Wed, 4 Feb 2009 12:26:00 +0000 (12:26 +0000)]
linux: fix IRQ handling for PV passthrough
For DomU-s registering PIRQ-s must be done separately, as they don't
use the IO-APIC code.
Additionally make sure the IRQ chip doesn't get set twice (and the
event channel information overwritten) for an IRQ possibly in use by
more than one device.
Keir Fraser [Wed, 4 Feb 2009 12:25:09 +0000 (12:25 +0000)]
linux: remove xen specific member from pci_dev
Move msi related variable irq_old out of struct pci_dev. This is
logically more consistent and has the additional benefit that xen
kernel and vanilla kernel now have the same pci_dev layout
Keir Fraser [Tue, 3 Feb 2009 13:59:17 +0000 (13:59 +0000)]
fbfront: Improve diagnostics when kthread_run() fails
Failure is reported with xenbus_dev_fatal(..."register_framebuffer"),
which was already suboptimal before it got moved away from
register_framebuffer(), and is outright misleading now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Keir Fraser [Wed, 14 Jan 2009 14:03:42 +0000 (14:03 +0000)]
revert: "netfront/back: do not mark packets of length < MSS as GSO"
changeset: 774:107e10e0e07c
user: Keir Fraser <keir.fraser@citrix.com>
date: Tue Jan 13 15:17:54 2009 +0000
summary: netfront/back: do not mark packets of length < MSS as GSO
Herbert Xu suggested a better fix in the network
stack which will follow.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Tue, 13 Jan 2009 15:17:54 +0000 (15:17 +0000)]
netfront/back: do not mark packets of length < MSS as GSO
Linux assumes that skbs marked for GSO are longer than MSS. In
particular tcp_tso_segment assumes that skb_segment will return a
chain of at least 2 skbs.
Both netfront and back should therefor not pass such a packet up the
stack.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
This patch fixes some weird issues in upstream.
Dom0 uses one page shared with hypervisor to notify which pirqs need EOI
writes, but the page is set incorrectly for ia64 due to following reasons:
1. the related two hypercalls are not enabled in the correct way, so this page
is not really used by dom0 and hypervisor do nothing when dom0 writes eoi.
Keir Fraser [Thu, 18 Dec 2008 11:51:36 +0000 (11:51 +0000)]
netback: handle non-netback foreign pages
An SKB can contain pages which are foreign but not tracked by netback,
such as those created by gnttab_copy_grant_page when in
NETBK_DELAYED_COPY_SKB mode. These pages do not have a mapping field
which points to a valid offset in the pending_tx_info array.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 11 Dec 2008 13:38:48 +0000 (13:38 +0000)]
add hvc compatibility mode to xencons.
Makes switching back and forth with a pvops kernel easier. Taken from
http://lists.alioth.debian.org/pipermail/pkg-xen-devel/2008-October/002098.html
http://svn.debian.org/viewsvn/kernel?rev=12337&view=rev with thanks to
Bastian Blank.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Isaku Yamahata [Wed, 3 Dec 2008 02:38:32 +0000 (11:38 +0900)]
IA64: xencomm support for multi call with physdev_op and event_channel_op.
Recently the c/s of d545a95fca73 makes use of multi call
with __HYPERVISOR_event_channel_op and __HYPERVISOR_physdev_op.
This patch adds support of those hypercall.
Keir Fraser [Tue, 2 Dec 2008 11:54:47 +0000 (11:54 +0000)]
Fix buggy mask_base in saving/restoring MSI-X table during S3
Fix mask_base (actually MSI-X table base, copy name from native) to be
a virtual address rather than a physical address. And remove wrong
printk in pci_disable_msix.
Keir Fraser [Fri, 28 Nov 2008 13:07:36 +0000 (13:07 +0000)]
dom0 linux: Fix and cleanup reassigning memory resource code.
When we use PCI pass-through, we have to assign page-aligned resources
to device. To do this, we round up the alignment to PAGE_SIZE, if
device is specified by "reassigndev=" boot parameter.
"pdev_sort_resources" function uses the alignment. But it does not
round up the alignment to PAGE_SIZE. This patch makes
"pdev_sort_resources" function round up the alignment to PAGE_SIZE.
"pbus_size_mem" function round up the alignment of bridge's resource
window as well as that of normal resource. But we don't need to do
this. This patch makes "pbus_size_mem" function exclude bridges's
resource window.
This patch also cleanups code of reassigning memory resource.
Keir Fraser [Mon, 24 Nov 2008 11:04:54 +0000 (11:04 +0000)]
pciback: error handler for PCIE_AER.
This patch is the main implementation for enabling PCIE_AER handling,
adding related pci error handler in pciback and pcifront.
When a device sends a PCIE error message to the root port, it will
trigger an interrupt. The irq handler will then collect roor error
status register, then schedule a work to process the error based on
the error type.
If the error is non-correctable error (fatal or non-fatal), AER
service driver will call the callback funtions of the endpoint's
driver. For bridge, it will broadcast the error to the downstream
ports. Pciback error handler will be called accordingly. Pciback then
ask pcifront help to call the end-device driver for finally completing
the related pci error handling jobs.
Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com> Signed-off-by: Ke Liping <liping.ke@intel.com>
Keir Fraser [Wed, 19 Nov 2008 13:15:46 +0000 (13:15 +0000)]
linux/x86: remove broken HYPERVISOR_acm_op()
That hypercall apparently never really worked (it's being passed two
arguments, but the hypercall entry point code only loaded one, while
do_acm_op() again consumed two), appears to be pointless in the kernel
anyway, and there's been no __HYPERVISOR_acm_op for quite a while.
Keir Fraser [Tue, 18 Nov 2008 16:04:04 +0000 (16:04 +0000)]
linux, S3: dom0 doesn't need save ioapic state
Dom0 doesn't need to save/restore ioapic state across S3
suspend/resume, as Xen already does it. The more important
is to avoid warnings on some platforms which may have
uninitialized RTEs to be weird value (like smi mode) but
masked. When dom0 saves those entries and then write back
later, it's easy to trigger Xen's sanity check from
ioapic_guest_write.
Keir Fraser [Fri, 7 Nov 2008 17:04:20 +0000 (17:04 +0000)]
xen: Shouldn't remove device in pci_bus_probe_wrapper()
In pci_bus_probe_wrapper(), it adds (assign) a device to dom0 firstly,
but if pci_bus_probe() for the device fails (don't have driver), the
device will be removed (deassigned) from dom0. For PCIe-to-PCI
bridges, they are removed from dom0 when they are hooked by
pci_bus_probe_wrapper(). That's to say they are not mapped in VT-d
page table. Thus the PCI devices under these bridges cannot work. This
situation happens when install pciback module, because pciback will
probe these bridges and removed them from dom0. Built-in pciback won't
result in this problem due to these bridges (for example 00:1e.0) are
probed before their devices (for example 02:00.0). (When map a pci
device (02:00.0) to VT-d, it will also map its pcie-to-pci bridge
(00:1e.0) to VT-d)
So I think should not remove (deassign) devices from dom0 when
pci_bus_probe() fails. Each device which can DMA should be mapped in
VT-d when VT-d is enabled. But current code make it possible some
these devices are not mapped into VT-d.
From: Weidong Han <weidong.han@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 5 Nov 2008 15:43:55 +0000 (15:43 +0000)]
prevent invalid or unsupportable PIRQs from being used (v2)
By keeping the respective irq_desc[] entries pointing to no_irq_type,
setup_irq() (and thus request_irq()) will fail for such IRQs. This
matches native behavior, which also only installs ioapic_*_type out of
ioapic_register_intr().
At the same time, make assign_irq_vector() fail not only when Xen
doesn't support the PIRQ, but also if the IRQ requested doesn't fall
in the kernel's PIRQ space.
Keir Fraser [Wed, 5 Nov 2008 14:45:34 +0000 (14:45 +0000)]
linux: prevent invalid or unsupportable PIRQs from being used
By keeping the respective irq_desc[] entries pointing to no_irq_type,
setup_irq() (and thus request_irq()) will fail for such IRQs. This
matches native behavior, which also only installs ioapic_*_type out of
ioapic_register_intr().
At the same time, make assign_irq_vector() fail not only when Xen
doesn't support the PIRQ, but also if the IRQ requested doesn't fall
in the kernel's PIRQ space.
Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm
Platform firmware exports the support for Native C-state to OS
using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface
Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm
With Processor Native C-state, we use 'MWAIT' instruction on the
processor
to enter different C-states (C1, C2, C3). We won't use the
special IO
ports to enter C-state and no SMM mode etc required to enter
C-state.
Overall this will mean better C-state support.
One major advantage of using MWAIT for all C-states is, with this
and "treat interrupt as break event" feature of MWAIT, we can now get
accurate timing for the time spent in C1, C2, .. states.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Mon, 27 Oct 2008 10:43:45 +0000 (10:43 +0000)]
Fix IRQ-from-evtchn delivery so that softirq handling does not happen
while IRQ delivery is blocked. We do this by moving irq_enter/irq_exit
outside the mutual-exclusion region in evtchn_do_upcall(). We then
have to remove irq_enter/irq_exit from do_IRQ(), otherwise the
preempt_coutn check in the rcu code will always fail and we hang
during boot.
Thanks to Eduard Guzovsky of Stratus for help with this patch.