Keir Fraser [Thu, 18 Dec 2008 11:51:36 +0000 (11:51 +0000)]
netback: handle non-netback foreign pages
An SKB can contain pages which are foreign but not tracked by netback,
such as those created by gnttab_copy_grant_page when in
NETBK_DELAYED_COPY_SKB mode. These pages do not have a mapping field
which points to a valid offset in the pending_tx_info array.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 11 Dec 2008 13:38:48 +0000 (13:38 +0000)]
add hvc compatibility mode to xencons.
Makes switching back and forth with a pvops kernel easier. Taken from
http://lists.alioth.debian.org/pipermail/pkg-xen-devel/2008-October/002098.html
http://svn.debian.org/viewsvn/kernel?rev=12337&view=rev with thanks to
Bastian Blank.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Isaku Yamahata [Wed, 3 Dec 2008 02:38:32 +0000 (11:38 +0900)]
IA64: xencomm support for multi call with physdev_op and event_channel_op.
Recently the c/s of d545a95fca73 makes use of multi call
with __HYPERVISOR_event_channel_op and __HYPERVISOR_physdev_op.
This patch adds support of those hypercall.
Keir Fraser [Tue, 2 Dec 2008 11:54:47 +0000 (11:54 +0000)]
Fix buggy mask_base in saving/restoring MSI-X table during S3
Fix mask_base (actually MSI-X table base, copy name from native) to be
a virtual address rather than a physical address. And remove wrong
printk in pci_disable_msix.
Keir Fraser [Fri, 28 Nov 2008 13:07:36 +0000 (13:07 +0000)]
dom0 linux: Fix and cleanup reassigning memory resource code.
When we use PCI pass-through, we have to assign page-aligned resources
to device. To do this, we round up the alignment to PAGE_SIZE, if
device is specified by "reassigndev=" boot parameter.
"pdev_sort_resources" function uses the alignment. But it does not
round up the alignment to PAGE_SIZE. This patch makes
"pdev_sort_resources" function round up the alignment to PAGE_SIZE.
"pbus_size_mem" function round up the alignment of bridge's resource
window as well as that of normal resource. But we don't need to do
this. This patch makes "pbus_size_mem" function exclude bridges's
resource window.
This patch also cleanups code of reassigning memory resource.
Keir Fraser [Mon, 24 Nov 2008 11:04:54 +0000 (11:04 +0000)]
pciback: error handler for PCIE_AER.
This patch is the main implementation for enabling PCIE_AER handling,
adding related pci error handler in pciback and pcifront.
When a device sends a PCIE error message to the root port, it will
trigger an interrupt. The irq handler will then collect roor error
status register, then schedule a work to process the error based on
the error type.
If the error is non-correctable error (fatal or non-fatal), AER
service driver will call the callback funtions of the endpoint's
driver. For bridge, it will broadcast the error to the downstream
ports. Pciback error handler will be called accordingly. Pciback then
ask pcifront help to call the end-device driver for finally completing
the related pci error handling jobs.
Signed-off-by: Jiang Yunhong <yunhong.jiang@intel.com> Signed-off-by: Ke Liping <liping.ke@intel.com>
Keir Fraser [Wed, 19 Nov 2008 13:15:46 +0000 (13:15 +0000)]
linux/x86: remove broken HYPERVISOR_acm_op()
That hypercall apparently never really worked (it's being passed two
arguments, but the hypercall entry point code only loaded one, while
do_acm_op() again consumed two), appears to be pointless in the kernel
anyway, and there's been no __HYPERVISOR_acm_op for quite a while.
Keir Fraser [Tue, 18 Nov 2008 16:04:04 +0000 (16:04 +0000)]
linux, S3: dom0 doesn't need save ioapic state
Dom0 doesn't need to save/restore ioapic state across S3
suspend/resume, as Xen already does it. The more important
is to avoid warnings on some platforms which may have
uninitialized RTEs to be weird value (like smi mode) but
masked. When dom0 saves those entries and then write back
later, it's easy to trigger Xen's sanity check from
ioapic_guest_write.
Keir Fraser [Fri, 7 Nov 2008 17:04:20 +0000 (17:04 +0000)]
xen: Shouldn't remove device in pci_bus_probe_wrapper()
In pci_bus_probe_wrapper(), it adds (assign) a device to dom0 firstly,
but if pci_bus_probe() for the device fails (don't have driver), the
device will be removed (deassigned) from dom0. For PCIe-to-PCI
bridges, they are removed from dom0 when they are hooked by
pci_bus_probe_wrapper(). That's to say they are not mapped in VT-d
page table. Thus the PCI devices under these bridges cannot work. This
situation happens when install pciback module, because pciback will
probe these bridges and removed them from dom0. Built-in pciback won't
result in this problem due to these bridges (for example 00:1e.0) are
probed before their devices (for example 02:00.0). (When map a pci
device (02:00.0) to VT-d, it will also map its pcie-to-pci bridge
(00:1e.0) to VT-d)
So I think should not remove (deassign) devices from dom0 when
pci_bus_probe() fails. Each device which can DMA should be mapped in
VT-d when VT-d is enabled. But current code make it possible some
these devices are not mapped into VT-d.
From: Weidong Han <weidong.han@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 5 Nov 2008 15:43:55 +0000 (15:43 +0000)]
prevent invalid or unsupportable PIRQs from being used (v2)
By keeping the respective irq_desc[] entries pointing to no_irq_type,
setup_irq() (and thus request_irq()) will fail for such IRQs. This
matches native behavior, which also only installs ioapic_*_type out of
ioapic_register_intr().
At the same time, make assign_irq_vector() fail not only when Xen
doesn't support the PIRQ, but also if the IRQ requested doesn't fall
in the kernel's PIRQ space.
Keir Fraser [Wed, 5 Nov 2008 14:45:34 +0000 (14:45 +0000)]
linux: prevent invalid or unsupportable PIRQs from being used
By keeping the respective irq_desc[] entries pointing to no_irq_type,
setup_irq() (and thus request_irq()) will fail for such IRQs. This
matches native behavior, which also only installs ioapic_*_type out of
ioapic_register_intr().
At the same time, make assign_irq_vector() fail not only when Xen
doesn't support the PIRQ, but also if the IRQ requested doesn't fall
in the kernel's PIRQ space.
Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm
Platform firmware exports the support for Native C-state to OS
using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface
Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm
With Processor Native C-state, we use 'MWAIT' instruction on the
processor
to enter different C-states (C1, C2, C3). We won't use the
special IO
ports to enter C-state and no SMM mode etc required to enter
C-state.
Overall this will mean better C-state support.
One major advantage of using MWAIT for all C-states is, with this
and "treat interrupt as break event" feature of MWAIT, we can now get
accurate timing for the time spent in C1, C2, .. states.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
Keir Fraser [Mon, 27 Oct 2008 10:43:45 +0000 (10:43 +0000)]
Fix IRQ-from-evtchn delivery so that softirq handling does not happen
while IRQ delivery is blocked. We do this by moving irq_enter/irq_exit
outside the mutual-exclusion region in evtchn_do_upcall(). We then
have to remove irq_enter/irq_exit from do_IRQ(), otherwise the
preempt_coutn check in the rcu code will always fail and we hang
during boot.
Thanks to Eduard Guzovsky of Stratus for help with this patch.
Keir Fraser [Fri, 17 Oct 2008 11:01:56 +0000 (12:01 +0100)]
dom0 linux: Fix issue on reassigning resources to PCI-PCI bridge.
This patch fixes the issue on reassigning resources to PCI-PCI bridge,
which was found by Zhao, Yu.
Current "quirk_align_mem_resources" updates Base/Limit register of
PCI-PCI bridge, if IORESOURCE_MEM is set in "dev->resource[i].flags".
But, when "quirk_align_mem_resources" is called,
dev->resource[i].flags
is not initialized, because "quirk_align_mem_resources" is called
before "pci_read_bridge_bases". As a result, current code does not
update Base/Limit register.
This patch sets All F to Base register and sets 0 to Limit register,
regardless of "dev->resource[i].flags". After that,
"pci_assign_unassigned_resources" calculates resource window size and
assigns resource to PCI-PCI bridge.
Keir Fraser [Fri, 10 Oct 2008 08:58:50 +0000 (09:58 +0100)]
This patch adds the power management support to ahci driver.
And it is necessary for S3.
It is back-ported from linux kernel mainline tree.
More precisely, the patch is the diff between the commit c1332875cbe0c148c7f200d4f9b36b64e34d9872 and tag v2.8.18.
[PATCH] ahci: separate out ahci_reset_controller() and
ahci_init_controller()
Separate out ahci_reset_controller() and ahci_init_controller()
from
ata_host_init(). These will be used by PM callbacks. This patch
doesn't introduce any behavior change.
[PATCH] libata: improve driver initialization and deinitialization
Implement ahci_[de]init_port() and use it during initialization
and
de-initialization. ahci_[de]init_port() are supersets of what
used to
be done during driver [de-]initialization. This patch makes the
following behavior changes.
* Per-port IRQ mask is cleared on driver load as done in other
drivers. The mask will be configured properly during probe.
* During init_one(), HOST_IRQ_STAT is cleared after masking port
IRQs
such that there is no race window.
* CMD_SPIN_UP is cleared during init_one() instead of being set.
It
is set in port_start(). This is more consistent with overall
structure of initialization. Note that CMD_SPIN_UP simply
controls
PHY activation.
* Slumber and staggered spin-up are handled properly.
* All init/deinit operations are done in step-by-step manner as
described in the spec instead of issued as single merged
command.
Original implementation is from Zhao, Forrest
<forrest.zhao@intel.com>
Simplify ahci_start_engine() by killing prerequisite condition
checks.
Rationales are..
* No user checks error return from ahci_start_engine()
* Code flow guarantees the prerequisite conditions unless the
controller is malfunctioning. In such cases, the driver had
chances
to learn about the problem _before_ calling this function.
* Closely related to the above two, driver calls into this
function
even when prerequisites fail hoping for the best.
Basically, ahci_start_engine() should only do the operation
itself.
It isn't the right place to check for prerequisites.
* move ahci_port_start/stop() below EH functions. This makes ahci
more consistent with other drivers and makes prototypes for
ahci_start/stop_engine() unnecessary.
* swap positions between ahci_start_engine() and
ahci_stop_engine()
for readability.
[PATCH] The redefinition of ahci_start_engine() and
ahci_stop_engine()
- Make ahci_start_engine() and ahci_stop_engine() more consistent
with
AHCI spec 1.1
- Change their input parameter from ap to port_mmio
- Update the existing users of ahci_start_engine() and
ahci_stop_engine()
Signed-off-by: Forrest Zhao <forrest.zhao@intel.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
=============
Some of the commits above may not be directly related to the ahci pm
problem, but they lay the basic ground for the final commit.
Keir Fraser [Thu, 9 Oct 2008 10:10:43 +0000 (11:10 +0100)]
xen/dom0: Reassign memory resources to device for pci passthrough.
This patch adds the function that reassign page-aligned memory
resources, to dom0 linux. The function is useful when we assign I/O
device to HVM domain using pci passthrough.
When we assign a device to HVM domain using pci passthrough,
the device needs to be assigned page-aligned memory resources. If the
memory resource is not page-aligned, following error occurs.
Error: pci: 0000:00:1d.7: non-page-aligned MMIO BAR found.
On many system, BIOS assigns memory resources to the device and
enables it. So my patch disables the device, and releases resources,
Then it assigns page-aligned memory resource to the device.
To reassign resources, please add boot parameters of dom0 linux as
follows.
reassign_resources reassigndev=00:1d.7,01:00.0
reassign_resources
Enables reassigning resources.
reassigndev= Specifies devices include I/O device and
PCI-PCI
bridge to reassign resources. PCI-PCI bridge
can be specified, if resource windows need to
be expanded.
Keir Fraser [Thu, 9 Oct 2008 09:11:13 +0000 (10:11 +0100)]
dom0: Fix bad pte at booting time
Backport upstream kernel patch to fix Dom0's bad pte bug.
- In Dom0 kernel, at boot time, system will call bt_ioremap() to do
mappings for the Boot Time Fix Memory region. Also system will call
bt_iounmap() to unmap the memory region by setting phys=3D0. In this
case, system will encounter pte_ERROR(). This patch backports the
upstream kernel patch by Ingo Molnar <mingo@elte.hu>, with commit: 70c9f590ffc3f959cc81c1a3cecb6b8133caf35d
[PATCH] i386: Don't delete cpu_devs data to identify different x86
types in late_initcall
In arch/i386/cpu/common.c there is:
cpu_devs[X86_VENDOR_INTEL]
cpu_devs[X86_VENDOR_CYRIX]
cpu_devs[X86_VENDOR_AMD]
...
They are all filled with data early.
The data (struct) got set to NULL for all, but Intel in different
late_initcall (exit_cpu_vendor) calls.
I don't see what sense this makes at all, maybe something that got
forgotten with the HOTPLUG_CPU extenstions?
Please check/review whether initdata, cpuinitdata is still ok and
this still works with HOTPLUG_CPU and without, it should...
Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: davej@redhat.com
[PATCH] i386: mark cpu init functions as __cpuinit, data as
__cpuinitdata
Mark i386-specific cpu init functions as __cpuinit. They are all
only called from arch/i386/common.c:identify_cpu() that already is
marked as __cpuinit. This patch also removes the empty function
init_umc().
Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
[PATCH] i386: mark cpu cache functions as __cpuinit
Mark i386-specific cpu cache functions as __cpuinit. They are all
only called from arch/i386/common.c:display_cache_info() that
already is marked as __cpuinit.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
[PATCH] i386: mark cpu_dev structures as __cpuinitdata
The different cpu_dev structures are all used from __cpuinit
callers what I can tell. So mark them as __cpuinitdata instead of
__initdata. I am a little bit unsure about
arch/i386/common.c:default_cpu, especially when it
comes to the purpose of this_cpu.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
Keir Fraser [Fri, 3 Oct 2008 08:39:37 +0000 (09:39 +0100)]
linux: restrict IRQ probing
Since IRQ probing may touch all currently unused interrupts, we must
prevent probing for those where it doesn't make sense (to avoid
triggering BUG()s or de-referencing NULL function pointers):
- dynamic IRQs must never be probed
- physical IRQs should only be probed when registered or
identity-mapped
Keir Fraser [Thu, 2 Oct 2008 10:29:02 +0000 (11:29 +0100)]
xen: fix kdump kernel crash on Xen3.2
The kernel is supposed to create some "Crash note" resources (children
of the "Hypervisor code and data" resource in /proc/iomem). However,
when running on Xen 3.2, xen_machine_kexec_setup_resources()
encounters an error and returns prior to doing this.
The error occurs when it calls the "kexec_get_range" hypercall to
determine the location of the "vmcoreinfo". This was only implemented
in Xen 3.3.
This patch makes the kernel handle this error gracefully by simply not
creating the sysfs file "hypervisor/vmcoreinfo" if the hypervisor is
unable to provide the info - rather than bailing out of
xen_machine_kexec_setup_resources() early.
Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
The udp6_sendmsg function uses a shared buffer to store the
flow without taking any locks. This leads to races with SMP.
This patch moves the flowi object onto the stack.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
UDP tracks corking status through the pending variable. The
IP layer also tracks it through the socket write queue. It
is possible for the two to get out of sync when MSG_PROBE is
used.
This patch changes UDP to check the write queue to ensure
that the two stay in sync.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement support within the AML interpreter for
package objects that contain a mismatch between the AML
length and package element count. In this case, the lesser
of the two is used. Some BIOS code apparently modifies
the package length on the fly, and this change supports
this. Provides compatibility with the MS AML interpreter.
Signed-off-by: Alexey Starikovskiy
<alexey.y.starikovskiy@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
acpi: fix processor handling in presence of external control
- avoid leaking stuff in acpi_processor_remove()
- remove a pointless change to native code in acpi_processor_hotplug()
(struct acpi_processor's id field is unsigned)
- don't set up processor_extcntl_ops when nothing controlled by Xen
(thus processor_cntl_external() will always return false, allowing
ACPI code to retain native behavior)
Dom0 ACPI: handle I/O access width that are not a multiple of 8 bits
Back ported following patch from upstream to support 4-bit access
width which is required by T-state control.
Below are original commit comments from upstream.
ACPI: Handle I/O access width requestst that are not a multiple of
8 bits.
We've run into BIOS that hand us 4-bit access width requests
for T-state control when the code expected only multipls of
8-bits.
Round up.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Wei Gang <gang.wei@intel.com>
* Remove dynirq/pirq_to/from_irq() macros. It's clearer to use *_BASE
and NR_* macros directly.
* Avoid and fix confusion between a Linux 'pirq' and a Xen
'pirq'. This is basically done by avoiding the notion of a Linux
'pirq' at all.
* Fix IA64 build.