]> xenbits.xensource.com Git - legacy/linux-2.6.18-xen.git/log
legacy/linux-2.6.18-xen.git
15 years agousbfront: fix compile error and disable PM feature
Keir Fraser [Thu, 8 Oct 2009 07:53:36 +0000 (08:53 +0100)]
usbfront: fix compile error and disable PM feature

Fix the compilation error of usbfront, and disable bus suspend/resume
by default.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
15 years agoPVUSB: Fixes and updates
Keir Fraser [Wed, 7 Oct 2009 07:42:00 +0000 (08:42 +0100)]
PVUSB: Fixes and updates

- xenbus state flow changed.
  Whole of the flow is changed to be like netback/netfront.
  Reconfiguring/Reconfiguring are removed.

- New RING for hotplug notification added.

- USBIF_MAX_SEGMENTS_PER_REQUEST value is changed (10) to (16).
  According to this change, RING_SIZE is decreased from 32 to 16.
  This affects the performance. My flash drive's read throughput
  was dropped from 29MB/s to 18MB/s in the linux environment.
  However, Windows guest send urb with 64kB buffer(64KB = 4kB * 16).
  This is required.

- New port-setting interface
  xenbus_watch_path2 is added to usbback, port-setting interface
  is moved from sysfs to xenstore.
  Now, the port-rule is directly written to xenstore entry.
  Example.
  # xenstore-write /local/domain/0/backend/vusb/1/0/port/1 "2-1"
    (adding physical bus 2-1 to vusb-1-0 port 1)

- urb dequeue function completed.
  usbfront send unlink-request to usbback, and can cancel the urb
  that is submitted in the backend.

- New USB Spec version (USB1.1/USB2.0) selection support.
  usbfront can act as both USB1.1 and USB2.0 virtual host controller
  according to the xenstore entry key "usb-ver".

- experimental bus_suspend/bus_resume added to usbfront.

- various cleanups, bugfix, refactoring and codestyle-fix.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
15 years agoxen: re-synchronize ring.h public header
Keir Fraser [Wed, 7 Oct 2009 06:33:40 +0000 (07:33 +0100)]
xen: re-synchronize ring.h public header

Patch 20267:e9366bed077e modified the definition of sring in the xen
repo's version of ring.h, but not the version in the linux kernel
repo. That change broke pause/resume/shutdown messages from the
blktap2 kernel module, which (for the time being) relies on pad[0]
being at consistent location in the sring struct.  This patch fixes
this regression by resyncronizing the two the files.

Signed off by: Jake Wires <Jake.Wires@citrix.com>

15 years agomce: support machine check logging left over from previous reset
Keir Fraser [Tue, 29 Sep 2009 10:23:06 +0000 (11:23 +0100)]
mce: support machine check logging left over from previous reset

Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
15 years agoxen/usb: force proper address translation in USB monitor
Keir Fraser [Tue, 22 Sep 2009 07:04:07 +0000 (08:04 +0100)]
xen/usb: force proper address translation in USB monitor

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoxen/edac: force proper address translation in EDAC
Keir Fraser [Tue, 22 Sep 2009 07:03:39 +0000 (08:03 +0100)]
xen/edac: force proper address translation in EDAC

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoFix BUG in unlock_cpu_hotplug().
Keir Fraser [Sun, 30 Aug 2009 07:54:15 +0000 (08:54 +0100)]
Fix BUG in unlock_cpu_hotplug().

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxen/x86: make do_settimeofday() return -EPERM when clock can't be changed
Keir Fraser [Tue, 25 Aug 2009 13:55:22 +0000 (14:55 +0100)]
xen/x86: make do_settimeofday() return -EPERM when clock can't be changed

Rather than returning success here (without actually having done
anything), it seems more appropriate/conforming to let the caller know
that what he intended to do didn't succeed.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoixgbe: memset size in netif_napi_del()
Keir Fraser [Wed, 19 Aug 2009 12:00:40 +0000 (13:00 +0100)]
ixgbe: memset size in netif_napi_del()

By inspection the memset appears to be long as napi->poll_dev
is a struct net_device not a struct napi_struct.

Cc: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
15 years agoRevert 927:56df01ffed10
Keir Fraser [Sun, 16 Aug 2009 07:42:29 +0000 (08:42 +0100)]
Revert 927:56df01ffed10

15 years agodrivers/xen/blkback: Add the kernel side of the blkback queueing feature.
Keir Fraser [Fri, 14 Aug 2009 16:28:43 +0000 (17:28 +0100)]
drivers/xen/blkback: Add the kernel side of the blkback queueing feature.

This is similar to the credit scheduler used in netif, except that it
allows occasional burstability (for use with e2fsck as an example).

Signed-off-by: William Pitcock <nenolod@dereferenced.org>
15 years agonet: Fix NULL pointer deref of sock->ops->Sendpage in sock_sendpage().
Keir Fraser [Fri, 14 Aug 2009 16:05:54 +0000 (17:05 +0100)]
net: Fix NULL pointer deref of sock->ops->Sendpage in sock_sendpage().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoPCI x86: always use conf1 to access config space below 256 bytes
Keir Fraser [Fri, 14 Aug 2009 09:54:33 +0000 (10:54 +0100)]
PCI x86: always use conf1 to access config space below 256 bytes

Back-ported to 2.6.18.8 by Simon Horman

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
15 years agommconfig: Fix x86_64 ioremap base_address
Keir Fraser [Fri, 14 Aug 2009 09:53:56 +0000 (10:53 +0100)]
mmconfig: Fix x86_64 ioremap base_address

Current mmconfig has some problems of remapped range.

a) In the case of broken MCFG tables on Asus etc., we need to remap
   256M range, but currently only remap 1M.

b) The base address always corresponds to bus number 0, but currently
   we are assuming it corresponds to start bus number.

This patch fixes the above problems.

(akpm: Arjan suggests that if the MCFG table is broken we just
shouldn't use it, rather than try to work around things).

Back-ported to 2.6.18 by Simon Horman

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
15 years agoxen/x86-64: fix Dom0 boot on AMD K8 CPUs
Keir Fraser [Wed, 5 Aug 2009 11:05:34 +0000 (12:05 +0100)]
xen/x86-64: fix Dom0 boot on AMD K8 CPUs

The workaround in question here should be (and is being) applied by
the hypervisor (which doesn't allow any guest - including Dom0 - to
write other than all zeroes or all ones into MCi_CTL).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoxen-blkfront: beyond ARRAY_SIZE of info->shadow
Keir Fraser [Wed, 29 Jul 2009 08:21:40 +0000 (09:21 +0100)]
xen-blkfront: beyond ARRAY_SIZE of info->shadow

Import upstream pv-ops change
b9ed7252d219c1c663944bf03846eabb515dbe75:

Do not go beyond ARRAY_SIZE of info->shadow
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoblktap2: properly suppress compiler warning
Keir Fraser [Tue, 28 Jul 2009 15:29:11 +0000 (16:29 +0100)]
blktap2: properly suppress compiler warning

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoblktap2: make blktap2 work for auto translated mode with hvm domain.
Keir Fraser [Fri, 24 Jul 2009 09:28:33 +0000 (10:28 +0100)]
blktap2: make blktap2 work for auto translated mode with hvm domain.

This patch makes blktap2 work for hvm domain with auto translated
mode. (I.e. IA64 HVM domain case as Kuwamura reported its bug.)

blktap2 has introduces new feature that pages from the self domain
can be handled. However it doesn't work for auto translated mode
because blktap2 relies on p2m table manipulation. But the p2m
doesn't make sense for auto translated mode.
So self grant mapping is used instead.

Just passing same page to blktap2 daemon doesn't work because
when doing io, the page is locked, so the given page from blktap2
block device is already locked. When blktap2 daemon issues IO on
the page, it tries to lock it resulting in dead lock.
So resorted to self grant.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agolinux/x86-64: MCE: truely do Dom0 stuff only on Dom0
Keir Fraser [Mon, 20 Jul 2009 09:11:23 +0000 (10:11 +0100)]
linux/x86-64: MCE: truely do Dom0 stuff only on Dom0

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoblkback: pagemap bug fixes
Keir Fraser [Mon, 20 Jul 2009 09:03:44 +0000 (10:03 +0100)]
blkback: pagemap bug fixes

Signed-off-by: Jake Wires <jake.wires@citrix.com>
15 years agobuildconfigs: INPUT_EVDEV=y as default for xen0_x86
Keir Fraser [Wed, 15 Jul 2009 08:10:37 +0000 (09:10 +0100)]
buildconfigs: INPUT_EVDEV=y as default for xen0_x86

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxen: fix missing Crash note in /proc/iomem
Keir Fraser [Mon, 13 Jul 2009 10:55:13 +0000 (11:55 +0100)]
xen: fix missing Crash note in /proc/iomem

Missing "Crash note" in /proc/iomem (dom 0) happens on the Xen 3.4.*.
This causes a crash dump cannot be analyzed normally.

Signed-off-by: Itsuro Oda <oda@valinux.co.jp>
15 years agoxen/x86: allow non-SMP builds of blktap2 to succeed
Keir Fraser [Fri, 10 Jul 2009 10:01:34 +0000 (11:01 +0100)]
xen/x86: allow non-SMP builds of blktap2 to succeed

c/s 893 introduced a regression here, since xen_invlpg_all() is an
SMP-only function.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: Handle dynamic Cx state changes correctly.
Keir Fraser [Mon, 6 Jul 2009 14:20:52 +0000 (15:20 +0100)]
x86: Handle dynamic Cx state changes correctly.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoblktap2: remove warnings.
Keir Fraser [Mon, 29 Jun 2009 09:57:46 +0000 (10:57 +0100)]
blktap2: remove warnings.

This patch removes the following warnings on ia64.

> linux-2.6.18-xen.hg/drivers/xen/blktap2/device.c: In function
  'blktap_device_finish_request':
> linux-2.6.18-xen.hg/drivers/xen/blktap2/device.c:403: warning:
  format '%lld' expects type 'long long int', but argument 7 has type 'uint64_t'
> linux-2.6.18-xen.hg/drivers/xen/blktap2/sysfs.c: In function
  'blktap_sysfs_debug_device':
> linux-2.6.18-xen.hg/drivers/xen/blktap2/sysfs.c:276: warning: format
  '%llu' expects type 'long long unsigned int', but argument 4 has type
  'uint64_t'

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years ago[IA64] Build blktap2 driver by default in x86 builds.
Isaku Yamahata [Mon, 29 Jun 2009 03:09:16 +0000 (12:09 +0900)]
[IA64] Build blktap2 driver by default in x86 builds.

add CONFIG_XEN_BLKDEV_TAP2=y to buildconfigs/linux-defconfig_xen_ia64.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years ago[IA64] xencomm: tmem support.
Isaku Yamahata [Mon, 29 Jun 2009 02:23:16 +0000 (11:23 +0900)]
[IA64] xencomm: tmem support.

add tmem support to xencomm.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years ago[IA64] xencomm: add XENVER_commandline support.
Isaku Yamahata [Mon, 29 Jun 2009 02:22:41 +0000 (11:22 +0900)]
[IA64] xencomm: add XENVER_commandline support.

add XENVER_commandline support to xencomm.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agoxenbus: fix timeout with PV guest and physical CDROM
Keir Fraser [Tue, 23 Jun 2009 10:12:38 +0000 (11:12 +0100)]
xenbus: fix timeout with PV guest and physical CDROM

Specifying a physical CDROM in the configuration of a PV guest, like

    disk =3D ['tap:aio:/....,xvda,w', 'phy:/dev/cdrom,hdc:cdrom,r' ]

will cause the 300 seconds timeout to occur if there is no physical
CDROM in the tray.  The bug is due to the device being Closed (as shown by
the timeout message) but not ready.  The configuration is quite bogus, but
this is a regression from when the timeout was 10 seconds only, and
the fix is easy and safe: only check is_ready for connected devices.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
15 years agox86-64: do not pass unmanageable amounts of memory to Dom0
Keir Fraser [Thu, 18 Jun 2009 09:32:16 +0000 (10:32 +0100)]
x86-64: do not pass unmanageable amounts of memory to Dom0

Due to address space restrictions it is not possible to successfully
pass more than about 500Gb to a Linux Dom0 unless its kernel specifies
a non-default phys-to-machine map location via XEN_ELFNOTE_INIT_P2M.

For non-Linux Dom0 kernels I can't say whether the limit could be set
to close to 1Tb, but since passing such huge amounts of memory isn't
very useful anyway (and can be enforced via dom0_mem=3D), the patch
doesn't attempt to guess the kernel type and restricts the memory
amount in all cases.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoTranscendent memory ("tmem") for Linux
Keir Fraser [Thu, 18 Jun 2009 09:24:18 +0000 (10:24 +0100)]
Transcendent memory ("tmem") for Linux

Tmem, when called from a tmem-capable (paravirtualized) guest, makes
use of otherwise unutilized ("fallow") memory to create and manage
pools of pages that can be accessed from the guest either as
"ephemeral" pages or as "persistent" pages. In either case, the pages
are not directly addressible by the guest, only copied to and fro via
the tmem interface. Ephemeral pages are a nice place for a guest to
put recently evicted clean pages that it might need again; these pages
can be reclaimed synchronously by Xen for other guests or other uses.
Persistent pages are a nice place for a guest to put "swap" pages to
avoid sending them to disk. These pages retain data as long as the
guest lives, but count against the guest memory allocation.

This patch contains the Linux paravirtualization changes to
complement the tmem Xen patch (xen-unstable c/s 19646). It
implements "precache" (ext3 only as of now), "preswap",
and limited "shared precache" (ocfs2 only as of now) support.
CONFIG options are required to turn on
the support (but in this patch they default to "y").  If
the underlying Xen does not have tmem support or has it
turned off, this is sensed early to avoid nearly all
hypercalls.

Lots of useful prose about tmem can be found at
http://oss.oracle.com/projects/tmem

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
15 years agoxen/x86: Fix build failure after platform_op header changes.
Keir Fraser [Wed, 17 Jun 2009 08:07:23 +0000 (09:07 +0100)]
xen/x86: Fix build failure after platform_op header changes.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agolinux: follow-up adjustments for platform.h interface header change
Keir Fraser [Wed, 17 Jun 2009 06:26:52 +0000 (07:26 +0100)]
linux: follow-up adjustments for platform.h interface header change

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoSync Xen public interface headers with 19775:bda5ab0cb387
Keir Fraser [Wed, 17 Jun 2009 06:26:00 +0000 (07:26 +0100)]
Sync Xen public interface headers with 19775:bda5ab0cb387

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agovideo/sstfb: Fix #elif->#else typo
Keir Fraser [Tue, 16 Jun 2009 11:00:56 +0000 (12:00 +0100)]
video/sstfb: Fix #elif->#else typo
Signed-off-by: Keir Fraser <keir.fraser@eu.citrix.com>
15 years agox86: add MCA logging support in DOM0
Keir Fraser [Tue, 16 Jun 2009 10:58:55 +0000 (11:58 +0100)]
x86: add MCA logging support in DOM0

When an MCE/CMCI error happens (or by polling), the related error
information will be sent to DOM0 by XEN. This patch will help to fetch
the xen-logged information by hypercall and then convert XEN-format
log into Linux format MCELOG. It makes using current available mcelog
tools for native Linux possible.

With this patch, after mce/cmci error log information is sent to DOM0,
running mcelog tools in DOM0, you will get same detailed decoded mce
information as in Native Linux.

Signed-Off-By: Liping Ke <liping.ke@intel.com>
Signed-Off-By: Yunhong Jiang <yunhong.jiang@intel.com>
Acked-By: Jan Beulich <jbeulich@novell.com>
15 years agoblktap: Indirection in vm_area_struct->vm_private_data
Keir Fraser [Tue, 16 Jun 2009 10:09:39 +0000 (11:09 +0100)]
blktap: Indirection in vm_area_struct->vm_private_data

The recent patch in linux-2.6.18.hg (878: eba6fe6d8d53) changed the
way that the foreign map is stored in vm_area_struct. Currently blktap
(not 2) implementation is internally inconsistent, which triggers
kernel bug when tap:aio disk is used (dump attached at the end of the
email).

Signed-off-by: Grzegorz Milos <gm281@cam.ac.uk>
15 years agoblktap2: fix compiler further warnings
Keir Fraser [Tue, 16 Jun 2009 10:07:19 +0000 (11:07 +0100)]
blktap2: fix compiler further warnings

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoFix Makefile.xen generation when building external modules
Keir Fraser [Tue, 16 Jun 2009 10:06:10 +0000 (11:06 +0100)]
Fix Makefile.xen generation when building external modules

Otherwise, the file will be (attempted to be) put in the (possibly
read-only) source tree.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agopci: fix pcie-aer recovery mechanism defects.
Keir Fraser [Mon, 8 Jun 2009 11:23:24 +0000 (12:23 +0100)]
pci: fix pcie-aer recovery mechanism defects.

When aer error happening, if the device is not hided or assigned,
exit. If the device is assigned yet not connected by PV guest or is
owned by HVM guest, kill the guest. [sh_info is NULL]

Signed-Off-By: Liping Ke <liping.ke@intel.com>
Signed-Off-By: Yunhong Jiang <yunhong.jiang@intel.com>
15 years agoballoon: try harder to balloon up under memory pressure.
Keir Fraser [Fri, 5 Jun 2009 13:01:20 +0000 (14:01 +0100)]
balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
15 years agoblktap2: use blk_rq_map_sg() here too
Keir Fraser [Thu, 4 Jun 2009 09:46:54 +0000 (10:46 +0100)]
blktap2: use blk_rq_map_sg() here too

Just like in blkfront, not doing so can cause the maximum number of
segments check to trigger.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agopci/guestdev, iomul: use strlcpy()
Keir Fraser [Thu, 4 Jun 2009 09:45:49 +0000 (10:45 +0100)]
pci/guestdev, iomul: use strlcpy()

use strlcpy() to make them robust.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agolinux: fix blkback/blktap2 interaction
Keir Fraser [Thu, 4 Jun 2009 09:33:52 +0000 (10:33 +0100)]
linux: fix blkback/blktap2 interaction

blkback's page map code needs to be accessible to both blkback and
blktap2, irrespective of whether either or both are modules. The
most immediate solution is to break it out into a separate, library-
like component that doesn't need building if either of the two
consumers is configured off, and that gets built as a module if both
consumers are modules.

Also fix the dummy implementation of blkback_pagemap_read(), since
using BUG() there doesn't compile.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agolinux/blktap2: reduce TLB flush scope
Keir Fraser [Thu, 4 Jun 2009 09:32:57 +0000 (10:32 +0100)]
linux/blktap2: reduce TLB flush scope

c/s 885 added very coarse TLB flushing. Since these flushes always
follow single page updates, single page flushes (when available) are
sufficient.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agolinux/blktap2: allow to build as module
Keir Fraser [Thu, 4 Jun 2009 09:32:34 +0000 (10:32 +0100)]
linux/blktap2: allow to build as module

... and also allow to interact with blkback when that's also built as
a module.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoDom0 PCI: fix SR-IOV function dependency link problem
Keir Fraser [Wed, 3 Jun 2009 10:22:24 +0000 (11:22 +0100)]
Dom0 PCI: fix SR-IOV function dependency link problem

PCIe Root Complex Integrated Endpoint does not implement ARI, so this
kind of endpoint uses 3-bit function number. The function dependency
link of the integrated endpoint should be calculated using the device
number field in conjunction with the value from function dependency
link register.

Normal SR-IOV endpoint always implements ARI and the function
dependency link register contains 8-bit function number (i.e. `devfn'
from software perspective).

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
15 years agoDom0 PCI: fix a regression introduced by the SR-IOV change
Keir Fraser [Wed, 3 Jun 2009 10:21:52 +0000 (11:21 +0100)]
Dom0 PCI: fix a regression introduced by the SR-IOV change

The device class may be changed during the early fixup. So need to
re-read the device class from pci_dev after the fixup.

The patch "PCI: centralize device setup code" (c/s 825) wrongly
cleaned up the device class re-read. This patch reverts that change.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
15 years agoBuild blktap2 driver by default in x86 builds.
Keir Fraser [Tue, 2 Jun 2009 22:43:55 +0000 (23:43 +0100)]
Build blktap2 driver by default in x86 builds.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxen/x86-64: fix phys_pmd_init() (regression from c/s 547)
Keir Fraser [Fri, 29 May 2009 08:17:16 +0000 (09:17 +0100)]
xen/x86-64: fix phys_pmd_init() (regression from c/s 547)

I didn't pay attention to the fact that 'end' must always be an upper
bound, while xen_start_info->nr_pages must be additionally during
boot.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agolinux/blktap2: fix compiler warnings
Keir Fraser [Fri, 29 May 2009 08:16:37 +0000 (09:16 +0100)]
linux/blktap2: fix compiler warnings

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoblktap2: introduce blktap2 dedicated config.
Keir Fraser [Thu, 28 May 2009 09:05:02 +0000 (10:05 +0100)]
blktap2: introduce blktap2 dedicated config.

Introduce CONFIG_XEN_BLKDEV_TAP2 instead of CONFIG_XEN_BLKDEV_TAP.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agoblktap2: add tlb flush properly.
Keir Fraser [Thu, 28 May 2009 09:04:26 +0000 (10:04 +0100)]
blktap2: add tlb flush properly.

xen_invlpg() flushes tlb on its cpu, but tlb flush is needed on
all cpus. So replace xen_invlpg() with more proper ones.
Maybe it would be possible to make tlb flush less.
this patch also makes blktap2 compile on ia64 because xen_invlpg()
is x86 specific.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agolinux/pci: reserve io/memory space for bridge
Keir Fraser [Thu, 28 May 2009 09:00:03 +0000 (10:00 +0100)]
linux/pci: reserve io/memory space for bridge

reserve io/memory space for bridge which will be used later
by PCI hotplug.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agopci/guestdev: enhance guestdev to accept +iomul.
Keir Fraser [Thu, 28 May 2009 08:59:18 +0000 (09:59 +0100)]
pci/guestdev: enhance guestdev to accept +iomul.

enhance guestdev to accept +iomul and use it.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agoPCI pass through: PCIe IO space multiplexing
Keir Fraser [Thu, 28 May 2009 08:57:49 +0000 (09:57 +0100)]
PCI pass through: PCIe IO space multiplexing

This is required for more than 16 HVM domain to boot from
PCIe pass through device.

Linux as dom0 exclusively assigns IO space to downstream PCI bridges
and the assignment unit of PCI bridge IO space is 4K. So the only up
to 16 PCIe device can be accessed via IO space within 64K IO ports.
PCI expansion ROM BIOS often uses IO port access to boot from the
device, so on virtualized environment, it means only up to 16 guest
domain can boot from pass-through device.

This patch allows PCIe IO space sharing of pass-through device.
- reassign IO space of PCIe devices specified by
  "guestiomuldev=[<segment>:]<bus>:<dev>[,[<segment:><bus>:dev]][,...]"
  to be shared.
  This is implemented as Linux PCI quirk fixup.

  The sharing unit is PCIe switch. Ie IO space of the end point
  devices under the same switch will be shared. If there are more than
  one switches, two areas of IO space will be used.

- And the driver which arbitrates the accesses to the multiplexed PCIe
  IO space. Later qemu-dm will use this.

Limitation:
IO port of IO shared devices can't be accessed from dom0 Linux device
driver.  But this wouldn't be a big issue because PCIe specification
discourages the use of IO space and recommends that IO space should be
used only for bootable device with ROM code. OS device driver should
work without IO space access.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
15 years agopcie io space multiplex: backport of bus event notification patch
Keir Fraser [Thu, 28 May 2009 08:56:11 +0000 (09:56 +0100)]
pcie io space multiplex: backport of bus event notification patch

back port of 116af378201ef793424cd10508ccf18b06d8a021 and
ec0676ee28528dc8dda13a93ee4b1f215a0c2f9d.

commit 116af378201ef793424cd10508ccf18b06d8a021
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date:   Wed Oct 25 13:44:59 2006 +1000

    Driver core: add notification of bus events

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ec0676ee28528dc8dda13a93ee4b1f215a0c2f9d
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Fri Dec 5 14:10:31 2008 -0500

    Driver core: move the bus notifier call points

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
15 years agoUpgrade forcedeth net driver to 0.62 (driver package v1.25)
Keir Fraser [Thu, 28 May 2009 08:53:22 +0000 (09:53 +0100)]
Upgrade forcedeth net driver to 0.62 (driver package v1.25)

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agonetback: optionally return TX responses out-of-order
Keir Fraser [Wed, 27 May 2009 10:21:00 +0000 (11:21 +0100)]
netback: optionally return TX responses out-of-order

Add a mode to netback in which it tries to return TX responses in a
slightly less predictable order.  This can make some otherwise
hardware-specific frontend bugs reproduce much more easily, which is
obviously rather useful when trying to fix them.  It'd also be quite
useful for making sure they didn't happen in the first place.

Randomisation is only done if a module parameter is set, and defaults
to off.  It certainly isn't something you'd ever want to run with in
production, but it might be useful for other people developing
frontend drivers.  I don't know if that's considered an adequate
reason to apply it, but, if anyone wants it, here it is.

Signed-off-by: Steven Smith <steven.smith@eu.citrix.com>
15 years agoblktap2: a completely rewritten blktap implementation
Keir Fraser [Tue, 26 May 2009 10:23:16 +0000 (11:23 +0100)]
blktap2: a completely rewritten blktap implementation

Benefits to blktap2 over the old version of blktap:

* Isolation from xenstore - Blktap devices are now created directly on
   the linux dom0 command line, rather than being spawned in response
   to XenStore events.  This is handy for debugging, makes blktap
   generally easier to work with, and is a step toward a generic
   user-level block device implementation that is not Xen-specific.

* Improved tapdisk infrastructure: simpler request forwarding, new
   request scheduler, request merging, more efficient use of AIO.

* Improved tapdisk error handling and memory management.  No
   allocations on the block data path, IO retry logic to protect
   guests
   transient block device failures.  This has been tested and is known
   to work on weird environments such as NFS soft mounts.

* Pause and snapshot of live virtual disks (see xmsnap script).

* VHD support.  The VHD code in this release has been rigorously
   tested, and represents a very mature implementation of the VHD
   image
   format.

* No more duplication of mechanism with blkback.  The blktap kernel
   module has changed dramatically from the original blktap.  Blkback
   is now always used to talk to Xen guests, blktap just presents a
   Linux gendisk that blkback can export.  This is done while
   preserving the zero-copy data path from domU to physical device.

These patches deprecate the old blktap code, which can hopefully be
removed from the tree completely at some point in the future.

Signed-off-by: Jake Wires <jake.wires@citrix.com>
Signed-off-by: Dutch Meyer <dmeyer@cs.ubc.ca>
15 years agoPV-on-HVM: xenbus - check HAVE_UNLOCKED_IOCTL for old Linux kernels.
Keir Fraser [Tue, 26 May 2009 08:53:55 +0000 (09:53 +0100)]
PV-on-HVM: xenbus - check HAVE_UNLOCKED_IOCTL for old Linux kernels.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
15 years agoxenbus: Allow lazy init in case xenstored runs in a separate minios domain.
Keir Fraser [Tue, 19 May 2009 13:45:50 +0000 (14:45 +0100)]
xenbus: Allow lazy init in case xenstored runs in a separate minios domain.

Here's an explanation of the states:

It starts out in XENBUS_XSD_UNCOMMITTED.

As the master xenbus (the one local to xenstored), it will receive an
mmap from xenstore, putting it in XENBUS_XSD_LOCAL_INIT. This enables
the wake_waiting IRQ, which will put it in XENBUS_XSD_LOCAL_READY.

Alternatively, as a slave xenbus, it will receive an ioctl from the
xenstore domain builder, putting it in XENBUS_XSD_FOREIGN_INIT. This
enables the wake_waiting IRQ, which will put it in
XENBUS_XSD_FOREIGN_READY.

DomU's are immediately initialized to XENBUS_XSD_FOREIGN_READY.

Signed-off-by: Diego Ongaro <diego.ongaro@citrix.com>
Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
15 years agoxenbus: Remove an assumption that 'initial domain' is dom0.
Keir Fraser [Tue, 19 May 2009 13:42:04 +0000 (14:42 +0100)]
xenbus: Remove an assumption that 'initial domain' is dom0.
Signed-off-by: Diego Ongaro <diego.ongaro@citrix.com>
Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
15 years agoxenbus: allow any xenbus command over /proc/xen/xenbus.
Keir Fraser [Tue, 19 May 2009 13:28:48 +0000 (14:28 +0100)]
xenbus: allow any xenbus command over /proc/xen/xenbus.

Signed-off-by: Diego Ongaro <diego.ongaro@citrix.com>
Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@eu.citrix.com>
15 years agoSync Xen public headers with 3.4.0 release.
Keir Fraser [Mon, 18 May 2009 13:14:15 +0000 (14:14 +0100)]
Sync Xen public headers with 3.4.0 release.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoAdded tag xen-3.4.0 for changeset 9cbcc9008446
Keir Fraser [Mon, 18 May 2009 12:21:07 +0000 (13:21 +0100)]
Added tag xen-3.4.0 for changeset 9cbcc9008446

15 years agoxen/x86: don't initialize cpu_data[]'s apicid field on generic code xen-3.4.0
Keir Fraser [Thu, 14 May 2009 09:09:15 +0000 (10:09 +0100)]
xen/x86: don't initialize cpu_data[]'s apicid field on generic code

Afaict, this is not only redundant with the intialization done in
drivers/xen/core/smpboot.c, but actually results - at least for
secondary CPUs - in the Xen-specific value written to be later
overwritten with whatever the generic code determines (with no
guarantee that the two values are identical).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoxen/i386: hypervisor_callback adjustments
Keir Fraser [Thu, 14 May 2009 09:08:40 +0000 (10:08 +0100)]
xen/i386: hypervisor_callback adjustments

The missing check of the interrupted code's code selector in
hypervisor_callback() allowed a user mode application to oops (and
perhaps crash) the kernel.

Further adjustments:
- the 'main' critical region does not include the jmp following the
  disabling of interrupts
- the sysexit_[se]crit range checks got broken at some point - the
  sysexit ciritcal region is always at higher addresses than the
  'main'
  one, yielding the check pointless (but consuming execution time);
  since the supervisor mode kernel isn't actively used afaict, I moved
  that code into an #ifdef using a hypothetical config option
- the use of a numeric label across more than 300 lines of code always
  seemed pretty fragile to me, so the patch replaces this with a local
  named label
- streamlined the critical_region_fixup code to eliminate a branch

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoxen: miscellaneous cleanup
Keir Fraser [Thu, 14 May 2009 09:08:10 +0000 (10:08 +0100)]
xen: miscellaneous cleanup

- add two missing unwind annotations
- mark remaining struct file_operations instances const
- use get_capacity() instead of raw access to the capacity field
- use assert_spin_locked() instead of BUG_ON(!spin_is_locked())
- use clear_tsk_thread_flag() instead of clear_ti_thread_flag()
- remove dead variable cpu_state

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agolinux/blktap: fix blktap_clear_pte().
Keir Fraser [Tue, 5 May 2009 12:32:55 +0000 (13:32 +0100)]
linux/blktap: fix blktap_clear_pte().

fix blktap_clear_pte(). In case of vma->vm_file == NULL
case wasn't handled correctly.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agolinux/blktap: fix vma_close() for partial munmap.
Keir Fraser [Tue, 28 Apr 2009 12:44:22 +0000 (13:44 +0100)]
linux/blktap: fix vma_close() for partial munmap.

vm_area_struct::vm_private_data is used
by get_user_pages() so that we can't override
it. So in order to make blktap work, set it
to a array of struct page*.

Without mm->mmap_sem, virtual mapping can be changed.
so remembering vma which was passed to mmap callback
is bogus because later the vma can be freed or changed.
So don't remember vma and put necessary infomations into
tap_blkif_t. and use find_vma() to get necessary vma's.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agoblktap: don't use vma->vm_start to calculate offset.
Keir Fraser [Tue, 28 Apr 2009 12:43:46 +0000 (13:43 +0100)]
blktap: don't use vma->vm_start to calculate offset.

struct vma can be split by partial munmap(), we can't depend on
vm_start. Instead, use tap_blkif_t::rings_vstart.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agoblktap: fix race memory refernce with ring_ok.
Keir Fraser [Tue, 28 Apr 2009 12:43:06 +0000 (13:43 +0100)]
blktap: fix race memory refernce with ring_ok.

fix race memory refernce with ring_ok.
ring_ok is shared by mmapping process and blktap kernel thread.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agoblktap: add one static.
Keir Fraser [Tue, 28 Apr 2009 12:42:32 +0000 (13:42 +0100)]
blktap: add one static.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agoblktap: don't access deallocated data
Keir Fraser [Fri, 17 Apr 2009 12:03:22 +0000 (13:03 +0100)]
blktap: don't access deallocated data

Dereferencing filp->private_data->vma in the file_operations.release
actor isn't permitted, as the vma generally has been destroyed by that
time. The kfree()ing of vma->vm_private_data must be done in the
vm_operations.close actor, and the call to zap_page_range() seems
redundant with the caller of that actor altogether.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agopci: clean up of changeset 860
Keir Fraser [Thu, 16 Apr 2009 10:47:44 +0000 (11:47 +0100)]
pci: clean up of changeset 860

The fixing logic was somewhat confused and doesn't produce right
result. This patch cleans it up.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agobackport Linux changeset of bf4162bcf82ebc3258d6bc0ddd6453132abde72d
Keir Fraser [Tue, 14 Apr 2009 10:17:47 +0000 (11:17 +0100)]
backport Linux changeset of bf4162bcf82ebc3258d6bc0ddd6453132abde72d

Without this patch, fakephp with reassigndev fails
to allocate memory resource.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
commit bf4162bcf82ebc3258d6bc0ddd6453132abde72d
Author: Darrick J. Wong <djwong@us.ibm.com>
Date:   Tue Nov 25 13:51:44 2008 -0800

    PCI hotplug: fakephp: Allocate PCI resources before adding the
    device

    For PCI devices, pci_bus_assign_resources() must be called to set
    up the pci_device->resource array before pci_bus_add_devices() can
    be called, else attempts to load drivers results in BAR collision
    errors where there are none.
    This is not done in fakephp, so devices can be "unplugged" but
    scanning the
    parent bus won't bring the devices back due to resource
    unallocation.  Move the
    pci_bus_add_device-calling logic into pci_rescan_bus and preface
    it with a call
    to pci_bus_assign_resources so that we only have to (re)allocate
    resources once
    per bus where a new device is found.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
16 years agolinux/pci/reassign: fix alignment calculation
Keir Fraser [Tue, 14 Apr 2009 10:16:26 +0000 (11:16 +0100)]
linux/pci/reassign: fix alignment calculation

Later r_align is incremented, so it must be decremented
as compensation.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agolinux/pci_back: fix NULL pointer ref.
Keir Fraser [Tue, 14 Apr 2009 10:16:08 +0000 (11:16 +0100)]
linux/pci_back: fix NULL pointer ref.

pcistub_device_release() can be called during
initialization. Thus pci_get_drvdata() can return NULL.
Fix it by inserting NULL check.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agomerge with linux-2.6.18-xen.hg
Isaku Yamahata [Tue, 14 Apr 2009 05:05:30 +0000 (14:05 +0900)]
merge with linux-2.6.18-xen.hg

16 years agodom0 linux: support SBDF with "guestdev=" and remove "reassigndev="
Keir Fraser [Thu, 9 Apr 2009 07:44:25 +0000 (08:44 +0100)]
dom0 linux: support SBDF with "guestdev=" and remove "reassigndev="

When we don't need to reassign resources and use device path,
pciback.hide= boot parameter can be used. The parameter is also needed
for backward compatibility.

    pciback.hide=(00:01.0)(00:02.0)

When we need to reassign resources or use device path, guestdev= boot
parameter can be used. reassign_resources boot parameter is needed to
reassign resources, too.

    guestdev=00:01.0,00:02.0 reassign_resources
    guestdev=PNP0A08:0-1.0,PNP0A08:0-2.0
    guestdev=PNP0A08:0-1.0,PNP0A08:0-2.0 reassign_resources

Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
16 years agonetfront accel: Better watch handling across suspend/resume
Keir Fraser [Tue, 7 Apr 2009 09:29:30 +0000 (10:29 +0100)]
netfront accel: Better watch handling across suspend/resume

Signed-off-by: Kieran Mansley <kmansley@solarflare.com>
16 years ago[IA64] fix fsys.S paravirtualization
Isaku Yamahata [Tue, 7 Apr 2009 02:31:17 +0000 (11:31 +0900)]
[IA64] fix fsys.S paravirtualization

fix fsys.S paravirtualization.
event_mask must be cleared before checking event_pending.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agousbback: fix urb interval value for interrupt urbs.
Keir Fraser [Mon, 6 Apr 2009 12:51:20 +0000 (13:51 +0100)]
usbback: fix urb interval value for interrupt urbs.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
16 years agoPCI: sync up the SR-IOV changes between Dom0 and upstream kernel
Keir Fraser [Mon, 6 Apr 2009 12:48:03 +0000 (13:48 +0100)]
PCI: sync up the SR-IOV changes between Dom0 and upstream kernel

The SR-IOV patches for the upstream kernel are finally in-tree. This
patch backports some minor changes that appeared in the upstream
kernel after the Dom0 patches were checked-in.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
16 years agopci: Do not disable I/O decoding on reassigning resource.
Keir Fraser [Mon, 6 Apr 2009 12:47:27 +0000 (13:47 +0100)]
pci: Do not disable I/O decoding on reassigning resource.

When I reserve UHCI for guest domain with "guestdev=" and
"reassign_resources" parameters, spurious interrupts occurred.
The reason is that UHCI is not reset by uhci_check_and_reset_hc
because I/O decoding is disabled. UHCI keeps asserting the interrupt
line. As a result spurious interrupts occur.

The patch does not disable I/O decoding. It disables only memory
decoding. So UHCI is reset and spurious interrupts do not occur.

Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
16 years agoSkip vcpu_hotplug for VCPU 0 in smp_resume.
Keir Fraser [Wed, 1 Apr 2009 10:43:01 +0000 (11:43 +0100)]
Skip vcpu_hotplug for VCPU 0 in smp_resume.
This function can occasionally take up to 2 seconds to complete,
and smp_suspend also skips VCPU 0.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agousbfront: do not assume sequentially mapped pages
Keir Fraser [Tue, 31 Mar 2009 11:01:50 +0000 (12:01 +0100)]
usbfront: do not assume sequentially mapped pages

xenhcd_gnttab_map in usbfront-q.c looks up the mfn of the start of the
usb transfer buffer.  But the buffer may span several pages, and the
current code simply increments the obtained mfn.  Needless to say this
is an unwarranted assumption.  It causes large transfers to be
corrupted and/or to overwrite other parts of memory.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
16 years agosfc_netfront: Only clear tx_skb when ready for netif_wake_queue
Keir Fraser [Tue, 31 Mar 2009 11:00:53 +0000 (12:00 +0100)]
sfc_netfront: Only clear tx_skb when ready for netif_wake_queue
(doing otherwise could result in a lost packet) and document use of
locks to protect tx_skb

Signed-off-by: Kieran Mansley <kmansley@solarflare.com>
16 years agonetfront accel: Simplify, document, and fix a theoretical bug in use
Keir Fraser [Tue, 31 Mar 2009 11:00:03 +0000 (12:00 +0100)]
netfront accel: Simplify, document, and fix a theoretical bug in use
of spin locks by netfront acceleration plugins

Signed-off-by: Kieran Mansley <kmansley@solarflare.com>
16 years agonet sfc: Update sfc and sfc_resource driver to latest release
Keir Fraser [Tue, 31 Mar 2009 10:59:10 +0000 (11:59 +0100)]
net sfc: Update sfc and sfc_resource driver to latest release

...and update sfc_netfront, sfc_netback, sfc_netutil for any API changes

sfc_netback: Fix asymmetric use of SFC buffer table alloc and free
sfc_netback: Clean up if no SFC accel device found
sfc_netback: Gracefully handle case where page grant fails
sfc_netback: Disable net acceleration if the physical link goes down
sfc_netfront: Less verbose error messages, more verbose counters for
rx discard errors
sfc_netfront: Gracefully handle case where SFC netfront fails during
initialisation

Signed-off-by: Kieran Mansley <kmansley@solarflare.com>
16 years agopvusbback: fix a compilation error.
Keir Fraser [Tue, 31 Mar 2009 10:49:12 +0000 (11:49 +0100)]
pvusbback: fix a compilation error.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agopvusb: Fix license headers.
Keir Fraser [Tue, 31 Mar 2009 10:11:23 +0000 (11:11 +0100)]
pvusb: Fix license headers.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
16 years agoxen: swiotlb allocations do not need to come from low memory
Keir Fraser [Fri, 20 Mar 2009 09:00:58 +0000 (09:00 +0000)]
xen: swiotlb allocations do not need to come from low memory

Other than on native, where using the _low variants of alloc_bootmem()
is indeed a requirement for swiotlb, on Xen this is not needed. Using
the _low variants has the potential of preventing systems from booting
when they have lots of memory, due to the way the bootmem allocator
works: It allocates memory from bottom to top. Thus, if other large
(but not _low) allocations (memmap, large system hash tables)
mostly consume the memory below 4Gb, the swiotlb allocations can
fail. (This is equally so for native, but cannot be that easily fixed
there.)

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agopci: Fix the non-PCI_IOV build.
Keir Fraser [Thu, 19 Mar 2009 13:48:52 +0000 (13:48 +0000)]
pci: Fix the non-PCI_IOV build.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoXen: Sync physdev.h public header.
Keir Fraser [Thu, 19 Mar 2009 10:25:31 +0000 (10:25 +0000)]
Xen: Sync physdev.h public header.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoPCI: pass ARI and SR-IOV device information to the hypervisor
Keir Fraser [Thu, 19 Mar 2009 10:21:46 +0000 (10:21 +0000)]
PCI: pass ARI and SR-IOV device information to the hypervisor

PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the Extended
Function -- a function whose function number is greater than 7 within an
ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the Extended
Function is under the scope of the same remapping unit as the traditional
function. The hypervisor needs to know if a function is Extended
Function so it can find proper DMAR for it.

And section 8.3.3 specifies that the SR-IOV Virtual Function is under the
scope of the same remapping unit as the Physical Function. The hypervisor
also needs to know if a function is the Virtual Function and which
Physical Function it's associated with for same reason.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
16 years agoPCI: save and restore PCIe 2.0 registers
Keir Fraser [Thu, 19 Mar 2009 10:21:21 +0000 (10:21 +0000)]
PCI: save and restore PCIe 2.0 registers

PCIe 2.0 defines several new registers (Device Control 2, Link Control
2, and Slot Control 2). Save and retore them in pci_save_pcie_state()
and pci_restore_pcie_state().

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
16 years agoPCI: add a SR-IOV quirk for Intel 82576 NIC
Keir Fraser [Thu, 19 Mar 2009 10:20:59 +0000 (10:20 +0000)]
PCI: add a SR-IOV quirk for Intel 82576 NIC

If BIOS doesn't allocate resources for VF BARs, zero Flash BAR and
program VF BARs to use the old Flash Memory Space.

Please refer to Intel 82576 Gigabit Ethernet Controller Datasheet
section 7.9.2.14.2 for details.
http://download.intel.com/design/network/datashts/82576_Datasheet.pdf

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
16 years agolinux/PCI-MSI: fix compiler warnings resulting from c/s 790
Keir Fraser [Thu, 19 Mar 2009 10:07:31 +0000 (10:07 +0000)]
linux/PCI-MSI: fix compiler warnings resulting from c/s 790

The one in pci_enable_msix() is rather meaningful, as the
uninitialized inner msi_dev_entry was indeed hiding the initialized
outer one.

Signed-off-by: Jan Beulich <jbeulich@novell.com>