xebus: Do not avoid state transitions on shutdown in kernels for which
we do not provide a shutdown hook. This is because we don't drive
xenbus during shutdown in this case, and such kernels often do not
expose a 'system_state' variable, so the xenbus driver doesn't build!
Signed-off-by: David Lively <dlively@virtualiron.com> Signed-off-by: Ben Guthro <bguthro@virtualrion.com>
privcmd: Take write lock on mm semaphore when calling
*remap_pfn_range(), as these function mess with fields in the vma
structure. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
[XEN][LINUX][POWERPC] Add PowerPC Xen architecture support.
Most of this code was written by Jimi Xenidis <jimix@watson.ibm.com>. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
[PPC] Add architecture-generic xencomm infrastructure.
Xencomm is the mechanism by which userspace can pass virtual addresses
to Xen on architectures that cannot perform page table walks in
software, including PowerPC and IA64. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
[PPC] Create Xen-specific interface for xlate_dev_mem_*
PowerPC builds both drivers/char/mem.c and drivers/xen/char/mem.c at
once, so we cannot hijack the xlate_dev_mem_ptr() interface. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Ian Campbell [Mon, 2 Jul 2007 16:19:24 +0000 (17:19 +0100)]
Do not call clock_was_set() from interrupt context.
Currently clock_was_set() is a nop but on newer kernels it is not and
cannot be called from interrupt context. Prepare for that by deferring
to a workqueue. Since a timer interrupt can occur before
init_workqueue() is called we need to protect against the possibility
that keventd hasn't started yet.
(drop unused variable max_ntp_tick).
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Ian Campbell [Wed, 27 Jun 2007 15:31:36 +0000 (16:31 +0100)]
Fix kexec compatibility with highmem.
Stop abusing xen_create_contiguous_region() to move pages below the
MFN limit. Instead introduce xen_limit_pages_to_max_mfn() which works
for both low and highmem but doesn't bother making the pages
contiguous.
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
kfraser [Thu, 21 Jun 2007 12:39:05 +0000 (13:39 +0100)]
Fix amd64-agp aperture validation
Under Xen, pfn_valid() on a machine address makes no sense. But even
on native, under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid()
implies all subsequent pfn-s are also invalid is wrong. Thus replace
this by explicitly checking against the E820 map.
Patch is in 2.6.22-rc4.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
kfraser [Thu, 21 Jun 2007 08:45:00 +0000 (09:45 +0100)]
xen/firmware: Simplify Xen-interfacing code. The firmware-info maps
will be guaranteed to be densely populated from index 0 to max-1,
hence when we see an error it is always appropriate to bail from the
loop. Signed-off-by: Keir Fraser <keir@xensource.com>
Tim Deegan [Wed, 20 Jun 2007 18:17:36 +0000 (19:17 +0100)]
Don't clip the TSC-derived usec value to <1tick in gettimeofday.
This fixes some very odd gettimeofday() results when NTP is
adjtimex()ing the clock backwards. Signed-off-by: Tim Deegan <Tim.Deegan@xensource.com>
virt_to_bus() must be called after gnttab_dma_use_page() loop.
Otherwise gnttab unmap_and_replace may happen between them, resulting
in the bus address being wrong. Thanks Isaku.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
kfraser [Tue, 12 Jun 2007 10:36:58 +0000 (11:36 +0100)]
pciback: "Controller" pcibackend and frontend extensions.
On ia64, we've run into the case where the I/O hierarchies are more
complicated than the current set of driver domain backends can describe.
Some platforms make use of translation offsets for I/O port and MMIO
ranges. Without knowledge of these translation offsets, devices are
unusable by driver domains. For instance, here's an example of a
tulip card that lives under a PCI root bus making use of an I/O port
translation:
# lspci -v -s 02:05.0
02:05.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41)
Subsystem: Hewlett-Packard Company Unknown device 125a
Flags: bus master, medium devsel, latency 128, IRQ 67
I/O ports at 2001100 [size=128]
Memory at 90102000 (32-bit, non-prefetchable) [size=1K]
Expansion ROM at 90080000 [disabled] [size=256K]
I/O port spaces are of course limited to 64k, but on this system
multiple I/O port spaces are available (one per PCI root bridge in
this case). On ia64, I/O port spaces are typically a sparse encoding
of an MMIO range. The legacy I/O port range is decoded directly by
the processor, additional ranges are decoded by the I/O hardware. To
access I/O port 0x1100 on this device, the driver needs to do an
inb/outb to address 0x2001100. The kernel will then swizzle the bits
to create an MMIO transaction within the MMIO range for that set of
I/O ports.
To support this, I've created the "controller" backend as shown below.
This is unfortunately an ia64-specific backend, but I don't see
any mechanism to generically support the kinds of things this backend
needs to do. PCI controllers on ia64 are created to represent the PCI
root bridges found in ACPI. These root bridge ACPI nodes have _CRS
(Current Resource Setting) methods that describe the address ranges
consumed by the bus below the root bridge. Address ranges described
with a translation attribute make use of a translation offset to reach
the desired address. This information must be provided to a driver
domain guest to allow it to access the devices.
Given this architecture, the obvious choice is to create virtual PCI
buses based on controllers. All devices physically under the same
controller are virtualized under the same domain:bus. Within a bus,
device slots are virtualized much like the slot backend. The tricky
part comes with how to describe the address translation for a
controller to the guest driver domain. For this, I chose to store the
information in xenbus. We already make use of the following keys for
driver domains:
root_num /* Number of PCI roots exposed */
root-X /* domain:bus information for root X */
To this, I've added:
root-X-resources /* number of resources for root X */
root-X-resource-Y /* resource umber Y for root X */
root-resource-magic /* synchronization/versioning for resource
info */
I debated for a while how to expose the root-X-resource-Y information
and came up with a simple ASCII dump of the struct acpi_resource
returned from the ACPI _CRS method. This isn't quite a silly as it
sounds because the structure is a fixed size regardless of word
length, and it's contents are largely based on fixed tables found in
the ACPI spec. This makes it relatively immune to frequent changes.
The PCI backend stores the ASCII byte stream of the controller
resources into xenbus, the PCI frontend then extracts the byte stream,
and decodes it back into a struct acpi_resource for use.
The only changes to the existing code to support the frontend is a
trivial addition of passing the bus number to pcifront_init_sd() and a
hook to setup the root windows after the bus is scanned. No changes
are required for the controller backend.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
kfraser [Mon, 11 Jun 2007 14:53:26 +0000 (15:53 +0100)]
Make dma address conversion logic of gnttab dma arch specific.
gnttab_dma_map_page() and gnttab_dma_unmap_page() uses machine address
with dma address interchangebly. However it doesn't work with auto
translated mode enabled (i.e. on ia64) because
- bus address space(dma_addr_t) is different from machine address
space(maddr_t).
With the terminology in xen/include/public/mm.h,
dma_addr_t is maddr and maddr_t is gmaddr.
So they should be handled differently with auto translated physmap
mode
enabled.
- dma address conversion depends on dma api implementation and
its paravirtualization.
"pfn_valid(mfn_to_local_pfn(maddr >> PAGE_SHIFT)" check in
gnttab_dma_map_page() doesn't make sense with auto translate physmap
mode enabled.
To address those issues, split those logic from gnttab_dma_map_page()
and gnttab_dma_unmap_page(), and put it into arch specific files.
This patch doesn't change the already existing x86 logic.
kfraser [Thu, 7 Jun 2007 10:04:08 +0000 (11:04 +0100)]
[LINUX] gnttab: Fix copy_grant_page race with seqlock
Previously gnttab_copy_grant_page would always unmap the grant table
entry, even if DMA operations were outstanding. This would allow a
hostile guest to free a page still used by DMA to the hypervisor.
This patch fixes this by making sure that we don't free the grant
table entry if a DMA operation has taken place. To achieve this a
seqlock is used to synchronise the DMA operations and
copy_grant_page.
The DMA operations use the read side of the seqlock so performance
should be largely unaffected.
Thanks to Isaku Yamahata for noticing the race condition.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ian Campbell [Mon, 4 Jun 2007 09:05:24 +0000 (10:05 +0100)]
Imported patch allow-i386-crash-kernels-to-handle-x86_64-dumps.patch from xen-unstable.hg 15200:bd3d6b4c52ec
The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.
It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.
Signed-off-by: Ian Campbell <ian.campbell@xensource.com> Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Horms <horms@verge.net.au> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
o Currently there is no specific alignment restriction in linker script
and in some cases it can be placed non 4K aligned addresses. This fails
kexec which checks that segment to be loaded is page aligned.
o I guess, it does not harm data segment to be 4K aligned.