One more of these issues (which were considered fixed): Other than on
x86-64, i386 allows set_fixmap() to replace already present mappings.
Consequently, on PAE, care must be taken to not update the high half
of a pte while the low half is still holding the old value.
This is a backport of some code for Linux that is needed by
my backport of kexec to IA64 xen.
From: Simon Horman <horms@verge.net.au>
sysctl: implement CTL_UNNUMBERED
This patch takes the CTL_UNNUMBERD concept from NFS and makes it
available to all new sysctl users.
At the same time the sysctl binary interface maintenance documentation
is updated to mention and to describe what is needed to successfully
maintain the sysctl binary interface.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
linux/x86: match native behavior of 'make install'
Placement of the final image and handling of the install process
should match native (whether to implicitly create an initrd is just
one example). This includes there not being a need for a special
boot-xen subdirectory and more unification of arch/*/Makefile pieces.
Add the CDROM_GET_CAPABILITY ioctl to blkfront.
Return 0 instead of -EINVAL if the blkfront device is a cdrom,
i.e. had the VDISK_CDROM attribute. This allows udev's cdrom_id
to correctly detect the device as a cdrom device. Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
rebind_irq_to_cpu needs to mask evtchn before bind, which should
be same as what evtchn_rebind_cpu does today. Or else cpu_disable
fails at fixup_irqs.
Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Wake sts is only required to be checked for S1, and implementation
is free to not touch wake sts bit for S3. The latter case is observed
falling into loop on checking wake sts after sleep hypercall returns.
Actually we only need to check hypercall return value here, and those
checks belong to Xen instead.
Bind different tasks' evtchns to different vcpus of Dom0
Currently, all user-space event channels notify Dom0's vcpu0 -- this
is not nice considering scalability. The patch tries to bind different
tasks' evtchns to different vcpus of Dom0 when the bindings are
initialized, and it can also dynamically change the binding if a task
actually gets run on another vcpu for some reason. Tests (Inb and
OLTP) show the patch can improve scalability to some notable degree.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Keir Fraser <keir@xensource.com>
This fixes a bug causing a bogus symbol reference (to init_page_count)
in the PV-on-HVM drivers for 2.6 kernels that define the set_page_count
macro.
Based on an original patch by: Signed-off-by: Ben Guthro <bguthro@virtualiron.com> Signed-off-by: Dave Lively <dlively@virtualiron.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Fix potential kthread deadlock during Xen suspend.
kthread_create() depends on keventd, so it cannot be executed from
keventd. Replace use of kthread_create() with an approach based on
kernel_thread().
Based on an original patch by: Signed-off-by: Ben Guthro <bguthro@virtualiron.com> Signed-off-by: Robert Phillips <rphillips@virtualiron.com> Signed-off-by: Keir Fraser <keir@xensource.com>
kfraser [Fri, 31 Aug 2007 09:45:38 +0000 (10:45 +0100)]
xen: Use unlocked_ioctl in evtchn, gntdev and privcmd drivers to avoid
acquiring the BKL sempahore. Performance improvement is particularly
significant for heavy users of evtchn-notify ioctl.
Suggested by Dexuan Cui <dexuan.cui@intel.com> Signed-off-by: Keir Fraser <keir@xensource.com>
kfraser [Tue, 28 Aug 2007 14:33:15 +0000 (15:33 +0100)]
Remove xencomm page size limit.
Currently xencomm has page size limit so that a domain with many
memory (e.g. 100GB~) can't be created.
Now that xencomm of xen side accepts struct xencomm_desc whose address
array crosses page boundary. Thus it isn't necessary to allocate
single page not to cross page boundary. We can allocate exact sized
memory. Note that struct xencomm_desc can't cross page boundary and
slab allocator returns sizeof(void*) aligned pointer.
Where sizeof(*desc) > sizeof(void*), e.g. 32 bit environment,
the slab allocator return pointer doesn't gurantee that
struct xencomm_desc doesn't cross page boundary. So we fall back to
page allocator.
Alex Williamson [Wed, 22 Aug 2007 14:09:20 +0000 (08:09 -0600)]
[IA64] Work around for xencomm memory reservation op.
- Xencomm has single page size limit caused by xencomm_alloc()/xencomm_free()
so that we have to repeat the hypercall. Repeating the hypercall allows us
to create domains larger than ~63G. This limitation could also be removed
by allowing xencomm calls to cross pages.
- Even if the above limitation is removed, the hypercall with large number of
extents may cause the soft lockup warning.
In order to avoid the warning, we limit the number of extents and repeat
the hypercall.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Alex Williamson <alex.williamson@hp.com>
kfraser [Tue, 14 Aug 2007 15:20:55 +0000 (16:20 +0100)]
[xencomm] introduce opaque type struct xencomm_handle* for xencommized
value. This patch is preparation for xencomm consolidation.
powerpc uses void * for xencommized value, on the other hand IA64 uses
struct xencomm_handle *. Unify it with struct xencomm_handle *.
kfraser [Tue, 14 Aug 2007 15:04:09 +0000 (16:04 +0100)]
[linux, xencomm] Various fixes common xencomm.c for ia64 xencomm consolidation
- move xen_guest_handle() macro into include/xen/xencomm.h
ia64 also uses it.
- is_kern_addr() is powerpc specific. and other arch doesn't
implement it.
It will be defined in linux/include/asm-ia64/xen/xencomm.h
- fix error recovery path of xencomm_create()
xencomm_free() requires pseudo physical address, not virtual
address.
- add one BUG_ON() to xencomm_create_mini() for alignment
requirement
- use xencomm_pa() instead of __pa() in xencomm_map() and
xencomm_map_no_alloc().
They should work for statically allocated area. On ia64 it isn't
in
straight mapping area so that xencomm_pa() is necessary.
- add gcc bug work around. gcc 4.1.2 doesn't handle properly
variables on stack with align attribute.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660
kfraser [Mon, 13 Aug 2007 15:40:36 +0000 (16:40 +0100)]
[LINUX] drivers: Add missing includes
This patch adds missing includes that currently work through indirect
inclusions. This cannot be relied on and indeed does break on older
kernels (2.4 with PV-on-HVM).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kfraser [Mon, 13 Aug 2007 15:33:35 +0000 (16:33 +0100)]
[LINUX] netfront: Cleanup and fix TSO/GSO/CHECKSUM conditionals
This patch tries to minimise the amount of code that is conditionally
compiled. This is desirable (and the Linux way) as it helps to
prevent people breaking code unwittingly since conditionals may hide
compile problems.
It also adds a missing conditional around the TSO ethtool operations.
This also helps the building of netfront under Linux 2.4 which
doesn't have TSO.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kfraser [Wed, 1 Aug 2007 14:16:46 +0000 (15:16 +0100)]
[NET] netloop: Do not clobber cloned skb page frags
The netloop driver tries to localise foreign mappings by
copying them. Unfortunately, it does so by directly modifying
skb page frags without checking whether the skb is cloned or
not. In fact, the packet is going to be cloned more often
than not.
This may result in either data corruption on DMA or a
page fault in dom0 which kills the whole machine.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Alex Williamson [Thu, 26 Jul 2007 20:33:50 +0000 (14:33 -0600)]
[IA64] defconfig update
Remove Radeon framebuffer, even if you have a radeon VGA, you
don't necessarily want the framebuffer. Add framebuffer console
support for pvfb. Other changes pulled in automatically from
make oldconfig.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
xenbus: Allow to build with old kernels (pre-2.6.6).
Original patch by Ben Guthro <bguthro@virtualiron.com> Signed-off-by: Keir Fraser <keir@xensource.com>
Alex Williamson [Wed, 25 Jul 2007 19:16:28 +0000 (13:16 -0600)]
[IA64] Fix Linux VGA autodetection
This patch re-orders setup_arch in xenlinux slightly. This allows us
to check if conswitchp was setup early by the HCDP code to point to a
VGA console. Also fix a bug w/ zero'ing the preferred console name string.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
xenbus: Improvements to wait_for_devices().
1. When printing a warning about a timed-out device, print the
current state of both ends of the device connection (i.e., backend as
well as frontend).
2. A device is 'not yet connected' only when the local state is *less
than* XenbusStateConnected. If the state is Closing or Closed
(usually because of an explicit failure when trying to make the
connection) then we should not wait for the connection to occur -- it
will never happen!
xenbus: Wait for 30s for devices to connect (previously 10s).
Give a visual update to the user on the console every 5s during this
period. Signed-off-by: Keir Fraser <keir@xensource.com>
Open CONFIG_ACPI_SLEEP in xenlinux, to enable ACPI based
power management. Basically, user can trigger power event
now by "echo *** > /sys/power/state". Also gear to pm
interface defined between xenlinux and Xen.
Also sync to xen interface headers consequently
Signed-off-by Ke Yu <ke.yu@intel.com>
Signed-off-by Kevin Tian <kevin.tian@intel.com>
Implements module autoloading for the xen frontend drivers by adding a
uevent function for the frontend xenbus and some module aliases to the
individual drivers.
blktap: Fix page reference count/file rss count leak when auto-translate is enabled.
Tapdisk process rss size becomes too large with auto translation
enabled.
The example is as follows where dom0 has only several hundred
megabytes.
This patch fixes it.
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 6758 root 15 0 39824 1.7t 1.7t S 0 188932.0 14:10.28
tapdisk
This is because page reference count/file rss size are incremented
when io request is accepted, but aren't decremented when the request
is done. This can be fixed by using vm_insert_page() in blktap_mmap()
instead of remap_pfn_range().
Details:
The tapdisk daemon mmaps blktap device and the blktap driver maps
page from the front end into the mmapped area and unmaps it when I/O
request is done.
When io request is accepted, dispatch_rw_block_io() is called.
With auto translated mode disabled, it directly manipulates the page
table without incrementing rss size. With auto translated mode
enabled, it calls vm_insert_page() which increments page reference
count/file rss. When io request is done, fast_flush_area() is called.
With auto translated mode disabled, it directly manipulates the page
table without decrementing rss size. With auto translated mode
enabled, it calls zap_page_range() which should decrements page
reference count/file rss. However (vma->vm_flags & VM_PFNMAP) is true,
it doesn't decrement them so that page reference count and file rss
are leaked. blktap driver allocates pages and never free them so that
page reference count leak doesn't cause an issue (probably until
overflow).
Without auto translation, it makes sense for blktap_mmap() to set
VM_PFNMAP with remap_pfn_range() because the blktap driver directly
manipulates page tables. On the other hand with auto translation the
VM_PFNMAP bit shouldn't set. This can be achieved by using
vm_insert_page() in blktap_mmap() instead of remap_pfn_range()
The changed logic allows having *-xen.[cS] files anywhere in the tree,
without a need to modify the corresponding Makefiles. The patch also
cleans up Makefiles modified for Xen as far as possible.
xebus: Do not avoid state transitions on shutdown in kernels for which
we do not provide a shutdown hook. This is because we don't drive
xenbus during shutdown in this case, and such kernels often do not
expose a 'system_state' variable, so the xenbus driver doesn't build!
Signed-off-by: David Lively <dlively@virtualiron.com> Signed-off-by: Ben Guthro <bguthro@virtualrion.com>