- array indices got checked after having indexed the array already
- several were off by one
- BLKTAP_IOCTL_FREEINTF should not be used on other than the control
device (or the logic should be changed to that when thus used only
the respective device can be freed)
- BLKTAP_IOCTL_MINOR can reasonably also be used on non-control
- devices
(returning that device's minor and ignoring the passed in argument)
xen/blktap: fix cleanup after unclean application exit
When an application using blktap devices doesn't close the file handle
(or mmap-s) of /dev/xen/blktapN, we cannot defer the mmput() on the
stored mm until blktap_release(), as that will never be called without
the mm's reference count dropping to zero.
Keir Fraser [Tue, 30 Mar 2010 17:28:34 +0000 (18:28 +0100)]
xen/balloon: Fix return value interpretation for XENMEM_get_pod_target
Unfortunately c/s 989 didn't consider what I would call a quirk in
pre-3.4 Xen, resulting in XENMEM_get_pod_target calls to not return
-ENOSYS as one would normally expect.
Keir Fraser [Mon, 1 Mar 2010 09:56:15 +0000 (09:56 +0000)]
blktap2: Fix queue restart, racing block device removal.
Makes tapdisk context test dev->gd before attempting a queue restart,
with the device lock held. Fixes a race lost against device
destruction, which may issued anywhere on the control path.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com> Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Mon, 1 Mar 2010 09:55:09 +0000 (09:55 +0000)]
Guest SR-IOV: Replace previous changeset with a more complete implementation from Intel
"""Guest SR-IOV support for PV guest
These changes are for PV guest to use Virtual Function. Because the
VF's vendor, device registers in cfg space are 0xffff, which are
invalid and ignored by PCI device scan. Values in 'struct pci_dev' are
fixed up by SR-IOV code, and using these values will present correct
VID and DID to PV guest kernel.
And command registers in the cfg space are read only 0, which means we
have to emulate MMIO enable bit (VF only uses MMIO resource) so PV
kernel can work properly."""
Keir Fraser [Mon, 22 Feb 2010 10:03:18 +0000 (10:03 +0000)]
linux/x86: fix long timeout handling in stop_hz_timer()
Other than for HYPERVISOR_set_timer_op, zero doesn't mean "no timeout"
for VCPUOP_set_singleshot_timer (but should be retained rather than
adjusted by NS_PER_TICK/2 for the former).
Also properly cancel the singleshot timer is start_hz_timer().
Keir Fraser [Mon, 1 Feb 2010 14:12:36 +0000 (14:12 +0000)]
xen/balloon: fix balloon driver accounting for HVM-with-PoD case
With PoD, ballooning down a guest to the target set through xenstore
based on its totalram_pages value isn't sufficient, since that value
doesn't include all the pages assigned to the guest. Since the delta
is static, determine it once at load time.
Keir Fraser [Mon, 18 Jan 2010 10:46:43 +0000 (10:46 +0000)]
xen/blkfront: fixes for 'xm block-detach ... --force'
Prevent prematurely freeing 'struct blkfront_info' instances (when the
xenbus data structures are gone, but the Linux ones are still needed).
Prevent adding a disk with the same (major, minor) [and hence the same
name and sysfs entries, which leads to oopses] when the previous
instance wasn't fully de-allocated yet.
This still doesn't address all issues resulting from forced detach:
I/O submitted after the detach still blocks forever, likely preventing
subsequent un-mounting from completing. It's not clear to me (not
knowing much about the block layer) how this can be avoided.
This also doesn't address issues with duplicate device creation caused
by re-using the hdXX and sdXX name spaces - this would require
synchronization with the respective native code.
Keir Fraser [Wed, 13 Jan 2010 08:11:51 +0000 (08:11 +0000)]
privcmd: add new (replacement) mmap-batch ioctl
While the error indicator of IOCTL_PRIVCMD_MMAPBATCH should be in the
top nibble (it is documented that way in include/xen/public/privcmd.h
and include/xen/compat_ioctl.h), it really wasn't for 64-bit
implementations. With MFNs now possibly being 32 or more bits wide on
x86-64, using bits 28-31 as failure indicator (and bit 31 as paged-out
indicator) is not longer acceptable. Instead, a new ioctl with a
separate error indication array is being introduced.
Keir Fraser [Fri, 8 Jan 2010 13:07:17 +0000 (13:07 +0000)]
Update sfc_netback driver to match sfc_resource 3.0.2.2074
Add support for direct guest access and acceleration of SFC9000 series
NICs.
Improve handling of NIC reset in sfc_netback
Remove nic_index state and replace with if_index from struct
net_device Remove duplication of header files with sfc_resource driver
Keir Fraser [Fri, 8 Jan 2010 13:05:49 +0000 (13:05 +0000)]
Update Solarflare Communications net driver to version 3.0.2.2074
Bring net driver in Xen tree in line with kernel.org tree
Add support for new SFC9000 series NICs
Keir Fraser [Wed, 6 Jan 2010 08:38:09 +0000 (08:38 +0000)]
xen/privcmd: fix for proper operation in compat mode
- sizeof(struct privcmd_mmapbatch_32) was wrong
- MFN array must be translated for IOCTL_PRIVCMD_MMAPBATCH
Also, the error indicator of IOCTL_PRIVCMD_MMAPBATCH should be in the
top nibble (it is documented that way in include/xen/public/privcmd.h
and include/xen/compat_ioctl.h), but since that is an incompatible
change it is not being done here (instead, a new ioctl with proper
behavior will need to be added).
Keir Fraser [Wed, 6 Jan 2010 08:15:35 +0000 (08:15 +0000)]
xenoprof: dynamic buffer array allocation
The recent change to locally define MAX_VIRT_CPUS wasn't really
appropriate - with there not being a hard limit on the number of
vCPU-s anymore, these arrays should be allocated dynamically.
Keir Fraser [Wed, 6 Jan 2010 08:14:10 +0000 (08:14 +0000)]
xen/privcmd: convert single shot check to be per-page
For the sake of not breaking the ia64 build, old behavior is being
retained when HAVE_ARCH_PRIVCMD_MMAP. Hopefully someone able to
test ia64 can fix this up in the not too distant future.
Keir Fraser [Wed, 16 Dec 2009 16:44:12 +0000 (16:44 +0000)]
xen/backends: simplify address translations
There are quite a number of places where e.g. page->va->page
translations happen.
Besides yielding smaller code (source and binary), a second goal is to
make it easier to determine where virtual addresses of pages allocated
through alloc_empty_pages_and_pagevec() are really used (in turn in
order to determine whether using highmem pages would be possible
there).
Keir Fraser [Mon, 7 Dec 2009 14:14:28 +0000 (14:14 +0000)]
netback: Fixes for delayed copy of tx network packets.
- Should call net_tx_action_dealloc() even when dealloc ring is
empty, as there may in any case be work to do on the
pending_inuse list.
- Should not exit directly from the middle of the tx_action tasklet,
as the tx_pending_timer should always be checked and updated at the
end of the tasklet.
Keir Fraser [Thu, 3 Dec 2009 13:53:06 +0000 (13:53 +0000)]
xenfb: Only start one xenfb kthread
When doing save/restore testing with the linux-2.6.18-xen.hg tree it
was discovered that every time a restore happened we would get a new
xenfb thread. While the framebuffer continues to work, this is an
obvious resource leak. The attached patch only starts up a new xenfb
thread the first time the backend connects, and continues to re-use
that in the future. Jeremy's upstream LKML tree doesn't suffer from
this since it uses a completely different mechanism to do screen
updates. Original patch from John Haxby @ Oracle; slightly modified
by me to apply to the linux-2.6.18-xen.hg tree.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
When mem= is being used to specify a value below the amount a domain
got passed from Xen, init_memory_mapping() got called with the higher
original value (end_pfn_map), triggering the BUG()s in maddr.h
checking PFNs against end_pfn.
Keir Fraser [Tue, 24 Nov 2009 14:45:19 +0000 (14:45 +0000)]
xen: Dont call msi_unmap_pirq() if did not enable msi
When device driver unload, it may call pci_disable_msi(), if msi did
not enabled but do msi_unmap_pirq(), then later driver reload and
without msi, then will failed in request_irq() for irq_desc[irq]->chip
valie is no_irq_chip. So when did not enable msi during driver
initializing, then unloaded driver will not try to disable it.
How to reproduce it:
At the server with QLogic 25xx, try to reload qla2xxx will hit it.
Keir Fraser [Wed, 4 Nov 2009 18:13:32 +0000 (18:13 +0000)]
xenbus: do not hold transaction_mutex when returning to userspace
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
xenstore-list/3522 is leaving the kernel with locks still held!
1 lock held by xenstore-list/3522:
#0: (&xs_state.transaction_mutex){......}, at: [<c026dc6f>]
xenbus_dev_request_and_reply+0x8f/0xa0
The canonical fix for this type of issue appears to be to maintain a
count manually rather than using an rwsem so do that here.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Fri, 23 Oct 2009 09:07:22 +0000 (10:07 +0100)]
xen/x86: fix GFP mask handling in dma_alloc_coherent()
Ever since no longer pushing all memory into the DMA zone (c/s 355),
explicitly setting GFP_DMA as well as not masking off GFP_DMA32 was
unnecessarily restricting the pool from which suitable memory could be
taken.
Keir Fraser [Wed, 7 Oct 2009 07:42:00 +0000 (08:42 +0100)]
PVUSB: Fixes and updates
- xenbus state flow changed.
Whole of the flow is changed to be like netback/netfront.
Reconfiguring/Reconfiguring are removed.
- New RING for hotplug notification added.
- USBIF_MAX_SEGMENTS_PER_REQUEST value is changed (10) to (16).
According to this change, RING_SIZE is decreased from 32 to 16.
This affects the performance. My flash drive's read throughput
was dropped from 29MB/s to 18MB/s in the linux environment.
However, Windows guest send urb with 64kB buffer(64KB = 4kB * 16).
This is required.
- New port-setting interface
xenbus_watch_path2 is added to usbback, port-setting interface
is moved from sysfs to xenstore.
Now, the port-rule is directly written to xenstore entry.
Example.
# xenstore-write /local/domain/0/backend/vusb/1/0/port/1 "2-1"
(adding physical bus 2-1 to vusb-1-0 port 1)
- urb dequeue function completed.
usbfront send unlink-request to usbback, and can cancel the urb
that is submitted in the backend.
- New USB Spec version (USB1.1/USB2.0) selection support.
usbfront can act as both USB1.1 and USB2.0 virtual host controller
according to the xenstore entry key "usb-ver".
- experimental bus_suspend/bus_resume added to usbfront.
- various cleanups, bugfix, refactoring and codestyle-fix.
Keir Fraser [Wed, 7 Oct 2009 06:33:40 +0000 (07:33 +0100)]
xen: re-synchronize ring.h public header
Patch 20267:e9366bed077e modified the definition of sring in the xen
repo's version of ring.h, but not the version in the linux kernel
repo. That change broke pause/resume/shutdown messages from the
blktap2 kernel module, which (for the time being) relies on pad[0]
being at consistent location in the sring struct. This patch fixes
this regression by resyncronizing the two the files.
Keir Fraser [Tue, 25 Aug 2009 13:55:22 +0000 (14:55 +0100)]
xen/x86: make do_settimeofday() return -EPERM when clock can't be changed
Rather than returning success here (without actually having done
anything), it seems more appropriate/conforming to let the caller know
that what he intended to do didn't succeed.
Keir Fraser [Wed, 5 Aug 2009 11:05:34 +0000 (12:05 +0100)]
xen/x86-64: fix Dom0 boot on AMD K8 CPUs
The workaround in question here should be (and is being) applied by
the hypervisor (which doesn't allow any guest - including Dom0 - to
write other than all zeroes or all ones into MCi_CTL).
Do not go beyond ARRAY_SIZE of info->shadow Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jan Beulich <jbeulich@novell.com>
blktap2: make blktap2 work for auto translated mode with hvm domain.
This patch makes blktap2 work for hvm domain with auto translated
mode. (I.e. IA64 HVM domain case as Kuwamura reported its bug.)
blktap2 has introduces new feature that pages from the self domain
can be handled. However it doesn't work for auto translated mode
because blktap2 relies on p2m table manipulation. But the p2m
doesn't make sense for auto translated mode.
So self grant mapping is used instead.
Just passing same page to blktap2 daemon doesn't work because
when doing io, the page is locked, so the given page from blktap2
block device is already locked. When blktap2 daemon issues IO on
the page, it tries to lock it resulting in dead lock.
So resorted to self grant.