Because the cpumap member of struct xen_sysctl_cpupool_op is used only
when the operation is XEN_SYSCTL_CPUPOOL_OP_INFO or
XEN_SYSCTL_CPUPOOL_OP_FREEINFO, in case of others, xencomm_map to
cpumap fails, thus XEN_SYSCTL_cpupool_op fails.
Keir Fraser [Tue, 10 Aug 2010 14:47:41 +0000 (15:47 +0100)]
xen/x86: eliminate nesting of run-queue locks inside xtime_lock
From: Zdenek Salvet <salvet@ics.muni.cz>
According to Debian bug 591362 this has been causing problems. While
no proof was given that the inverse lock order does actually occur
anywhere (with interrupts enabled), it is plain unnecessary to take
the risk.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Tue, 10 Aug 2010 14:46:56 +0000 (15:46 +0100)]
blktap2: eliminate bogus clearing of PG_reserved
While making sure PG_reserved is set for pages allocated from the
balloon driver (and to be used for I/O) is a necessary thing to do
(as 2.6.18's as well as pv-ops' balloon drivers don't guarantee this
for the pages returned from alloc_empty_pages_and_page_vec()),
clearing this flag again when a page is no longer in use for I/O is
bogus at best (after all, the page at that point is not associated
with any MFN anymore), and causes problems when the balloon driver
properly marks all such pages as reserved and checks, upon their
return, that they are still marked this way.
Keir Fraser [Mon, 2 Aug 2010 10:02:18 +0000 (11:02 +0100)]
xenoprofile: Add IBS support
Add IBS support for AMD family 10h processors. The major
implementation is derived from latest Linux. Two hypercalls are added,
which is necessary for IBS feature detection and user mode parameter
read.
Keir Fraser [Fri, 18 Jun 2010 13:11:57 +0000 (14:11 +0100)]
xen/x86: fix for special behavior of first sys_settimeofday(NULL, &tz) invocation
The data Xen's time implementation maintains to make do_gettimeofday()
return values monotonic needs to be reset not only during normal
do_gettimeofday() invocations, but also when the clock gets warped
due to the hardware (CMOS) clock running on local (rather than UTC)
time.
Additionally there was a time window in do_gettimeofday() (between
the end of the xtime read loop and the acquiring of the monotonicity
data lock) where, if on another processor do_settimeofday() would
execute to completion, the zeroes written by the latter could get
overwritten by the former with values obtained before the time was
updated. This now gets prevented by maintaining a version for the
monotonicity data.
This fixes the following errors:
/arch/ia64/xen/xcom_privcmd.c: In function `xencomm_privcmd_sysctl':
/arch/ia64/xen/xcom_privcmd.c:295: error: case label not within a
switch statement
/arch/ia64/xen/xcom_privcmd.c:305: error: break statement not within
loop or switch
Since 1018:b7eb9756e522 inserted lines in outside of a switch
statement. This patch corrects it.
blktap: fix cleanup after unclean application exit #2
When an application using blktap devices doesn't close the mmap-s of
/dev/xen/blktapN and the frontend driver never connects, we cannot
defer the mmput() on the stored mm until blktap_release() or the exit
path of the worker thread, as the former will never be called without
the mm's reference count dropping to zero, and the worker thread
would never get started.
- array indices got checked after having indexed the array already
- several were off by one
- BLKTAP_IOCTL_FREEINTF should not be used on other than the control
device (or the logic should be changed to that when thus used only
the respective device can be freed)
- BLKTAP_IOCTL_MINOR can reasonably also be used on non-control
- devices
(returning that device's minor and ignoring the passed in argument)
xen/blktap: fix cleanup after unclean application exit
When an application using blktap devices doesn't close the file handle
(or mmap-s) of /dev/xen/blktapN, we cannot defer the mmput() on the
stored mm until blktap_release(), as that will never be called without
the mm's reference count dropping to zero.
Keir Fraser [Tue, 30 Mar 2010 17:28:34 +0000 (18:28 +0100)]
xen/balloon: Fix return value interpretation for XENMEM_get_pod_target
Unfortunately c/s 989 didn't consider what I would call a quirk in
pre-3.4 Xen, resulting in XENMEM_get_pod_target calls to not return
-ENOSYS as one would normally expect.
Keir Fraser [Mon, 1 Mar 2010 09:56:15 +0000 (09:56 +0000)]
blktap2: Fix queue restart, racing block device removal.
Makes tapdisk context test dev->gd before attempting a queue restart,
with the device lock held. Fixes a race lost against device
destruction, which may issued anywhere on the control path.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com> Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Mon, 1 Mar 2010 09:55:09 +0000 (09:55 +0000)]
Guest SR-IOV: Replace previous changeset with a more complete implementation from Intel
"""Guest SR-IOV support for PV guest
These changes are for PV guest to use Virtual Function. Because the
VF's vendor, device registers in cfg space are 0xffff, which are
invalid and ignored by PCI device scan. Values in 'struct pci_dev' are
fixed up by SR-IOV code, and using these values will present correct
VID and DID to PV guest kernel.
And command registers in the cfg space are read only 0, which means we
have to emulate MMIO enable bit (VF only uses MMIO resource) so PV
kernel can work properly."""
Keir Fraser [Mon, 22 Feb 2010 10:03:18 +0000 (10:03 +0000)]
linux/x86: fix long timeout handling in stop_hz_timer()
Other than for HYPERVISOR_set_timer_op, zero doesn't mean "no timeout"
for VCPUOP_set_singleshot_timer (but should be retained rather than
adjusted by NS_PER_TICK/2 for the former).
Also properly cancel the singleshot timer is start_hz_timer().
Keir Fraser [Mon, 1 Feb 2010 14:12:36 +0000 (14:12 +0000)]
xen/balloon: fix balloon driver accounting for HVM-with-PoD case
With PoD, ballooning down a guest to the target set through xenstore
based on its totalram_pages value isn't sufficient, since that value
doesn't include all the pages assigned to the guest. Since the delta
is static, determine it once at load time.
Keir Fraser [Mon, 18 Jan 2010 10:46:43 +0000 (10:46 +0000)]
xen/blkfront: fixes for 'xm block-detach ... --force'
Prevent prematurely freeing 'struct blkfront_info' instances (when the
xenbus data structures are gone, but the Linux ones are still needed).
Prevent adding a disk with the same (major, minor) [and hence the same
name and sysfs entries, which leads to oopses] when the previous
instance wasn't fully de-allocated yet.
This still doesn't address all issues resulting from forced detach:
I/O submitted after the detach still blocks forever, likely preventing
subsequent un-mounting from completing. It's not clear to me (not
knowing much about the block layer) how this can be avoided.
This also doesn't address issues with duplicate device creation caused
by re-using the hdXX and sdXX name spaces - this would require
synchronization with the respective native code.
Keir Fraser [Wed, 13 Jan 2010 08:11:51 +0000 (08:11 +0000)]
privcmd: add new (replacement) mmap-batch ioctl
While the error indicator of IOCTL_PRIVCMD_MMAPBATCH should be in the
top nibble (it is documented that way in include/xen/public/privcmd.h
and include/xen/compat_ioctl.h), it really wasn't for 64-bit
implementations. With MFNs now possibly being 32 or more bits wide on
x86-64, using bits 28-31 as failure indicator (and bit 31 as paged-out
indicator) is not longer acceptable. Instead, a new ioctl with a
separate error indication array is being introduced.
Keir Fraser [Fri, 8 Jan 2010 13:07:17 +0000 (13:07 +0000)]
Update sfc_netback driver to match sfc_resource 3.0.2.2074
Add support for direct guest access and acceleration of SFC9000 series
NICs.
Improve handling of NIC reset in sfc_netback
Remove nic_index state and replace with if_index from struct
net_device Remove duplication of header files with sfc_resource driver
Keir Fraser [Fri, 8 Jan 2010 13:05:49 +0000 (13:05 +0000)]
Update Solarflare Communications net driver to version 3.0.2.2074
Bring net driver in Xen tree in line with kernel.org tree
Add support for new SFC9000 series NICs
Keir Fraser [Wed, 6 Jan 2010 08:38:09 +0000 (08:38 +0000)]
xen/privcmd: fix for proper operation in compat mode
- sizeof(struct privcmd_mmapbatch_32) was wrong
- MFN array must be translated for IOCTL_PRIVCMD_MMAPBATCH
Also, the error indicator of IOCTL_PRIVCMD_MMAPBATCH should be in the
top nibble (it is documented that way in include/xen/public/privcmd.h
and include/xen/compat_ioctl.h), but since that is an incompatible
change it is not being done here (instead, a new ioctl with proper
behavior will need to be added).
Keir Fraser [Wed, 6 Jan 2010 08:15:35 +0000 (08:15 +0000)]
xenoprof: dynamic buffer array allocation
The recent change to locally define MAX_VIRT_CPUS wasn't really
appropriate - with there not being a hard limit on the number of
vCPU-s anymore, these arrays should be allocated dynamically.
Keir Fraser [Wed, 6 Jan 2010 08:14:10 +0000 (08:14 +0000)]
xen/privcmd: convert single shot check to be per-page
For the sake of not breaking the ia64 build, old behavior is being
retained when HAVE_ARCH_PRIVCMD_MMAP. Hopefully someone able to
test ia64 can fix this up in the not too distant future.
Keir Fraser [Wed, 16 Dec 2009 16:44:12 +0000 (16:44 +0000)]
xen/backends: simplify address translations
There are quite a number of places where e.g. page->va->page
translations happen.
Besides yielding smaller code (source and binary), a second goal is to
make it easier to determine where virtual addresses of pages allocated
through alloc_empty_pages_and_pagevec() are really used (in turn in
order to determine whether using highmem pages would be possible
there).
Keir Fraser [Mon, 7 Dec 2009 14:14:28 +0000 (14:14 +0000)]
netback: Fixes for delayed copy of tx network packets.
- Should call net_tx_action_dealloc() even when dealloc ring is
empty, as there may in any case be work to do on the
pending_inuse list.
- Should not exit directly from the middle of the tx_action tasklet,
as the tx_pending_timer should always be checked and updated at the
end of the tasklet.
Keir Fraser [Thu, 3 Dec 2009 13:53:06 +0000 (13:53 +0000)]
xenfb: Only start one xenfb kthread
When doing save/restore testing with the linux-2.6.18-xen.hg tree it
was discovered that every time a restore happened we would get a new
xenfb thread. While the framebuffer continues to work, this is an
obvious resource leak. The attached patch only starts up a new xenfb
thread the first time the backend connects, and continues to re-use
that in the future. Jeremy's upstream LKML tree doesn't suffer from
this since it uses a completely different mechanism to do screen
updates. Original patch from John Haxby @ Oracle; slightly modified
by me to apply to the linux-2.6.18-xen.hg tree.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
When mem= is being used to specify a value below the amount a domain
got passed from Xen, init_memory_mapping() got called with the higher
original value (end_pfn_map), triggering the BUG()s in maddr.h
checking PFNs against end_pfn.
Keir Fraser [Tue, 24 Nov 2009 14:45:19 +0000 (14:45 +0000)]
xen: Dont call msi_unmap_pirq() if did not enable msi
When device driver unload, it may call pci_disable_msi(), if msi did
not enabled but do msi_unmap_pirq(), then later driver reload and
without msi, then will failed in request_irq() for irq_desc[irq]->chip
valie is no_irq_chip. So when did not enable msi during driver
initializing, then unloaded driver will not try to disable it.
How to reproduce it:
At the server with QLogic 25xx, try to reload qla2xxx will hit it.
Keir Fraser [Wed, 4 Nov 2009 18:13:32 +0000 (18:13 +0000)]
xenbus: do not hold transaction_mutex when returning to userspace
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
xenstore-list/3522 is leaving the kernel with locks still held!
1 lock held by xenstore-list/3522:
#0: (&xs_state.transaction_mutex){......}, at: [<c026dc6f>]
xenbus_dev_request_and_reply+0x8f/0xa0
The canonical fix for this type of issue appears to be to maintain a
count manually rather than using an rwsem so do that here.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Fri, 23 Oct 2009 09:07:22 +0000 (10:07 +0100)]
xen/x86: fix GFP mask handling in dma_alloc_coherent()
Ever since no longer pushing all memory into the DMA zone (c/s 355),
explicitly setting GFP_DMA as well as not masking off GFP_DMA32 was
unnecessarily restricting the pool from which suitable memory could be
taken.
Keir Fraser [Wed, 7 Oct 2009 07:42:00 +0000 (08:42 +0100)]
PVUSB: Fixes and updates
- xenbus state flow changed.
Whole of the flow is changed to be like netback/netfront.
Reconfiguring/Reconfiguring are removed.
- New RING for hotplug notification added.
- USBIF_MAX_SEGMENTS_PER_REQUEST value is changed (10) to (16).
According to this change, RING_SIZE is decreased from 32 to 16.
This affects the performance. My flash drive's read throughput
was dropped from 29MB/s to 18MB/s in the linux environment.
However, Windows guest send urb with 64kB buffer(64KB = 4kB * 16).
This is required.
- New port-setting interface
xenbus_watch_path2 is added to usbback, port-setting interface
is moved from sysfs to xenstore.
Now, the port-rule is directly written to xenstore entry.
Example.
# xenstore-write /local/domain/0/backend/vusb/1/0/port/1 "2-1"
(adding physical bus 2-1 to vusb-1-0 port 1)
- urb dequeue function completed.
usbfront send unlink-request to usbback, and can cancel the urb
that is submitted in the backend.
- New USB Spec version (USB1.1/USB2.0) selection support.
usbfront can act as both USB1.1 and USB2.0 virtual host controller
according to the xenstore entry key "usb-ver".
- experimental bus_suspend/bus_resume added to usbfront.
- various cleanups, bugfix, refactoring and codestyle-fix.
Keir Fraser [Wed, 7 Oct 2009 06:33:40 +0000 (07:33 +0100)]
xen: re-synchronize ring.h public header
Patch 20267:e9366bed077e modified the definition of sring in the xen
repo's version of ring.h, but not the version in the linux kernel
repo. That change broke pause/resume/shutdown messages from the
blktap2 kernel module, which (for the time being) relies on pad[0]
being at consistent location in the sring struct. This patch fixes
this regression by resyncronizing the two the files.