Alex Williamson [Tue, 8 Jan 2013 21:10:03 +0000 (14:10 -0700)]
vfio-pci: Loosen sanity checks to allow future features
VFIO_PCI_NUM_REGIONS and VFIO_PCI_NUM_IRQS should never have been
used in this manner as it locks a specific kernel implementation.
Future features may introduce new regions or interrupt entries
(VGA may add legacy ranges, AER might add an IRQ for error
signalling). Fix this before it gets us into trouble.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org
Alex Williamson [Tue, 8 Jan 2013 21:09:03 +0000 (14:09 -0700)]
vfio-pci: Make host MSI-X enable track guest
Guests typically enable MSI-X with all of the vectors in the MSI-X
vector table masked. Only when the vector is enabled does the vector
get unmasked, resulting in a vector_use callback. These two points,
enable and unmask, correspond to pci_enable_msix() and request_irq()
for Linux guests. Some drivers rely on VF/PF or PF/fw communication
channels that expect the physical state of the device to match the
guest visible state of the device. They don't appreciate lazily
enabling MSI-X on the physical device.
To solve this, enable MSI-X with a single vector when the MSI-X
capability is enabled and immediate disable the vector. This leaves
the physical device in exactly the same state between host and guest.
Furthermore, the brief gap where we enable vector 0, it fires into
userspace, not KVM, so the guest doesn't get spurious interrupts.
Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters
to enable MSI-X with zero vectors, but this will currently return an
error as the Linux MSI-X interfaces do not allow it.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org
Anthony Liguori [Tue, 8 Jan 2013 16:36:20 +0000 (10:36 -0600)]
Merge remote-tracking branch 'kraxel/usb.75' into staging
* kraxel/usb.75: (32 commits)
uhci: stop using portio lists
usbredir: Add support for buffered bulk input (v2)
exynos4210: Add EHCI support
usb/ehci: Add SysBus EHCI device for Exynos4210
usb/ehci: Move capsbase and opregbase into SysBus EHCI class
usb/ehci: Clean up SysBus and PCI EHCI split
xhci: call set-address with dummy usbpacket
usb-redir: Add debugging to bufpq save / restore
usbredir: Add usbredir_init_endpoints() helper
usbredir: Verify we have 32 bits bulk length cap when redirecting to xhci
usbredir: Add ep_stopped USBDevice method
usbredir: Add USBEP2I and I2USBEP helper macros
usbredir: Add an usbredir_stop_ep helper function
usb: Add an usb_device_ep_stopped USBDevice method
usb: Fix usb_ep_find_packet_by_id
hid: Change idle handling to use a timer
uhci: Maximize how many frames we catch up when behind
uhci: Limit amount of frames processed in one go
uhci: Add a QH_VALID define
uhci: Fix pending interrupts getting lost on migration
...
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Anthony Liguori [Tue, 8 Jan 2013 16:36:13 +0000 (10:36 -0600)]
Merge remote-tracking branch 'stefanha/net' into staging
* stefanha/net:
rtl8139: preserve link state across device reset
e1000: no need auto-negotiation if link was down
net: clean up network at qemu process termination
e1000: Discard oversized packets based on SBP|LPE
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Petar Jovanovic [Wed, 2 Jan 2013 04:08:48 +0000 (05:08 +0100)]
target-mips: Fix helper and tests for dot/cross-dot product instructions
Helper function for dpa_w_ph, dpax_w_ph, dps_w_ph and dpsx_w_ph incorrectly
defines halfword vector elements as unsigned values. This results in wrong
output which is not triggered in the tests as they also follow this logic.
Signed-off-by: Petar Jovanovic <petarj@mips.com> Reviewed-by: Eric Johnson <ericj@mips.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Stefan Weil [Tue, 1 Jan 2013 18:44:31 +0000 (19:44 +0100)]
target-mips: Replace macros by inline functions
The macros RESTORE_ROUNDING_MODE and RESTORE_FLUSH_MODE silently used
variable env from their callers. Using inline functions with env passed
as a function argument is more transparent.
This modification was proposed by Peter Maydell.
Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Johnson <ericj@mips.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Hans de Goede [Wed, 19 Dec 2012 14:08:33 +0000 (15:08 +0100)]
usbredir: Add support for buffered bulk input (v2)
Buffered bulk mode is intended for bulk *input* endpoints, where the data is
of a streaming nature (not part of a command-response protocol). These
endpoints' input buffer may overflow if data is not read quickly enough.
So in buffered bulk mode the usb-host takes care of the submitting and
re-submitting of bulk transfers.
Buffered bulk mode is necessary for reliable operation with the bulk in
endpoints of usb to serial convertors. Unfortunatelty buffered bulk input
mode will only work with certain devices, therefor this patch also adds a
usb-id table to enable it for devices which need it, while leaving the
bulk ep handling for other devices unmodified.
Note that the bumping of the required usbredir from 0.5.3 to 0.6 does
not mean that we will now need a newer usbredir release then qemu-1.3,
.pc files reporting 0.5.3 have only ever existed in usbredir builds directly
from git, so qemu-1.3 needs the 0.6 release too.
Changes in v2:
-Split of quirk handling into quirks.c
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Andreas Färber [Sun, 16 Dec 2012 03:49:43 +0000 (04:49 +0100)]
usb/ehci: Clean up SysBus and PCI EHCI split
SysBus EHCI was introduced in a hurry before 1.3 Soft Freeze.
To use QOM casts in place of DO_UPCAST() / FROM_SYSBUS(), we need an
identifying type. Introduce generic abstract base types for PCI and
SysBus EHCI to allow multiple types to access the shared fields.
While at it, move the state structs being amended with macros to the
header file so that they can be embedded.
The VMSTATE_PCI_DEVICE() macro does not play nice with the QOM
parent_obj naming convention, so defer that cleanup.
Signed-off-by: Andreas Färber <andreas.faerber@web.de> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Gerd Hoffmann [Fri, 14 Dec 2012 12:10:39 +0000 (13:10 +0100)]
xhci: call set-address with dummy usbpacket
Due to the way devices are addressed with xhci (done by hardware, not
the guest os) there is no packet when invoking the set-address control
request. Create a dummy packet in that case to avoid null pointer
dereferences.
Hans de Goede [Fri, 14 Dec 2012 13:35:44 +0000 (14:35 +0100)]
usbredir: Verify we have 32 bits bulk length cap when redirecting to xhci
The xhci-hcd may submit bulk transfers > 65535 bytes even when not using
bulk-in pipeling, so usbredir can only be used in combination with an xhci
hcd if the client has the 32 bits bulk length capability.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:40 +0000 (14:35 +0100)]
usb: Add an usb_device_ep_stopped USBDevice method
Some usb devices (host or network redirection) can benefit from knowing when
the guest stops using an endpoint. Redirection may involve submitting packets
independently from the guest (in combination with a fifo buffer between the
redirection code and the guest), to ensure that buffers of the real usb device
are timely emptied. This is done for example for isoc traffic and for interrupt
input endpoints. But when the (re)submission of packets is done by the device
code, then how does it know when to stop this?
For isoc endpoints this is handled by detecting a set interface (change alt
setting) command, which works well for isoc endpoints. But for interrupt
endpoints currently the redirection code never stops receiving data from
the device, which is less then ideal.
However the controller emulation is aware when a guest looses interest, as
then the qh for the endpoint gets unlinked (ehci, ohci, uhci) or the endpoint
is explicitly stopped (xhci). This patch adds a new ep_stopped USBDevice
method and modifies the hcd code to call this on queue unlink / ep stop.
This makes it possible for the redirection code to properly stop receiving
interrupt input (*) data when the guest no longer has interest in it.
*) And in the future also buffered bulk input.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:38 +0000 (14:35 +0100)]
hid: Change idle handling to use a timer
This leads to cleaner code in usb-hid, and removes up to a 1000 calls / sec to
qemu_get_clock_ns(vm_clock) if idle-time is set to its default value of 0.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:36 +0000 (14:35 +0100)]
uhci: Limit amount of frames processed in one go
Before this patch uhci would process an unlimited amount of frames when
behind on schedule, by setting the timer to a time already past, causing the
timer subsys to immediately recall the frame_timer function gain.
This would cause invalid cancellations of bulk queues when the catching up
processed more then 32 frames at a moment when the bulk qh was temporarily
unlinked (which the Linux uhci driver does).
This patch fixes this by processing maximum 16 frames in one go, and always
setting the timer one ms later, making the code behave more like the ehci
code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:33 +0000 (14:35 +0100)]
uhci: Fix 1 ms delay in interrupt reporting to the guest
Re-arrange how we process frames / increase frnum / report pending interrupts,
to avoid a 1 ms delay in interrupt reporting to the guest. This increases
the packet throughput for cases where the guest submits a single packet,
then waits for its completion then re-submits from 500 pkts / sec to
1000 pkts / sec. This impacts for example the use of redirected / virtual
usb to serial convertors.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:31 +0000 (14:35 +0100)]
ehci: Further speedup rescanning if async schedule after raising an interrupt
I tried lowering the time between raising an interrupt and rescanning the
async schedule to see if the guest has queued a new transfer before, but
that did not have any positive effect. I now believe the cause for this is
that lowering this time made it more likely to hit the 1 ms interrupt
threshold penalty for the next packet, as described in my
"ehci: Use uframe precision for interrupt threshold checking" commit.
Now that we do interrupt threshold handling with uframe precision, futher
lowering this time from .5 to .25 ms gives an extra 15% improvement in speed
(MB/s) reading from a simple USB-2.0 thumb-drive.
While at it also properly set the int_req_by_async flag for short packet
completions.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Tue, 18 Dec 2012 13:17:02 +0000 (14:17 +0100)]
ehci: Use uframe precision for interrupt threshold checking (v2)
Before this patch, the following could happen:
1) Transfer completes, raises interrupt
2) .5 ms later we check if the guest has queued up any new transfers
3) We find and execute a new transfer
4) .2 ms later the new transfer completes
5) We re-run our frame_timer to write back the completion, but less then
1 ms has passed since our last run, so frindex is not changed, so the
interrupt threshold code delays the interrupt
6) 1 ms from the re-run our frame-timer runs again and finally delivers
the interrupt
This leads to unnecessary large delays of interrupts, this code fixes this
by changing frindex to uframe precision and using that for interrupt threshold
control, making the interrupt fire at step 5 for guest which have low interrupt
threshold settings (like Linux).
Note that the guest still sees the frindex move in steps of 8 for migration
compatibility.
This boosts Linux read speed of a simple cheap USB thumb drive by 6 %.
Changes in v2:
-Make the guest see frindex move in steps of 8 by modifying ehci_opreg_read,
rather then using a shadow variable
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:29 +0000 (14:35 +0100)]
ehci: Verify a queue's ep direction does not change
ehci_fill_queue assumes that there is a one on one relationship between an ep
and a qh, this patch adds a check to ensure this.
Note I don't expect this to ever trigger, this is just something I noticed
the guest might do while working on other stuff. The only way this check can
trigger is if a guest mixes in and out qtd-s in a single qh for a non
control ep.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:24 +0000 (14:35 +0100)]
ehci: Verify guest does not change the token of inflight qtd-s
This is not allowed, except for clearing active on cancellation, so don't
warn when the new token does not have its active bit set.
This unifies the cancellation path for modified qtd-s, and prepares
ehci_verify_qtd to be used ad an extra check inside
ehci_writeback_async_complete_packet().
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Hans de Goede [Fri, 14 Dec 2012 13:35:22 +0000 (14:35 +0100)]
ehci: Add a ehci_writeback_async_complete_packet helper function
Also drop the warning printf, which was there mainly because this was an
untested code path (as the previous bug fixes to it show), but that no
longer is the case now :)
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Amos Kong [Fri, 28 Dec 2012 09:29:10 +0000 (17:29 +0800)]
e1000: no need auto-negotiation if link was down
Commit b9d03e352cb6b31a66545763f6a1e20c9abf0c2c added link
auto-negotiation emulation, it would always set link up by
callback function. Problem exists if original link status
was down, link status should not be changed in auto-negotiation.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Amos Kong [Tue, 11 Dec 2012 14:20:15 +0000 (22:20 +0800)]
net: clean up network at qemu process termination
We don't clean up network if fails to parse "-device" parameters without
calling net_cleanup(). I touch a problem, the tap device which is
created by qemu-ifup script could not be removed by qemu-ifdown script.
Some similar problems also exist in vl.c
In this patch, if network initialization successes, a cleanup function
will be registered to be called at qemu process termination.
Signed-off-by: Amos Kong <akong@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Weil [Sat, 5 Jan 2013 08:33:43 +0000 (09:33 +0100)]
hw/i386: Fix broken build for non POSIX hosts
pc-testdev.c cannot be compiled with MinGW (and other non POSIX hosts):
CC i386-softmmu/hw/i386/../pc-testdev.o
qemu/hw/i386/../pc-testdev.c:38:22: warning: sys/mman.h: file not found
qemu/hw/i386/../pc-testdev.c: In function ‘test_flush_page’:
qemu/hw/i386/../pc-testdev.c:103: warning: implicit declaration of function ‘mprotect’
...
Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The interface to normalizeRoundAndPackFloat64 requires that the
high bit be clear. Perform one shift-right-and-jam if needed.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
John Spencer [Tue, 25 Dec 2012 23:49:49 +0000 (00:49 +0100)]
linux-user/syscall.c: remove forward declarations
instead use the correct headers that define these functions.
Requested-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: John Spencer <maillist-qemu@barfooze.de> Reviewed-by: Amos Kong <kongjianjun@gmail.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
pc_fw_add_pflash_drv() ignores qemu_find_file() failure, and happily
creates a drive without a medium.
When pc_system_flash_init() asks for its size, bdrv_getlength() fails
with -ENOMEDIUM, which isn't checked either. It fails relatively
cleanly only because -ENOMEDIUM isn't a multiple of 4096:
$ qemu-system-x86_64 -S -vnc :0 -bios nonexistant
qemu: PC system firmware (pflash) must be a multiple of 0x1000
[Exit 1 ]
Fix by handling the qemu_find_file() failure.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Fix this by moving the label to the end of the line, this way the
libvirt parser does still recognise the message. libvirt looks
for "char device redirected to ${ptsname}<whitespace>".
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Stefan Hajnoczi [Thu, 3 Jan 2013 10:56:16 +0000 (11:56 +0100)]
dataplane: use linux-headers/ for virtio includes
The hw/dataplane/vring.c code includes linux/virtio_ring.h. Ensure that
we use linux-headers/ instead of the system-wide headers, which may be
out-of-date on older distros.
This resolves the following build error on Debian 6:
CC hw/dataplane/vring.o
cc1: warnings being treated as errors
hw/dataplane/vring.c: In function 'vring_enable_notification':
hw/dataplane/vring.c:71: error: implicit declaration of function 'vring_avail_event'
hw/dataplane/vring.c:71: error: nested extern declaration of 'vring_avail_event'
hw/dataplane/vring.c:71: error: lvalue required as left operand of assignment
Note that we now build dataplane/ for each target instead of only once.
There is no way around this since linux-headers/ is only available for
per-target objects - and it's how virtio, vfio, kvm, and friends are
built.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Michael Tokarev [Mon, 31 Dec 2012 11:30:31 +0000 (15:30 +0400)]
savevm.c: cleanup system includes
savevm.c suffers from the same problem as some other files.
Some years ago savevm.c was created from vl.c, moving some
code from there into a separate file. At that time, all
includes were just copied from vl.c to savevm.c, without
checking which ones are needed and which are not.
But actually most of that stuff is _not_ needed. More, some
stuff is wrong, for example, *BSD #ifdef'ery around <util.h>
vs <libutil.h> - for one, it fails to build on Debian/kFreebsd.
Just remove all this. Maybe there's a possibility to clean
it up further - like removing <windows.h> (and maybe including
winsock.h for htons etc), and maybe it's possible to remove
some internal #includes too, but I didn't check this.
While at it, remove duplicate #include of qemu/timer.h.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Curses display requires stdin/out to stay on the terminal,
so -daemonize makes no sense in this case. Instead of
leaving display uninitialized like is done since 995ee2bf469de6bb,
explicitly detect this case earlier and error out.
-nographic can actually be used with -daemonize, by redirecting
everything to a null device, but the problem is that according
to documentation and historical behavour, -nographic redirects
guest ports to stdin/out, which, again, makes no sense in case
of -daemonize. Since -nographic is a legacy option, don't bother
fixing this case (to allow -nographic and -daemonize by redirecting
guest ports to null instead of stdin/out in this case), but disallow
it completely instead, to stop garbling host terminal.
If no display display needed and user wants to use -nographic,
the right way to go is to use
-serial null -parallel null -monitor none -display none -vga none
instead of -nographic.
Also prevent the same issue -- it was possible to get garbled
host tty after
-nographic -daemonize
and it is still possible to have it by using
-serial stdio -daemonize
Fix this by disallowing opening stdio chardev when -daemonize
is specified.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Liu Yuan [Mon, 17 Dec 2012 06:17:27 +0000 (14:17 +0800)]
sheepdog: pass oid directly to send_pending_req()
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Liu Yuan [Mon, 17 Dec 2012 06:17:26 +0000 (14:17 +0800)]
sheepdog: don't update inode when create_and_write fails
For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap
to avoid the scenario that the object is allocated but wasn't created at the
server side. This will result in VM's IO error on the failed object.
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
liguang [Mon, 17 Dec 2012 01:49:23 +0000 (09:49 +0800)]
qemu-img: report size overflow error message
qemu-img will complain when qcow or qcow2
size overflow for 64 bits, report the right
message in this condition.
$./qemu-img create -f qcow2 /tmp/foo 0x10000000000000000
before change:
qemu-img: Invalid image size specified! You may use k, M, G or T suffixes for
qemu-img: kilobytes, megabytes, gigabytes and terabytes.
after change:
qemu-img: Image size must be less than 8 EiB!
[Resolved conflict with a9300911 goto removal -- Stefan]
Signed-off-by: liguang <lig.fnst@cn.fujitsu.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The virtio-blk-data-plane feature is easy to integrate into
hw/virtio-blk.c. The data plane can be started and stopped similar to
vhost-net.
Users can take advantage of the virtio-blk-data-plane feature using the
new -device virtio-blk-pci,x-data-plane=on property.
The x-data-plane name was chosen because at this stage the feature is
experimental and likely to see changes in the future.
If the VM configuration does not support virtio-blk-data-plane an error
message is printed. Although we could fall back to regular virtio-blk,
I prefer the explicit approach since it prompts the user to fix their
configuration if they want the performance benefit of
virtio-blk-data-plane.
Limitations:
* Only format=raw is supported
* Live migration is not supported
* Block jobs, hot unplug, and other operations fail with -EBUSY
* I/O throttling limits are ignored
* Only Linux hosts are supported due to Linux AIO usage
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 14 Nov 2012 14:39:30 +0000 (15:39 +0100)]
dataplane: add virtio-blk data plane code
virtio-blk-data-plane is a subset implementation of virtio-blk. It only
handles read, write, and flush requests. It does this using a dedicated
thread that executes an epoll(2)-based event loop and processes I/O
using Linux AIO.
This approach performs very well but can be used for raw image files
only. The number of IOPS achieved has been reported to be several times
higher than the existing virtio-blk implementation.
Eventually it should be possible to unify virtio-blk-data-plane with the
main body of QEMU code once the block layer and hardware emulation is
able to run outside the global mutex.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Mon, 10 Dec 2012 12:14:39 +0000 (13:14 +0100)]
virtio-blk: restore VirtIOBlkConf->config_wce flag
Two slightly different versions of a patch to conditionally set
VIRTIO_BLK_F_CONFIG_WCE through the "config-wce" qdev property have been
applied (ea776abca and eec7f96c2). David Gibson
<david@gibson.dropbear.id.au> noticed that the "config-wce"
property is broken as a result and fixed it recently.
The fix sets the host_features VIRTIO_BLK_F_CONFIG_WCE bit from a qdev
property. Unfortunately, the virtio device then has no chance to test
for the presence of the feature bit during virtio_blk_init().
Therefore, reinstate the VirtIOBlkConf->config_wce flag. Drop the
duplicate qdev property to set the host_features bit. The
VirtIOBlkConf->config_wce flag will be used by virtio-blk-data-plane in
a later patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Thu, 22 Nov 2012 15:06:06 +0000 (16:06 +0100)]
iov: add qemu_iovec_concat_iov()
The qemu_iovec_concat() function copies a subset of a QEMUIOVector. The
new qemu_iovec_concat_iov() function does the same for a iov/cnt pair.
It is easy to define qemu_iovec_concat() in terms of
qemu_iovec_concat_iov(). The existing code is mostly unchanged, except
for the assertion src->size >= soffset, which cannot be efficiently
checked upfront on a iov/cnt pair. Instead we assert upon hitting the
end of src with an unsatisfied soffset.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 14 Nov 2012 14:30:09 +0000 (15:30 +0100)]
dataplane: add Linux AIO request queue
The IOQueue has a pool of iocb structs and a function to add new
read/write requests. Multiple requests can be added before calling the
submit function to actually tell the host kernel to begin I/O. This
allows callers to batch requests and submit them in one go.
The actual I/O is performed using Linux AIO.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 14 Nov 2012 14:23:00 +0000 (15:23 +0100)]
dataplane: add event loop
Outside the safety of the global mutex we need to poll on file
descriptors. I found epoll(2) is a convenient way to do that, although
other options could replace this module in the future (such as an
AioContext-based loop or glib's GMainLoop).
One important feature of this small event loop implementation is that
the loop can be terminated in a thread-safe way. This allows QEMU to
stop the data plane thread cleanly.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 14 Nov 2012 14:15:50 +0000 (15:15 +0100)]
dataplane: add virtqueue vring code
The virtio-blk-data-plane cannot access memory using the usual QEMU
functions since it executes outside the global mutex and the memory APIs
are this time are not thread-safe.
This patch introduces a virtqueue module based on the kernel's vhost
vring code. The trick is that we map guest memory ahead of time and
access it cheaply outside the global mutex.
Once the hardware emulation code can execute outside the global mutex it
will be possible to drop this code.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Tue, 20 Nov 2012 09:30:08 +0000 (10:30 +0100)]
dataplane: add host memory mapping code
The data plane thread needs to map guest physical addresses to host
pointers. Normally this is done with cpu_physical_memory_map() but the
function assumes the global mutex is held. The data plane thread does
not touch the global mutex and therefore needs a thread-safe memory
mapping mechanism.
Hostmem registers a MemoryListener similar to how vhost collects and
pushes memory region information into the kernel. There is a
fine-grained lock on the regions list which is held during lookup and
when installing a new regions list.
When the physical memory map changes the MemoryListener callbacks are
invoked. They build up a new list of memory regions which is finally
installed when the list has been completed.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Wed, 14 Nov 2012 10:43:23 +0000 (11:43 +0100)]
raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled. This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Anthony Liguori [Wed, 2 Jan 2013 14:01:36 +0000 (08:01 -0600)]
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pci,virtio
This optimizes MSIX handling in virtio-pci.
Also included is pci express capability bugfix.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* mst/tags/for_anthony:
virtio-pci: don't poll masked vectors
msix: expose access to masked/pending state
msi: add API to get notified about pending bit poll
pcie: Fix bug in pcie_ext_cap_set_next
virtio: make bindings typesafe
target-mips: Use EXCP_SC rather than a magic number
From the discussion on the ML [1], the exception limit defined by
magic number 0x100 is actually EXCP_SC defined in cpu.h. Replace the
magic number with EXCP_SC. Remove "#if 1 .. #endif" as well.
Jovanovic, Petar [Tue, 11 Dec 2012 15:06:35 +0000 (15:06 +0000)]
target-mips: Make repl_ph to sign extend to target-long
The immediate value is 9bits, should sign-extend to 16bits. The return value to
register should sign-extend to target_long, as Richard says, removing an
unnecessary cast works fun.