Stefan Hajnoczi [Mon, 18 Aug 2014 15:07:12 +0000 (16:07 +0100)]
block: acquire AioContext in do_drive_del()
Make drive_del safe for dataplane where another thread may be running
the BlockDriverState's AioContext.
Note the assumption that AioContext's lifetime exceeds DriveInfo and
BlockDriverState. We release AioContext after DriveInfo and
BlockDriverState are potentially freed.
This is clearly safe with the global AioContext but also with -object
iothread and implicit iothreads created by -device
virtio-blk-pci,x-data-plane=on (their lifetime is tied to DeviceState,
not BlockDriverState).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Mon, 4 Aug 2014 15:56:33 +0000 (16:56 +0100)]
linux-aio: avoid deadlock in nested aio_poll() calls
If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop. This was reported
to happen with the mirror blockjob.
This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.
Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Marcin Gibuła <m.gibula@beyond.pl> Reported-by: Marcin Gibuła <m.gibula@beyond.pl> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Marcin Gibuła <m.gibula@beyond.pl>
This patch fixes data loss but this code path is probably rare. Since
guests cannot assume ordering between in-flight requests, few
applications submit overlapping write requests.
Reported-by: Slava Pestov <sviatoslav.pestov@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
Max Reitz [Fri, 20 Jun 2014 19:57:34 +0000 (21:57 +0200)]
nbd: Follow the BDS' AIO context
Keep the NBD server always in the same AIO context as the exported BDS
by calling bdrv_add_aio_context_notifier() and implementing the required
callbacks.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Max Reitz [Fri, 20 Jun 2014 19:57:33 +0000 (21:57 +0200)]
block: Add AIO context notifiers
If a long-running operation on a BDS wants to always remain in the same
AIO context, it somehow needs to keep track of the BDS changing its
context. This adds a function for registering callbacks on a BDS which
are called whenever the BDS is attached or detached from an AIO context.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Max Reitz [Fri, 20 Jun 2014 19:57:32 +0000 (21:57 +0200)]
nbd: Drop nbd_can_read()
There is no variant of aio_set_fd_handler() like qemu_set_fd_handler2(),
so we cannot give a can_read() callback function. Instead, unregister
the nbd_read() function whenever we cannot read and re-register it as
soon as we can read again.
All this is hidden behind the functions nbd_set_handlers() (which
registers all handlers for the AIO context and file descriptor belonging
to the given client), nbd_unset_handlers() (which unregisters them) and
nbd_update_can_read() (which checks whether NBD can read for the given
client and acts accordingly).
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Liu Yuan [Thu, 28 Aug 2014 10:27:55 +0000 (18:27 +0800)]
sheepdog: fix a core dump while do auto-reconnecting
We should reinit local_err as NULL inside the while loop or g_free() will report
corrupption and abort the QEMU when sheepdog driver tries reconnecting.
qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: (null)
[xcb] Unknown sequence number while awaiting reply
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
qemu-system-x86_64: ../../src/xcb_io.c:298: poll_for_response: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Aborted (core dumped)
Cc: qemu-devel@nongnu.org Cc: Markus Armbruster <armbru@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:10 +0000 (11:53 +0200)]
aio-win32: add support for sockets
Uses the same select/WSAEventSelect scheme as main-loop.c.
WSAEventSelect() is edge-triggered, so it cannot be used
directly, but it is still used as a way to exit from a
blocking g_poll().
Before g_poll() is called, we poll sockets with a non-blocking
select() to achieve the level-triggered semantics we require:
if a socket is ready, the g_poll() is made non-blocking too.
Based on a patch from Or Goshen.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:08 +0000 (11:53 +0200)]
AioContext: introduce aio_prepare
This will be used to implement socket polling on Windows.
On Windows, select() and g_poll() are completely different;
sockets are polled with select() before calling g_poll,
and the g_poll must be nonblocking if select() says a
socket is ready.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:05 +0000 (11:53 +0200)]
AioContext: export and use aio_dispatch
So far, aio_poll's scheme was dispatch/poll/dispatch, where
the first dispatch phase was used only in the GSource case in
order to avoid a blocking poll. Earlier patches changed it to
dispatch/prepare/poll/dispatch, where prepare is aio_compute_timeout.
By making aio_dispatch public, we can remove the first dispatch
phase altogether, so that both aio_poll and the GSource use the same
prepare/poll/dispatch scheme.
This patch breaks the invariant that aio_poll(..., true) will not block
the first time it returns false. This used to be fundamental for
qemu_aio_flush's implementation as "while (qemu_aio_wait()) {}" but
no code in QEMU relies on this invariant anymore. The return value
of aio_poll() is now comparable with that of g_main_context_iteration.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:04 +0000 (11:53 +0200)]
AioContext: run bottom halves after polling
Make the dispatching phase the same before blocking and afterwards.
The next patch will make aio_dispatch public and use it directly
for the GSource case, instead of aio_poll. aio_poll can then be
simplified heavily.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:03 +0000 (11:53 +0200)]
aio-win32: Factor out duplicate code into aio_dispatch_handlers
Later, the call to aio_dispatch will move int the GSource wrapper, while the
standalone case will still be call the component functions aio_bh_poll,
aio_dispatch_handlers and timerlistgroup_run_timers.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Wed, 9 Jul 2014 09:53:01 +0000 (11:53 +0200)]
AioContext: take bottom halves into account when computing aio_poll timeout
Right now, QEMU invokes aio_bh_poll before the "poll" phase
of aio_poll. It is simpler to do it afterwards and skip the
"poll" phase altogether when the OS-dependent parts of AioContext
are invoked from GSource. This way, AioContext behaves more
similarly when used as a GSource vs. when used as stand-alone.
As a start, take bottom halves into account when computing the
poll timeout. If a bottom half is ready, do a non-blocking
poll. As a side effect, this makes idle bottom halves work
with aio_poll; an improvement, but not really an important
one since they are deprecated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Liu Yuan [Mon, 18 Aug 2014 09:41:05 +0000 (17:41 +0800)]
block/quorum: add simple read pattern support
This patch adds single read pattern to quorum driver and quorum vote is default
pattern.
For now we do a quorum vote on all the reads, it is designed for unreliable
underlying storage such as non-redundant NFS to make sure data integrity at the
cost of the read performance.
For some use cases as following:
VM
--------------
| |
v v
A B
Both A and B has hardware raid storage to justify the data integrity on its own.
So it would help performance if we do a single read instead of on all the nodes.
Further, if we run VM on either of the storage node, we can make a local read
request for better performance.
This patch generalize the above 2 nodes case in the N nodes. That is,
vm -> write to all the N nodes, read just one of them. If single read fails, we
try to read next node in FIFO order specified by the startup command.
The 2 nodes case is very similar to DRBD[1] though lack of auto-sync
functionality in the single device/node failure for now. But compared with DRBD
we still have some advantages over it:
- Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed
storage. And if A crashes, we need to restart all the VMs on node B. But for
practice case, we can't because B might not have enough resources to setup 20 VMs
at once. So if we run our 20 VMs with quorum driver, and scatter the replicated
images over the data center, we can very likely restart 20 VMs without any
resource problem.
After all, I think we can build a more powerful replicated image functionality
on quorum and block jobs(block mirror) to meet various High Availibility needs.
[Dropped \n from an error_setg() error message
--Stefan]
Cc: Benoit Canet <benoit@irqsave.net> Cc: Eric Blake <eblake@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Liu Yuan [Mon, 18 Aug 2014 09:41:04 +0000 (17:41 +0800)]
qapi: add read-pattern enum for quorum
Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Hitoshi Mitake [Mon, 11 Aug 2014 05:43:46 +0000 (14:43 +0900)]
sheepdog: improve error handling for a case of failed lock
Recently, sheepdog revived its VDI locking functionality. This patch
updates sheepdog driver of QEMU for this feature. It changes an error
code for a case of failed locking. -EBUSY is a suitable one.
Reported-by: Valerio Pachera <sirio81@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Liu Yuan <namei.unix@gmail.com> Cc: MORITA Kazutaka <morita.kazutaka@gmail.com> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Hitoshi Mitake [Mon, 11 Aug 2014 05:43:45 +0000 (14:43 +0900)]
sheepdog: adopting protocol update for VDI locking
The update is required for supporting iSCSI multipath. It doesn't
affect behavior of QEMU driver but adding a new field to vdi request
struct is required.
Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Liu Yuan <namei.unix@gmail.com> Cc: MORITA Kazutaka <morita.kazutaka@gmail.com> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Tue, 26 Aug 2014 18:17:56 +0000 (19:17 +0100)]
qemu-img: always goto out in img_snapshot() error paths
The out label has the qemu_progress_end() and other cleanup calls.
Always goto out in error paths so the cleanup happens. These error
paths now return 1 instead of -1.
Note that bdrv_unref(NULL) is safe. We just need to initialize bs to
NULL at the top of the function.
We can now remove the obsolete bs_old_backing = NULL and bs_new_backing
= NULL for safe mode. Originally it was necessary in commit 3e85c6fd
("qemu-img rebase") but became useless in commit c2abcce ("qemu-img:
avoid calling exit(1) to release resources properly") because the
variables are already initialized during declaration.
Reported-by: John Snow <jsnow@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
Stefan Hajnoczi [Tue, 26 Aug 2014 18:17:55 +0000 (19:17 +0100)]
qemu-img: fix img_compare() flags error path
If img_compare() fails to parse the cache flags the goto out3 code path
will call qemu_progress_end(). Make sure we actually call
qemu_progress_init() first.
Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
The curl hardcoded timeout (5 seconds) sometimes is not long
enough depending on the remote server configuration and network
traffic. The user should be able to set how much long he is
willing to wait for the connection.
Adding a new option to set this timeout gives the user this
flexibility. The previous default timeout of 5 seconds will be
used if this option is not present.
Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Benoit Canet <benoit.canet@nodalink.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We identify devices by their Open Firmware device paths. The encoding
of bus numbers is incorrect: idebus_get_fw_dev_path() formats them in
decimal, while SeaBIOS uses hexadecimal. With bus number > 9, SeaBIOS
will miss the bootindex (lucky case), or apply it to another device
(unlucky case).
Bug can't bite right now: ich9-ahci has six ports, and the sysbus-ahci
created by Calxeda Highbank has just one.
Fix it anyway, by changing %d to %x.
I couldn't find an Open Firmware spec covering this. For what it's
worth, OVMF agrees with SeaBIOS.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Peter Maydell [Thu, 28 Aug 2014 16:08:13 +0000 (17:08 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
SCSI patches include bug fixes from Fam and Peter, improved error
reporting from Fam and a fix for DPRINTF bitrot. Memory patches try
again to initialize name from the QOM name.
# gpg: Signature made Thu 28 Aug 2014 15:10:31 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: aka "Paolo Bonzini <bonzini@gnu.org>"
* remotes/bonzini/tags/for-upstream:
memory: Lazy init name from QOM name as needed
xen: hvm: Abstract away memory region name ref
xen-hvm: Constify string
virtio-scsi: Report error if num_queues is 0 or too large
scsi-generic: remove superfluous DPRINTF avoid to break compiling
block/iscsi: fix memory corruption on iscsi resize
scsi-bus: Convert DeviceClass init to realize
block: Pass errp in blkconf_geometry
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 28 Aug 2014 15:07:23 +0000 (16:07 +0100)]
Merge remote-tracking branch 'remotes/kvm/tags/for-upstream' into staging
Mostly bugfixes + Alexey's interface-based implementation
of the NMI monitor command.
# gpg: Signature made Thu 28 Aug 2014 15:07:22 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: aka "Paolo Bonzini <bonzini@gnu.org>"
* remotes/kvm/tags/for-upstream:
mc146818rtc: reinitialize irq_reinject_on_ack_count on reset
target-i386: Add "tsc_adjust" CPU feature name
target-i386: Add "mpx" CPU feature name
vl: process -object after other backend options
checkpatch.pl: adjust typedef definition to QEMU coding style
x86: Clear MTRRs on vCPU reset
x86: kvm: Add MTRR support for kvm_get|put_msrs()
x86: Use common variable range MTRR counts
target-i386: Don't forbid NX bit on PAE PDEs and PTEs
spapr: Add support for new NMI interface
s390x: Migrate to new NMI interface
s390x: Convert QEMUMachine to MachineClass
cpus: Define callback for QEMU "nmi" command
kvm: run cpu state synchronization on target vcpu thread
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
To support name retrieval of MemoryRegions that were created
dynamically (that is, not via memory_region_init and friends). We
cache the name in MemoryRegion's state as
object_get_canonical_path_component mallocs the returned value
so it's not suitable for direct return to callers. Memory already
frees the name field, so this will be garbage collected along with
the MR object.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The mr->name field is removed. This slipped through compile testing.
Fix.
Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It's constant, and sourced from existing const strings. Avoid dodgy
casts by converting to const.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Thu, 28 Aug 2014 13:51:12 +0000 (14:51 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/fix-buildbot-12082014-pull-request' into staging
Pull request
# gpg: Signature made Thu 28 Aug 2014 13:43:00 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
* remotes/stefanha/tags/fix-buildbot-12082014-pull-request:
Revert "qemu-img: sort block formats in help message"
block: sort formats alphabetically in bdrv_iterate_format()
mirror: fix uninitialized variable delay_ns warnings
trace: avoid Python 2.5 all() in tracetool
libqtest: launch QEMU with QEMU_AUDIO_DRV=none
qapi.py: avoid Python 2.5+ any() function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:51 +0000 (12:08 +0100)]
qapi.py: avoid Python 2.5+ any() function
There is one instance of any() in qapi.py that breaks builds on older
distros that ship Python 2.4 (like RHEL5):
GEN qmp-commands.h
Traceback (most recent call last):
File "build/scripts/qapi-commands.py", line 445, in ?
exprs = parse_schema(input_file)
File "build/scripts/qapi.py", line 329, in parse_schema
schema = QAPISchema(open(input_file, "r"))
File "build/scripts/qapi.py", line 110, in __init__
if any(include_path == elem[1]
NameError: global name 'any' is not defined
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Paolo Bonzini [Tue, 10 Jun 2014 08:52:02 +0000 (10:52 +0200)]
checkpatch.pl: adjust typedef definition to QEMU coding style
Most QEMU typedefs are camelcase, starting with one uppercase letter
and containing at least one lowercase letter. There are a few
all-uppercase types, add the most common too.
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Markus Armbruster <armbru@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Gonglei [Fri, 22 Aug 2014 02:01:50 +0000 (10:01 +0800)]
scsi-generic: remove superfluous DPRINTF avoid to break compiling
variables lun and tag had been eliminated, break compiling
when enable debug switch. Meanwhile traces provide the same
information with this DPRINTF, so remove it.
Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Lieven [Fri, 22 Aug 2014 08:08:49 +0000 (10:08 +0200)]
block/iscsi: fix memory corruption on iscsi resize
bs->total_sectors is not yet updated at this point. resulting
in memory corruption if the volume has grown and data is written
to the newly availble areas.
CC: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fam Zheng [Tue, 12 Aug 2014 02:12:55 +0000 (10:12 +0800)]
scsi-bus: Convert DeviceClass init to realize
Replace "init/destroy" with "realize/unrealize" in SCSIDeviceClass,
which has errp as a parameter. So all the implementations now use
error_setg instead of error_report for reporting error.
Also in scsi_bus_legacy_handle_cmdline, report the error when
initializing the if=scsi devices, before returning it, because in the
callee, error_report is changed to error_setg. And the callers don't
have the right locations (e.g. "-drive if=scsi").
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alex Williamson [Mon, 25 Aug 2014 18:10:15 +0000 (12:10 -0600)]
vfio: Enable NVIDIA 88000 region quirk regardless of VGA
If we make use of OVMF for the BIOS then we can use GPUs without VGA
space access, but we still need this quirk. Disassociate it from the
x-vga option and enable it on all NVIDIA VGA display class devices.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Peter Maydell [Mon, 25 Aug 2014 17:49:25 +0000 (18:49 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features
A bunch of bugfixes - these will make sense for 2.1.1
ACPI support for TPM and partial ARI support for PCIE.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 24 Aug 2014 23:16:35 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
* remotes/mst/tags/for_upstream:
pcie: fix trailing whitespace
ioh3420: Enable ARI forwarding
ioh3420: Remove obsoleted, unused ioh3420_init function
pcie: Rename the pcie_cap_ari_* functions to pcie_cap_arifwd_*
pcie: Fix incorrect write to the ari capability next function field
ssdt-tpm: add generated hex file to git
Add ACPI tables for TPM
pc: reserve more memory for ACPI for new machine types
pcihp: fix possible array out of bounds
pci_bridge: manually destroy memory regions within PCIBridgeWindows
hostmem: set MPOL_MF_MOVE
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alex Williamson [Thu, 14 Aug 2014 21:39:39 +0000 (15:39 -0600)]
x86: Clear MTRRs on vCPU reset
The SDM specifies (June 2014 Vol3 11.11.5):
On a hardware reset, the P6 and more recent processors clear the
valid flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
MTRRs are undefined.
We currently do none of that, so whatever MTRR settings you had prior
to reset is what you have after reset. Usually this doesn't matter
because KVM often ignores the guest mappings and uses write-back
anyway. However, if you have an assigned device and an IOMMU that
allows NoSnoop for that device, KVM defers to the guest memory
mappings which are now stale after reset. The result is that OVMF
rebooting on such a configuration takes a full minute to LZMA
decompress the firmware volume, a process that is nearly instant on
the initial boot.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alex Williamson [Thu, 14 Aug 2014 21:39:33 +0000 (15:39 -0600)]
x86: kvm: Add MTRR support for kvm_get|put_msrs()
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*. This means that on migration, the
target loses MTRR state from the source. Generally that's ok though
because KVM ignores it and maps everything as write-back anyway. The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent. In that case KVM trusts the guest mapping of memory as
configured in the MTRR.
This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration. kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.
Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alex Williamson [Thu, 14 Aug 2014 21:39:27 +0000 (15:39 -0600)]
x86: Use common variable range MTRR counts
We currently define the number of variable range MTRR registers as 8
in the CPUX86State structure and vmstate, but use MSR_MTRRcap_VCNT
(also 8) to report to guests the number available. Change this to
use MSR_MTRRcap_VCNT consistently.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
William Grant [Sun, 24 Aug 2014 05:13:48 +0000 (15:13 +1000)]
target-i386: Don't forbid NX bit on PAE PDEs and PTEs
Commit e8f6d00c30ed88910d0d985f4b2bf41654172ceb ("target-i386: raise
page fault for reserved physical address bits") added a check that the
NX bit is not set on PAE PDPEs, but it also added it to rsvd_mask for
the rest of the function. This caused any PDEs or PTEs with NX set to be
erroneously rejected, making PAE guests with NX support unusable.
Signed-off-by: William Grant <wgrant@ubuntu.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell [Mon, 25 Aug 2014 16:34:30 +0000 (17:34 +0100)]
Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-08-24' into staging
trivial patches for 2014-08-24
# gpg: Signature made Sun 24 Aug 2014 14:28:49 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg: aka "Michael Tokarev <mjt@corpit.ru>"
# gpg: aka "Michael Tokarev <mjt@debian.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
# Subkey fingerprint: 6F67 E18E 7C91 C5B1 5514 66A7 BEE5 9D74 A4C3 D7DB
* remotes/mjt/tags/trivial-patches-2014-08-24:
vmxnet3: Pad short frames to minimum size (60 bytes)
libdecnumber: Fix warnings from smatch (missing static, boolean operations)
linux-user: fix file descriptor leaks
po: Fix Makefile rules for in-tree builds without configuration
slirp/misc: Use the GLib memory allocation APIs
configure: no need to mkdir QMP
dma: axidma: Variablise repeated s->streams[i] sub-expr
microblaze: ml605: Get rid of ddr_base variable
tests/bios-tables-test: check the value returned by fopen()
tcg: dump op count into qemu log
util/path: Use the GLib memory allocation routines
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements an NMI interface for s390 and s390-ccw machines.
This removes #ifdef s390 branch in qmp_inject_nmi so new s390's
nmi_monitor_handler() callback is going to be used for NMI.
Since nmi_monitor_handler()-calling code is platform independent,
CPUState::cpu_index is used instead of S390CPU::env.cpu_num.
There should not be any change in behaviour as both @cpu_index and
@cpu_num are global CPU numbers.
Note that s390_cpu_restart() already takes care of the specified cpu,
so we don't need to schedule via async_run_on_cpu().
Since the only error s390_cpu_restart() can return is ENOSYS, convert
it to QERR_UNSUPPORTED.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This converts s390-virtio and s390-ccw-virtio machines to QOM MachineClass.
This brings ability to add interfaces to the machine classes. The first
interface for addition will be NMI.
The patch is mechanical so no change in behavior is expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This introduces an NMI (Non Maskable Interrupt) interface with
a single nmi_monitor_handler() method. A machine or a device can
implement it. This searches for an QOM object with this interface
and if it is implemented, calls it. The callback implements an action
required to cause debug crash dump on in-kernel debugger invocation.
The callback returns Error**.
This adds a nmi_monitor_handle() helper which walks through
all objects to find the interface. The interface method is called
for all found instances.
This adds support for it in qmp_inject_nmi(). Since no architecture
supports it at the moment, there is no change in behaviour.
This changes inject-nmi command description for HMP and QMP.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Knut Omang [Sun, 24 Aug 2014 13:32:18 +0000 (15:32 +0200)]
pcie: Rename the pcie_cap_ari_* functions to pcie_cap_arifwd_*
Rename helper functions to make a clearer distinction between
the PCIe capability/control register feature ARI forwarding and a
device that supports the ARI feature via an ARI extended PCIe capability.
Signed-off-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch has Michael Tsirkin's patches folded in.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
pc: reserve more memory for ACPI for new machine types
commit 868270f23d8db2cce83e4f082fe75e8625a5fbf9
acpi-build: tweak acpi migration limits
broke kernel loading with -kernel/-initrd: it doubled
the size of ACPI tables but did not reserve
enough memory.
As a result, issues on boot and halt are observed.
Fix this up by doubling reserved memory for new machine types.
Cc: qemu-stable@nongnu.org Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Gonglei [Wed, 20 Aug 2014 05:52:30 +0000 (13:52 +0800)]
pcihp: fix possible array out of bounds
Prevent out-of-bounds array access on
acpi_pcihp_pci_status.
Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Paolo Bonzini [Wed, 20 Aug 2014 15:50:05 +0000 (17:50 +0200)]
pci_bridge: manually destroy memory regions within PCIBridgeWindows
The regions are destroyed and recreated on configuration space accesses.
We need to destroy them before the containing PCIBridgeWindows object
is freed.
Reported-by: Gonglei <arei.gonglei@huawei.com> Reported-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Ben Draper [Wed, 20 Aug 2014 12:27:14 +0000 (13:27 +0100)]
vmxnet3: Pad short frames to minimum size (60 bytes)
When running VMware ESXi under qemu-kvm the guest discards frames
that are too short. Short ARP Requests will be dropped, this prevents
guests on the same bridge as VMware ESXi from communicating. This patch
simply adds the padding on the network device itself.
Signed-off-by: Ben Draper <ben@xrsa.net> Reviewed-by: Dmitry Fleytman <dmitry@daynix.com> Cc: qemu-stable@nongnu.org Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Stefan Weil [Fri, 18 Jul 2014 14:52:29 +0000 (16:52 +0200)]
po: Fix Makefile rules for in-tree builds without configuration
Adding 'update' to the phony targets fixes this error:
$ LANG=C make -C po update
make: Entering directory `/qemu/po'
LINK update
/qemu/po/de_DE.po: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make: *** [update] Error 1
make: Leaving directory `/qemu/po'
Some other phony targets (build, install) were also added, and the
existing .PHONY statement was moved to a more prominent position at
the beginning of the Makefile.
The patch also fixes a 2nd bug. The default target should be 'all',
but instead 'modules' (from rules.mak) was the default. Fix this by
adding 'all' as a target before any include statement.
Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Tue, 19 Aug 2014 08:30:17 +0000 (16:30 +0800)]
slirp/misc: Use the GLib memory allocation APIs
Here we don't check the return value of malloc() which may fail.
Use the g_new() instead, which will abort the program when
there is not enough memory.
Also, use g_strdup instead of strdup and remove the unnecessary
strdup function.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Mon, 18 Aug 2014 07:54:33 +0000 (15:54 +0800)]
tests/bios-tables-test: check the value returned by fopen()
The function fopen() may fail, so check its return value.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Liu <john.liuli@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Mon, 18 Aug 2014 07:58:08 +0000 (15:58 +0800)]
tcg: dump op count into qemu log
fopen() may fail and it does not check its return vaule here,
it is better to dump op count to the normal log file.
Signed-off-by: Li Liu <john.liuli@huawei.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhanghailiang [Mon, 18 Aug 2014 07:49:22 +0000 (15:49 +0800)]
util/path: Use the GLib memory allocation routines
In this file, we don't check the return value of malloc/strdup/realloc which may fail.
Instead of using these routines, we use the GLib memory APIs g_malloc/g_strdup/g_realloc.
They will exit on allocation failure, so there is no need to test for failure,
which would be fine for setup.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Maydell [Fri, 22 Aug 2014 15:12:51 +0000 (16:12 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches
# gpg: Signature made Fri 22 Aug 2014 14:47:53 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (29 commits)
qemu-img: Allow cache mode specification for amend
qemu-img: Allow source cache mode specification
vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted
blkdebug: Delete BH in bdrv_aio_cancel
qemu-iotests: add test case 101 for short file I/O
raw-posix: fix O_DIRECT short reads
block/iscsi: fix memory corruption on iscsi resize
block/vvfat.c: remove debugging code to reinit stderr if NULL
iotests: Add test for image filename construction
quorum: Implement bdrv_refresh_filename()
nbd: Implement bdrv_refresh_filename()
blkverify: Implement bdrv_refresh_filename()
blkdebug: Implement bdrv_refresh_filename()
block: Add bdrv_refresh_filename()
virtio-blk: fix reference a pointer which might be freed
virtio-blk: allow block_resize with dataplane
block: acquire AioContext in qmp_block_resize()
qemu-iotests: Fix 028 reference output for qed
test-coroutine: test cost introduced by coroutine
iotests: Add test for qcow2's cache options
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Max Reitz [Tue, 22 Jul 2014 20:58:43 +0000 (22:58 +0200)]
qemu-img: Allow cache mode specification for amend
qemu-img amend may extensively modify the target image, depending on the
options to be amended (e.g. conversion to qcow2 compat level 0.10 from
1.1 for an image with many unallocated zero clusters). Therefore it
makes sense to allow the user to specify the cache mode to be used.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Tue, 22 Jul 2014 20:58:42 +0000 (22:58 +0200)]
qemu-img: Allow source cache mode specification
Many qemu-img subcommands only read the source file(s) once. For these
use cases, a full write-back cache is unnecessary and mainly clutters
host cache memory. Though this is generally no concern as cache memory
is freely available and can be scaled by the host OS, it may become a
concern with thin provisioning.
For these cases, it makes sense to allow users to freely specify the
source cache mode (e.g. use no cache at all).
This commit adds a new switch (-T) for the qemu-img subcommands check,
compare, convert and rebase to specify the cache to be used for source
images (the backing file in case of rebase).
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tom Musta [Tue, 12 Aug 2014 18:53:43 +0000 (13:53 -0500)]
linux-user: writev Partial Writes
Although not technically not required by POSIX, the writev system call will
typically write out its buffers individually. That is, if the first buffer
is written successfully, but the second buffer pointer is invalid, then
the first chuck will be written and its size is returned.
Signed-off-by: Tom Musta <tommusta@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:42 +0000 (13:53 -0500)]
linux-user: Support target-to-host translation of mlockall argument
The argument to the mlockall system call is not necessarily the same on
all platforms and thus may require translation prior to passing to the
host.
For example, PowerPC 64 bit platforms define values for MCL_CURRENT
(0x2000) and MCL_FUTURE (0x4000) which are different from Intel platforms
(0x1 and 0x2, respectively)
Signed-off-by: Tom Musta <tommusta@gmail.com> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:41 +0000 (13:53 -0500)]
linux-user: clock_nanosleep errno Handling on PPC
The clock_nanosleep syscall is unusual in that it returns positive
numbers in error handling situations, versus returning -1 and setting
errno, or returning a negative errno value. On POWER, the kernel will
set the SO bit of CR0 to indicate failure in a syscall. QEMU has
generic handling to do this for syscalls with standard return values.
Add special case code for clock_nanosleep to handle CR0 properly.
Signed-off-by: Tom Musta <tommusta@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Wed, 13 Aug 2014 19:04:44 +0000 (14:04 -0500)]
linux-user: Move get_ppc64_abi
The get_ppc64_abi is used to determine the ELF ABI (i.e. V1 or V2). This
routine is currently implemented in the linux-user/elfload.c file but
is useful in other scenarios. Move the routine to a more generally
available location (linux-user/ppc/target_cpu.h).
Signed-off-by: Tom Musta <tommusta@gmail.com> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:39 +0000 (13:53 -0500)]
linux-user: Detect fault in sched_rr_get_interval
Properly detect a fault when attempting to store into an invalid
struct timespec pointer.
Signed-off-by: Tom Musta <tommusta@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:38 +0000 (13:53 -0500)]
linux-user: Handle NULL sched_param argument to sched_*
The sched_getparam, sched_setparam and sched_setscheduler system
calls take a pointer argument to a sched_param structure. When
this pointer is null, errno should be set to EINVAL.
Signed-off-by: Tom Musta <tommusta@gmail.com> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:37 +0000 (13:53 -0500)]
linux-user: Detect Negative Message Sizes in msgsnd System Call
The msgsnd system call takes an argument that describes the message
size (msgsz) and is of type size_t. The system call should set
errno to EINVAL in the event that a negative message size is passed.
Signed-off-by: Tom Musta <tommusta@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Tom Musta [Tue, 12 Aug 2014 18:53:36 +0000 (13:53 -0500)]
linux-user: Conditionally Pass Attribute Pointer to mq_open()
The mq_open system call takes an optional struct mq_attr pointer
argument in the fourth position. This pointer is used when O_CREAT
is specified in the flags (second) argument. It may be NULL, in
which case the queue is created with implementation defined attributes.
Change the code to properly handle the case when NULL is passed in the
arg4 position.
Signed-off-by: Tom Musta <tommusta@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>