Maxime Coquelin [Thu, 16 Nov 2017 18:48:35 +0000 (19:48 +0100)]
vhost: restore avail index from vring used index on disconnection
vhost_virtqueue_stop() gets avail index value from the backend,
except if the backend is not responding.
It happens when the backend crashes, and in this case, internal
state of the virtio queue is inconsistent, making packets
to corrupt the vring state.
With a Linux guest, it results in following error message on
backend reconnection:
[ 22.444905] virtio_net virtio0: output.0:id 0 is not a head!
[ 22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
[ 22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5
Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down") Cc: qemu-stable@nongnu.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2ae39a113af311cb56a0c35b7f212dafcef15303) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Maxime Coquelin [Thu, 16 Nov 2017 18:48:34 +0000 (19:48 +0100)]
virtio: Add queue interface to restore avail index from vring used index
In case of backend crash, it is not possible to restore internal
avail index from the backend value as vhost_get_vring_base
callback fails.
This patch provides a new interface to restore internal avail index
from the vring used index, as done by some vhost-user backend on
reconnection.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Tue, 14 Nov 2017 23:22:23 +0000 (00:22 +0100)]
util/stats64: Fix min/max comparisons
stat64_min_slow() and stat64_max_slow() compare the wrong way. This
makes iotest 136 fail with clang and -m32.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20171114232223.25207-1-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 26a5db322be1e424a815d070ddd04442a5e5df50) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Blake [Mon, 13 Nov 2017 15:24:24 +0000 (09:24 -0600)]
nbd/client: Use error_prepend() correctly
When using error prepend(), it is necessary to end with a space
in the format string; otherwise, messages come out incorrectly,
such as when connecting to a socket that hangs up immediately:
can't open device nbd://localhost:10809/: Failed to read dataUnexpected end-of-file before all bytes were read
Originally botched in commit e44ed99d, then several more instances
added in the meantime.
Pre-existing and not fixed here: we are inconsistent on capitalization;
some of our messages start with lower case, and others start with upper,
although the use of error_prepend() is much nicer to read when all
fragments consistently start with lower.
CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171113152424.25381-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit cb6b1a3fc30c52ffd94ed0b69ca5991b19651724) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
net: fix check for number of parameters to -netdev socket
Since commit 0f8c289ad "net: fix -netdev socket,fd= for UDP sockets"
we allow more than one parameter for -netdev socket. But now
we run into an assert when no parameter at all is specified
Fix this by reverting the change of the if condition done in 0f8c289ad.
Cc: Jason Wang <jasowang@redhat.com> Cc: qemu-stable@nongnu.org Fixes: 0f8c289ad539feb5135c545bea947b310a893f4b Reported-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Signed-off-by: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit ff86d5762552787f1fcb7da695ec4f8c1be754b4)
Conflicts:
net/socket.c
* drop context dep on 0522a959 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jens Freimann [Mon, 6 Nov 2017 14:05:46 +0000 (15:05 +0100)]
net/socket: fix coverity issue
This fixes coverity issue CID1005339.
Make sure that saddr is not used uninitialized if the
mcast parameter is NULL.
Cc: qemu-stable@nongnu.org Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit bb160b571fe469b03228d4502c75a18045978a74) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Auger [Tue, 7 Nov 2017 13:03:52 +0000 (13:03 +0000)]
hw/intc/arm_gicv3_its: Don't abort on table save failure
The ITS is not fully properly reset at the moment. Caches are
not emptied.
After a reset, in case we attempt to save the state before
the bound devices have registered their MSIs and after the
1st level table has been allocated by the ITS driver
(device BASER is valid), the first level entries are still
invalid. If the device cache is not empty (devices registered
before the reset), vgic_its_save_device_tables fails with -EINVAL.
This causes a QEMU abort().
Cc: qemu-stable@nongnu.org Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: wanghaibin <wanghaibin.wang@huawei.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 8a7348b5d62d7ea16807e6bea54b448a0184bb0f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Tue, 7 Nov 2017 13:03:51 +0000 (13:03 +0000)]
translate.c: Fix usermode big-endian AArch32 LDREXD and STREXD
For AArch32 LDREXD and STREXD, architecturally the 32-bit word at the
lowest address is always Rt and the one at addr+4 is Rt2, even if the
CPU is big-endian. Our implementation does these with a single
64-bit store, so if we're big-endian then we need to put the two
32-bit halves together in the opposite order to little-endian,
so that they end up in the right places. We were trying to do
this with the gen_aa32_frob64() function, but that is not correct
for the usermode emulator, because there there is a distinction
between "load a 64 bit value" (which does a BE 64-bit access
and doesn't need swapping) and "load two 32 bit values as one
64 bit access" (where we still need to do the swapping, like
system mode BE32).
Fixes: https://bugs.launchpad.net/qemu/+bug/1725267 Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1509622400-13351-1-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 3448d47b3172015006b79197eb5a69826c6a7b6d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Tue, 17 Oct 2017 19:49:14 +0000 (21:49 +0200)]
ppc: fix setting of compat mode
While trying to make KVM PR usable again, commit 5dfaa532ae introduced a
regression: the current compat_pvr value is passed to KVM instead of the
new one. This means that we always pass 0 instead of the max-cpu-compat
PVR during the initial machine reset. And at CAS time, we either pass
the PVR from the command line or even don't call kvmppc_set_compat() at
all, ie, the PCR will not be set as expected.
For example if we start a big endian fedora26 guest in power7 compat
mode on a POWER8 host, we get this in the guest:
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
MMU : Hash
but the guest can still execute POWER8 instructions, and the following
program succeeds:
int main()
{
asm("vncipher 0,0,0"); // ISA 2.07 instruction
}
Let's pass the new compat_pvr to kvmppc_set_compat() and the program fails
with SIGILL as expected.
Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit e4f0c6bb1a9f72ad9e32c3171d36bae17ea1cd67) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
io: monitor encoutput buffer size from websocket GSource
The websocket GSource is monitoring the size of the rawoutput
buffer to determine if the channel can accepts more writes.
The rawoutput buffer, however, is merely a temporary staging
buffer before data is copied into the encoutput buffer. Thus
its size will always be zero when the GSource runs.
This flaw causes the encoutput buffer to grow without bound
if the other end of the underlying data channel doesn't
read data being sent. This can be seen with VNC if a client
is on a slow WAN link and the guest OS is sending many screen
updates. A malicious VNC client can act like it is on a slow
link by playing a video in the guest and then reading data
very slowly, causing QEMU host memory to expand arbitrarily.
This issue is assigned CVE-2017-15268, publically reported in
Paolo Bonzini [Tue, 10 Oct 2017 15:14:44 +0000 (17:14 +0200)]
nios2: define tcg_env
This should be done by all target and, since commit 53f6672bcf
("gen-icount: use tcg_ctx.tcg_env instead of cpu_env", 2017-06-30),
is causing the NIOS2 target to hang.
This is because the test for "should I exit to the main loop"
was being done with the correct offset to the icount decrementer,
but using TCG temporary 0 (the frame pointer) rather than the
env pointer.
Cc: qemu-stable@nongnu.org Cc: Marek Vasut <marex@denx.de> Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 17bd9597be45b96ae00716b0ae01a4d11bbee1ab) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 9 Oct 2017 21:55:33 +0000 (23:55 +0200)]
iotests: Add cluster_size=64k to 125
Apparently it would be a good idea to test that, too.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171009215533.12530-4-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 4c112a397c2f61038914fa315a7896ce6d645d18) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 9 Oct 2017 21:55:32 +0000 (23:55 +0200)]
qcow2: Always execute preallocate() in a coroutine
Some qcow2 functions (at least perform_cow()) expect s->lock to be
taken. Therefore, if we want to make use of them, we should execute
preallocate() (as "preallocate_co") in a coroutine so that we can use
the qemu_co_mutex_* functions.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171009215533.12530-3-mreitz@redhat.com Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 572b07bea1d1a0f7726fd95c2613c76002a379bc) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Mon, 9 Oct 2017 21:55:31 +0000 (23:55 +0200)]
qcow2: Fix unaligned preallocated truncation
A qcow2 image file's length is not required to have a length that is a
multiple of the cluster size. However, qcow2_refcount_area() expects an
aligned value for its @start_offset parameter, so we need to round
@old_file_size up to the next cluster boundary.
Reported-by: Ping Li <pingl@redhat.com>
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1414049 Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171009215533.12530-2-mreitz@redhat.com Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit e400ad1e1f0127b4fdabcb1c8de1e99be91788df) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Michael Olbrich [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
hw/sd: fix out-of-bounds check for multi block reads
The current code checks if the next block exceeds the size of the card.
This generates an error while reading the last block of the card.
Do the out-of-bounds check when starting to read a new block to fix this.
This issue became visible with increased error checking in Linux 4.13.
Cc: qemu-stable@nongnu.org Signed-off-by: Michael Olbrich <m.olbrich@pengutronix.de> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 20170916091611.10241-1-m.olbrich@pengutronix.de Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 8573378e62d19e25a2434e23462ec99ef4d065ac) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Xu [Tue, 10 Oct 2017 09:42:46 +0000 (11:42 +0200)]
exec: simplify address_space_get_iotlb_entry
This patch let address_space_get_iotlb_entry() to use the newly
introduced page_mask parameter in flatview_do_translate(). Then we
will be sure the IOTLB can be aligned to page mask, also we should
nicely support huge pages now when introducing a764040.
Fixes: a764040 ("exec: abstract address_space_do_translate()") Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20171010094247.10173-3-maxime.coquelin@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 076a93d7972c9c1e3839d2f65edc32568a2cce93) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Xu [Tue, 10 Oct 2017 09:42:45 +0000 (11:42 +0200)]
exec: add page_mask for flatview_do_translate
The function is originally used for flatview_space_translate() and what
we care about most is (xlat, plen) range. However for iotlb requests, we
don't really care about "plen", but the size of the page that "xlat" is
located on. While, plen cannot really contain this information.
A simple example to show why "plen" is not good for IOTLB translations:
E.g., for huge pages, it is possible that guest mapped 1G huge page on
device side that used this GPA range:
0x100000000 - 0x13fffffff
Then let's say we want to translate one IOVA that finally mapped to GPA
0x13ffffe00 (which is located on this 1G huge page). Then here we'll
get:
(xlat, plen) = (0x13fffe00, 0x200)
So the IOTLB would be only covering a very small range since from
"plen" (which is 0x200 bytes) we cannot tell the size of the page.
Actually we can really know that this is a huge page - we just throw the
information away in flatview_do_translate().
This patch introduced "page_mask" optional parameter to capture that
page mask info. Also, I made "plen" an optional parameter as well, with
some comments for the whole function.
No functional change yet.
Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20171010094247.10173-2-maxime.coquelin@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d5e5fafd11be4458443c43f19c1ebdd24d99a751) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This shares an cached empty FlatView among address spaces. The empty
FV is used every time when a root MR renders into a FV without memory
sections which happens when MR or its children are not enabled or
zero-sized. The empty_view is not NULL to keep the rest of memory
API intact; it also has a dispatch tree for the same reason.
On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves
the amount of FlatView's in use (557 -> 260) and dispatch tables
(~800000 -> ~370000). In an unrelated experiment with 112 non-virtio
devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000
are created at startup.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-16-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 092aa2fc65b7a35121616aad8f39d47b8f921618) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Thu, 21 Sep 2017 10:28:16 +0000 (12:28 +0200)]
memory: seek FlatView sharing candidates among children subregions
A container can be used instead of an alias to allow switching between
multiple subregions. In this case we cannot directly share the
subregions (since they only belong to a single parent), but if the
subregions are aliases we can in turn walk those.
This is not enough to remove all source of quadratic FlatView creation,
but it enables sharing of the PCI bus master FlatViews (and their
AddressSpaceDispatch structures) across all PCI devices. For 112
virtio-net-pci devices, boot time is reduced from 25 to 10 seconds and
memory consumption from 1.4 to 1 G.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e673ba9af9bf8fd8e0f44025ac738b8285b3ed27) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Thu, 21 Sep 2017 10:34:00 +0000 (12:34 +0200)]
memory: trace FlatView creation and destruction
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 02d9651d6a46479e9d70b72dca34e43605d06cda) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This avoids usual memory_region_transaction_commit() which rebuilds
all FVs.
On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this brings
down the boot time from 25s to 20s and reduces the amount of temporary FVs
allocated during machine constructon (~800000 -> ~640000) and amount of
temporary dispatch trees (~370000 -> ~300000), the total memory footprint
goes down (18G -> 17G).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-18-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 202fc01b05572ecb258fdf4c5bd56cf6de8140c7) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Since FlatViews are shared now and ASes not, this gets rid of
address_space_init_shareable().
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-17-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b516572f31c0ea0937cd9d11d9bd72dd83809886)
Conflicts:
target/arm/cpu.c
* drop context deps on 1d2091bc and 1e577cc7 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
memory: Share FlatView's and dispatch trees between address spaces
This allows sharing flat views between address spaces (AS) when
the same root memory region is used when creating a new address space.
This is done by walking through all ASes and caching one FlatView per
a physical root MR (i.e. not aliased).
This removes search for duplicates from address_space_init_shareable() as
FlatViews are shared elsewhere and keeping as::ref_count correct seems
an unnecessary and useless complication.
This should cause no change and memory use or boot time yet.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-13-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 967dc9b1194a9281124b2e1ce67b6c3359a2138f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Address spaces get to keep a root MR (alias or not) but FlatView stores
the actual MR as this is going to be used later on to decide whether to
share a particular FlatView or not.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-10-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 89c177bbdd6cf8e50b3fd4831697d50e195d6432) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Thu, 21 Sep 2017 12:32:47 +0000 (14:32 +0200)]
memory: avoid "resurrection" of dead FlatViews
It's possible for address_space_get_flatview() as it currently stands
to cause a use-after-free for the returned FlatView, if the reference
count is incremented after the FlatView has been replaced by a writer:
Since FlatViews are not updated very often, we can just detect the
situation using a new atomic op atomic_fetch_inc_nonzero, similar to
Linux's atomic_inc_not_zero, which performs the refcount increment only if
it hasn't already hit zero. This is similar to Linux commit de09a9771a53
("CRED: Fix get_task_cred() and task_state() to not resurrect dead
credentials", 2010-07-29).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 447b0d0b9ee8a0ac216c3186e0f3c427a1001f0c)
Conflicts:
docs/devel/atomics.txt
* drop documentation ref to atomic_fetch_xor
* prereq for 166206845f Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
memory: Move AddressSpaceDispatch from AddressSpace to FlatView
As we are going to share FlatView's between AddressSpace's,
and AddressSpaceDispatch is a structure to perform quick lookup
in FlatView, this moves ASD to FlatView.
After previosly open coded ASD rendering, we can also remove
as->next_dispatch as the new FlatView pointer is stored
on a stack and set to an AS atomically.
flatview_destroy() is executed under RCU instead of
address_space_dispatch_free() now.
This makes mem_begin/mem_commit to work with ASD and mem_add with FV
as later on mem_add will be taking FV as an argument anyway.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-5-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 66a6df1dc6d5b28cc3e65db0d71683fbdddc6b62) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This moves a FlatView allocation and initialization to a helper.
While we are nere, replace g_new with g_new0 to not to bother if we add
new fields in the future.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-4-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cc94cd6d36602d976a5e7bc29134d1eaefb4102e) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We are going to share FlatView's between AddressSpace's and per-AS
memory listeners won't suit the purpose anymore so open code
the dispatch tree rendering.
Since there is a good chance that dispatch_listener was the only
listener, this avoids address_space_update_topology_pass() if there is
no registered listeners; this should improve starting time.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-3-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 9a62e24f45bc97f8eaf198caf58906b47c50a8d5) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Blake [Thu, 5 Oct 2017 19:02:47 +0000 (14:02 -0500)]
block: Perform copy-on-read in loop
Improve our braindead copy-on-read implementation. Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G. While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read
Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits. Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.
Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.
CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit cb2e28780c7080af489e72227683fe374f05022d)
Conflicts:
block/io.c
* remove context dep on d855ebcd3 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jim Somerville [Fri, 29 Sep 2017 16:00:19 +0000 (12:00 -0400)]
kvmclock: use the updated system_timer_msr
Fixes e2b6c17 (kvmclock: update system_time_msr address forcibly)
which makes a call to get the latest value of the address
stored in system_timer_msr, but then uses the old address anyway.
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Message-Id: <59b67db0bd15a46ab47c3aa657c81a4c11f168ea.1506702472.git.Jim.Somerville@windriver.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 346b1215b1e9f7cc6d8fe9fb6f3c2778b890afb6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
block/mirror: check backing in bdrv_mirror_top_flush
Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit ce960aa9062a407d0ca15aee3dcd3bd84a4e24f9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Wed, 27 Sep 2017 15:28:26 +0000 (17:28 +0200)]
hw/usb/bus: Remove bad object_unparent() from usb_try_create_simple()
Valgrind detects an invalid read operation when hot-plugging of an
USB device fails:
$ valgrind x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598== Memcheck, a memory error detector
==30598== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30598== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==30598== Command: x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598==
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
==30598== Invalid read of size 8
==30598== at 0x60EF50: object_unparent (object.c:445)
==30598== by 0x580F0D: usb_try_create_simple (bus.c:346)
==30598== by 0x581BEB: usb_claim_port (bus.c:451)
==30598== by 0x582310: usb_qdev_realize (bus.c:257)
==30598== by 0x4CB399: device_set_realized (qdev.c:914)
==30598== by 0x60E26D: property_set_bool (object.c:1886)
==30598== by 0x61235E: object_property_set_qobject (qom-qobject.c:27)
==30598== by 0x61000F: object_property_set_bool (object.c:1162)
==30598== by 0x4567C3: qdev_device_add (qdev-monitor.c:630)
==30598== by 0x456D52: qmp_device_add (qdev-monitor.c:807)
==30598== by 0x470A99: hmp_device_add (hmp.c:1933)
==30598== by 0x3679C3: handle_hmp_command (monitor.c:3123)
The object_unparent() here is not necessary anymore since commit 69382d8b3e8600b3 ("qdev: Fix object reference leak in case device.realize()
fails"), so let's remove it now.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1506526106-30971-1-git-send-email-thuth@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f3b2bea3c76ba9283b957f1373e7cebdbf863059) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].
At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.
In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.
A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.
With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.
This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:
- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.
- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 10f12e6450407b18b4d5a6b50d3852dcfd7fff75) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Lieven [Tue, 26 Sep 2017 10:33:16 +0000 (12:33 +0200)]
migration: disable auto-converge during bulk block migration
auto-converge and block migration currently do not play well together.
During block migration the auto-converge logic detects that ram
migration makes no progress and thus throttles down the vm until
it nearly stalls completely. Avoid this by disabling the throttling
logic during the bulk phase of the block migration.
Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1506421996-12513-1-git-send-email-pl@kamp.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 9ac78b6171bec47083a9b6ce88dc1f114caea2f9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This patch prevents PCI passthrough hotplug on Xen. Even if the Xen tool
stack prepares its own ACPI tables, we still rely on QEMU for hotplug
ACPI notifications.
The original issue is fixed by the two previous patch:
hw/acpi: Limit hotplug to root bus on legacy mode
hw/acpi: Move acpi_set_pci_info to pcihp
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2bed1ba77fae50bc8b5e68ede2d80b652b30c3b8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Anthony PERARD [Wed, 6 Sep 2017 13:40:32 +0000 (14:40 +0100)]
hw/acpi: Move acpi_set_pci_info to pcihp
HW part of ACPI PCI hotplug in QEMU depends on ACPI_PCIHP_PROP_BSEL
being set on a PCI bus that supports ACPI hotplug. It should work
regardless of the source of ACPI tables (QEMU generator/legacy SeaBIOS/Xen).
So move ACPI_PCIHP_PROP_BSEL initialization into HW ACPI implementation
part from QEMU's ACPI table generator.
To do PCI passthrough with Xen, the property ACPI_PCIHP_PROP_BSEL needs
to be set, but this was done only when ACPI tables are built which is
not needed for a Xen guest. The need for the property starts with commit
"pc: pcihp: avoid adding ACPI_PCIHP_PROP_BSEL twice"
(f0c9d64a68b776374ec4732424a3e27753ce37b6).
Adding find_i440fx into stubs so that mips-softmmu target can be built.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ab938ae43f8a3a71a3525566edf586081b7a7452) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Anthony PERARD [Wed, 6 Sep 2017 13:40:31 +0000 (14:40 +0100)]
hw/acpi: Limit hotplug to root bus on legacy mode
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f5855994fee2f8815dc86b8453e4a63e290aea05) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Stefan Hajnoczi [Tue, 29 Aug 2017 12:27:43 +0000 (13:27 +0100)]
nbd-client: avoid read_reply_co entry if send failed
The following segfault is encountered if the NBD server closes the UNIX
domain socket immediately after negotiation:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
441 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
(gdb) bt
#0 0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
#1 0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207
#2 0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237
#3 0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836
#4 0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, f
+lags=0) at block/io.c:1086
#5 0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182
#6 0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032
#7 0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079
#8 0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
#9 0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6
The problem is that nbd_client_init() uses
nbd_client_attach_aio_context() -> aio_co_schedule(new_context,
client->read_reply_co). Execution of read_reply_co is deferred to a BH
which doesn't run until later.
In the mean time blk_co_preadv() can be called and nbd_coroutine_end()
calls aio_wake() on read_reply_co. At this point in time
read_reply_co's ctx isn't set because it has never been entered yet.
This patch simplifies the nbd_co_send_request() ->
nbd_co_receive_reply() -> nbd_coroutine_end() lifecycle to just
nbd_co_send_request() -> nbd_co_receive_reply(). The request is "ended"
if an error occurs at any point. Callers no longer have to invoke
nbd_coroutine_end().
This cleanup also eliminates the segfault because we don't call
aio_co_schedule() to wake up s->read_reply_co if sending the request
failed. It is only necessary to wake up s->read_reply_co if a reply was
received.
Note this only happens with UNIX domain sockets on Linux. It doesn't
seem possible to reproduce this with TCP sockets.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20170829122745.14309-2-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 3c2d5183f9fa4eac3d17d841e26da65a0181ae7b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The mmio path (see exec.c:prepare_mmio_access) already protects itself
against recursive locking and it makes sense to do the same for
io_readx/writex. Otherwise any helper running in the BQL context will
assert when it attempts to write to device memory as in the case of
the bug report.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> CC: Richard Jones <rjones@redhat.com> CC: Paolo Bonzini <bonzini@gnu.org> CC: qemu-stable@nongnu.org
Message-Id: <20170921110625.9500-1-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 8b81253332b5a3f3c67b6462f39caef47a00dd29) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
block/qcow2-bitmap: fix use of uninitialized pointer
Without initialization to zero dirty_bitmap field may be not zero
for a bitmap which should not be stored and
qcow2_store_persistent_dirty_bitmaps will erroneously call
store_bitmap for it which leads to SIGSEGV on bdrv_dirty_bitmap_name.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170922144353.4220-1-vsementsov@virtuozzo.com Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 5330f32b71b1868bdb3b444733063cb5adc4e8e6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
block/throttle-groups.c: allocate RestartData on the heap
RestartData is the opaque data of the throttle_group_restart_queue_entry
coroutine. By being stack allocated, it isn't available anymore if
aio_co_enter schedules the coroutine with a bottom half and runs after
throttle_group_restart_queue returns.
Cc: qemu-stable@nongnu.org Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 43a5dc02fd6070827d5c4ff652b885219fa8cbe1)
Conflicts:
block/throttle-groups.c
* reworked to avoid functional dep on 022cdc9, since that involves
refactoring for a feature not present in 2.10 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Blake [Thu, 14 Sep 2017 13:49:23 +0000 (08:49 -0500)]
osdep: Fix ROUND_UP(64-bit, 32-bit)
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using a ternary to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512U) produces 0,
instead of the intended 2TiB, because negation of an unsigned
32-bit quantity followed by widening to 64-bits does not
sign-extend the mask.
Broken since its introduction in commit 292c8e50 (v1.5.0).
Callers that passed the same width type to both macro parameters,
or that had other code to ensure the first parameter's maximum
runtime value did not exceed the second parameter's width, are
unaffected, but I did not audit to see which (if any) existing
clients of the macro could trigger incorrect behavior (I found
the bug while adding a new use of the macro).
While preparing the patch, checkpatch complained about poor
spacing, so I also fixed that here and in the nearby DIV_ROUND_UP.
CC: qemu-trivial@nongnu.org CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 2098b073f398cd628c09c5a78537a6854e85830d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The migration interface for ais was introduced with kernel 4.13
but the capability itself had been active since 4.12. As migration
support is considered necessary lets disable ais in the 2.10
stable version. A proper fix and re-enablement will be done
for qemu 2.11.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20170921140834.14233-2-borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 3f2d07b3b01ea61126b382633ab4006320923048) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jan Dakinevich [Wed, 20 Sep 2017 06:48:52 +0000 (08:48 +0200)]
9pfs: check the size of transport buffer before marshaling
v9fs_do_readdir_with_stat() should check for a maximum buffer size
before an attempt to marshal gathered data. Otherwise, buffers assumed
as misconfigured and the transport would be broken.
The patch brings v9fs_do_readdir_with_stat() in conformity with
v9fs_do_readdir() behavior.
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
[groug, regression caused my commit 8d37de41cab1 # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 772a73692ecb52bace0cff6f95df62f59b8cabe0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jan Dakinevich [Wed, 20 Sep 2017 06:48:52 +0000 (08:48 +0200)]
9pfs: fix name_to_path assertion in v9fs_complete_rename()
The third parameter of v9fs_co_name_to_path() must not contain `/'
character.
The issue is most likely related to 9p2000.u protocol only.
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
[groug, regression caused by commit f57f5878578a # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 4d8bc7334b06ef01a21cad3d1eb8dc183037a06b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jan Dakinevich [Wed, 20 Sep 2017 06:48:51 +0000 (08:48 +0200)]
9pfs: fix readdir() for 9p2000.u
If the client is using 9p2000.u, the following occurs:
$ cd ${virtfs_shared_dir}
$ mkdir -p a/b/c
$ ls a/b
ls: cannot access 'a/b/a': No such file or directory
ls: cannot access 'a/b/b': No such file or directory
a b c
instead of the expected:
$ ls a/b
c
This is a regression introduced by commit f57f5878578a;
local_name_to_path() now resolves ".." and "." in paths,
and v9fs_do_readdir_with_stat()->stat_to_v9stat() then
copies the basename of the resulting path to the response.
With the example above, this means that "." and ".." are
turned into "b" and "a" respectively...
stat_to_v9stat() currently assumes it is passed a full
canonicalized path and uses it to do two different things:
1) to pass it to v9fs_co_readlink() in case the file is a symbolic
link
2) to set the name field of the V9fsStat structure to the basename
part of the given path
It only has two users: v9fs_stat() and v9fs_do_readdir_with_stat().
v9fs_stat() really needs 1) and 2) to be performed since it starts
with the full canonicalized path stored in the fid. It is different
for v9fs_do_readdir_with_stat() though because the name we want to
put into the V9fsStat structure is the d_name field of the dirent
actually (ie, we want to keep the "." and ".." special names). So,
we only need 1) in this case.
This patch hence adds a basename argument to stat_to_v9stat(), to
be used to set the name field of the V9fsStat structure, and moves
the basename logic to v9fs_stat().
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
(groug, renamed old name argument to path and updated changelog) Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 6069537f4336a59054afda91a6545d3648c64619) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
virtio-gpu can trigger the assert added by commit "6905b93447 console:
add same surface replace pre-condition" in multihead setups (where
surface can be NULL for secondary displays). Allow surface being NULL.
Igor Mammedov [Mon, 18 Sep 2017 19:01:25 +0000 (15:01 -0400)]
ide: ahci: unparent children buses before freeing their memory
Fixes read after freeing error reported
https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04243.html
Message-Id: <59a56959-ca12-ea75-33fa-ff07eba1b090@redhat.com>
ich9-ahci device creates ide buses and attaches them as QOM children
at realize time, however it forgets to properly clean them up
at unrealize time and frees memory containing these children,
with following call-chain:
qdev_device_add()
object_property_set_bool('realized', true)
device_set_realized()
...
pci_qdev_realize() -> pci_ich9_ahci_realize() -> ahci_realize()
...
s->dev = g_new0(AHCIDevice, ports);
...
AHCIDevice *ad = &s->dev[i];
ide_bus_new(&ad->port, sizeof(ad->port), qdev, i, 1);
^^^ creates bus in memory allocated by above gnew()
and adds it as child propety to ahci device
...
hotplug_handler_plug(); -> goto post_realize_fail;
pci_qdev_unrealize() -> pci_ich9_uninit() -> ahci_uninit()
...
g_free(s->dev);
^^^ free memory that holds children busses
return with error from device_set_realized()
As result later when qdev_device_add() tries to unparent ich9-ahci
after failed device_set_realized(),
object_unparent() -> object_property_del_child()
iterates over existing QOM children including buses added by
ide_bus_new() and tries to unparent them, which causes access to
freed memory where they where located.
Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1503938085-169486-1-git-send-email-imammedo@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 955f5c7ba127746345a3d43b4d7c885ca159ae6b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Mon, 18 Sep 2017 19:01:25 +0000 (15:01 -0400)]
hw/ide/microdrive: Mark the dscm1xxxx device with user_creatable = false
QEMU currently aborts with an assertion message when the user is trying
to remove a dscm1xxxx again:
$ aarch64-softmmu/qemu-system-aarch64 -S -M integratorcp -nographic
QEMU 2.9.93 monitor - type 'help' for more information
(qemu) device_add dscm1xxxx,id=xyz
(qemu) device_del xyz
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)
Looks like this device has to be wired up in code and is not meant
to be hot-pluggable, so let's mark it with user_creatable = false.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1503543783-17192-1-git-send-email-thuth@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 4c93950659487c7ad4f85571ee78524c1e3a94b3) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Mon, 4 Sep 2017 14:21:55 +0000 (15:21 +0100)]
hw/arm/aspeed_soc: Mark devices as user_creatable = false
QEMU currently aborts if the user is accidentially trying to
do something like this:
$ aarch64-softmmu/qemu-system-aarch64 -S -M integratorcp -nographic
QEMU 2.9.93 monitor - type 'help' for more information
(qemu) device_add ast2400
Unexpected error in error_set_from_qdev_prop_error()
at hw/core/qdev-properties.c:1032:
Aborted (core dumped)
The ast2400 SoC devices are clearly not creatable by the user since
they are using the serial_hds and nd_table arrays directly in their
realize function, so mark them with user_creatable = false.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 469f3da42ef4af347fa7831e1cc0bd35d17f5b83) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Mon, 4 Sep 2017 14:21:55 +0000 (15:21 +0100)]
hw/arm/digic: Mark device with user_creatable = false
QEMU currently shows some unexpected behavior when the user trys to
do a "device_add digic" on an unrelated ARM machine like integratorcp
in "-nographic" mode (the device_add command does not immediately
return to the monitor prompt), and trying to "device_del" the device
later results in a "qemu/qdev-monitor.c:872:qdev_unplug: assertion
failed: (hotplug_ctrl)" error condition.
Looking at the realize function of the device, it uses serial_hds
directly and this means that the device can not be added a second
time, so let's simply mark it with "user_creatable = false" now.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit f58f25599b72c7479e6a1ff67c7f671823aa14da) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Wed, 16 Aug 2017 05:30:58 +0000 (07:30 +0200)]
s390x/ipl: The s390-ipl device is not hot-pluggable
The s390-ipl device can not be created by the user, since it is meant only
to be instantiated once internally to load the ROMs and kernel. If the user
tries to do a "device_add s390-ipl" via the monitor later, QEMU aborts with
a "ROM images must be loaded at startup" error message.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1502861458-30270-1-git-send-email-thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 0d4fa4996fc5ee56ea7d072e272b8e69948460a5) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Wed, 16 Aug 2017 14:08:48 +0000 (16:08 +0200)]
watchdog/wdt_diag288: Mark diag288 watchdog as non-hotpluggable
QEMU currently aborts when the user tries to hot-unplug a diag288
device:
$ qemu-system-s390x -nographic -nodefaults -S -monitor stdio
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) device_add diag288,id=x
(qemu) device_del x
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)
The device is not designed as hot-pluggable (it should only be used
via the "-watchdog" parameter), so let's simply remove the possibility
to hotplug it to prevent that users can run into this ugly situation.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1502892528-22618-1-git-send-email-thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 84ebd3e8c7d4fe955b359b9aac84395907b0412e) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
While loading kernel via multiboot-v1 image, (flags & 0x00010000)
indicates that multiboot header contains valid addresses to load
the kernel image. These addresses are used to compute kernel
size and kernel text offset in the OS image. Validate these
address values to avoid an OOB access issue.
This is CVE-2017-14167.
Reported-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20170907063256.7418-1-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ed4f86e8b6eff8e600c69adee68c7cd34dd2cccb) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Gerd Hoffmann [Mon, 28 Aug 2017 12:29:06 +0000 (14:29 +0200)]
vga: stop passing pointers to vga_draw_line* functions
Instead pass around the address (aka offset into vga memory).
Add vga_read_* helper functions which apply vbe_size_mask to
the address, to make sure the address stays within the valid
range, similar to the cirrus blitter fixes (commits ffaf857778
and 026aeffcb4).
Impact: DoS for privileged guest users. qemu crashes with
a segfault, when hitting the guard page after vga memory
allocation, while reading vga memory for display updates.
Fixes: CVE-2017-13672 Cc: P J P <ppandit@redhat.com> Reported-by: David Buchanan <d@vidbuchanan.co.uk> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170828122906.18993-1-kraxel@redhat.com
(cherry picked from commit 3d90c6254863693a6b13d918d2b8682e08bbc681) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Gerd Hoffmann [Mon, 28 Aug 2017 12:33:07 +0000 (14:33 +0200)]
vga: fix display update region calculation (split screen)
vga display update mis-calculated the region for the dirty bitmap
snapshot in case split screen mode is used. This can trigger an
assert in cpu_physical_memory_snapshot_get_dirty().
Impact: DoS for privileged guest users.
Fixes: CVE-2017-13673 Fixes: fec5e8c92becad223df9d972770522f64aafdb72 Cc: P J P <ppandit@redhat.com> Reported-by: David Buchanan <d@vidbuchanan.co.uk> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170828123307.15392-1-kraxel@redhat.com
(cherry picked from commit e65294157d4b69393b3f819c99f4f647452b48e3) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 672339f7eff5e9226f302037290e84e783d2b5cd) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
libvhost-user: support resuming vq->last_avail_idx based on used_idx
This is the same workaround as commit 523b018dde3b765, which was lost
with libvhost-user transition in commit e10e798c85c2331.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 35480cbfcb73143af66c8de4b444d686a46c2e88) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Thu, 14 Sep 2017 17:43:19 +0000 (18:43 +0100)]
mps2-an511: Fix wiring of UART overflow interrupt lines
Fix an error that meant we were wiring every UART's overflow
interrupts into the same inputs 0 and 1 of the OR gate,
rather than giving each its own input.
Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1505232834-20890-1-git-send-email-peter.maydell@linaro.org
(cherry picked from commit ce3bc112cdb1d462e2d52eaa17a7314e7f3af504) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Alex Williamson [Thu, 7 Sep 2017 20:27:09 +0000 (14:27 -0600)]
vhost: Release memory references on cleanup
vhost registers a MemoryListener where it adds and removes references
to MemoryRegions as the MemoryRegionSections pass through. The
region_add callback is invoked for each existing section when the
MemoryListener is registered, but unregistering the MemoryListener
performs no reciprocal region_del callback. It's therefore the
owner of the MemoryListener's responsibility to cleanup any persistent
changes, such as these memory references, after unregistering.
The consequence of this bug is that if we have both a vhost device
and a vfio device, the vhost device will reference any mmap'd MMIO of
the vfio device via this MemoryListener. If the vhost device is then
removed, those references remain outstanding. If we then attempt to
remove the vfio device, it never gets finalized and the only way to
release the kernel file descriptors is to terminate the QEMU process.
Fixes: dfde4e6e1a86 ("memory: add ref/unref calls") Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-stable@nongnu.org # v1.6.0+ Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ee4c112846a0f2ac4fe5601918b0a2642ac8e2ed) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Pavel Butsykin [Mon, 4 Sep 2017 10:18:00 +0000 (13:18 +0300)]
qcow2: move qcow2_store_persistent_dirty_bitmaps() before cache flushing
After calling qcow2_inactivate(), all qcow2 caches must be flushed, but this
may not happen, because the last call qcow2_store_persistent_dirty_bitmaps()
can lead to marking l2/refcont cache as dirty.
Let's move qcow2_store_persistent_dirty_bitmaps() before the caсhe flushing
to fix it.
Cc: qemu-stable@nongnu.org Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 83a8c775a8bf134eb18a719322939b74a818d750) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Thu, 7 Sep 2017 12:54:51 +0000 (13:54 +0100)]
hw/arm/allwinner-a10: Mark the allwinner-a10 device with user_creatable = false
QEMU currently exits unexpectedly when the user accidentially
tries to do something like this:
$ aarch64-softmmu/qemu-system-aarch64 -S -M integratorcp -nographic
QEMU 2.9.93 monitor - type 'help' for more information
(qemu) device_add allwinner-a10
Unsupported NIC model: smc91c111
Exiting just due to a "device_add" should not happen. Looking closer
at the the realize and instance_init function of this device also
reveals that it is using serial_hds and nd_table directly there, so
this device is clearly not creatable by the user and should be marked
accordingly.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 1503416789-32080-1-git-send-email-thuth@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit dc89a180caf143a5d596d3f2f776d13be83a687d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
/home/pranith/qemu/hw/intc/arm_gicv3_kvm.c:296:17: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
if (!c->gicr_ctlr & GICR_CTLR_ENABLE_LPIS) {
^ ~
/home/pranith/qemu/hw/intc/arm_gicv3_kvm.c:296:17: note: add parentheses after the '!' to evaluate the bitwise operator first
if (!c->gicr_ctlr & GICR_CTLR_ENABLE_LPIS) {
^
/home/pranith/qemu/hw/intc/arm_gicv3_kvm.c:296:17: note: add parentheses around left hand side expression to silence this warning
if (!c->gicr_ctlr & GICR_CTLR_ENABLE_LPIS) {
^
This logic error meant we were not setting the PTZ
bit when we should -- luckily as the comment suggests
this wouldn't have had any effects beyond making GIC
initialization take a little longer.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-id: 20170829173226.7625-1-bobby.prani@gmail.com Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 7229ec5825df6b933f150b54a8a2bedd2de1864c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Mon, 4 Sep 2017 07:59:01 +0000 (09:59 +0200)]
virtfs: error out gracefully when mandatory suboptions are missing
We internally convert -virtfs to -fsdev/-device. If the user doesn't
provide the path or security_model suboptions, and the fsdev backend
requires them, we hit an assertion when populating the internal -fsdev
option:
Let's test the suboption presence on the command line before trying
to set it in the internal -fsdev option, and let the backend code
error out gracefully (ie, like it already does when the user passes
-fsdev on the command line).
Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 32b6943699948f7adc35ada233fbd25daffad5e9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
For "ldp x0, x1, [x0]", if the second load is on a second page and
the second page is unmapped, the exception would be raised with x0
already modified. This means the instruction couldn't be restarted.
Cc: qemu-arm@nongnu.org Cc: qemu-stable@nongnu.org Reported-by: Andrew <andrew@fubar.geek.nz> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20170825224833.4463-1-richard.henderson@linaro.org Fixes: https://bugs.launchpad.net/qemu/+bug/1713066 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMM: tweaked comment format] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 3e4d91b94ce400326fae0850578d9e9f30a71adb) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Samuel Thibault [Thu, 24 Aug 2017 23:35:53 +0000 (01:35 +0200)]
slirp: fix clearing ifq_so from pending packets
The if_fastq and if_batchq contain not only packets, but queues of packets
for the same socket. When sofree frees a socket, it thus has to clear ifq_so
from all the packets from the queues, not only the first.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 1201d308519f1e915866d7583d5136d03cc1d384) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The following scenario leads to an assertion failure in
qio_channel_yield():
1. Request coroutine calls qio_channel_yield() successfully when sending
would block on the socket. It is now yielded.
2. nbd_read_reply_entry() calls nbd_recv_coroutines_enter_all() because
nbd_receive_reply() failed.
3. Request coroutine is entered and returns from qio_channel_yield().
Note that the socket fd handler has not fired yet so
ioc->write_coroutine is still set.
4. Request coroutine attempts to send the request body with nbd_rwv()
but the socket would still block. qio_channel_yield() is called
again and assert(!ioc->write_coroutine) is hit.
The problem is that nbd_read_reply_entry() does not distinguish between
request coroutines that are waiting to receive a reply and those that
are not.
This patch adds a per-request bool receiving flag so
nbd_read_reply_entry() can avoid spurious aio_wake() calls.
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20170822125113.5025-1-stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
Stefan Hajnoczi [Wed, 23 Aug 2017 13:42:42 +0000 (21:42 +0800)]
block: Update open_flags after ->inactivate() callback
In the ->inactivate() callbacks, permissions are updated, which
typically involves a recursive check of the whole graph. Setting
BDRV_O_INACTIVE right before doing that creates a state that
bdrv_is_writable() returns false, which causes permission update
failure.
Reorder them so the flag is updated after calling the function. Note
that this doesn't break the assert in bdrv_child_cb_inactivate() because
for any specific BDS, we still update its flags first before calling
->inactivate() on it one level deeper in the recursion.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170823134242.12080-5-famz@redhat.com> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
Fam Zheng [Wed, 23 Aug 2017 13:42:40 +0000 (21:42 +0800)]
block-backend: Allow more "can inactivate" cases
These two conditions corresponds to mirror job's source and target,
which need to be allowed as they are part of the non-shared storage
migration workflow: failing to inactivate either will result in a
failure during migration completion.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170823134242.12080-3-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[eblake: improve comment grammar] Signed-off-by: Eric Blake <eblake@redhat.com>
Eduardo Habkost [Fri, 18 Aug 2017 19:09:43 +0000 (16:09 -0300)]
numa: Move numa_legacy_auto_assign_ram to pc-i440fx-2.9
The 'm->numa_auto_assign_ram = numa_legacy_auto_assign_ram;' line
was supposed to be in pc_i440fx_2_9_machine_options() (see commit 3bfe5716 "numa: equally distribute memory on nodes"), but the
merge commit adb354dd ("Merge remote-tracking branch
'mst/tags/for_upstream' into staging") moved it to the
pc_i440fx_2_10_machine_options().
Move the line back to pc_i440fx_2_9_machine_options().
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 20170818190943.23858-1-ehabkost@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Igor Mammedov [Thu, 17 Aug 2017 16:14:13 +0000 (18:14 +0200)]
fix build failure in nbd_read_reply_entry()
travis builds fail at HEAD at rc3 master with
block/nbd-client.c: In function ‘nbd_read_reply_entry’:
block/nbd-client.c:110:8: error: ‘ret’ may be used uninitialized in this function [-Werror=uninitialized]
fix it by initializing 'ret' to 0
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Wed, 23 Aug 2017 08:04:20 +0000 (09:04 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170823' into staging
ppc patch queue 2017-08-23
This is identical to the pull request from yesterday (20180822),
except that a bug in one patch is fixed so that it doesn't break TCG
on a ppc host.
Last minute ppc related fixes for qemu-2.10. I'm not sure if these
are critical enough to prompt another rc, but I'm submitting them for
consideration.
First, is Cornelia's fix for 480bc11e6 which meant "make check" would
always fail on a ppc host. Tracking that down delayed submission of
the rest of these patches, sorry.
The rest are all fairly important bugfixes for qemu crashes or guest
behaviour regression on ppc. Patches 2-4 specifically are fixes for
regressions from qemu-2.9, caused by the compatibility mode and
hotplug handling cleanups for the pseries machine type.
* remotes/dgibson/tags/ppc-for-2.10-20170823:
hw/ppc/spapr_iommu: Fix crash when removing the "spapr-tce-table" device
hw/ppc/spapr_rtc: Mark the RTC device with user_creatable = false
hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev'
spapr: Allow configure-connector to be called multiple times
ppc: fix ppc_set_compat() with KVM PR
target/ppc: 'PVR != host PVR' in KVM_SET_SREGS workaround
boot-serial-test: prefer tcg accelerator
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Thomas Huth [Thu, 17 Aug 2017 05:15:10 +0000 (07:15 +0200)]
hw/ppc/spapr_rtc: Mark the RTC device with user_creatable = false
QEMU currently aborts unexpectedly when a user tries to do something
like this:
$ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) device_add spapr-rtc,id=spapr-rtc
(qemu) device_del spapr-rtc
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)
The RTC device is not meant to be hot-pluggable - it's an internal
device only and it even should not be possible to create it a
second time with the "-device" parameter, so let's mark this
with "user_creatable = false".
Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Thomas Huth [Mon, 21 Aug 2017 06:30:29 +0000 (08:30 +0200)]
hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev'
QEMU currently crashes when trying to use a 'pc-dimm' on the pseries
machine without specifying its 'memdev' property. This happens because
pc_dimm_get_memory_region() does not check whether the 'memdev' property
has properly been set by the user. Looking closer at this function, it's
also obvious that it is using &error_abort to call another function - and
this is bad in a function that is used in the hot-plugging calling chain
since this can also cause QEMU to exit unexpectedly.
So let's fix these issues in a proper way now: Add a "Error **errp"
parameter to pc_dimm_get_memory_region() which we use in case the 'memdev'
property has not been set by the user, and which we can use instead of
the &error_abort, and change the callers of get_memory_region() to make
use of this "errp" parameter for proper error checking.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Bharata B Rao [Thu, 17 Aug 2017 05:16:42 +0000 (10:46 +0530)]
spapr: Allow configure-connector to be called multiple times
In case of in-kernel memory hot unplug, when the guest is not able
to remove all the LMBs that are requested for removal, it will add back
any LMBs that have been successfully removed. The DR Connectors of
these LMBs wouldn't have been unconfigured and hence the addition of
these LMBs will result in configure-connector call being issued on
LMB DR connectors that are already in configured state. Such
configure-connector calls will fail resulting in a DIMM which is
partially unplugged.
This however worked till recently before we overhauled the DRC
implementation in QEMU. Commit 9d4c0f4f0a71e: "spapr: Consolidate
DRC state variables" is the first commit where this problem shows up
as per git bisect.
Ideally guest shouldn't be issuing configure-connector call on an
already configured DR connector. However for now, work around this in
QEMU by allowing configure-connector to be called multiple times for
all types of DR connectors.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Corrected buglet that would have initialized fdt pointers ready
for reading on a device not present at reset] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Greg Kurz [Mon, 14 Aug 2017 17:49:16 +0000 (19:49 +0200)]
ppc: fix ppc_set_compat() with KVM PR
When running in KVM PR mode, kvmppc_set_compat() always fail because the
current PR implementation doesn't handle KVM_REG_PPC_ARCH_COMPAT. Now that
the machine code inconditionally calls ppc_set_compat_all() at reset time
to restore the compat mode default value (commit 66d5c492dd3a9), it is
impossible to start a guest with PR:
qemu-system-ppc64: Unable to set CPU compatibility mode in KVM:
Invalid argument
A tentative patch [1] was recently sent by Suraj to address the issue, but
it would prevent the compat mode to be turned off on reset. And we really
don't want to explicitely check for KVM PR. During the patch's review,
David suggested that we should only call the KVM ioctl() if the compat
PVR changes. This allows at least to run with KVM PR, provided no compat
mode is requested from the command line (which should be the case when
running PR nested). This is what this patch does.
While here, we also fix the side effect where KVM would fail but we would
change the CPU state in QEMU anyway.
[1] http://patchwork.ozlabs.org/patch/782039/
Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: 'PVR != host PVR' in KVM_SET_SREGS workaround
Commit d5fc133eed ("ppc: Rework CPU compatibility testing
across migration") changed the way cpu_post_load behaves with
the PVR setting, causing an unexpected bug in KVM-HV migrations
between hosts that are compatible (POWER8 and POWER8E, for example).
Even with pvr_match() returning true, the guest freezes right after
cpu_post_load. The reason is that the guest kernel can't handle a
different PVR value other that the running host in KVM_SET_SREGS.
In [1] it was discussed the possibility of a new KVM capability
that would indicate that the guest kernel can handle a different
PVR in KVM_SET_SREGS. Even if such feature is implemented, there is
still the problem with older kernels that will not have this capability
and will fail to migrate.
This patch implements a workaround for that scenario. If running
with KVM, check if the guest kernel does not have the capability
(named here as 'cap_ppc_pvr_compat'). If it doesn't, calls
kvmppc_is_pr() to see if the guest is running in KVM-HV. If all this
happens, set env->spr[SPR_PVR] to the same value as the current
host PVR. This ensures that we allow migrations with 'close enough'
PVRs to still work in KVM-HV but also makes the code ready for
this new KVM capability when it is done.
A new function called 'kvmppc_pvr_workaround_required' was created
to encapsulate the conditions said above and to avoid calling too
many kvm.c internals inside cpu_post_load.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Fix for the case of using TCG on a PPC host] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Peter Maydell [Tue, 15 Aug 2017 17:17:02 +0000 (18:17 +0100)]
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-08-15' into staging
nbd patches for 2017-08-15
- Eric Blake: nbd: Fix trace message for disconnect
- Stefan Hajnoczi: qemu-iotests: step clock after each test iteration
- Fam Zheng: 0/4 block: Fix non-shared storage migration
- Eric Blake: nbd-client: Fix regression when server sends garbage
# gpg: Signature made Tue 15 Aug 2017 16:06:02 BST
# gpg: using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg: aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2017-08-15:
nbd-client: Fix regression when server sends garbage
iotests: Add non-shared storage migration case 192
block-backend: Defer shared_perm tightening migration completion
nbd: Fix order of bdrv_set_perm and bdrv_invalidate_cache
stubs: Add vm state change handler stubs
qemu-iotests: step clock after each test iteration
nbd: Fix trace message for disconnect
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>