In commit 7073fbada733c8d10992f00772c9b9299d740e9b, the `andn` instruction
was implemented via `tcg_gen_andc` but passes the operands in the wrong
order:
- X86 defines `andn dest,src1,src2` as: dest = ~src1 & src2
- TCG defines `andc dest,src1,src2` as: dest = src1 & ~src2
This patch fixes the problem by simply swapping the order of the two last
arguments in `tcg_gen_andc_tl`.
Reported-by: Alexandro Sanchez Bach <alexandro@phi.nz> Signed-off-by: Alexandro Sanchez Bach <alexandro@phi.nz> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5cd10051c2e02b7a86eae49919d6c65a87dbea46) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Failure to do so results in the tcg optimizer sign-extending
any constant fold from 32-bits. This turns out to be visible
in the RISC-V testsuite using a host that emits these opcodes
(e.g. any non-x86_64).
Reported-by: Michael Clark <mjc@sifive.com> Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit f2f1dde75160cac6ede330f3db50dc817d01a2d6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Wed, 28 Feb 2018 13:13:15 +0000 (14:13 +0100)]
iotests: Test preallocated truncate of 2G image
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180228131315.30194-3-mreitz@redhat.com Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 733d1dce0f3c8ab7b79a173f6482781d3718f844) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Wed, 28 Feb 2018 13:13:14 +0000 (14:13 +0100)]
block/file-posix: Fix fully preallocated truncate
Storing the lseek() result in an int results in it overflowing when the
file is at least 2 GB big. Then, we have a 50 % chance of the result
being "negative" and thus thinking an error occurred when actually
everything went just fine.
So we should use the correct type for storing the result: off_t.
Reported-by: Daniel P. Berrange <berrange@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549231 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180228131315.30194-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 82b45e0a0b824787bd79ce3f6453eaa2afddd138) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Michal Privoznik [Sat, 24 Mar 2018 05:14:49 +0000 (06:14 +0100)]
qemu-pr-helper: Actually allow users to specify pidfile
Due to wrong specification of arguments to getopt_long() any
attempt to set pidfile resulted in:
1) the default to be leaked
2) the @pidfile variable to be set to NULL (because optarg is
NULL without this patch).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <6f10cd53d361a395aa0e85a9311ec4e9a8fc11e5.1521868451.git.mprivozn@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f8e1a989644f22ba2f7afb0e13b6ce2309ea9503) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Tue, 20 Mar 2018 10:44:56 +0000 (11:44 +0100)]
virtio_net: flush uncompleted TX on reset
If the backend could not transmit a packet right away for some reason,
the packet is queued for asynchronous sending. The corresponding vq
element is tracked in the async_tx.elem field of the VirtIONetQueue,
for later freeing when the transmission is complete.
If a reset happens before completion, virtio_net_tx_complete() will push
async_tx.elem back to the guest anyway, and we end up with the inuse flag
of the vq being equal to -1. The next call to virtqueue_pop() is then
likely to fail with "Virtqueue size exceeded".
This can be reproduced easily by starting a guest with an hubport backend
that is not connected to a functional network, eg,
and no other -netdev hubport,hubid=0 on the command line.
The appropriate fix is to ensure that such an asynchronous transmission
cannot survive a device reset. So for all queues, we first try to send
the packet again, and eventually we purge it if the backend still could
not deliver it.
CC: qemu-stable@nongnu.org Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com> Buglink: https://github.com/open-power-host-os/qemu/issues/37 Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: R. Nageswara Sastry <nasastry@in.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 94b52958b77a2a040564cf7ed716d3a9545d94e5) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Victor Kamensky [Fri, 23 Mar 2018 18:26:45 +0000 (18:26 +0000)]
arm/translate-a64: treat DISAS_UPDATE as variant of DISAS_EXIT
In OE project 4.15 linux kernel boot hang was observed under
single cpu aarch64 qemu. Kernel code was in a loop waiting for
vtimer arrival, spinning in TC generated blocks, while interrupt
was pending unprocessed. This happened because when qemu tried to
handle vtimer interrupt target had interrupts disabled, as
result flag indicating TCG exit, cpu->icount_decr.u16.high,
was cleared but arm_cpu_exec_interrupt function did not call
arm_cpu_do_interrupt to process interrupt. Later when target
reenabled interrupts, it happened without exit into main loop, so
following code that waited for result of interrupt execution
run in infinite loop.
To solve the problem instructions that operate on CPU sys state
(i.e enable/disable interrupt), and marked as DISAS_UPDATE,
should be considered as DISAS_EXIT variant, and should be
forced to exit back to main loop so qemu will have a chance
processing pending CPU state updates, including pending
interrupts.
This change brings consistency with how DISAS_UPDATE is treated
in aarch32 case.
CC: Peter Maydell <peter.maydell@linaro.org> CC: Alex Bennée <alex.bennee@linaro.org> CC: qemu-stable@nongnu.org Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Victor Kamensky <kamensky@cisco.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1521526368-1996-1-git-send-email-kamensky@cisco.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit a75a52d62418dafe462be4fe30485501d1010bb9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Wed, 14 Mar 2018 12:34:40 +0000 (13:34 +0100)]
tests/multiboot: Add .gitignore
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit e2679395d598bd40770c22a793c0152576ac211f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Wed, 14 Mar 2018 12:33:49 +0000 (13:33 +0100)]
tests/multiboot: Add tests for the a.out kludge
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit 1c8c426fb44bf5b3ffbcad1b00c7def4b89b03ec) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Wed, 14 Mar 2018 12:29:46 +0000 (13:29 +0100)]
tests/multiboot: Test exit code for every qemu run
Testing the exit code only once after a whole group of tests has
completed is not enough, it catches errors only in the very last qemu
invocation. We need to have the check after each qemu run.
The logging and diff with the reference output is still done once per
group to keep things more managable. This is not a problem because the
log file accumulates the output of all runs.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit 49713c413a65ab4b02124aabe83f8539cc6ece5e) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Wed, 14 Mar 2018 16:57:45 +0000 (17:57 +0100)]
multiboot: Check validity of mh_header_addr
I couldn't find a case where this prevents something bad from happening
that isn't already caught by other checks, but let's err on the safe
side and check that mh_header_addr is as expected.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit dbf2dce7aabb7723542bd182175904846d70b0f9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Wed, 14 Mar 2018 16:46:38 +0000 (17:46 +0100)]
multiboot: Reject kernels exceeding the address space
The code path where mh_load_end_addr is non-zero in the Multiboot
header checks that mh_load_end_addr >= mh_load_addr and so
mb_load_size is checked. However, mb_load_size is not checked when
calculated from the file size, when mh_load_end_addr is 0.
If the kernel binary size is larger than can fit in the address space
after load_addr, we ended up with a kernel_size that is smaller than
load_size, which means that we read the file into a too small buffer.
Add a check to reject kernel files with such Multiboot headers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit b17a9054a0652a1481be48a6729e972abf02412f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jack Schwartz [Thu, 21 Dec 2017 17:25:18 +0000 (09:25 -0800)]
multiboot: fprintf(stderr...) -> error_report()
Change all fprintf(stderr...) calls in hw/i386/multiboot.c to call
error_report() instead, including the mb_debug macro. Remove the "\n"
from strings passed to all modified calls, since error_report() appends
one.
Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 4b9006a41ea8818f2385ae5228e07f211bb4a33d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jack Schwartz [Thu, 21 Dec 2017 17:25:17 +0000 (09:25 -0800)]
multiboot: Use header names when displaying fields
Refer to field names when displaying fields in printf and debug statements.
Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit ce5eb6dc4dc5652f7e360a1db817f1d5dafab90f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7a2e43cc96fd017883973caf9ee076ae23a3bebd) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Jack Schwartz [Thu, 21 Dec 2017 17:25:15 +0000 (09:25 -0800)]
multiboot: bss_end_addr can be zero
The multiboot spec (https://www.gnu.org/software/grub/manual/multiboot/),
section 3.1.3, allows for bss_end_addr to be zero.
A zero bss_end_addr signifies there is no .bss section.
Suggested-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2a8fcd119eb7c6bb3837fc3669eb1b2dfb31daf8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Lieven [Thu, 8 Mar 2018 11:18:25 +0000 (12:18 +0100)]
migration/block: reset dirty bitmap before read in bulk phase
Reset the dirty bitmap before reading to make sure we don't miss
any new data.
Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1520507908-16743-3-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 86b124bc76bd7137d0fb20696c4e349571b8533d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
address_space_rw is calling address_space_to_flatview but it can
be called outside the RCU lock. To fix it, transform flatview_rw
into address_space_rw, since flatview_rw is otherwise unused.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit db84fd973eba3f1e121416dcab73a4e8a60f2526) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
address_space_map is calling address_space_to_flatview but it can
be called outside the RCU lock. The function itself is calling
rcu_read_lock/rcu_read_unlock, just in the wrong place, so the
fix is easy.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ad0c60fa572d4050255b698ecdb67294dd4c0125) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
address_space_access_valid is calling address_space_to_flatview but it can
be called outside the RCU lock. To fix it, push the rcu_read_lock/unlock
pair up from flatview_access_valid to address_space_access_valid.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 11e732a5ed46903f997985bed4c3767ca28a7eb6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
address_space_read is calling address_space_to_flatview but it can
be called outside the RCU lock. To fix it, push the rcu_read_lock/unlock
pair up from flatview_read_full to address_space_read's constant size
fast path and address_space_read_full.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b2a44fcad74f1cc7a6786d38eba7db12ab2352ba) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
address_space_write is calling address_space_to_flatview but it can
be called outside the RCU lock. To fix it, push the rcu_read_lock/unlock
pair up from flatview_write to address_space_write.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 4c6ebbb364aa6f42c5d8e83e932e967eb83f0e44) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Sun, 4 Mar 2018 23:31:20 +0000 (00:31 +0100)]
memory: inline some performance-sensitive accessors
These accessors are called from inlined functions, and the call sequence
is much more expensive than just inlining the access. Move the
struct declaration to memory-internal.h so that exec.c and memory.c
can both use an inline function.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 785a507ec78bbda1c346f3d3593e5a58b62e73ef) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Mon, 5 Mar 2018 08:18:26 +0000 (09:18 +0100)]
openpic_kvm: drop address_space_to_flatview call
The MemoryListener is registered on address_space_memory, there is
not much to assert. This currently works because the callback
is invoked only once when the listener is registered, but section->fv
is the _new_ FlatView, not the old one on later calls and that
would break.
This confines address_space_to_flatview to exec.c and memory.c.
Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 80d2b933f9fe3e53d4f76a45a1bc1a0175669468) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
KONRAD Frederic [Fri, 2 Mar 2018 09:59:25 +0000 (10:59 +0100)]
sparc: fix leon3 casa instruction when MMU is disabled
Since the commit af7a06bac7d3abb2da48ef3277d2a415772d2ae8:
`casa [..](10), .., ..` (and probably others alternate space instructions)
triggers a data access exception when the MMU is disabled.
When we enter get_asi(...) dc->mem_idx is set to MMU_PHYS_IDX when the MMU
is disabled. Just keep mem_idx unchanged in this case so we passthrough the
MMU when it is disabled.
Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
(cherry picked from commit 6e10f37c86068e35151f982c976a85f1bec07ef2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
target_mprotect/target_munmap return value goes through get_errno at the
call site, thus the functions must either set errno to host error code
and return -1 or return negative guest error code. Do the latter.
Max Filippov [Wed, 28 Feb 2018 22:16:04 +0000 (14:16 -0800)]
linux-user: fix assertion in shmdt
shmdt fails to call mmap_lock/mmap_unlock around page_set_flags,
resulting in the following assertion:
page_set_flags: Assertion `have_mmap_lock()' failed.
Max Filippov [Wed, 7 Mar 2018 21:50:10 +0000 (13:50 -0800)]
linux-user: fix mmap/munmap/mprotect/mremap/shmat
In linux-user QEMU that runs for a target with TARGET_ABI_BITS bigger
than L1_MAP_ADDR_SPACE_BITS an assertion in page_set_flags fires when
mmap, munmap, mprotect, mremap or shmat is called for an address outside
the guest address space. mmap and mprotect should return ENOMEM in such
case.
Change definition of GUEST_ADDR_MAX to always be the last valid guest
address. Account for this change in open_self_maps.
Add macro guest_addr_valid that verifies if the guest address is valid.
Add function guest_range_valid that verifies if address range is within
guest address space and does not wrap around. Use that macro in
mmap/munmap/mprotect/mremap/shmat for error checking.
Max Filippov [Wed, 28 Feb 2018 19:48:04 +0000 (11:48 -0800)]
target/xtensa: dump correct physical registers
xtensa_cpu_dump_state outputs CPU physical registers as is, without
synchronization from current window. That may result in different values
printed for the current window and corresponding physical registers.
Synchronize physical registers from window before dumping.
Cc: qemu-stable@nongnu.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
(cherry picked from commit b55b1afda942306e4e40420aced1524bd83ba16d)
Conflicts:
target/xtensa/translate.c
* drop context dependencies Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Mark Cave-Ayland [Fri, 23 Feb 2018 11:10:17 +0000 (11:10 +0000)]
loader: don't perform overlapping address check for memory region ROM images
All memory region ROM images have a base address of 0 which causes the overlapping
address check to fail if more than one memory region ROM image is present, or an
existing ROM image is loaded at address 0.
Make sure that we ignore the overlapping address check in
rom_check_and_register_reset() if this is a memory region ROM image. In particular
this fixes the "rom: requested regions overlap" error on startup when trying to
run qemu-system-sparc with a -kernel image since commit 7497638642: "tcx: switch to
load_image_mr() and remove prom_addr hack".
Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
(cherry picked from commit ca316c11526a1bc221fb542bdce6bac7238dde69) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Fri, 16 Feb 2018 18:14:55 +0000 (19:14 +0100)]
rbd: Fix use after free in qemu_rbd_set_keypairs() error path
If we want to include the invalid option name in the error message, we
can't free the string earlier than that.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 71c87815f9e0386b6f3e22942adc956fd603c82f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Alberto Garcia [Wed, 21 Feb 2018 14:08:49 +0000 (16:08 +0200)]
specs/qcow2: Fix documentation of the compressed cluster descriptor
This patch fixes several mistakes in the documentation of the
compressed cluster descriptor:
1) the documentation claims that the cluster descriptor contains the
number of sectors used to store the compressed data, but what it
actually contains is the number of sectors *minus one* or, in other
words, the number of additional sectors after the first one.
2) the width of the fields is incorrectly specified. The number of bits
used by each field is
x = 62 - (cluster_bits - 8) for the offset field
y = (cluster_bits - 8) for the size field
So the offset field's location is [0, x-1], not [0, x] as stated.
3) the size field does not contain the size of the compressed data,
but rather the number of sectors where that data is stored. The
compressed data starts at the exact point specified in the offset
field and ends when there's enough data to produce a cluster of
decompressed data. Both points can be in the middle of a sector,
allowing several compressed clusters to be stored next to one
another, sharing sectors if necessary.
Cc: qemu-stable@nongnu.org Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 156b46ded3853dfc6b34c5afae019ff61798491b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Blake [Thu, 15 Feb 2018 03:29:05 +0000 (21:29 -0600)]
nbd: Honor server's advertised minimum block size
Commit 79ba8c98 (v2.7) changed the setting of request_alignment
to occur only during bdrv_refresh_limits(), rather than at at
bdrv_open() time; but at the time, NBD was unaffected, because
it still used sector-based callbacks, so the block layer
defaulted NBD to use 512 request_alignment.
Later, commit 70c4fb26 (also v2.7) changed NBD to use byte-based
callbacks, without setting request_alignment. This resulted in
NBD using request_alignment of 1, which works great when the
server supports it (as is the case for qemu-nbd), but falls apart
miserably if the server requires alignment (but only if qemu
actually sends a sub-sector request; qemu-io can do it, but
most qemu operations still perform on sectors or larger).
Even later, the NBD protocol was updated to document that clients
should learn the server's minimum alignment during NBD_OPT_GO;
and recommended that clients should assume a minimum size of 512
unless the server understands NBD_OPT_GO and replied with a smaller
size. Commit 081dd1fe (v2.10) attempted to do that, by assigning
request_alignment to whatever was learned from the server; but
it has two flaws: the assignment is done during bdrv_open() so
it gets unconditionally wiped out back to 1 during any later
bdrv_refresh_limits(); and the code is not using a default of 512
when the server did not report a minimum size.
Fix these issues by moving the assignment to request_alignment
to the right function, and by using a sane default when the
server does not advertise a minimum size.
CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180215032905.27146-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
(cherry picked from commit fd8d372dd36e839568a718684914d9960d8b1ebd) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Wed, 20 Jun 2018 12:54:15 +0000 (14:54 +0200)]
spapr: make pseries-2.11 the default machine type
The spapr capability framework was introduced in QEMU 2.12. It allows
to have an explicit control on how host features are exposed to the
guest. This is especially needed to handle migration between hetero-
geneous hosts (eg, POWER8 to POWER9). It is also used to expose fixes/
workarounds against speculative execution vulnerabilities to guests.
The framework was hence backported to QEMU 2.11.1, especially these
commits:
0fac4aa93074 spapr: Add pseries-2.12 machine type 9070f408f491 spapr: Treat Hardware Transactional Memory (HTM) as an
optional capability
0fac4aa93074 has the confusing effect of making pseries-2.12 the default
machine type for QEMU 2.11.1, instead of the expected pseries-2.11. This
patch changes the default machine back to pseries-2.11.
Unfortunately, 9070f408f491 enforces the HTM capability for pseries-2.11
to be enabled by default, ie, when not passing cap-htm on the command
line. This breaks several 'make check' testcases that run qemu-system-ppc64
with TCG.
The only sane way to fix this is to adapt the impacted testcases so that
they all pass cap-htm=off in this case. This patch does that as well.
Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tiwei Bie [Thu, 25 Jan 2018 07:12:43 +0000 (15:12 +0800)]
virtio-balloon: unref the memory region before continuing
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit b86107ab43b804e899a226fe287e34ab8acef596) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
"i82801b11-bridge" does not clear the bridge specific registers at
platform reset.
This is a problem because devices on "i82801b11-bridge" continue to
respond to config space cycles after platform reset, when addressed with
the bus number that was previously programmed into the secondary bus
number register of "i82801b11-bridge". This error breaks OVMF's search for
extra (PXB) root buses, for example.
The device class reset method for "i82801b11-bridge" is currently NULL;
set it directly to pci_bridge_reset(), like the last three bridge models
in the above listing do.
Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: qemu-stable@nongnu.org
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1541839 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ed247f40db84c8bd4bb7d10948702cd47cc4d5ae) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This fix was inspired by
http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg00883.html.
Fixes: 8a6a80896d6af03b8ee0c17cdf37219eca2588a7 ("block/ssh: Use QemuOpts for runtime options") Cc: Max Reitz <mreitz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit fbd5c4c0db47e578e3fdd88a0ebc4314a1ed3d42) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Tue, 27 Feb 2018 15:22:58 +0000 (16:22 +0100)]
spapr: register dummy ICPs later
Some older machine types create more ICPs than needed. We hence
need to register up to xics_max_server_number() dummy ICPs to
accomodate the migration of these machine types.
Recent VSMT rework changed xics_max_server_number() to return
The change is okay but it requires spapr->vsmt to be set, which
isn't the case with the current code. This causes the formula to
return zero and we don't create dummy ICPs. This breaks migration
of older guests as reported here:
The dummy ICP workaround doesn't really have a dependency on XICS
itself. But it does depend on proper VCPU id numbering and it must
be applied before creating vCPUs (ie, creating real ICPs). So this
patch moves the workaround to spapr_init_cpus(), which already
assumes VSMT to be set.
Fixes: 72194664c8a1 ("spapr: use spapr->vsmt to compute VCPU ids") Reported-by: Lukas Doktor <ldoktor@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 72fdd4de8e5fdc1a6078e000835fb54592a3fe97) Signed-off-by: Greg Kurz <groug@kaod.org>
Greg Kurz [Fri, 16 Feb 2018 18:58:06 +0000 (19:58 +0100)]
spapr: fix missing CPU core nodes in DT when running with TCG
Commit 5d0fb1508e2d "spapr: consolidate the VCPU id numbering logic
in a single place" introduced a helper to detect thread0 of a virtual
core based on its VCPU id. This is used to create CPU core nodes in
the DT, but it is broken in TCG.
This happens because spapr_get_vcpu_id() maps VCPU ids to
cs->cpu_index in TCG mode. This confuses the code in
spapr_is_thread0_in_vcore(), since it assumes thread0 VCPU
ids to have a spapr->vsmt spacing.
spapr_get_vcpu_id(cpu) % spapr->vsmt == 0
Actually, there's no real reason to expose cs->cpu_index instead
of the VCPU id, since we also generate it with TCG. Also we already
set it explicitly in spapr_set_vcpu_id(), so there's no real reason
either to call kvm_arch_vcpu_id() with KVM.
This patch unifies spapr_get_vcpu_id() to always return the computed
VCPU id both in TCG and KVM. This is one step forward towards KVM<->TCG
migration.
Fixes: 5d0fb1508e2d Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit b1a568c1c2192f090536b8ac76d135ce1f46a0ee) Signed-off-by: Greg Kurz <groug@kaod.org>
or guess that the VCPU id of a given VCPU is the first thread of a virtual
core:
index % spapr->vsmt != 0
Even if the numbering logic isn't that complex, it is rather fragile to
have these assumptions open-coded in several places. FWIW this was
proved with recent issues related to VSMT.
This patch moves the VCPU id formula to a single function to be called
everywhere the code needs to compute one. It also adds an helper to
guess if a VCPU is the first thread of a VCORE.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Rename spapr_is_vcore() to spapr_is_thread0_in_vcore() for clarity] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 5d0fb1508e2d279da74ef4a103e8def9b52c6304) Signed-off-by: Greg Kurz <groug@kaod.org>
Greg Kurz [Wed, 14 Feb 2018 19:40:44 +0000 (20:40 +0100)]
spapr: rename spapr_vcpu_id() to spapr_get_vcpu_id()
The spapr_vcpu_id() function is an accessor actually. Let's rename it
for symmetry with the recently added spapr_set_vcpu_id() helper.
The motivation behind this is that a later patch will consolidate
the VCPU id formula in a function and spapr_vcpu_id looks like an
appropriate name.
Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 14bb4486c819ea797a151b3e0fe53d6f5c7b3fc5) Signed-off-by: Greg Kurz <groug@kaod.org>
David Gibson [Mon, 15 Jan 2018 05:40:23 +0000 (16:40 +1100)]
target/ppc: Clarify compat mode max_threads value
We recently had some discussions that were sidetracked for a while, because
nearly everyone misapprehended the purpose of the 'max_threads' field in
the compatiblity modes table. It's all about guest expectations, not host
expectations or support (that's handled elsewhere).
In an attempt to avoid a repeat of that confusion, rename the field to
'max_vthreads' and add an explanatory comment.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
(cherry picked from commit abbc124753896f72e3715813ea20dd1924202ff0) Signed-off-by: Greg Kurz <groug@kaod.org>
Greg Kurz [Wed, 14 Feb 2018 19:40:35 +0000 (20:40 +0100)]
spapr: move VCPU calculation to core machine code
The VCPU ids are currently computed and assigned to each individual
CPU threads in spapr_cpu_core_realize(). But the numbering logic
of VCPU ids is actually a machine-level concept, and many places
in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes.
The current formula used in spapr_cpu_core_realize() is:
vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i
where:
cc->core_id is a multiple of smp_threads
cpu_index = cc->core_id + i
0 <= i < smp_threads
This formula was used before VSMT at the time VCPU ids where computed
at the target emulation level. It has the advantage of being useable
to derive a VPCU id out of a CPU index only. It is fitted for all the
places where the machine code has to compute a VCPU id.
This patch introduces an accessor to set the VCPU id in a PowerPCCPU object
using the above formula. It is a first step to consolidate all the VCPU id
logic in a single place.
Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 648edb64751ea0e550f36302fa66f9f11e480824) Signed-off-by: Greg Kurz <groug@kaod.org>
Greg Kurz [Wed, 14 Feb 2018 19:40:26 +0000 (20:40 +0100)]
spapr: use spapr->vsmt to compute VCPU ids
Since the introduction of VSMT in 2.11, the spacing of VCPU ids
between cores is controllable through a machine property instead
of being only dictated by the SMT mode of the host:
cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i
Until recently, the machine code would try to change the SMT mode
of the host to be equal to VSMT or exit. This allowed the rest of
the code to assume that kvmppc_smt_threads() == spapr->vsmt is
always true.
Recent commit "8904e5a75005 spapr: Adjust default VSMT value for
better migration compatibility" relaxed the rule. If the VSMT
mode cannot be set in KVM for some reasons, but the requested
CPU topology is compatible with the current SMT mode, then we
let the guest run with kvmppc_smt_threads() != spapr->vsmt.
This breaks quite a few places in the code, in particular when
calculating DRC indexes.
This is what happens on a POWER host with subcores-per-core=2 (ie,
supports up to SMT4) when passing the following topology:
But since the spacing of VCPU ids is 8, the DRC for core1 points to a
VCPU that doesn't exist, the DRC for core2 points to the first VCPU of
core1 and and so on...
(qemu) device_del core1
pseries-hotplug-cpu: Cannot find CPU (drc index 10000004) to remove
(qemu) device_del core2
cpu 4 (hwid 8) Ready to die...
cpu 5 (hwid 9) Ready to die...
cpu 6 (hwid 10) Ready to die...
cpu 7 (hwid 11) Ready to die...
These are the VCPU ids of core1 actually
(qemu) device_add host-spapr-cpu-core,core-id=12,id=core3
(qemu) device_del core3
pseries-hotplug-cpu: Cannot find CPU (drc index 1000000c) to remove
This patches all the code in hw/ppc/spapr.c to assume the VSMT
spacing when manipulating VCPU ids.
Fixes: 8904e5a75005 Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 72194664c8a16b67865eb95054f984dd169cfa86)
Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>
Laurent Vivier [Fri, 9 Feb 2018 08:18:58 +0000 (09:18 +0100)]
spapr: set vsmt to MAX(8, smp_threads)
We ignore silently the value of smp_threads when we set
the default VSMT value, and if smp_threads is greater than VSMT
kernel is going into trouble later.
Fixes: 8904e5a750
("spapr: Adjust default VSMT value for better migration compatibility")
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4ad64cbd0c3f9df15be5f7d1c920285551e802ca) Signed-off-by: Greg Kurz <groug@kaod.org>
David Gibson [Mon, 15 Jan 2018 06:51:33 +0000 (17:51 +1100)]
spapr: Adjust default VSMT value for better migration compatibility
fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
"vsmt" parameter for the pseries machine type, which controls the spacing
of the vcpu ids of thread 0 for each virtual core. This was done to bring
some consistency and stability to how that was done, while still allowing
backwards compatibility for migration and otherwise.
The default value we used for vsmt was set to the max of the host's
advertised default number of threads and the number of vthreads per vcore
in the guest. This was done to continue running without extra parameters
on older KVM versions which don't allow the VSMT value to be changed.
Unfortunately, even that smaller than before leakage of host configuration
into guest visible configuration still breaks things. Specifically a guest
with 4 (or less) vthread/vcore will get a different vsmt value when
running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host. That means the
vcpu ids don't line up so you can't migrate between them, though you should
be able to.
Long term we really want to make vsmt == smp_threads for sufficiently
new machine types. However, that means that qemu will then require a
sufficiently recent KVM (one which supports changing VSMT) - that's still
not widely enough deployed to be really comfortable to do.
In the meantime we need some default that will work as often as
possible. This patch changes that default to 8 in all circumstances.
This does change guest visible behaviour (including for existing
machine versions) for many cases - just not the most common/important
case.
Following is case by case justification for why this is still the least
worst option. Note that any of the old behaviours can still be duplicated
after this patch, it's just that it requires manual intervention by
setting the vsmt property on the command line.
KVM HV on POWER8 host:
This is the overwhelmingly common case in production setups, and is
unchanged by design. POWER8 hosts will advertise a default VSMT mode
of 8, and > 8 vthreads/vcore isn't permitted
KVM HV on POWER7 host:
Will break, but POWER7s allowing KVM were never released to the public.
KVM HV on POWER9 host:
Not yet released to the public, breaking this now will reduce other
breakage later.
KVM HV on PowerPC 970:
Will theoretically break it, but it was barely supported to begin with
and already required various user visible hacks to work. Also so old
that I just don't care.
TCG:
This is the nastiest one; it means migration of TCG guests (without
manual vsmt setting) will break. Since TCG is rarely used in production
I think this is worth it for the other benefits. It does also remove
one more barrier to TCG<->KVM migration which could be interesting for
debugging applications.
KVM PR:
As with TCG, this will break migration of existing configurations,
without adding extra manual vsmt options. As with TCG, it is rare in
production so I think the benefits outweigh breakages.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 8904e5a75005fe579c28806003892d8ae4a27dfa) Signed-off-by: Greg Kurz <groug@kaod.org>
David Gibson [Tue, 16 Jan 2018 04:37:37 +0000 (15:37 +1100)]
spapr: Allow some cases where we can't set VSMT mode in the kernel
At present if we require a vsmt mode that's not equal to the kernel's
default, and the kernel doesn't let us change it (e.g. because it's an old
kernel without support) then we always fail.
But in fact we can cope with the kernel having a different vsmt as long as
a) it's >= the actual number of vthreads/vcore (so that guest threads
that are supposed to be on the same core act like it)
b) it's a submultiple of the requested vsmt mode (so that guest threads
spaced by the vsmt value will act like they're on different cores)
Allowing this case gives us a bit more freedom to adjust the vsmt behaviour
without breaking existing cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1f20f2e0ee61d91abff4e86ed1cda1b5244647d3) Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20180307154258.9313-1-kraxel@redhat.com
(cherry picked from commit 2ca5c43091324a68772dc348cdf157c63888c168) Signed-off-by: Greg Kurz <groug@kaod.org>
linzhecheng [Thu, 11 Jan 2018 13:27:24 +0000 (21:27 +0800)]
vga: check the validation of memory addr when draw text
Start a vm with qemu-kvm -enable-kvm -vnc :66 -smp 1 -m 1024 -hda
redhat_5.11.qcow2 -device pcnet -vga cirrus,
then use VNC client to connect to VM, and excute the code below in guest
OS will lead to qemu crash:
int main()
{
iopl(3);
srand(time(NULL));
int a,b;
while(1){
a = rand()%0x100;
b = 0x3c0 + (rand()%0x20);
outb(a,b);
}
return 0;
}
The above code is writing the registers of VGA randomly.
We can write VGA CRT controller registers index 0x0C or 0x0D
(which is the start address register) to modify the
the display memory address of the upper left pixel
or character of the screen. The address may be out of the
range of vga ram. So we should check the validation of memory address
when reading or writing it to avoid segfault.
Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-id: 20180111132724.13744-1-linzhecheng@huawei.com Fixes: CVE-2018-5683 Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 191f59dc17396bb5a8da50f8c59b6e0a430711a4) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: track how much decoded data we consumed when doing SASL encoding
I attempted to fix a flaw with tracking how much data had actually been
processed when encoding with SASL. With that flaw, the VNC server could
mistakenly discard queued data that had not been sent.
The fix was not quite right though, because it merely decremented the
vs->output.offset value. This is effectively discarding data from the
end of the pending output buffer. We actually need to discard data from
the start of the pending output buffer. We also want to free memory that
is no longer required. The correct way to handle this is to use the
buffer_advance() helper method instead of directly manipulating the
offset value.
Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 20180201155841.27509-1-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 627ebec208a8809818589e17f4fce55a59420ad2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: avoid sign extension using client width/height
Pixman returns a signed int for the image width/height, but the VNC
protocol only permits a unsigned int16. Effective framebuffer size
is determined by the guest, limited by the video RAM size, so the
dimensions are unlikely to exceed the range of an unsigned int16,
but this is not currently validated.
With the current use of 'int' for client width/height, the calculation
of offsets in vnc_update_throttle_offset() suffers from integer size
promotion and sign extension, causing coverity warnings
*** CID 1385147: Integer handling issues (SIGN_EXTENSION)
/ui/vnc.c: 979 in vnc_update_throttle_offset()
973 * than that the client would already suffering awful audio
974 * glitches, so dropping samples is no worse really).
975 */
976 static void vnc_update_throttle_offset(VncState *vs)
977 {
978 size_t offset =
>>> CID 1385147: Integer handling issues (SIGN_EXTENSION)
>>> Suspicious implicit sign extension:
"vs->client_pf.bytes_per_pixel" with type "unsigned char" (8 bits,
unsigned) is promoted in "vs->client_width * vs->client_height *
vs->client_pf.bytes_per_pixel" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long" (64 bits, unsigned). If
"vs->client_width * vs->client_height * vs->client_pf.bytes_per_pixel"
is greater than 0x7FFFFFFF, the upper bits of the result will all be 1.
979 vs->client_width * vs->client_height * vs->client_pf.bytes_per_pixel;
Change client_width / client_height to be a size_t to avoid sign
extension and integer promotion. Then validate that dimensions are in
range wrt the RFB protocol u16 limits.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20180118155254.17053-1-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 4c956bd81e2e16afd19d38d1fdeba6d9faa8a1ae) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
While the QIOChannel APIs for reading/writing data return ssize_t, with negative
value indicating an error, the VNC code passes this return value through the
vnc_client_io_error() method. This detects the error condition, disconnects the
client and returns 0 to indicate error. Thus all the VNC helper methods should
return size_t (unsigned), and misleading comments which refer to the possibility
of negative return values need fixing.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-14-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 30b80fd5269257f55203b7072c505b4ebaab5115) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: place a hard cap on VNC server output buffer size
The previous patches fix problems with throttling of forced framebuffer updates
and audio data capture that would cause the QEMU output buffer size to grow
without bound. Those fixes are graceful in that once the client catches up with
reading data from the server, everything continues operating normally.
There is some data which the server sends to the client that is impractical to
throttle. Specifically there are various pseudo framebuffer update encodings to
inform the client of things like desktop resizes, pointer changes, audio
playback start/stop, LED state and so on. These generally only involve sending
a very small amount of data to the client, but a malicious guest might be able
to do things that trigger these changes at a very high rate. Throttling them is
not practical as missed or delayed events would cause broken behaviour for the
client.
This patch thus takes a more forceful approach of setting an absolute upper
bound on the amount of data we permit to be present in the output buffer at
any time. The previous patch set a threshold for throttling the output buffer
by allowing an amount of data equivalent to one complete framebuffer update and
one seconds worth of audio data. On top of this it allowed for one further
forced framebuffer update to be queued.
To be conservative, we thus take that throttling threshold and multiply it by
5 to form an absolute upper bound. If this bound is hit during vnc_write() we
forceably disconnect the client, refusing to queue further data. This limit is
high enough that it should never be hit unless a malicious client is trying to
exploit the sever, or the network is completely saturated preventing any sending
of data on the socket.
This completes the fix for CVE-2017-15124 started in the previous patches.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-12-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f887cf165db20f405cb8805c716bd363aaadf815) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: fix VNC client throttling when forced update is requested
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).
The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check is disabled if the client has requested
a forced update, because we want to send these as soon as possible.
As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then repeatedly send full framebuffer update
requests, but never read data back from the server. This can easily make QEMU's
VNC server send buffer consume 100MB of RAM per second, until the OOM killer
starts reaping processes (hopefully the rogue QEMU process, but it might pick
others...).
To address this we make the throttling more intelligent, so we can throttle
full updates. When we get a forced update request, we keep track of exactly how
much data we put on the output buffer. We will not process a subsequent forced
update request until this data has been fully sent on the wire. We always allow
one forced update request to be in flight, regardless of what data is queued
for incremental updates or audio data. The slight complication is that we do
not initially know how much data an update will send, as this is done in the
background by the VNC job thread. So we must track the fact that the job thread
has an update pending, and not process any further updates until this job is
has been completed & put data on the output buffer.
This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.
This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:
ui: fix VNC client throttling when audio capture is active
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).
The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check must be disabled if audio capture is
enabled, because when streaming audio the output buffer offset will rarely be
zero due to queued audio data, and so this would starve framebuffer updates.
As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then enable audio capture, and simply never
read data back from the server. This can easily make QEMU's VNC server send
buffer consume 100MB of RAM per second, until the OOM killer starts reaping
processes (hopefully the rogue QEMU process, but it might pick others...).
To address this we make the throttling more intelligent, so we can throttle
when audio capture is active too. To determine how to throttle incremental
updates or audio data, we calculate a size threshold. Normally the threshold is
the approximate number of bytes associated with a single complete framebuffer
update. ie width * height * bytes per pixel. We'll send incremental updates
until we hit this threshold, at which point we'll stop sending updates until
data has been written to the wire, causing the output buffer offset to fall
back below the threshold.
If audio capture is enabled, we increase the size of the threshold to also
allow for upto 1 seconds worth of audio data samples. ie nchannels * bytes
per sample * frequency. This allows the output buffer to have a mixture of
incremental framebuffer updates and audio data queued, but once the threshold
is exceeded, audio data will be dropped and incremental updates will be
throttled.
This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.
This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:
ui: refactor code for determining if an update should be sent to the client
The logic for determining if it is possible to send an update to the client
will become more complicated shortly, so pull it out into a separate method
for easier extension later.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-9-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0bad834228b9ee63e4239108d02dcb94568254d0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: correctly reset framebuffer update state after processing dirty regions
According to the RFB protocol, a client sends one or more framebuffer update
requests to the server. The server can reply with a single framebuffer update
response, that covers all previously received requests. Once the client has
read this update from the server, it may send further framebuffer update
requests to monitor future changes. The client is free to delay sending the
framebuffer update request if it needs to throttle the amount of data it is
reading from the server.
The QEMU VNC server, however, has never correctly handled the framebuffer
update requests. Once QEMU has received an update request, it will continue to
send client updates forever, even if the client hasn't asked for further
updates. This prevents the client from throttling back data it gets from the
server. This change fixes the flawed logic such that after a set of updates are
sent out, QEMU waits for a further update request before sending more data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-8-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 728a7ac95484a7ba5e624ccbac4c1326571576b0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: introduce enum to track VNC client framebuffer update request state
Currently the VNC servers tracks whether a client has requested an incremental
or forced update with two boolean flags. There are only really 3 distinct
states to track, so create an enum to more accurately reflect permitted states.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-7-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fef1bbadfb2c3027208eb3d14b43e1bdb51166ca) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: track how much decoded data we consumed when doing SASL encoding
When we encode data for writing with SASL, we encode the entire pending output
buffer. The subsequent write, however, may not be able to send the full encoded
data in one go though, particularly with a slow network. So we delay setting the
output buffer offset back to zero until all the SASL encoded data is sent.
Between encoding the data and completing sending of the SASL encoded data,
however, more data might have been placed on the pending output buffer. So it
is not valid to set offset back to zero. Instead we must keep track of how much
data we consumed during encoding and subtract only that amount.
With the current bug we would be throwing away some pending data without having
sent it at all. By sheer luck this did not previously cause any serious problem
because appending data to the send buffer is always an atomic action, so we
only ever throw away complete RFB protocol messages. In the case of frame buffer
updates we'd catch up fairly quickly, so no obvious problem was visible.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-6-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 8f61f1c5a6bc06438a1172efa80bc7606594fa07) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ui: avoid pointless VNC updates if framebuffer isn't dirty
The vnc_update_client() method checks the 'has_dirty' flag to see if there are
dirty regions that are pending to send to the client. Regardless of this flag,
if a forced update is requested, updates must be sent. For unknown reasons
though, the code also tries to sent updates if audio capture is enabled. This
makes no sense as audio capture state does not impact framebuffer contents, so
this check is removed.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-5-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3541b08475d51bddf8aded36576a0ff5a547a978) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Added a check for vs->disconnecting at the very start of the
vnc_update_client method. This means that the very next "if"
statement check for !vs->disconnecting always evaluates true,
and is thus redundant. This in turn means the vs->disconnecting
check at the very end of the method never evaluates true, and
is thus unreachable code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-3-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit c53df961617736f94731d94b62c2954c261d2bae) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Tue, 6 Feb 2018 11:23:30 +0000 (12:23 +0100)]
migration: incoming postcopy advise sanity checks
If postcopy-ram was set on the source but not on the destination,
migration doesn't occur, the destination prints an error and boots
the guest:
qemu-system-ppc64: Expected vmdescription section, but got 0
We end up with two running instances.
This behaviour was introduced in 2.11 by commit 58110f0acb1a "migration:
split common postcopy out of ram postcopy" to prepare ground for the
upcoming dirty bitmap postcopy support. It adds a new case where the
source may send an empty postcopy advise because dirty bitmap doesn't
need to check page sizes like RAM postcopy does.
If the source has enabled postcopy-ram, then it sends an advise with
the page size values. If the destination hasn't enabled postcopy-ram,
then loadvm_postcopy_handle_advise() leaves the page size values on
the stream and returns. This confuses qemu_loadvm_state() later on
and causes the destination to start execution.
As discussed several times, postcopy-ram should be enabled both sides
to be functional. This patch changes the destination to perform some
extra checks on the advise length to ensure this is the case. Otherwise
an error is returned and migration is aborted.
Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <151791621042.19120.3103118434734245776.stgit@bahia> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 875fcd013ab68c64802998b22f54f0184479d21b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32
MAX_VM_CMD_PACKAGED_SIZE is a constant used in qemu_savevm_send_packaged
and loadvm_handle_cmd_packaged to determine whether a package is too
big to be sent or received. qemu_savevm_send_packaged is called inside
postcopy_start (migration/migration.c) to send the MigrationState
in a single blob to the destination, using the MIG_CMD_PACKAGED subcommand,
which will read it up using loadvm_handle_cmd_packaged. If the blob is
larger than MAX_VM_CMD_PACKAGED_SIZE, an error is thrown and the postcopy
migration is aborted. Both MAX_VM_CMD_PACKAGED_SIZE and MIG_CMD_PACKAGED
were introduced by commit 11cf1d984b ("MIG_CMD_PACKAGED: Send a packaged
chunk ..."). The constant has its original value of 1ul << 24 (16MB).
The current MAX_VM_CMD_PACKAGED_SIZE value is not enough to support postcopy
migration of bigger pseries guests. The blob size for a postcopy migration of
a pseries guest with the following setup:
Goes around 12MB. Bumping the RAM to 128G makes the blob sizes goes to 20MB.
With 256G the blob goes to 37MB - more than twice the current maximum size.
At this moment the pseries machine can handle guests with up to 1TB of RAM,
making this postcopy blob goes to 128MB of size approximately.
Following the discussions made in [1], there is a need to understand what
devices are aggressively consuming the blob in that manner and see if that
can be mitigated. Until then, we can set MAX_VM_CMD_PACKAGED_SIZE to the
maximum value allowed. Since the size is a 32 bit int variable, we can set
it as 1ul << 32, giving a maximum blob size of 4G that is enough to support
postcopy migration of 32TB RAM guests given the above constraints.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit ee555cdf4d495ddd83633406e3099c5d1ef22e0a) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
migration: Recover block devices if failure in device state
In e91d895 I added the new pause-before-switchover mechanism
to allow migration completion to be delayed; this changes the
last state prior to completion to MIGRATE_STATUS_DEVICE rather
than MIGRATE_STATUS_ACTIVE.
Fix the failure path in migration_completion to recover the block
devices if it fails in MIGRATE_STATUS_DEVICE, not just the
MIGRATE_STATUS_ACTIVE that it previously had.
This corresponds to rh bz:
https://bugzilla.redhat.com/show_bug.cgi?id=1538494
whose symptom is an occasional source crash on a failed migration.
Fixes: e91d8951d59d483f085f Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 6039dd5b1c45d76403b9dcadd2afd7efd8f42330) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Ross Lagerwall [Wed, 1 Nov 2017 14:25:23 +0000 (14:25 +0000)]
migration: Don't leak IO channels
Since qemu_fopen_channel_{in,out}put take references on the underlying
IO channels, make sure to release our references to them.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Message-Id: <20171101142526.1006-2-ross.lagerwall@citrix.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 032b79f7173051e7f8742a43d106c7fc526856f9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
target/ppc/spapr_caps: Add support for tristate spapr_capabilities
spapr_caps are used to represent the level of support for various
capabilities related to the spapr machine type. Currently there is
only support for boolean capabilities.
Add support for tristate capabilities by implementing their get/set
functions. These capabilities can have the values 0, 1 or 2
corresponding to broken, workaround and fixed.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 6898aed77f4636c3e77af9c12631f583f22cb5db) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Add three new kvm capabilities used to represent the level of host support
for three corresponding workarounds.
Host support for each of the capabilities is queried through the
new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The
first two, character and behaviour, represent the available
characteristics of the cpu and the behaviour of the cpu respectively.
The second two, c_mask and b_mask, represent the mask of known bits for
the character and beheviour dwords respectively.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Correct some compile errors due to name change in final kernel
patch version] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 8acc2ae5e91681ceda3ff4cf946ebf163f6012e9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
target/ppc/spapr_caps: Add macro to generate spapr_caps migration vmstate
The vmstate description and the contained needed function for migration
of spapr_caps is the same for each cap, with the name of the cap
substituted. As such introduce a macro to allow for easier generation of
these.
Convert the three existing spapr_caps (htm, vsx, and dfp) to use this
macro.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1f63ebaa91f73f469c8f107dbd266cabdbea3a40) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
and use them in a couple of obvious places. Other macros will be used
in the model of the XIVE interrupt controller.
Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2a83f9976efa9a85e8ceb9d1035a68f25c321334) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Wed, 17 Jan 2018 09:20:42 +0000 (10:20 +0100)]
spapr: fix device tree properties when using compatibility mode
Commit 51f84465dd98 changed the compatility mode setting logic:
- machine reset only sets compatibility mode for the boot CPU
- compatibility mode is set for other CPUs when they are put online
by the guest with the "start-cpu" RTAS call
This causes a regression for machines started with max-compat-cpu:
the device tree nodes related to secondary CPU cores contain wrong
"cpu-version" and "ibm,pa-features" values, as shown below.
Guest started on a POWER8 host with:
-smp cores=2 -machine pseries,max-cpu-compat=compat7
The second core is advertised in raw POWER8 mode. This happens because
CAS assumes all CPUs to have the same compatibility mode. Since the
boot CPU already has the requested compatibility mode, the CAS code
does not set it for the secondary one, and exposes the bogus device
tree properties in in the CAS response to the guest.
A similar situation is observed when hot-plugging a CPU core. The
related device tree properties are generated and exposed to guest
with the "ibm,configure-connector" RTAS before "start-cpu" is called.
The CPU core is advertised to the guest in raw mode as well.
It both cases, it boils down to the fact that "start-cpu" happens too
late. This can be fixed globally by propagating the compatibility mode
of the boot CPU to the other CPUs during reset. For this to work, the
compatibility mode of the boot CPU must be set before the machine code
actually resets all CPUs.
It is not needed to set the compatibility mode in "start-cpu" anymore,
so the code is dropped.
Fixes: 51f84465dd98 Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 9012a53f067a78022947e18050b145c34a3dc599)
Conflicts:
hw/ppc/spapr_cpu_core.c
hw/ppc/spapr_rtas.c
* drop context dep on d6322252b32 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ppc: Change Power9 compat table to support at most 8 threads/core
Increases the max smt mode to 8 for Power9. That's because KVM supports
smt emulation in this platform so QEMU should allow users to use it as
well.
Today if we try to pass -smp ...,threads=8, QEMU will silently truncate
it to smt4 mode and may cause a crash if we try to perform a cpu
hotplug.
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
[dwg: Added an explanatory comment] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 03ee51d3548f5f553a3089f466483c1c6d5c666b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
hw/ppc/spapr_caps: Rework spapr_caps to use uint8 internal representation
Currently spapr_caps are tied to boolean values (on or off). This patch
reworks the caps so that they can have any uint8 value. This allows more
capabilities with various values to be represented in the same way
internally. Capabilities are numbered in ascending order. The internal
representation of capability values is an array of uint8s in the
sPAPRMachineState, indexed by capability number.
Capabilities can have their own name, description, options, getter and
setter functions, type and allow functions. They also each have their own
section in the migration stream. Capabilities are only migrated if they
were explictly set on the command line, with the assumption that
otherwise the default will match.
On migration we ensure that the capability value on the destination
is greater than or equal to the capability value from the source. So
long at this remains the case then the migration is considered
compatible and allowed to continue.
This patch implements generic getter and setter functions for boolean
capabilities. It also converts the existings cap-htm, cap-vsx and
cap-dfp capabilities to this new format.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4e5fe3688e23d61b45cc549ff1322aff8f50ef45)
Conflicts:
include/hw/ppc/spapr.h
*drop context dep on 60c6823b9bc Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Mon, 11 Dec 2017 06:34:30 +0000 (17:34 +1100)]
spapr: Handle Decimal Floating Point (DFP) as an optional capability
Decimal Floating Point has been available on POWER7 and later (server)
cpus. However, it can be disabled on the hypervisor, meaning that it's
not available to guests.
We currently handle this by conditionally advertising DFP support in the
device tree depending on whether the guest CPU model supports it - which
can also depend on what's allowed in the host for -cpu host. That can lead
to confusion on migration, since host properties are silently affecting
guest visible properties.
This patch handles it by treating it as an optional capability for the
pseries machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 2d1fb9bc8e6e78931d8e1bfeb0ed7a4d223b0480) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Thu, 7 Dec 2017 06:08:47 +0000 (17:08 +1100)]
spapr: Handle VMX/VSX presence as an spapr capability flag
We currently have some conditionals in the spapr device tree code to decide
whether or not to advertise the availability of the VMX (aka Altivec) and
VSX vector extensions to the guest, based on whether the guest cpu has
those features.
This can lead to confusion and subtle failures on migration, since it makes
a guest visible change based only on host capabilities. We now have a
better mechanism for this, in spapr capabilities flags, which explicitly
depend on user options rather than host capabilities.
Rework the advertisement of VSX and VMX based on a new VSX capability. We
no longer bother with a conditional for VMX support, because every CPU
that's ever been supported by the pseries machine type supports VMX.
NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on
availability of VSX in libc, so using cap-vsx=off may lead to a fatal
SIGILL in init.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 2938664286499c0c30d6e455a7e2e5d3e6c3f63d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Mon, 11 Dec 2017 06:41:34 +0000 (17:41 +1100)]
target/ppc: Clean up probing of VMX, VSX and DFP availability on KVM
When constructing the "host" cpu class we modify whether the VMX and VSX
vector extensions and DFP (Decimal Floating Point) are available
based on whether KVM can support those instructions. This can depend on
policy in the host kernel as well as on the actual host cpu capabilities.
However, the way we probe for this is not very nice: we explicitly check
the host's device tree. That works in practice, but it's not really
correct, since the device tree is a property of the host kernel's platform
which we don't really know about. We get away with it because the only
modern POWER platforms happen to encode VMX, VSX and DFP availability in
the device tree in the same way.
Arguably we should have an explicit KVM capability for this, but we haven't
needed one so far. Barring specific KVM policies which don't yet exist,
each of these instruction classes will be available in the guest if and
only if they're available in the qemu userspace process. We can determine
that from the ELF AUX vector we're supplied with.
Once reworked like this, there are no more callers for kvmppc_get_vmx() and
kvmppc_get_dfp() so remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 3f2ca480eb872b4946baf77f756236b637a5b15a) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Mon, 11 Dec 2017 04:09:37 +0000 (15:09 +1100)]
spapr: Validate capabilities on migration
Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration. Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.
This adds code to migrate the capabilities flags. These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence. However,
they are checked against the destination capabilities.
If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour. If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit be85537d654565e35e359a74b46fc08b7956525c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Mon, 11 Dec 2017 02:10:44 +0000 (13:10 +1100)]
spapr: Treat Hardware Transactional Memory (HTM) as an optional capability
This adds an spapr capability bit for Hardware Transactional Memory. It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it). However it is disabled by default for the latest pseries-2.12
machine type.
This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:
* This way running with -M pseries,accel=tcg will start with whatever cpu
and will provide the same guest visible model as with accel=kvm.
- More specifically, this means existing make check tests don't have
to be modified to use cap-htm=off in order to run with TCG
* We hope to add a new "HTM without suspend" feature in the not too
distant future which could work on both POWER8 and POWER9 cpus, and
could be enabled by default.
* Best guesses suggest that future POWER cpus may well only support the
HTM-without-suspend model, not the (frankly, horribly overcomplicated)
POWER8 style HTM with suspend.
* Anecdotal evidence suggests problems with HTM being enabled when it
wasn't wanted are more common than being missing when it was.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit ee76a09fc72cfbfab2bb5529320ef7e460adffd8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Thu, 7 Dec 2017 23:35:35 +0000 (10:35 +1100)]
spapr: Capabilities infrastructure
Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor. PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.
In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.
Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.
We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability. But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.
This introduces an infrastructure for defining "sPAPR capabilities". These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.
The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations). If not we simply fail, rather than silently modifying the
advertised featureset to the guest.
This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 33face6b8981add8eba1f7cdaf4cf6cede415d2e)
Conflicts:
include/hw/ppc/spapr.h
*drop context dep on 60c6823b9bc Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Mon, 13 Nov 2017 05:50:40 +0000 (16:50 +1100)]
spapr: Add pseries-2.12 machine type
While we're at it fix a couple of small errors in the 2.11 and 2.10 models
(they didn't have any real effect, but don't quite match the template).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2b6154120cbd7f5514cefd3c6084d39922d26d88) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Laurent Vivier [Thu, 14 Dec 2017 18:09:48 +0000 (19:09 +0100)]
spapr: don't initialize PATB entry if max-cpu-compat < power9
if KVM is enabled and KVM capabilities MMU radix is available,
the partition table entry (patb_entry) for the radix mode is
initialized by default in ppc_spapr_reset().
It's a problem if we want to migrate the guest to a POWER8 host
while the kernel is not started to set the value to the one
expected for a POWER8 CPU.
The "-machine max-cpu-compat=power8" should allow to migrate
a POWER9 KVM host to a POWER8 KVM host, but because patb_entry
is set, the destination QEMU tries to enable radix mode on the
POWER8 host. This fails and cancels the migration:
Process table config unsupported by the host
error while loading state for instance 0x0 of device 'spapr'
load of migration failed: Invalid argument
This patch doesn't set the PATB entry if the user provides
a CPU compatibility mode that doesn't support radix mode.
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1481fe5fcfeb7fcf3c1ebb9d8c0432e3e0188ccf) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Tue, 30 Jan 2018 13:17:19 +0000 (13:17 +0000)]
linux-user/signal.c: Rename MC_* defines
The SPARC code in linux-user/signal.c defines a set of
MC_* constants. On some SPARC hosts these are also defined
by sys/ucontext.h, resulting in build failures:
In file included from /usr/include/signal.h:302:0,
from include/qemu/osdep.h:86,
from linux-user/signal.c:19:
/usr/include/sparc64-linux-gnu/sys/ucontext.h:59:0: note: this is the location of the previous definition
# define MC_NGREG __MC_NGREG
Rename all these constants to SPARC_MC_* to avoid the clash.
Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1517318239-15764-1-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 8ebb314b957403c1c9a3f1cf995f73c6ae9d5d10) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Fri, 26 Jan 2018 22:25:24 +0000 (23:25 +0100)]
spapr_pci: fix MSI/MSIX selection
In various place we don't correctly check if the device supports MSI or
MSI-X. This can cause devices to be advertised with MSI support, even
if they only support MSI-X (like virtio-pci-* devices for example):
Worse, this can also cause the "ibm,change-msi" RTAS call to corrupt the
PCI status and cause migration to fail:
qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x6
read: 0 device: 10 cmask: 10 wmask: 0 w1cmask:0
^^
PCI_STATUS_CAP_LIST bit which is assumed to be constant
This patch changes spapr_populate_pci_child_dt() to properly check for
MSI support using msi_present(): this ensures that PCIDevice::msi_cap
was set by msi_init() and that msi_nr_vectors_allocated() will look at
the right place in the config space.
Checking PCIDevice::msix_entries_nr is enough for MSI-X but let's add
a call to msix_present() there as well for consistency.
It also changes rtas_ibm_change_msi() to select the appropriate MSI
type in Function 1 instead of always selecting plain MSI. This new
behaviour is compliant with LoPAPR 1.1, as described in "Table 71.
ibm,change-msi Argument Call Buffer":
Function 1: If Number Outputs is equal to 3, request to set to a new
number of MSIs (including set to 0).
If the “ibm,change-msix-capable” property exists and Number
Outputs is equal to 4, request is to set to a new number of
MSI or MSI-X (platform choice) interrupts (including set to
0).
Since MSI is the the platform default (LoPAPR 6.2.3 MSI Option), let's
check for MSI support first.
And finally, it checks the input parameters are valid, as described in
LoPAPR 1.1 "R1–7.3.10.5.1–3":
For the MSI option: The platform must return a Status of -3 (Parameter
error) from ibm,change-msi, with no change in interrupt assignments if
the PCI configuration address does not support MSI and Function 3 was
requested (that is, the “ibm,req#msi” property must exist for the PCI
configuration address in order to use Function 3), or does not support
MSI-X and Function 4 is requested (that is, the “ibm,req#msi-x” property
must exist for the PCI configuration address in order to use Function 4),
or if neither MSIs nor MSI-Xs are supported and Function 1 is requested.
This ensures that the ret_intr_type variable contains a valid MSI type
for this device, and that spapr_msi_setmsg() won't corrupt the PCI status.
Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 9cbe305b60cc49cfcd134765b85c28be95b1b57d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Fam Zheng [Wed, 17 Jan 2018 00:52:22 +0000 (08:52 +0800)]
usb-storage: Fix share-rw option parsing
Because usb-storage creates an internal scsi device, we should propagate
options. We already do so for bootindex etc, but failed to take care of
share-rw. Fix it in an apparent way: add a new parameter to
scsi_bus_legacy_add_drive and pass in s->conf.share_rw.
Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-id: 20180117005222.4781-1-famz@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 395b95395934785ca86baafd314d0c31b307d16d)
Conflicts:
hw/usb/dev-storage.c
* dropped context dep on ceff3e1f01e Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Fam Zheng [Tue, 26 Dec 2017 06:53:00 +0000 (14:53 +0800)]
osdep: Retry SETLK upon EINTR
We could hit lock failure if there is a signal that makes fcntl return
-1 and errno set to EINTR. In this case we should retry.
Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit f86428a1f4f91a460ed585682af70d3e8c31dc06) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>