Eric Blake [Fri, 15 Jul 2016 23:23:07 +0000 (17:23 -0600)]
nbd: Convert to byte-based interface
The NBD protocol doesn't have any notion of sectors, so it is
a fairly easy conversion to use byte-based read and write.
Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468624988-423-19-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:23:06 +0000 (17:23 -0600)]
block: Kill .bdrv_co_discard()
Now that all drivers have a byte-based .bdrv_co_pdiscard(), we
no longer need to worry about the sector-based version. We can
also relax our minimum alignment to 1 for drivers that support it.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:23:05 +0000 (17:23 -0600)]
sheepdog: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-17-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:23:04 +0000 (17:23 -0600)]
raw_bsd: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-16-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:23:03 +0000 (17:23 -0600)]
qcow2: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-15-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:23:02 +0000 (17:23 -0600)]
nbd: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.
While at it, call directly into nbd-client.c instead of having
a pointless trivial wrapper in nbd.c.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-14-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:23:01 +0000 (17:23 -0600)]
iscsi: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.
Unlike write_zeroes, where we can be handed unaligned requests
and must fail gracefully with -ENOTSUP for a fallback, we are
guaranteed that discard requests are always aligned because the
block layer already ignored unaligned head/tail.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-13-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:23:00 +0000 (17:23 -0600)]
gluster: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-12-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:59 +0000 (17:22 -0600)]
blkreplay: Switch .bdrv_co_discard() to byte-based
Another step towards killing off sector-based block APIs.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-11-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:58 +0000 (17:22 -0600)]
block: Add .bdrv_co_pdiscard() driver callback
There's enough drivers with a sector-based callback that it will
be easier to switch one at a time. This patch adds a byte-based
callback, and then after all drivers are swapped, we'll drop the
sector-based callback.
[checkpatch doesn't like the space after coroutine_fn in
block_int.h, but it's consistent with the rest of the file]
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:57 +0000 (17:22 -0600)]
block: Convert .bdrv_aio_discard() to byte-based
Another step towards byte-based interfaces everywhere. Replace
the sector-based driver callback .bdrv_aio_discard() with a new
byte-based .bdrv_aio_pdiscard(). Only raw-posix and RBD drivers
are affected, so it was not worth splitting into multiple patches.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:56 +0000 (17:22 -0600)]
rbd: Switch rbd_start_aio() to byte-based
The internal function converts to byte-based before calling into
RBD code; hoist the conversion to the callers so that callers
can then be switched to byte-based themselves.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-8-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:55 +0000 (17:22 -0600)]
raw-posix: Switch paio_submit() to byte-based
The only remaining uses of paio_submit() were flush (with no
offset or count) and discard (which we are switching to byte-based);
furthermore, the similarly named paio_submit_co() is already
byte-based.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-7-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:54 +0000 (17:22 -0600)]
block: Convert BB interface to byte-based discards
Change sector-based blk_discard(), blk_co_discard(), and
blk_aio_discard() to instead be byte-based blk_pdiscard(),
blk_co_pdiscard(), and blk_aio_pdiscard(). NBD gets a lot
simpler now that ignoring the unaligned portion of a
byte-based discard request is handled under the hood by
the block layer.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:53 +0000 (17:22 -0600)]
block: Convert bdrv_aio_discard() to byte-based
Another step towards byte-based interfaces everywhere. Replace
the sector-based bdrv_aio_discard() with a new byte-based
bdrv_aio_pdiscard(), which silently ignores any unaligned head
or tail. Driver callbacks will be converted in followup patches.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-5-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:52 +0000 (17:22 -0600)]
block: Switch BlockRequest to byte-based
BlockRequest is the internal struct used by bdrv_aio_*. At the
moment, all such calls were sector-based, but we will eventually
convert to byte-based; start by changing the internal variables
to be byte-based. No change to behavior, although the read and
write code can now go byte-based through more of the stack.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-4-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:51 +0000 (17:22 -0600)]
block: Convert bdrv_discard() to byte-based
Another step towards byte-based interfaces everywhere. Replace
the sector-based bdrv_discard() with a new byte-based
bdrv_pdiscard(), which silently ignores any unaligned head
or tail.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-3-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 23:22:50 +0000 (17:22 -0600)]
block: Convert bdrv_co_discard() to byte-based
Another step towards byte-based interfaces everywhere. Replace
the sector-based bdrv_co_discard() with a new byte-based
bdrv_co_pdiscard(), which silently ignores any unaligned head
or tail. Driver callbacks will be converted in followup patches.
By calculating the alignment outside of the loop, and clamping
the max discard to an aligned value, we can simplify the actions
done within the loop.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-2-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 18:32:04 +0000 (12:32 -0600)]
iscsi: Rely on block layer to break up large requests
Now that the block layer honors max_request, we don't need to
bother with an EINVAL on overlarge requests, but can instead
assert that requests are well-behaved.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-7-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 18:32:03 +0000 (12:32 -0600)]
nbd: Drop unused offset parameter
Now that NBD relies on the block layer to fragment things, we no
longer need to track an offset argument for which fragment of
a request we are actually servicing.
While at it, use true and false instead of 0 and 1 for a bool
parameter.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 18:32:02 +0000 (12:32 -0600)]
nbd: Rely on block layer to break up large requests
Now that the block layer will honor max_transfer, we can simplify
our code to rely on that guarantee.
The readv code can call directly into nbd-client, just as the
writev code has done since commit 52a4650.
Interestingly enough, while qemu-io 'w 0 40m' splits into a 32M
and 8M transaction, 'w -z 0 40m' splits into two 16M and an 8M,
because the block layer caps the bounce buffer for writing zeroes
at 16M. When we later introduce support for NBD_CMD_WRITE_ZEROES,
we can get a full 32M zero write (or larger, if the client and
server negotiate that write zeroes can use a larger size than
ordinary writes).
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-5-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 18:32:01 +0000 (12:32 -0600)]
block: Fragment writes to max transfer length
Drivers should be able to rely on the block layer honoring the
max transfer length, rather than needing to return -EINVAL
(iscsi) or manually fragment things (nbd). We already fragment
write zeroes at the block layer; this patch adds the fragmentation
for normal writes, after requests have been aligned (fragmenting
before alignment would lead to multiple unaligned requests, rather
than just the head and tail).
When fragmenting a large request where FUA was requested, but
where we know that FUA is implemented by flushing all requests
rather than the given request, then we can still get by with
only one flush. Note, however, that we need a followup patch
to the raw format driver to avoid a regression in the number of
flushes actually issued.
The return value was previously nebulous on success (sometimes
zero, sometimes the length written); since we never have a short
write, and since fragmenting may store yet another positive
value in 'ret', change the function to always return 0 on success,
matching what we do in bdrv_aligned_preadv().
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468607524-19021-4-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 18:32:00 +0000 (12:32 -0600)]
raw_bsd: Don't advertise flags not supported by protocol layer
The raw format layer supports all flags via passthrough - but
it only makes sense to pass through flags that the lower layer
actually supports.
The next patch gives stronger reasoning for why this is correct.
At the moment, the raw format layer ignores the max_transfer
limit of its protocol layer, and an attempt to do the qemu-io
'w -f 0 40m' to an NBD server that lacks FUA will pass the entire
40m request to the NBD driver, which then fragments the request
itself into a 32m write, 8m write, and flush. But once the block
layer starts honoring limits and fragmenting packets, the raw
driver will hand the NBD driver two separate requests; if both
requests have BDRV_REQ_FUA set, then this would result in a 32m
write, flush, 8m write, and second flush. By having the raw
layer no longer advertise FUA support when the protocol layer
lacks it, we are back to a single flush at the block layer for
the overall 40m request.
Note that 'w -f -z 0 40m' does not currently exhibit the same
problem, because there, the fragmentation does not occur until
at the NBD layer (the raw layer has .bdrv_co_pwrite_zeroes, and
the NBD layer doesn't advertise max_pwrite_zeroes to constrain
things at the raw layer) - but the problem is latent and we
would again have too many flushes without this patch once the
NBD layer implements support for the new NBD_CMD_WRITE_ZEROES
command, if it sets max_pwrite_zeroes to the same 32m limit as
recommended by the NBD protocol.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468607524-19021-3-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 15 Jul 2016 18:31:59 +0000 (12:31 -0600)]
block: Fragment reads to max transfer length
Drivers should be able to rely on the block layer honoring the
max transfer length, rather than needing to return -EINVAL
(iscsi) or manually fragment things (nbd). This patch adds
the fragmentation in the block layer, after requests have been
aligned (fragmenting before alignment would lead to multiple
unaligned requests, rather than just the head and tail).
The return value was previously nebulous on success on whether
it was zero or the length read; and fragmenting may introduce
yet other non-zero values if we use the last length read. But
as at least some callers are sloppy and expect only zero on
success, it is easiest to just guarantee 0.
[Fix uninitialized ret local variable in bdrv_aligned_preadv().
--Stefan]
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468607524-19021-2-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* remotes/pmaydell/tags/pull-target-arm-20160719:
arm_gicv3: Add assert()s to tell Coverity that offsets are aligned
target-arm: Fix unreachable code in gicv3_class_name()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/riku/tags/pull-linux-user-20160719-2:
linux-user: AArch64 has sync_file_range, not sync_file_range2
linux-user: Fix type for SIOCATMARK ioctl
linux-user: define missing sparc syscalls
linux-user: Fix terminal control ioctls
linux-user: Add some new blk ioctls
linux-user: Handle short lengths in host_to_target_sockaddr()
linux-user: Forget about synchronous signal once it is delivered
linux-user: Correct type for LOOP_GET_STATUS{,64} ioctls
linux-user: Correct type for BLKSSZGET
linux-user: Add loop control ioctls
linux-user: Check sigsetsize argument to syscalls
linux-user: add nested netlink types
linux-user: convert sockaddr_ll from host to target
linux-user: add fd_trans helper in do_recvfrom()
linux-user: fix netlink memory corruption
linux-user: fd_trans_*_data() returns the length
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 11 Jul 2016 18:22:52 +0000 (19:22 +0100)]
arm_gicv3: Add assert()s to tell Coverity that offsets are aligned
Coverity complains that the GICR_IPRIORITYR case in gicv3_readl()
can overflow an array, because it doesn't know that the offsets
passed to that function must be word aligned. Add some assert()s
which hopefully tell Coverity that this isn't possible.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1468261372-17508-1-git-send-email-peter.maydell@linaro.org
Peter Maydell [Mon, 11 Jul 2016 18:09:12 +0000 (19:09 +0100)]
target-arm: Fix unreachable code in gicv3_class_name()
Coverity complains that the exit() in gicv3_class_name()
can be unreachable, because if TARGET_AARCH64 is defined
then all code paths return before reaching it. Move the
exit() up to the error_report() that it belongs with.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1468260552-8400-1-git-send-email-peter.maydell@linaro.org
Peter Maydell [Tue, 19 Jul 2016 14:04:36 +0000 (15:04 +0100)]
disas: Fix ATTRIBUTE_UNUSED define clash with ALSA headers
disas/bfd.h defines ATTRIBUTE_UNUSED, but unfortunately the
ALSA system headers also define this macro, which means that
you can get a compilation failure if building with ALSA and
any files happen to include the alsa headers before bfd.h
rather than the other way around.
This is unfortunate namespace pollution by the ALSA headers but
we can work around it. Add an #ifndef guard to bfd.h and remove
the unnecessary extra definition in disas/arm.c to fix this.
Reported-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468937076-21503-1-git-send-email-peter.maydell@linaro.org
Peter Maydell [Tue, 19 Jul 2016 14:08:05 +0000 (15:08 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* two old patches from prospective GSoC students
* i386 -kernel device tree support
* Coverity fix
* memory usage improvement from Peter
* checkpatch fix
* g_path_get_dirname cleanup
* caching of block status for iSCSI
* remotes/bonzini/tags/for-upstream:
target-i386: Remove redundant HF_SOFTMMU_MASK
block/iscsi: allow caching of the allocation map
block/iscsi: fix rounding in iscsi_allocationmap_set
Move README to markdown
cpu-exec: Move down some declarations in cpu_exec()
exec: avoid realloc in phys_map_node_reserve
checkpatch: consider git extended headers valid patches
megasas: remove useless check for cmd->frame
compiler: never omit assertions if using a static analysis tool
hw/i386: add device tree support
Changed malloc to g_malloc, free to g_free in bsd-user/qemu.h
use g_path_get_dirname instead of dirname
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 15 Jul 2016 16:28:06 +0000 (17:28 +0100)]
linux-user: AArch64 has sync_file_range, not sync_file_range2
The AArch64 Linux ABI syscall 84 is sync_file_range, not
sync_file_range2 (in the kernel it uses the asm-generic
headers and does not define __ARCH_WANT_SYNC_FILE_RANGE2).
Update our TARGET_NR_* definitions accordingly.
This fixes the sync_file_range syscall which otherwise
gets its arguments in the wrong order.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Fri, 15 Jul 2016 11:09:31 +0000 (12:09 +0100)]
linux-user: Fix type for SIOCATMARK ioctl
The SIOCATMARK ioctl takes an argument which should be a
pointer to an integer where the kernel will write the result.
We were incorrectly declaring it as TYPE_NULL which would mean
it would always fail (with EFAULT) when it should succeed.
Correct the type.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Timothy Pearson [Sun, 19 Jun 2016 00:15:35 +0000 (19:15 -0500)]
linux-user: Fix terminal control ioctls
TIOCGPTN and related terminal control ioctls were not converted to the guest ioctl format on x86_64 targets. Convert these ioctls to enable terminal functionality on x86_64 guests.
Peter Maydell [Thu, 7 Jul 2016 14:44:43 +0000 (15:44 +0100)]
linux-user: Handle short lengths in host_to_target_sockaddr()
If userspace specifies a short buffer for a target sockaddr,
the kernel will only copy in as much as it has space for
(or none at all if the length is zero) -- see the kernel
move_addr_to_user() function. Mimic this in QEMU's
host_to_target_sockaddr() routine.
In particular, this fixes a segfault running the LTP
recvfrom01 test, where the guest makes a recvfrom()
call with a bad buffer pointer and other parameters which
cause the kernel to set the addrlen to zero; because we
did not skip the attempt to swap the sa_family field we
segfaulted on the bad address.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Wed, 6 Jul 2016 14:09:29 +0000 (15:09 +0100)]
linux-user: Forget about synchronous signal once it is delivered
Commit 655ed67c2a248cf which switched synchronous signals to
benig recorded in ts->sync_signal rather than in a queue
with every other signal had a bug: we failed to clear
the flag indicating that a synchronous signal was pending
when we delivered it. This meant that we would take the signal
again and again every time the guest made a syscall.
(This is a bug introduced in my refactoring of Timothy Baldwin's
original code.)
Fix this by passing in the struct emulated_sigtable* to
handle_pending_signal(), so that we clear the pending flag
in the ts->sync_signal struct when handling a synchronous signal.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Tue, 5 Jul 2016 15:36:29 +0000 (16:36 +0100)]
linux-user: Correct type for LOOP_GET_STATUS{,64} ioctls
The LOOP_GET_STATUS and LOOP_GET_STATUS64 ioctls were incorrectly
defined as IOC_W rather than IOC_R, which meant we weren't
correctly copying the information back from the kernel to the guest.
The loop_info64 structure definition was also missing a member
and using the wrong type for several 32-bit fields.
In particular, this meant that "kpartx -d image.img" didn't work
and "losetup -a" behaved strangely. Correct the ioctl type definitions.
Reported-by: Chanho Park <chanho61.park@samsung.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Mon, 4 Jul 2016 16:06:18 +0000 (17:06 +0100)]
linux-user: Correct type for BLKSSZGET
The BLKSSZGET ioctl takes an argument which is a pointer to an int.
We were incorrectly declaring it to take a pointer to a long, which
meant that we would incorrectly write to memory which we should not
if the guest is a 64-bit architecture.
In particular, kpartx uses this ioctl to write to an int on the
stack, which tends to result in it crashing immediately.
Reported-by: Chanho Park <chanho61.park@samsung.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Mon, 4 Jul 2016 16:06:17 +0000 (17:06 +0100)]
linux-user: Add loop control ioctls
Add support for the /dev/loop-control ioctls:
LOOP_CTL_ADD
LOOP_CTL_REMOVE
LOOP_CTL_GET_FREE
[RV: fixed to apply to new header guards] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Thu, 30 Jun 2016 13:23:24 +0000 (14:23 +0100)]
linux-user: Check sigsetsize argument to syscalls
Many syscalls which take a sigset_t argument also take an argument
giving the size of the sigset_t. The kernel insists that this
matches its idea of the type size and fails EINVAL if it is not.
Implement this logic in QEMU. (This mostly just means some LTP test
cases which check error cases now pass.)
Laurent Vivier [Mon, 27 Jun 2016 16:54:30 +0000 (18:54 +0200)]
linux-user: add nested netlink types
Nested types are used by the kernel to send link information and
protocol properties.
We can see following errors with "ip link show":
Unimplemented nested type 26
Unimplemented nested type 26
Unimplemented nested type 18
Unimplemented nested type 26
Unimplemented nested type 18
Unimplemented nested type 26
This patch implements nested types 18 (IFLA_LINKINFO) and
26 (IFLA_AF_SPEC).
Laurent Vivier [Tue, 21 Jun 2016 17:51:14 +0000 (19:51 +0200)]
linux-user: fix netlink memory corruption
Netlink is byte-swapping data in the guest memory (it's bad).
It's ok when the data come from the host as they are generated by the
host.
But it doesn't work when data come from the guest: the guest can
try to reuse these data whereas they have been byte-swapped.
This is what happens in glibc:
glibc generates a sequence number in nlh.nlmsg_seq and calls
sendto() with this nlh. In sendto(), we byte-swap nlmsg.seq.
Later, after the recvmsg(), glibc compares nlh.nlmsg_seq with
sequence number given in return, and of course it fails (hangs),
because nlh.nlmsg_seq is not valid anymore.
The involved code in glibc is:
sysdeps/unix/sysv/linux/check_pf.c:make_request()
...
req.nlh.nlmsg_seq = time (NULL);
...
if (TEMP_FAILURE_RETRY (__sendto (fd, (void *) &req, sizeof (req), 0,
(struct sockaddr *) &nladdr,
sizeof (nladdr))) < 0)
<here req.nlh.nlmsg_seq has been byte-swapped>
...
do
{
...
ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
...
struct nlmsghdr *nlmh;
for (nlmh = (struct nlmsghdr *) buf;
NLMSG_OK (nlmh, (size_t) read_len);
nlmh = (struct nlmsghdr *) NLMSG_NEXT (nlmh, read_len))
{
<we compare nlmh->nlmsg_seq with corrupted req.nlh.nlmsg_seq>
if (nladdr.nl_pid != 0 || (pid_t) nlmh->nlmsg_pid != pid
|| nlmh->nlmsg_seq != req.nlh.nlmsg_seq)
continue;
...
else if (nlmh->nlmsg_type == NLMSG_DONE)
/* We found the end, leave the loop. */
done = true;
}
}
while (! done);
As we have a continue on "nlmh->nlmsg_seq != req.nlh.nlmsg_seq",
"done" cannot be set to "true" and we have an infinite loop.
It's why commands like "apt-get update" or "dnf update hangs".
Peter Maydell [Tue, 19 Jul 2016 12:00:35 +0000 (13:00 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Tue 19 Jul 2016 03:33:40 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
e1000e: fix building without CONFIG_VMXNET3_PCI
MAINTAINERS: release Scott from being a rocker maintainer
tap: fix memory leak on failure to create a multiqueue tap device
net: fix incorrect argument to iov_to_buf
net: fix incorrect access to pointer
e1000e: fix incorrect access to pointer
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/jnsnow/tags/ide-pull-request:
block: ignore flush requests when storage is clean
tests: in IDE and AHCI tests perform DMA write before flushing
ide: set retry_unit for PIO and FLUSH requests
ide: refactor retry_unit set and clear into separate function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/stefanha/tags/tracing-pull-request:
trace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state
trace: Allow event name pattern in "info trace-events"
trace: Conditionally trace events based on their per-vCPU state
trace: Add per-vCPU tracing states for events with the 'vcpu' property
trace: Cosmetic changes on fast-path tracing
disas: Remove unused macro '_'
trace: Identify events with the 'vcpu' property
trace: [bsd-user] Commandline arguments to control tracing
trace: [linux-user] Commandline arguments to control tracing
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 19 Jul 2016 08:02:05 +0000 (09:02 +0100)]
Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160718.0' into staging
VFIO update 2016-07-18
One fix for 2.7-rc0 which hides the ARI extended capability, fixing
multifunction support in PCIe configurations where the assigned device
function topology does not match the host (Alex Williamson)
Peter Lieven [Mon, 18 Jul 2016 08:52:20 +0000 (10:52 +0200)]
block/iscsi: allow caching of the allocation map
until now the allocation map was used only as a hint if a cluster
is allocated or not. If a block was not allocated (or Qemu had
no info about the allocation status) a get_block_status call was
issued to check the allocation status and possibly avoid
a subsequent read of unallocated sectors. If a block known to be
allocated the get_block_status call was omitted. In the other case
a get_block_status call was issued before every read to avoid
the necessity for a consistent allocation map. To avoid the
potential overhead of calling get_block_status for each and
every read request this took only place for the bigger requests.
This patch enhances this mechanism to cache the allocation
status and avoid calling get_block_status for blocks where
the allocation status has been queried before. This allows
for bypassing the read request even for smaller requests and
additionally omits calling get_block_status for known to be
unallocated blocks.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the README file to markdown so that it makes the github page look
prettier. I know that github repo is a mirror and not the official
repo, but I think it doesn't hurt to have it in markdown format.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160715043111.29007-1-bobby.prani@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
block: ignore flush requests when storage is clean
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.
This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.
The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.
This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).
This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
tests: in IDE and AHCI tests perform DMA write before flushing
Due to changes in flush behaviour clean disks stopped generating
flush_to_disk events and IDE and AHCI tests that test flush commands
started to fail.
This change adds additional DMA writes to affected tests before sending
flush commands so that bdrv_flush actually generates flush_to_disk event.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-4-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
The following sequence of tests discovered a problem in IDE emulation:
1. Send DMA write to IDE device 0
2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block
layer using blkdebug script in tests/ide-test:test_retry_flush
When doing DMA request ide/core.c will set s->retry_unit to s->unit in
ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
After that ide_flush_cache runs and fails thanks to blkdebug.
ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
== s->unit. But s->retry_unit is still -1 after previous DMA completion
and flush does not use anything related to retry.
This patch restricts retry unit assertion only to ops that actually use
retry logic.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-3-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
ide: refactor retry_unit set and clear into separate function
Code to set and clear state associated with retry in moved into
ide_set_retry and ide_clear_retry to make adding retry setups easier.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-2-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
trace: Conditionally trace events based on their per-vCPU state
Events with the 'vcpu' property are conditionally emitted according to
their per-vCPU state. Other events are emitted normally based on their
global tracing state.
Note that the per-vCPU condition check applies to all tracing backends.
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eliminates a future compilation error when UI code includes the tracing
headers (indirectly pulling "disas/bfd.h" through "qom/cpu.h") and
GLib's i18n '_' macro.
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
trace: [bsd-user] Commandline arguments to control tracing
[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed. This type is also
returned by trace_opt_parse() and used in vl.c.
Also fixed coding style on for(;;) and else statement as suggested by
Eric Blake <eblake@redhat.com> since the patch modifies these lines or
close enough.
--Stefan]
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860252322.30668.18276041739086338328.stgit@fimbulvetr.bsc.es Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
trace: [linux-user] Commandline arguments to control tracing
[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed. This type is also
returned by trace_opt_parse() and used in vl.c.
--Stefan]
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860251784.30668.17339867835129075077.stgit@fimbulvetr.bsc.es Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Alex Williamson [Mon, 18 Jul 2016 16:55:17 +0000 (10:55 -0600)]
vfio/pci: Hide ARI capability
QEMU supports ARI on downstream ports and assigned devices may support
ARI in their extended capabilities. The endpoint ARI capability
specifies the next function, such that the OS doesn't need to walk
each possible function, however this next function is relative to the
host, not the guest. This leads to device discovery issues when we
combine separate functions into virtual multi-function packages in a
guest. For example, SR-IOV VFs are not enumerated by simply probing
the function address space, therefore the ARI next-function field is
zero. When we combine multiple VFs together as a multi-function
device in the guest, the guest OS identifies ARI is enabled, relies on
this next-function field, and stops looking for additional function
after the first is found.
Long term we should expose the ARI capability to the guest to enable
configurations with more than 8 functions per slot, but this requires
additional QEMU PCI infrastructure to manage the next-function field
for multiple, otherwise independent devices. In the short term,
hiding this capability allows equivalent functionality to what we
currently have on non-express chipsets.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Pranith Kumar [Mon, 27 Jun 2016 18:13:22 +0000 (14:13 -0400)]
.travis.yml: Disable IRC build status updates from forks
We want the travis build bot to post notifications on IRC only for the
master qemu repository and not the various forks/branches of
others. Currently there is no direct option to restrict the updates to
one repository. This is being worked upon by the developers and
tracked in https://github.com/travis-ci/travis-ci/issues/1094.
Until such time, we can use the workaround as posted in
ref. https://github.com/facebook/flow/pull/1822.
This basically creates an ecrypted string which decrypts to qemu IRC
channel only on "qemu/qemu" repo and not on the forks. This enables
the build bot to notify the IRC only for the main repo.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cao jin [Mon, 18 Jul 2016 04:05:49 +0000 (12:05 +0800)]
virtio-blk: dataplane cleanup
No need duplicate the judgment, there is one in function entry.
Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468814749-14510-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Renames look like this with git-diff(1) when diff.renames = true is set:
diff --git a/a b/b
similarity index 100%
rename from a
rename to b
This raises the "Does not appear to be a unified-diff format patch"
error because checkpatch.pl only considers a diff valid if it contains
at least one "@@" hunk.
This patch accepts renames and copies too so that checkpatch.pl exits
successfully when a diff only renames/copies files. The git diff
extended header format is described on the git-diff(1) man page.
Reported-by: Colin Lord <clord@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468576014-28788-1-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cao jin [Fri, 15 Jul 2016 10:28:44 +0000 (18:28 +0800)]
aio-posix: remove useless parameter
Parameter **errp of aio_context_setup() is useless, remove it
and clean up the related code.
Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468578524-23433-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Roman Pen [Wed, 13 Jul 2016 13:03:24 +0000 (15:03 +0200)]
linux-aio: prevent submitting more than MAX_EVENTS
Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us
with specified number of events. But kernel ring buffer allocation logic
is a bit tricky (ring buffer is page size aligned + some percpu allocation
are required) so eventually more than requested events number is allocated.
From a userspace side we have to follow the convention and should not try
to io_submit() more or logic, which consumes completed events, should be
changed accordingly. The pitfall is in the following sequence:
MAX_EVENTS = 128
io_setup(MAX_EVENTS)
io_submit(MAX_EVENTS)
io_submit(MAX_EVENTS)
/* now 256 events are in-flight */
io_getevents(MAX_EVENTS) = 128
/* we can handle only 128 events at once, to be sure
* that nothing is pended the io_getevents(MAX_EVENTS)
* call must be invoked once more or hang will happen. */
To prevent the hang or reiteration of io_getevents() call this patch
restricts the number of in-flights, which is now limited to MAX_EVENTS.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cao jin [Thu, 14 Jul 2016 13:10:43 +0000 (21:10 +0800)]
aio_ctx_check: follow CODING_STYLE
replace tab with spaces
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-id: 1468501843-14927-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Mon, 4 Jul 2016 16:33:20 +0000 (18:33 +0200)]
linux-aio: share one LinuxAioState within an AioContext
This has better performance because it executes fewer system calls
and does not use a bottom half per disk.
Originally proposed by Ming Lei.
[Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix
build error as reported by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]
Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
squash! linux-aio: share one LinuxAioState within an AioContext
We have only one flag for now - Empty Image flag. The patch fixes unused
bits specification and marks bit 1 as usused.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Peter Maydell [Mon, 18 Jul 2016 10:24:15 +0000 (11:24 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160718' into staging
ppc patch queue 2016-07-18
Here's what ought to be the final ppc pull request before the 2.7 hard
freeze. This set contains a rework of the DBDMA device for Mac
platforms, and some assorted cleanups and bugfixes.
# gpg: Signature made Mon 18 Jul 2016 05:35:27 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160718:
ppc: Yet another fix for the huge page support detection mechanism
target-ppc: fix left shift overflow in hpte_page_shift
ppc/mmu-hash64: Remove duplicated #include statement
ppc: abort if compat property contains an unknown value
spapr: Ensure CPU cores are added contiguously and removed in LIFO order
vfio/spapr: Remove stale ioctl() call
ppc: Fix support for odd MSR combinations
dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
dbdma: fix load_word/store_word value endianness
dbdma: fix endian of DBDMA_CMDPTR_LO during branch
dbdma: add per-channel debugging enabled via DEBUG_DBDMA_CHANMASK
dbdma: always define DBDMA_DPRINTF and enable debug with DEBUG_DBDMA
spapr: fix core unplug crash
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Jason Wang [Tue, 12 Jul 2016 08:28:23 +0000 (16:28 +0800)]
e1000e: fix building without CONFIG_VMXNET3_PCI
e1000e needs net_tx_pkt.o and net_rx_pkt.o too.
Cc: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Cc: Leonid Bloch <leonid.bloch@ravellosystems.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Thomas Huth [Fri, 15 Jul 2016 08:10:25 +0000 (10:10 +0200)]
ppc: Yet another fix for the huge page support detection mechanism
Commit 86b50f2e1bef ("Disable huge page support if it is not available
for main RAM") already made sure that huge page support is not announced
to the guest if the normal RAM of non-NUMA configurations is not backed
by a huge page filesystem. However, there is one more case that can go
wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
with huge page support (and only the memory of a DIMM is configured with
it). When QEMU is started with the following command line for example,
the Linux guest currently crashes because it is trying to use huge pages
on a memory region that does not support huge pages:
To fix this issue, we've got to make sure to disable huge page support,
too, when there is a NUMA node that is not using a memory backend with
huge page support.
Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Greg Kurz [Wed, 13 Jul 2016 10:00:17 +0000 (12:00 +0200)]
ppc: abort if compat property contains an unknown value
It is not possible to set the compat property to an unknown value with
powerpc_set_compat(). Something must have gone terribly wrong in QEMU,
if we detect an "Internal error" in powerpc_get_compat(). Let's abort then.
This patch also drops the "max_compat ? *max_compat : -1" construct. It is
useless since max_compat is dereferenced a few lines above.
Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr: Ensure CPU cores are added contiguously and removed in LIFO order
If CPU core addition or removal is allowed in random order leading to
holes in the core id range (and hence in the cpu_index range), migration
can fail as migration with holes in cpu_index range isn't yet handled
correctly.
Prevent this situation by enforcing the addition in contiguous order
and removal in LIFO order so that we never end up with holes in
cpu_index range.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
David Gibson [Tue, 12 Jul 2016 06:54:03 +0000 (16:54 +1000)]
vfio/spapr: Remove stale ioctl() call
This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an
earlier version of the code and has since been folded into
vfio_spapr_remove_window().
It wasn't caught because although the argument structure has been removed,
the libc function remove() means this didn't trigger a compile failure.
The ioctl() was also almost certain to fail silently and harmlessly with
the bogus argument, so this wasn't caught in testing.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
MacOS uses an architecturally illegal MSR combination that
seems nonetheless supported by 32-bit processors, which is
to have MSR[PR]=1 and one or more of MSR[DR/IR/EE]=0.
This adds support for it. To work properly we need to also
properly include support for PR=1,{I,D}R=0 to the MMU index
used by the qemu TLB.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:58 +0000 (19:08 +0100)]
dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
Otherwise MacOS 9 hangs upon shutdown.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:57 +0000 (19:08 +0100)]
dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
This fixes MacOS 9 whereby it continually flushes and polls the status bits
until they are set to indicate a successful flush.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:56 +0000 (19:08 +0100)]
dbdma: fix load_word/store_word value endianness
The values to read/write to/from physical memory are copied directly to the
physical address with no endian swapping required.
Also add some extra information to debugging output while we are here.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:55 +0000 (19:08 +0100)]
dbdma: fix endian of DBDMA_CMDPTR_LO during branch
The current DBDMA command is stored in little-endian format, so make sure
we convert it to match our CPU when updating the DBDMA_CMDPTR_LO register.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>