]> xenbits.xensource.com Git - qemu-upstream-4.3-testing.git/log
qemu-upstream-4.3-testing.git
9 years agortl8139: check IP Total Length field
Stefan Hajnoczi [Wed, 15 Jul 2015 17:17:02 +0000 (18:17 +0100)]
rtl8139: check IP Total Length field

The IP Total Length field includes the IP header and data.  Make sure it
is valid and does not exceed the Ethernet payload size.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9 years agortl8139: check IP Header Length field
Stefan Hajnoczi [Wed, 15 Jul 2015 17:17:01 +0000 (18:17 +0100)]
rtl8139: check IP Header Length field

The IP Header Length field was only checked in the IP checksum case, but
is used in other cases too.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9 years agortl8139: skip offload on short Ethernet/IP header
Stefan Hajnoczi [Wed, 15 Jul 2015 17:17:00 +0000 (18:17 +0100)]
rtl8139: skip offload on short Ethernet/IP header

Transmit offload features access Ethernet and IP headers the packet.  If
the packet is too short we must not attempt to access header fields:

  int proto = be16_to_cpu(*(uint16_t *)(saved_buffer + 12));
  ...
  eth_payload_data = saved_buffer + ETH_HLEN;
  ...
  ip = (ip_header*)eth_payload_data;
  if (IP_HEADER_VERSION(ip) != IP_HEADER_VERSION_4) {

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9 years agortl8139: drop tautologous if (ip) {...} statement
Stefan Hajnoczi [Wed, 15 Jul 2015 17:16:59 +0000 (18:16 +0100)]
rtl8139: drop tautologous if (ip) {...} statement

The previous patch stopped using the ip pointer as an indicator that the
IP header is present.  When we reach the if (ip) {...} statement we know
ip is always non-NULL.

Remove the if statement to reduce nesting.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9 years agortl8139: avoid nested ifs in IP header parsing
Stefan Hajnoczi [Wed, 15 Jul 2015 17:16:58 +0000 (18:16 +0100)]
rtl8139: avoid nested ifs in IP header parsing

Transmit offload needs to parse packet headers.  If header fields have
unexpected values the offload processing is skipped.

The code currently uses nested ifs because there is relatively little
input validation.  The next patches will add missing input validation
and a goto label is more appropriate to avoid deep if statement nesting.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9 years agofix off-by-one error in pci_piix3_xen_ide_unplug
James Harper [Thu, 30 Oct 2014 10:08:28 +0000 (10:08 +0000)]
fix off-by-one error in pci_piix3_xen_ide_unplug

Fix off-by-one error when unplugging disks, which would otherwise leave the last ATA disk plugged, with obvious consequences. Also rewrite loop to be more readable.

upstream-commmit-id: ab44b867744b2246302da73a8ef0ed26876f2981

Signed-off-by: James Harper <james.harper@ejbdigital.com.au>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoide: Clear DRQ after handling all expected accesses (CVE-2015-5154)
Kevin Wolf [Mon, 27 Jul 2015 03:42:53 +0000 (23:42 -0400)]
ide: Clear DRQ after handling all expected accesses (CVE-2015-5154)

This is additional hardening against an end_transfer_func that fails to
clear the DRQ status bit. The bit must be unset as soon as the PIO
transfer has completed, so it's better to do this in a central place
instead of duplicating the code in all commands (and forgetting it in
some).

upstream-commit-id: cb72cba83021fa42719e73a5249c12096a4d1cfc

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoide/atapi: Fix START STOP UNIT command completion (CVE-2015-5154)
Kevin Wolf [Mon, 27 Jul 2015 03:42:53 +0000 (23:42 -0400)]
ide/atapi: Fix START STOP UNIT command completion (CVE-2015-5154)

The command must be completed on all code paths. START STOP UNIT with
pwrcnd set should succeed without doing anything.

upstream-commit-id: 03441c3a4a42beb25460dd11592539030337d0f8

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoide: Check array bounds before writing to io_buffer (CVE-2015-5154)
Kevin Wolf [Mon, 27 Jul 2015 03:42:53 +0000 (23:42 -0400)]
ide: Check array bounds before writing to io_buffer (CVE-2015-5154)

If the end_transfer_func of a command is called because enough data has
been read or written for the current PIO transfer, and it fails to
correctly call the command completion functions, the DRQ bit in the
status register and s->end_transfer_func may remain set. This allows the
guest to access further bytes in s->io_buffer beyond s->data_end, and
eventually overflowing the io_buffer.

One case where this currently happens is emulation of the ATAPI command
START STOP UNIT.

This patch fixes the problem by adding explicit array bounds checks
before accessing the buffer instead of relying on end_transfer_func to
function correctly.

upstream-commit-id: d2ff85854512574e7209f295e87b0835d5b032c6

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agopcnet: force the buffer access to be in bounds during tx
Petr Matousek [Sun, 24 May 2015 08:53:44 +0000 (10:53 +0200)]
pcnet: force the buffer access to be in bounds during tx

4096 is the maximum length per TMD and it is also currently the size of
the relay buffer pcnet driver uses for sending the packet data to QEMU
for further processing. With packet spanning multiple TMDs it can
happen that the overall packet size will be bigger than sizeof(buffer),
which results in memory corruption.

Fix this by only allowing to queue maximum sizeof(buffer) bytes.

This is CVE-2015-3209.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Matt Tait <matttait@google.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9 years agopcnet: fix Negative array index read
Gonglei [Wed, 10 Jun 2015 11:49:18 +0000 (11:49 +0000)]
pcnet: fix Negative array index read

s->xmit_pos maybe assigned to a negative value (-1),
but in this branch variable s->xmit_pos as an index to
array s->buffer. Let's add a check for s->xmit_pos.

upstream-commit-id: 7b50d00911ddd6d56a766ac5671e47304c20a21b

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9 years agoxen/pt: unknown PCI config space fields should be read-only
Jan Beulich [Tue, 2 Jun 2015 16:08:08 +0000 (16:08 +0000)]
xen/pt: unknown PCI config space fields should be read-only

... by default. Add a per-device "permissive" mode similar to pciback's
to allow restoring previous behavior (and hence break security again,
i.e. should be used only for trusted guests).

This is part of XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>)
9 years agoxen/pt: add a few PCI config space field descriptions
Jan Beulich [Tue, 2 Jun 2015 16:08:07 +0000 (16:08 +0000)]
xen/pt: add a few PCI config space field descriptions

Since the next patch will turn all not explicitly described fields
read-only by default, those fields that have guest writable bits need
to be given explicit descriptors.

This is a preparatory patch for XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agoxen/pt: mark reserved bits in PCI config space fields
Jan Beulich [Tue, 2 Jun 2015 16:08:07 +0000 (16:08 +0000)]
xen/pt: mark reserved bits in PCI config space fields

The adjustments are solely to make the subsequent patches work right
(and hence make the patch set consistent), namely if permissive mode
(introduced by the last patch) gets used (as both reserved registers
and reserved fields must be similarly protected from guest access in
default mode, but the guest should be allowed access to them in
permissive mode).

This is a preparatory patch for XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agoxen/pt: mark all PCIe capability bits read-only
Jan Beulich [Tue, 2 Jun 2015 16:08:07 +0000 (16:08 +0000)]
xen/pt: mark all PCIe capability bits read-only

xen_pt_emu_reg_pcie[]'s PCI_EXP_DEVCAP needs to cover all bits as read-
only to avoid unintended write-back (just a precaution, the field ought
to be read-only in hardware).

This is a preparatory patch for XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxen/pt: split out calculation of throughable mask in PCI config space handling
Jan Beulich [Tue, 2 Jun 2015 16:08:07 +0000 (16:08 +0000)]
xen/pt: split out calculation of throughable mask in PCI config space handling

This is just to avoid having to adjust that calculation later in
multiple places.

Note that including ->ro_mask in get_throughable_mask()'s calculation
is only an apparent (i.e. benign) behavioral change: For r/o fields it
doesn't matter > whether they get passed through - either the same flag
is also set in emu_mask (then there's no change at all) or the field is
r/o in hardware (and hence a write won't change it anyway).

This is a preparatory patch for XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
9 years agoxen/pt: correctly handle PM status bit
Jan Beulich [Tue, 2 Jun 2015 16:08:07 +0000 (16:08 +0000)]
xen/pt: correctly handle PM status bit

xen_pt_pmcsr_reg_write() needs an adjustment to deal with the RW1C
nature of the not passed through bit 15 (PCI_PM_CTRL_PME_STATUS).

This is a preparatory patch for XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxen/pt: consolidate PM capability emu_mask
Jan Beulich [Tue, 2 Jun 2015 16:08:07 +0000 (16:08 +0000)]
xen/pt: consolidate PM capability emu_mask

There's no point in xen_pt_pmcsr_reg_{read,write}() each ORing
PCI_PM_CTRL_STATE_MASK and PCI_PM_CTRL_NO_SOFT_RESET into a local
emu_mask variable - we can have the same effect by setting the field
descriptor's emu_mask member suitably right away. Note that
xen_pt_pmcsr_reg_write() is being retained in order to allow later
patches to be less intrusive.

This is a preparatory patch for XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/MSI: don't open-code pass-through of enable bit modifications
Jan Beulich [Tue, 2 Jun 2015 16:08:06 +0000 (16:08 +0000)]
xen/MSI: don't open-code pass-through of enable bit modifications

Without this the actual XSA-131 fix would cause the enable bit to not
get set anymore (due to the write back getting suppressed there based
on the OR of emu_mask, ro_mask, and res_mask).

Note that the fiddling with the enable bit shouldn't really be done by
qemu, but making this work right (via libxc and the hypervisor) will
require more extensive changes, which can be postponed until after the
security issue got addressed.

This is a preparatory patch for XSA-131.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxen/MSI-X: limit error messages
Jan Beulich [Tue, 2 Jun 2015 16:08:06 +0000 (16:08 +0000)]
xen/MSI-X: limit error messages

Limit error messages resulting from bad guest behavior to avoid allowing
the guest to cause the control domain's disk to fill.

The first message in pci_msix_write() can simply be deleted, as this
is indeed bad guest behavior, but such out of bounds writes don't
really need to be logged.

The second one is more problematic, as there guest behavior may only
appear to be wrong: For one, the old logic didn't take the mask-all bit
into account. And then this shouldn't depend on host device state (i.e.
the host may have masked the entry without the guest having done so).
Plus these writes shouldn't be dropped even when an entry is unmasked.
Instead, if they can't be made take effect right away, they should take
effect on the next unmasking or enabling operation - the specification
explicitly describes such caching behavior. Until we can validly drop
the message (implementing such caching/latching behavior), issue the
message just once per MSI-X table entry.

Note that the log message in pci_msix_read() similar to the one being
removed here is not an issue: "addr" being of unsigned type, and the
maximum size of the MSI-X table being 32k, entry_nr simply can't be
negative and hence the conditonal guarding issuing of the message will
never be true.

This is XSA-130.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxen: don't allow guest to control MSI mask register
Jan Beulich [Tue, 2 Jun 2015 16:08:06 +0000 (16:08 +0000)]
xen: don't allow guest to control MSI mask register

It's being used by the hypervisor. For now simply mimic a device not
capable of masking, and fully emulate any accesses a guest may issue
nevertheless as simple reads/writes without side effects.

This is XSA-129.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxen: properly gate host writes of modified PCI CFG contents
Jan Beulich [Tue, 2 Jun 2015 16:08:06 +0000 (16:08 +0000)]
xen: properly gate host writes of modified PCI CFG contents

The old logic didn't work as intended when an access spanned multiple
fields (for example a 32-bit access to the location of the MSI Message
Data field with the high 16 bits not being covered by any known field).
Remove it and derive which fields not to write to from the accessed
fields' emulation masks: When they're all ones, there's no point in
doing any host write.

This fixes a secondary issue at once: We obviously shouldn't make any
host write attempt when already the host read failed.

This is XSA-128.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agofdc: force the fifo access to be in bounds of the allocated buffer
Petr Matousek [Wed, 6 May 2015 07:48:59 +0000 (09:48 +0200)]
fdc: force the fifo access to be in bounds of the allocated buffer

During processing of certain commands such as FD_CMD_READ_ID and
FD_CMD_DRIVE_SPECIFICATION_COMMAND the fifo memory access could
get out of bounds leading to memory corruption with values coming
from the guest.

Fix this by making sure that the index is always bounded by the
allocated memory.

This is CVE-2015-3456.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
10 years agoxen: limit guest control of PCI command register
Jan Beulich [Tue, 31 Mar 2015 14:14:36 +0000 (14:14 +0000)]
xen: limit guest control of PCI command register

Otherwise the guest can abuse that control to cause e.g. PCIe
Unsupported Request responses (by disabling memory and/or I/O decoding
and subsequently causing [CPU side] accesses to the respective address
ranges), which (depending on system configuration) may be fatal to the
host.

This is CVE-2015-2756 / XSA-126.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agodmg: sanitize chunk length and sectorcount (CVE-2014-0145)
Stefan Hajnoczi [Thu, 5 Mar 2015 11:27:26 +0000 (11:27 +0000)]
dmg: sanitize chunk length and sectorcount (CVE-2014-0145)

Chunk length and sectorcount are used for decompression buffers as well
as the bdrv_pread() count argument.  Ensure that they have reasonable
values so neither memory allocation nor conversion from uint64_t to int
will cause problems.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agodmg: prevent chunk buffer overflow (CVE-2014-0145)
Stefan Hajnoczi [Thu, 5 Mar 2015 11:20:06 +0000 (11:20 +0000)]
dmg: prevent chunk buffer overflow (CVE-2014-0145)

Both compressed and uncompressed I/O is buffered.  dmg_open() calculates
the maximum buffer size needed from the metadata in the image file.

There is currently a buffer overflow since ->lengths[] is accounted
against the maximum compressed buffer size but actually uses the
uncompressed buffer:

  switch (s->types[chunk]) {
  case 1: /* copy */
      ret = bdrv_pread(bs->file, s->offsets[chunk],
                       s->uncompressed_chunk, s->lengths[chunk]);

We must account against the maximum uncompressed buffer size for type=1
chunks.

This patch fixes the maximum buffer size calculation to take into
account the chunk type.  It is critical that we update the correct
maximum since there are two buffers ->compressed_chunk and
->uncompressed_chunk.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agobochs: Use unsigned variables for offsets and sizes (CVE-2014-0147)
Kevin Wolf [Thu, 5 Mar 2015 11:11:27 +0000 (11:11 +0000)]
bochs: Use unsigned variables for offsets and sizes (CVE-2014-0147)

Gets us rid of integer overflows resulting in negative sizes which
aren't correctly checked.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoqcow1: Validate image size (CVE-2014-0223)
Kevin Wolf [Thu, 5 Mar 2015 11:02:39 +0000 (11:02 +0000)]
qcow1: Validate image size (CVE-2014-0223)

A huge image size could cause s->l1_size to overflow. Make sure that
images never require a L1 table larger than what fits in s->l1_size.

This cannot only cause unbounded allocations, but also the allocation of
a too small L1 table, resulting in out-of-bounds array accesses (both
reads and writes).

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoqcow1: Validate L2 table size (CVE-2014-0222)
Kevin Wolf [Thu, 5 Mar 2015 10:59:35 +0000 (10:59 +0000)]
qcow1: Validate L2 table size (CVE-2014-0222)

Too large L2 table sizes cause unbounded allocations. Images actually
created by qemu-img only have 512 byte or 4k L2 tables.

To keep things consistent with cluster sizes, allow ranges between 512
bytes and 64k (in fact, down to 1 entry = 8 bytes is technically
working, but L2 table sizes smaller than a cluster don't make a lot of
sense).

This also means that the number of bytes on the virtual disk that are
described by the same L2 table is limited to at most 8k * 64k or 2^29,
preventively avoiding any integer overflows.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoqcow2: Don't rely on free_cluster_index in alloc_refcount_block() (CVE-2014-0147)
Kevin Wolf [Thu, 5 Mar 2015 10:52:50 +0000 (10:52 +0000)]
qcow2: Don't rely on free_cluster_index in alloc_refcount_block() (CVE-2014-0147)

free_cluster_index is only correct if update_refcount() was called from
an allocation function, and even there it's brittle because it's used to
protect unfinished allocations which still have a refcount of 0 - if it
moves in the wrong place, the unfinished allocation can be corrupted.

So not using it any more seems to be a good idea. Instead, use the
first requested cluster to do the calculations. Return -EAGAIN if
unfinished allocations could become invalid and let the caller restart
its search for some free clusters.

The context of creating a snapsnot is one situation where
update_refcount() is called outside of a cluster allocation. For this
case, the change fixes a buffer overflow if a cluster is referenced in
an L2 table that cannot be represented by an existing refcount block.
(new_table[refcount_table_index] was out of bounds)

[Bump the qemu-iotests 026 refblock_alloc.write leak count from 10 to
11.
--Stefan]

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoqcow2: Fix NULL dereference in qcow2_open() error path (CVE-2014-0146)
Kevin Wolf [Thu, 5 Mar 2015 10:42:00 +0000 (10:42 +0000)]
qcow2: Fix NULL dereference in qcow2_open() error path (CVE-2014-0146)

The qcow2 code assumes that s->snapshots is non-NULL if s->nb_snapshots
!= 0. By having the initialisation of both fields separated in
qcow2_open(), any error occuring in between would cause the error path
to dereference NULL in qcow2_free_snapshots() if the image had any
snapshots.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoqcow2: Fix L1 allocation size in qcow2_snapshot_load_tmp() (CVE-2014-0145)
Kevin Wolf [Thu, 5 Mar 2015 10:38:10 +0000 (10:38 +0000)]
qcow2: Fix L1 allocation size in qcow2_snapshot_load_tmp() (CVE-2014-0145)

For the L1 table to loaded for an internal snapshot, the code allocated
only enough memory to hold the currently active L1 table. If the
snapshot's L1 table is actually larger than the current one, this leads
to a buffer overflow.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agocirrus: don't overflow CirrusVGAState->cirrus_bltbuf
Gerd Hoffmann [Wed, 4 Mar 2015 18:02:44 +0000 (18:02 +0000)]
cirrus: don't overflow CirrusVGAState->cirrus_bltbuf

This is CVE-2014-8106.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agocirrus: fix blit region check
Gerd Hoffmann [Wed, 4 Mar 2015 18:02:43 +0000 (18:02 +0000)]
cirrus: fix blit region check

Issues:
 * Doesn't check pitches correctly in case it is negative.
 * Doesn't check width at all.

Turn macro into functions while being at it, also factor out the check
for one region which we then can simply call twice for src + dst.

This is CVE-2014-8106.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovnc: sanitize bits_per_pixel from the client
Petr Matousek [Mon, 27 Oct 2014 11:41:44 +0000 (12:41 +0100)]
vnc: sanitize bits_per_pixel from the client

bits_per_pixel that are less than 8 could result in accessing
non-initialized buffers later in the code due to the expectation
that bytes_per_pixel value that is used to initialize these buffers is
never zero.

To fix this check that bits_per_pixel from the client is one of the
values that the rfb protocol specification allows.

This is CVE-2014-7815.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
[ kraxel: apply codestyle fix ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
10 years agovmware-vga: CVE-2014-3689: turn off hw accel
Gerd Hoffmann [Wed, 4 Mar 2015 17:55:51 +0000 (17:55 +0000)]
vmware-vga: CVE-2014-3689: turn off hw accel

Quick & easy stopgap for CVE-2014-3689:  We just compile out the
hardware acceleration functions which lack sanity checks.  Thankfully
we have capability bits for them (SVGA_CAP_RECT_COPY and
SVGA_CAP_RECT_FILL), so guests should deal just fine, in theory.

Subsequent patches will add the missing checks and re-enable the
hardware acceleration emulation.

Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Don Koch <dkoch@verizon.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoslirp: udp: fix NULL pointer dereference because of uninitialized socket
Petr Matousek [Thu, 18 Sep 2014 06:35:37 +0000 (08:35 +0200)]
slirp: udp: fix NULL pointer dereference because of uninitialized socket

When guest sends udp packet with source port and source addr 0,
uninitialized socket is picked up when looking for matching and already
created udp sockets, and later passed to sosendto() where NULL pointer
dereference is hit during so->slirp->vnetwork_mask.s_addr access.

Fix this by checking that the socket is not just a socket stub.

This is CVE-2014-3640.

Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Xavier Mehrenberger <xavier.mehrenberger@airbus.com>
Reported-by: Stephane Duverger <stephane.duverger@eads.net>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-id: 20140918063537.GX9321@dhcp-25-225.brq.redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10 years agospice: make sure we don't overflow ssd->buf
Gerd Hoffmann [Wed, 3 Sep 2014 13:50:08 +0000 (15:50 +0200)]
spice: make sure we don't overflow ssd->buf

Related spice-only bug.  We have a fixed 16 MB buffer here, being
presented to the spice-server as qxl video memory in case spice is
used with a non-qxl card.  It's also used with qxl in vga mode.

When using display resolutions requiring more than 16 MB of memory we
are going to overflow that buffer.  In theory the guest can write,
indirectly via spice-server.  The spice-server clears the memory after
setting a new video mode though, triggering a segfault in the overflow
case, so qemu crashes before the guest has a chance to do something
evil.

Fix that by switching to dynamic allocation for the buffer.

CVE-2014-3615

Cc: qemu-stable@nongnu.org
Cc: secalert@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Conflicts:
ui/spice-display.c

10 years agovbe: rework sanity checks
Gerd Hoffmann [Wed, 4 Mar 2015 17:49:25 +0000 (17:49 +0000)]
vbe: rework sanity checks

Plug a bunch of holes in the bochs dispi interface parameter checking.
Add a function doing verification on all registers.  Call that
unconditionally on every register write.  That way we should catch
everything, even changing one register affecting the valid range of
another register.

Some of the holes have been added by commit
e9c6149f6ae6873f14a12eea554925b6aa4c4dec.  Before that commit the
maximum possible framebuffer (VBE_DISPI_MAX_XRES * VBE_DISPI_MAX_YRES *
32 bpp) has been smaller than the qemu vga memory (8MB) and the checking
for VBE_DISPI_MAX_XRES + VBE_DISPI_MAX_YRES + VBE_DISPI_MAX_BPP was ok.

Some of the holes have been there forever, such as
VBE_DISPI_INDEX_X_OFFSET and VBE_DISPI_INDEX_Y_OFFSET register writes
lacking any verification.

Security impact:

(1) Guest can make the ui (gtk/vnc/...) use memory rages outside the vga
frame buffer as source  ->  host memory leak.  Memory isn't leaked to
the guest but to the vnc client though.

(2) Qemu will segfault in case the memory range happens to include
unmapped areas  ->  Guest can DoS itself.

The guest can not modify host memory, so I don't think this can be used
by the guest to escape.

CVE-2014-3615

Cc: qemu-stable@nongnu.org
Cc: secalert@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Conflicts:
hw/display/vga.c

10 years agousb: fix up post load checks
Michael S. Tsirkin [Tue, 13 May 2014 09:33:16 +0000 (12:33 +0300)]
usb: fix up post load checks

Correct post load checks:
1. dev->setup_len == sizeof(dev->data_buf)
    seems fine, no need to fail migration
2. When state is DATA, passing index > len
   will cause memcpy with negative length,
   resulting in heap overflow

First of the issues was reported by dgilbert.

Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
10 years agoide: Correct improper smart self test counter reset in ide core.
Benoît Canet [Wed, 4 Mar 2015 17:17:12 +0000 (17:17 +0000)]
ide: Correct improper smart self test counter reset in ide core.

The SMART self test counter was incorrectly being reset to zero,
not 1. This had the effect that on every 21st SMART EXECUTE OFFLINE:
 * We would write off the beginning of a dynamically allocated buffer
 * We forgot the SMART history
Fix this.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Message-id: 1397336390-24664-1-git-send-email-benoit.canet@irqsave.net
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Cc: qemu-stable@nongnu.org
Acked-by: Kevin Wolf <kwolf@redhat.com>
[PMM: tweaked commit message as per suggestions from Markus]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovirtio: validate config_len on load
Michael S. Tsirkin [Wed, 4 Mar 2015 17:14:22 +0000 (17:14 +0000)]
virtio: validate config_len on load

Malformed input can have config_len in migration stream
exceed the array size allocated on destination, the
result will be heap overflow.

To fix, that config_len matches on both sides.

CVE-2014-0182

Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovirtio-net: fix guest-triggerable buffer overrun
Michael S. Tsirkin [Wed, 4 Mar 2015 17:12:57 +0000 (17:12 +0000)]
virtio-net: fix guest-triggerable buffer overrun

When VM guest programs multicast addresses for
a virtio net card, it supplies a 32 bit
entries counter for the number of addresses.
These addresses are read into tail portion of
a fixed macs array which has size MAC_TABLE_ENTRIES,
at offset equal to in_use.

To avoid overflow of this array by guest, qemu attempts
to test the size as follows:
-    if (in_use + mac_data.entries <= MAC_TABLE_ENTRIES) {

however, as mac_data.entries is uint32_t, this sum
can overflow, e.g. if in_use is 1 and mac_data.entries
is 0xffffffff then in_use + mac_data.entries will be 0.

Qemu will then read guest supplied buffer into this
memory, overflowing buffer on heap.

CVE-2014-0150

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1397218574-25058-1-git-send-email-mst@redhat.com
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovirtio: avoid buffer overrun on incoming migration
Michael Roth [Wed, 4 Mar 2015 16:57:44 +0000 (16:57 +0000)]
virtio: avoid buffer overrun on incoming migration

CVE-2013-6399

vdev->queue_sel is read from the wire, and later used in the
emulation code as an index into vdev->vq[]. If the value of
vdev->queue_sel exceeds the length of vdev->vq[], currently
allocated to be VIRTIO_PCI_QUEUE_MAX elements, subsequent PIO
operations such as VIRTIO_PCI_QUEUE_PFN can be used to overrun
the buffer with arbitrary data originating from the source.

Fix this by failing migration if the value from the wire exceeds
VIRTIO_PCI_QUEUE_MAX.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
10 years agovirtio-scsi: fix buffer overrun on invalid state load
Michael S. Tsirkin [Wed, 4 Mar 2015 16:50:31 +0000 (16:50 +0000)]
virtio-scsi: fix buffer overrun on invalid state load

CVE-2013-4542

hw/scsi/scsi-bus.c invokes load_request.

 virtio_scsi_load_request does:
    qemu_get_buffer(f, (unsigned char *)&req->elem, sizeof(req->elem));

this probably can make elem invalid, for example,
make in_num or out_num huge, then:

    virtio_scsi_parse_req(s, vs->cmd_vqs[n], req);

will do:

    if (req->elem.out_num > 1) {
        qemu_sgl_init_external(req, &req->elem.out_sg[1],
                               &req->elem.out_addr[1],
                               req->elem.out_num - 1);
    } else {
        qemu_sgl_init_external(req, &req->elem.in_sg[1],
                               &req->elem.in_addr[1],
                               req->elem.in_num - 1);
    }

and this will access out of array bounds.

Note: this adds security checks within assert calls since
SCSIBusInfo's load_request cannot fail.
For now simply disable builds with NDEBUG - there seems
to be little value in supporting these.

Cc: Andreas Färber <afaerber@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
10 years agousb: sanity check setup_index+setup_len in post_load
Michael S. Tsirkin [Thu, 3 Apr 2014 16:52:25 +0000 (19:52 +0300)]
usb: sanity check setup_index+setup_len in post_load

CVE-2013-4541

s->setup_len and s->setup_index are fed into usb_packet_copy as
size/offset into s->data_buf, it's possible for invalid state to exploit
this to load arbitrary data.

setup_len and setup_index should be checked to make sure
they are not negative.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovirtio: validate num_sg when mapping
Michael S. Tsirkin [Wed, 4 Mar 2015 16:43:53 +0000 (16:43 +0000)]
virtio: validate num_sg when mapping

CVE-2013-4535
CVE-2013-4536

Both virtio-block and virtio-serial read,
VirtQueueElements are read in as buffers, and passed to
virtqueue_map_sg(), where num_sg is taken from the wire and can force
writes to indicies beyond VIRTQUEUE_MAX_SIZE.

To fix, validate num_sg.

Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agohpet: fix buffer overrun on invalid state load
Michael S. Tsirkin [Wed, 4 Mar 2015 16:39:58 +0000 (16:39 +0000)]
hpet: fix buffer overrun on invalid state load

CVE-2013-4527 hw/timer/hpet.c buffer overrun

hpet is a VARRAY with a uint8 size but static array of 32

To fix, make sure num_timers is valid using VMSTATE_VALID hook.

Reported-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agoscsi: Allocate SCSITargetReq r->buf dynamically [CVE-2013-4344]
Asias He [Wed, 4 Mar 2015 16:22:18 +0000 (16:22 +0000)]
scsi: Allocate SCSITargetReq r->buf dynamically [CVE-2013-4344]

r->buf is hardcoded to 2056 which is (256 + 1) * 8, allowing 256 luns at
most. If more than 256 luns are specified by user, we have buffer
overflow in scsi_target_emulate_report_luns.

To fix, we allocate the buffer dynamically.

Signed-off-by: Asias He <asias@redhat.com>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovirtio: out-of-bounds buffer write on invalid state load
Michael S. Tsirkin [Wed, 4 Mar 2015 16:14:08 +0000 (16:14 +0000)]
virtio: out-of-bounds buffer write on invalid state load

CVE-2013-4151 QEMU 1.0 out-of-bounds buffer write in
virtio_load@hw/virtio/virtio.c

So we have this code since way back when:

    num = qemu_get_be32(f);

    for (i = 0; i < num; i++) {
        vdev->vq[i].vring.num = qemu_get_be32(f);

array of vqs has size VIRTIO_PCI_QUEUE_MAX, so
on invalid input this will write beyond end of buffer.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovirtio-net: out-of-bounds buffer write on load
Michael S. Tsirkin [Wed, 4 Mar 2015 16:07:30 +0000 (16:07 +0000)]
virtio-net: out-of-bounds buffer write on load

CVE-2013-4149 QEMU 1.3.0 out-of-bounds buffer write in
virtio_net_load()@hw/net/virtio-net.c

>         } else if (n->mac_table.in_use) {
>             uint8_t *buf = g_malloc0(n->mac_table.in_use);

We are allocating buffer of size n->mac_table.in_use

>             qemu_get_buffer(f, buf, n->mac_table.in_use * ETH_ALEN);

and read to the n->mac_table.in_use size buffer n->mac_table.in_use *
ETH_ALEN bytes, corrupting memory.

If adversary controls state then memory written there is controlled
by adversary.

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agovirtio-net: fix buffer overflow on invalid state load
Michael S. Tsirkin [Thu, 3 Apr 2014 16:50:39 +0000 (19:50 +0300)]
virtio-net: fix buffer overflow on invalid state load

CVE-2013-4148 QEMU 1.0 integer conversion in
virtio_net_load()@hw/net/virtio-net.c

Deals with loading a corrupted savevm image.

>         n->mac_table.in_use = qemu_get_be32(f);

in_use is int so it can get negative when assigned 32bit unsigned value.

>         /* MAC_TABLE_ENTRIES may be different from the saved image */
>         if (n->mac_table.in_use <= MAC_TABLE_ENTRIES) {

passing this check ^^^

>             qemu_get_buffer(f, n->mac_table.macs,
>                             n->mac_table.in_use * ETH_ALEN);

with good in_use value, "n->mac_table.in_use * ETH_ALEN" can get
positive and bigger than mac_table.macs. For example 0x81000000
satisfies this condition when ETH_ALEN is 6.

Fix it by making the value unsigned.
For consistency, change first_multi as well.

Note: all call sites were audited to confirm that
making them unsigned didn't cause any issues:
it turns out we actually never do math on them,
so it's easy to validate because both values are
always <= MAC_TABLE_ENTRIES.

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Conflicts:
include/hw/virtio/virtio-net.h

10 years agoblock/curl: only restrict protocols with libcurl>=7.19.4
Stefan Hajnoczi [Wed, 13 Feb 2013 08:25:34 +0000 (09:25 +0100)]
block/curl: only restrict protocols with libcurl>=7.19.4

The curl_easy_setopt(state->curl, CURLOPT_PROTOCOLS, ...) interface was
introduced in libcurl 7.19.4.  Therefore we cannot protect against
CVE-2013-0249 when linking against an older libcurl.

This fixes the build failure introduced by
fb6d1bbd246c7a57ef53d3847ef225cd1349d602.

Reported-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Andreas Färber <andreas.faeber@web.de>
Message-id: 1360743934-8337-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
10 years agoblock/curl: disable extra protocols to prevent CVE-2013-0249
Stefan Hajnoczi [Fri, 8 Feb 2013 07:49:10 +0000 (08:49 +0100)]
block/curl: disable extra protocols to prevent CVE-2013-0249

There is a buffer overflow in libcurl POP3/SMTP/IMAP.  The workaround is
simple: disable extra protocols so that they cannot be exploited.  Full
details here:

  http://curl.haxx.se/docs/adv_20130206.html

QEMU only cares about HTTP, HTTPS, FTP, FTPS, and TFTP.  I have tested
that this fix prevents the exploit on my host with
libcurl-7.27.0-5.fc18.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
10 years agoxen_disk: fix unmapping of persistent grants qemu-xen-4.3.4 qemu-xen-4.3.4-rc1
Roger Pau Monne [Mon, 17 Nov 2014 15:00:52 +0000 (15:00 +0000)]
xen_disk: fix unmapping of persistent grants

This patch fixes two issues with persistent grants and the disk PV backend
(Qdisk):

 - Keep track of memory regions where persistent grants have been mapped
   since we need to unmap them as a whole. It is not possible to unmap a
   single grant if it has been batch-mapped. A new check has also been added
   to make sure persistent grants are only used if the whole mapped region
   can be persistently mapped in the batch_maps case.
 - Unmap persistent grants before switching to the closed state, so the
   frontend can also free them.

upstream-commit-id: 2f01dfacb56bc7a0d4639adc9dff9aae131e6216

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
11 years agoxen_disk: mark ioreq as mapped before unmapping in error case qemu-xen-4.3.1 qemu-xen-4.3.1-rc2 qemu-xen-4.3.2 qemu-xen-4.3.2-rc1 qemu-xen-4.3.3
Matthew Daley [Thu, 10 Oct 2013 14:15:47 +0000 (14:15 +0000)]
xen_disk: mark ioreq as mapped before unmapping in error case

Commit 4472beae modified the semantics of ioreq_{un,}map so that they are
idempotent if called when they're not needed (ie., twice in a row). However,
it neglected to handle the case where batch mapping is not being used (the
default), and one of the grants fails to map. In this case, ioreq_unmap will
be called to unwind and unmap any mappings already performed, but ioreq_unmap
simply returns due to the aforementioned change (the ioreq has not already
been marked as mapped).

The frontend user can therefore force xen_disk to leak grant mappings, a
per-domain limited resource.

Fix by marking the ioreq as mapped before calling ioreq_unmap in this
situation.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agoqga: set umask 0077 when daemonizing (CVE-2013-2007)
Laszlo Ersek [Tue, 1 Oct 2013 15:13:33 +0000 (15:13 +0000)]
qga: set umask 0077 when daemonizing (CVE-2013-2007)

The qemu guest agent creates a bunch of files with insecure permissions
when started in daemon mode. For example:

  -rw-rw-rw- 1 root root /var/log/qemu-ga.log
  -rw-rw-rw- 1 root root /var/run/qga.state
  -rw-rw-rw- 1 root root /var/log/qga-fsfreeze-hook.log

In addition, at least all files created with the "guest-file-open" QMP
command, and all files created with shell output redirection (or
otherwise) by utilities invoked by the fsfreeze hook script are affected.

For now mask all file mode bits for "group" and "others" in
become_daemon().

Temporarily, for compatibility reasons, stick with the 0666 file-mode in
case of files newly created by the "guest-file-open" QMP call. Do so
without changing the umask temporarily.

upstream-commit-id: c689b4f1bac352dcfd6ecb9a1d45337de0f1de67

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agoAdd -f FMT / --format FMT arg to qemu-nbd
Daniel P. Berrange [Tue, 1 Oct 2013 14:59:14 +0000 (14:59 +0000)]
Add -f FMT / --format FMT arg to qemu-nbd

Currently the qemu-nbd program will auto-detect the format of
any disk it is given. This behaviour is known to be insecure.
For example, if qemu-nbd initially exposes a 'raw' file to an
unprivileged app, and that app runs

   'qemu-img create -f qcow2 -o backing_file=/etc/shadow /dev/nbd0'

then the next time the app is started, the qemu-nbd will now
detect it as a 'qcow2' file and expose /etc/shadow to the
unprivileged app.

The only way to avoid this is to explicitly tell qemu-nbd what
disk format to use on the command line, completely disabling
auto-detection. This patch adds a '-f' / '--format' arg for
this purpose, mirroring what is already available via qemu-img
and qemu commands.

  qemu-nbd --format raw -p 9000 evil.img

will now always use raw, regardless of what format 'evil.img'
looks like it contains

upstream-commit-id: e6b636779b51c97e67694be740ee972c52460c59

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
[Use errx, not err. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agoxen_disk: support "direct-io-safe" backend option qemu-xen-4.3.0
Stefano Stabellini [Fri, 28 Jun 2013 11:23:16 +0000 (11:23 +0000)]
xen_disk: support "direct-io-safe" backend option

Support backend option "direct-io-safe".  This is documented as
follows in the Xen backend specification:

 * direct-io-safe
 *      Values:         0/1 (boolean)
 *      Default Value:  0
 *
 *      The underlying storage is not affected by the direct IO memory
 *      lifetime bug.  See:
 *        http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
 *
 *      Therefore this option gives the backend permission to use
 *      O_DIRECT, notwithstanding that bug.
 *
 *      That is, if this option is enabled, use of O_DIRECT is safe,
 *      in circumstances where we would normally have avoided it as a
 *      workaround for that bug.  This option is not relevant for all
 *      backends, and even not necessarily supported for those for
 *      which it is relevant.  A backend which knows that it is not
 *      affected by the bug can ignore this option.
 *
 *      This option doesn't require a backend to use O_DIRECT, so it
 *      should not be used to try to control the caching behaviour.

Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the
default flags passed to the qemu block layer.

The original proposal for a "cache" backend option has been dropped
because it was believed too wide, especially considering that at the
moment the backend doesn't have a way to tell the toolstack that it is
capable of supporting it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
11 years agoMerge remote branch 'perard/cpu-hotplug-port-v2' into xen-staging-master-7
Stefano Stabellini [Tue, 25 Jun 2013 11:34:24 +0000 (11:34 +0000)]
Merge remote branch 'perard/cpu-hotplug-port-v2' into xen-staging-master-7

11 years agoxen: Implement hot_add_cpu hook.
Anthony PERARD [Mon, 10 Jun 2013 14:29:31 +0000 (15:29 +0100)]
xen: Implement hot_add_cpu hook.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agoxen: Fix vcpus initialisation.
Anthony PERARD [Fri, 14 Jun 2013 13:43:05 +0000 (14:43 +0100)]
xen: Fix vcpus initialisation.

Each vcpu needs a call to xc_evtchn_bind_interdomain in QEMU, even those
that are unplug at QEMU initialisation.

Without this patch, any hot-plugged CPU will be "Stuck ??" when Linux
will try to use them.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agoQMP: Add cpu-add command
Igor Mammedov [Tue, 30 Apr 2013 13:41:25 +0000 (15:41 +0200)]
QMP: Add cpu-add command

Adds "cpu-add id=xxx" QMP command.

cpu-add's "id" argument is a CPU number in a range [0..max-cpus)

Example QMP command:
 -> { "execute": "cpu-add", "arguments": { "id": 2 } }
 <- { "return": {} }

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
(cherry picked from QEMU commit 69ca3ea5e192251f27510554611bcff6f036a00b)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agoAdd hot_add_cpu hook to QEMUMachine
Igor Mammedov [Tue, 30 Apr 2013 13:41:24 +0000 (15:41 +0200)]
Add hot_add_cpu hook to QEMUMachine

Hook should be set by machines that implement CPU hot-add
via cpu-add QMP command.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
(cherry picked from QEMU commit b4fc7b4326112538e0dbdc7fd019652ba8cc3281)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agoacpi_piix4: Add infrastructure to send CPU hot-plug GPE to guest
Igor Mammedov [Thu, 25 Apr 2013 14:05:25 +0000 (16:05 +0200)]
acpi_piix4: Add infrastructure to send CPU hot-plug GPE to guest

* introduce processor status bitmask visible to guest at 0xaf00 addr,
  where ACPI asl code expects it
* set bit corresponding to APIC ID in processor status bitmask on
  receiving CPU hot-plug notification
* trigger CPU hot-plug SCI, to notify guest about CPU hot-plug event

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
(cherry picked from QEMU commit b8622725cf0196f672f272922b0941dc8ba1c408)

The function piix4_cpu_hotplug_req() has been modified to take an integer
instead of a CPU object.

There was a cpu_added_notifier in the original commit, this haven't
been back-ported, as it can't be used.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agocpu: Add qemu_for_each_cpu()
Michael S. Tsirkin [Wed, 24 Apr 2013 20:58:04 +0000 (22:58 +0200)]
cpu: Add qemu_for_each_cpu()

Wrapper to avoid open-coded loops and to make CPUState iteration
independent of CPUArchState.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
(cherry picked from QEMU commit d6b9e0d60cc511eca210834428bb74508cff3d33)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agocpu: Introduce get_arch_id() method and override it for X86CPU
Igor Mammedov [Tue, 23 Apr 2013 08:29:41 +0000 (10:29 +0200)]
cpu: Introduce get_arch_id() method and override it for X86CPU

get_arch_id() adds possibility for generic code to get a guest-visible
CPU ID without accessing CPUArchState.
If derived classes don't override it, it will return cpu_index.

Override it on target-i386 in X86CPU to return the APIC ID.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
(cherry picked from QEMU commit 997395d3888fcde6ce41535a8208d7aa919d824b)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
11 years agoRevert "xen: start PCI hole at 0xe0000000 (same as pc_init1 and qemu-xen-traditional)"
Stefano Stabellini [Thu, 13 Jun 2013 17:39:42 +0000 (17:39 +0000)]
Revert "xen: start PCI hole at 0xe0000000 (same as pc_init1 and qemu-xen-traditional)"

This reverts commit 4597594c61add43725bd207bb498268a058f9cfb.

Changing the start of the PCI hole requires a corresponding change in
hvmloader and libxc. Revert the commit for the moment.

11 years agoxen: start PCI hole at 0xe0000000 (same as pc_init1 and qemu-xen-traditional)
Stefano Stabellini [Wed, 5 Jun 2013 11:36:10 +0000 (11:36 +0000)]
xen: start PCI hole at 0xe0000000 (same as pc_init1 and qemu-xen-traditional)

We are currently setting the PCI hole to start at HVM_BELOW_4G_RAM_END,
that is 0xf0000000.
Start the PCI hole at 0xe0000000 instead, that is the same value used by
pc_init1 and qemu-xen-traditional.

upstream-commit-id: 9f24a8030a70ea4954b5b8c48f606012f086f65f

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agomain_loop: do not set nonblocking if xen_enabled()
Stefano Stabellini [Mon, 3 Jun 2013 15:38:43 +0000 (15:38 +0000)]
main_loop: do not set nonblocking if xen_enabled()

upstream-commit-id: a7d4207d378069a5bb3175a131e8fdedd39ef97d

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: qemu-stable@nongnu.org
11 years agoxen: simplify xen_enabled
Stefano Stabellini [Wed, 5 Jun 2013 11:33:02 +0000 (11:33 +0000)]
xen: simplify xen_enabled

No need for preprocessor conditionals in xen_enabled: xen_allowed is
always defined.

upstream-commit-id: 49fa9881b2358e390e9e9466ddde74e995927efa

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agoMerge commit 'v1.3.1' into xen-staging-master
Stefano Stabellini [Tue, 4 Jun 2013 15:33:31 +0000 (15:33 +0000)]
Merge commit 'v1.3.1' into xen-staging-master

12 years agoAllow xen guests to plug disks of 1 TiB or more qemu-xen-4.3.0-rc1
Felipe Franciosi [Fri, 5 Apr 2013 15:47:59 +0000 (15:47 +0000)]
Allow xen guests to plug disks of 1 TiB or more

The current xen backend driver implementation uses int64_t variables
to store the size of the corresponding backend disk/file. It also uses
an int64_t variable to store the block size of that image. When writing
the number of sectors (file_size/block_size) to xenstore, however, it
passes these values as 32 bit signed integers. This will cause an
overflow for any disk of 1 TiB or more.

This patch changes the xen backend driver to use a 64 bit integer write
xenstore function.

upstream-commit-id: 9246ce881128df2a69178779c1ef33c83df3c70d

Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoIntroduce 64 bit integer write interface to xenstore
Felipe Franciosi [Fri, 5 Apr 2013 15:37:32 +0000 (15:37 +0000)]
Introduce 64 bit integer write interface to xenstore

The current implementation of xen_backend only provides 32 bit integer
functions to write to xenstore. This patch adds two functions that
allow writing 64 bit integers (one generic function and another for
the backend only).

This patch also fixes the size of the char arrays used to represent
these integers as strings (originally 32 bytes, however no more than
12 bytes are needed for 32 bit integers and no more than 21 bytes are
needed for 64 bit integers).

upstream-commit-id: 10bb3c623478117aee5117c312736f10833decc2

Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoXen PV backend: Disable use of O_DIRECT by default as it results in crashes.
Alex Bligh [Fri, 5 Apr 2013 15:45:15 +0000 (15:45 +0000)]
Xen PV backend: Disable use of O_DIRECT by default as it results in crashes.

Due to what is almost certainly a kernel bug, writes with O_DIRECT may
continue to reference the page after the write has been marked as
completed, particularly in the case of TCP retransmit. In other
scenarios, this "merely" risks data corruption on the write, but with
Xen pages from domU are only transiently mapped into dom0's memory,
resulting in kernel panics when they are subsequently accessed.

This brings PV devices in line with emulated devices.  Removing
O_DIRECT is safe as barrier operations are now correctly passed
through.

See:
   http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
for more details.

upstream-commit-id: c1a88ad1f4ac994cd70695bf08141d161e21533e

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoXen PV backend: Move call to bdrv_new from blk_init to blk_connect
Alex Bligh [Fri, 5 Apr 2013 15:45:10 +0000 (15:45 +0000)]
Xen PV backend: Move call to bdrv_new from blk_init to blk_connect

This commit delays the point at which bdrv_new (and hence blk_open
on the underlying device) is called from blk_init to blk_connect.
This ensures that in an inbound live migrate, the block device is
not opened until it has been closed at the other end. This is in
preparation for supporting devices with open/close consistency
without using O_DIRECT. This commit does NOT itself change O_DIRECT
semantics.

upstream-commit-id: 86f425db3b1c4b6c4a2927eaec35627f9ab2e703

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoRevert "xen: Disable use of O_DIRECT by default as it results in crashes."
Stefano Stabellini [Fri, 5 Apr 2013 23:30:14 +0000 (23:30 +0000)]
Revert "xen: Disable use of O_DIRECT by default as it results in crashes."

This reverts commit f3903bbac78a81fcbce1350cdce860764a62783a.

12 years agoxen-mapcache: pass the right size argument to test_bits
Hanweidong [Tue, 2 Apr 2013 13:22:41 +0000 (13:22 +0000)]
xen-mapcache: pass the right size argument to test_bits

Compute the correct size for test_bits().
qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call xen_map_cache()
with size is 0 if the requested address is in the RAM.  Then
xen_map_cache() will pass the size 0 to test_bits() for checking if the
corresponding pfn was mapped in cache. But test_bits() will always
return 1 when size is 0 without any bit testing. Actually, for this
case, test_bits should check one bit. So this patch introduced a
__test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE,
then test_bits can work correctly with __test_bit_size
>> XC_PAGE_SHIFT as its size.

upstream-commit-id: 044d4e1aae539bd4214175bd9591b3de7986cf18

Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com>
Signed-off-by: Weidong Han <hanweidong@huawei.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoxen-mapcache: replace last_address_index with a last_entry pointer
Stefano Stabellini [Tue, 2 Apr 2013 13:23:40 +0000 (13:23 +0000)]
xen-mapcache: replace last_address_index with a last_entry pointer

Replace last_address_index and last_address_vaddr with a single pointer
to the last MapCacheEntry used.

upstream-commit-id: e2deee3ea6136b6189e8cfd26379420b9a398d96

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoxen: Disable use of O_DIRECT by default as it results in crashes.
Alex Bligh [Mon, 11 Mar 2013 14:02:49 +0000 (14:02 +0000)]
xen: Disable use of O_DIRECT by default as it results in crashes.

Due to what is almost certainly a kernel bug, writes with O_DIRECT may
continue to reference the page after the write has been marked as
completed, particularly in the case of TCP retransmit. In other
scenarios, this "merely" risks data corruption on the write, but with
Xen pages from domU are only transiently mapped into dom0's memory,
resulting in kernel panics when they are subsequently accessed.

This brings PV devices in line with emulated devices.  Removing
O_DIRECT is safe as barrier operations are now correctly passed
through.

See:
  http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
for more details.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoupdate VERSION for v1.3.1
Michael Roth [Mon, 28 Jan 2013 16:38:28 +0000 (10:38 -0600)]
update VERSION for v1.3.1

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agoqxl: Fix SPICE_RING_PROD_ITEM(), SPICE_RING_CONS_ITEM() sanity check
Markus Armbruster [Thu, 10 Jan 2013 13:24:49 +0000 (14:24 +0100)]
qxl: Fix SPICE_RING_PROD_ITEM(), SPICE_RING_CONS_ITEM() sanity check

The pointer arithmetic there is safe, but ugly.  Coverity grouses
about it.  However, the actual comparison is off by one: <= end
instead of < end.  Fix by rewriting the check in a cleaner way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit bc5f92e5db6f303e73387278e32f8669f0abf0e5)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agoFix compile errors when enabling Xen debug logging.
Sander Eikelenboom [Mon, 17 Dec 2012 11:37:43 +0000 (11:37 +0000)]
Fix compile errors when enabling Xen debug logging.

Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
(cherry picked from commit f1b8caf1d927f30f66054733a783651a24db4999)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agoxen: fix trivial PCI passthrough MSI-X bug
Stefano Stabellini [Mon, 17 Dec 2012 11:36:58 +0000 (11:36 +0000)]
xen: fix trivial PCI passthrough MSI-X bug

We are currently passing entry->data as address parameter. Pass
entry->addr instead.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Xen-devel: http://marc.info/?l=xen-devel&m=135515462613715
(cherry picked from commit 044b99c6555f562254ae70dc39f32190eecbc1f2)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agoxen_disk: fix memory leak
Roger Pau Monne [Mon, 14 Jan 2013 18:26:53 +0000 (18:26 +0000)]
xen_disk: fix memory leak

On ioreq_release the full ioreq was memset to 0, loosing all the data
and memory allocations inside the QEMUIOVector, which leads to a
memory leak. Create a new function to specifically reset ioreq.

Reported-by: Maik Wessler <maik.wessler@yahoo.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
(cherry picked from commit 282c6a2f292705f823554447ca0b7731b6f81a97)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agotcg/target-arm: Add missing parens to assertions
Peter Maydell [Thu, 17 Jan 2013 20:04:16 +0000 (20:04 +0000)]
tcg/target-arm: Add missing parens to assertions

Silence a (legitimate) complaint about missing parentheses:

tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_ld’:
tcg/arm/tcg-target.c:1148:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]
tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_st’:
tcg/arm/tcg-target.c:1357:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]

which meant that we would mistakenly always assert if running
a QEMU built with debug enabled on ARM.

Signed-off-by: Peter Maydell <peter.maydelL@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
(cherry picked from commit 5256a7208a7c2af19baf8f99bd4f06632f9f9ba9)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agowin32-aio: Fix memory leak
Kevin Wolf [Wed, 16 Jan 2013 20:20:00 +0000 (21:20 +0100)]
win32-aio: Fix memory leak

The buffer is allocated for both reads and writes, and obviously it
should be freed even if an error occurs.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit e8bccad5ac6095b5af7946cd72d9aacb57f7c0a3)

Conflicts:

block/win32-aio.c

*addressed conflict due to buggy g_free() still in use instead of
qemu_vfree() as it is upstream (via commit 7479acdb)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agowin32-aio: Fix vectored reads
Kevin Wolf [Wed, 16 Jan 2013 20:19:59 +0000 (21:19 +0100)]
win32-aio: Fix vectored reads

Copying data in the right direction really helps a lot!

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit bcbbd234d42f1111e42b91376db61922d42e7e9e)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agoaio: Fix return value of aio_poll()
Kevin Wolf [Wed, 16 Jan 2013 18:25:51 +0000 (19:25 +0100)]
aio: Fix return value of aio_poll()

aio_poll() must return true if any work is still pending, even if it
didn't make progress, so that bdrv_drain_all() doesn't stop waiting too
early. The possibility of stopping early occasionally lead to a failed
assertion in bdrv_drain_all(), when some in-flight request was missed
and the function didn't really drain all requests.

In order to make that change, the return value as specified in the
function comment must change for blocking = false; fortunately, the
return value of blocking = false callers is only used in test cases, so
this change shouldn't cause any trouble.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 2ea9b58f0bc62445b7ace2381b4c4db7d5597e19)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agoe1000: Discard oversized packets based on SBP|LPE
Michael Contreras [Wed, 5 Dec 2012 18:31:30 +0000 (13:31 -0500)]
e1000: Discard oversized packets based on SBP|LPE

Discard packets longer than 16384 when !SBP to match the hardware behavior.

upstream-commit-id: 2c0331f4f7d241995452b99afaf0aab00493334a
security-tags: XSA-41, CVE-2012-6075

Signed-off-by: Michael Contreras <michael@inetric.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12 years agoxen_disk: implement BLKIF_OP_FLUSH_DISKCACHE, remove BLKIF_OP_WRITE_BARRIER
Stefano Stabellini [Mon, 14 Jan 2013 18:30:30 +0000 (18:30 +0000)]
xen_disk: implement BLKIF_OP_FLUSH_DISKCACHE, remove BLKIF_OP_WRITE_BARRIER

upstream-commit-id: 7e7b7cba16faa7b721b822fa9ed8bebafa35700f

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoxen_disk: add persistent grant support to xen_disk backend
Roger Pau Monne [Mon, 14 Jan 2013 18:28:19 +0000 (18:28 +0000)]
xen_disk: add persistent grant support to xen_disk backend

This protocol extension reuses the same set of grant pages for all
transactions between the front/back drivers, avoiding expensive tlb
flushes, grant table lock contention and switches between userspace
and kernel space. The full description of the protocol can be found in
the public blkif.h header.

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=xen/include/public/io/blkif.h

Speed improvement with 15 guests performing I/O is ~450%.

upstream-commit-id: 9e496d7458bb01b717afe22db10a724db57d53fd

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoxen_disk: fix memory leak
Roger Pau Monne [Mon, 14 Jan 2013 18:26:53 +0000 (18:26 +0000)]
xen_disk: fix memory leak

On ioreq_release the full ioreq was memset to 0, loosing all the data
and memory allocations inside the QEMUIOVector, which leads to a
memory leak. Create a new function to specifically reset ioreq.

upstream-commit-id: 282c6a2f292705f823554447ca0b7731b6f81a97

Reported-by: Maik Wessler <maik.wessler@yahoo.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoraw-posix: fix bdrv_aio_ioctl
Paolo Bonzini [Thu, 10 Jan 2013 14:28:35 +0000 (15:28 +0100)]
raw-posix: fix bdrv_aio_ioctl

When the raw-posix aio=thread code was moved from posix-aio-compat.c
to block/raw-posix.c, there was an unintended change to the ioctl code.
The code used to return the ioctl command, which posix_aio_read()
would later morph into a zero.  This hack is not necessary anymore,
and in fact breaks scsi-generic (which expects a zero return code).
Remove it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b608c8dc02c78ee95455a0989bdf1b41c768b2ef)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agovfio-pci: Loosen sanity checks to allow future features
Alex Williamson [Tue, 8 Jan 2013 21:10:03 +0000 (14:10 -0700)]
vfio-pci: Loosen sanity checks to allow future features

VFIO_PCI_NUM_REGIONS and VFIO_PCI_NUM_IRQS should never have been
used in this manner as it locks a specific kernel implementation.
Future features may introduce new regions or interrupt entries
(VGA may add legacy ranges, AER might add an IRQ for error
signalling).  Fix this before it gets us into trouble.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org
(cherry picked from commit 8fc94e5a8046e349e07976f9bcaffbcd5833f3a2)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agopci-assign: Enable MSIX on device to match guest
Alex Williamson [Mon, 7 Jan 2013 04:30:31 +0000 (21:30 -0700)]
pci-assign: Enable MSIX on device to match guest

When a guest enables MSIX on a device we evaluate the MSIX vector
table, typically find no unmasked vectors and don't switch the device
to MSIX mode.  This generally works fine and the device will be
switched once the guest enables and therefore unmasks a vector.
Unfortunately some drivers enable MSIX, then use interfaces to send
commands between VF & PF or PF & firmware that act based on the host
state of the device.  These therefore may break when MSIX is managed
lazily.  This change re-enables the previous test used to enable MSIX
(see qemu-kvm a6b402c9), which basically guesses whether a vector
will be used based on the data field of the vector table.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit feb9a2ab4b0260d8d680a7ffd25063dafc7ec628)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agovfio-pci: Make host MSI-X enable track guest
Alex Williamson [Tue, 8 Jan 2013 21:09:03 +0000 (14:09 -0700)]
vfio-pci: Make host MSI-X enable track guest

Guests typically enable MSI-X with all of the vectors in the MSI-X
vector table masked.  Only when the vector is enabled does the vector
get unmasked, resulting in a vector_use callback.  These two points,
enable and unmask, correspond to pci_enable_msix() and request_irq()
for Linux guests.  Some drivers rely on VF/PF or PF/fw communication
channels that expect the physical state of the device to match the
guest visible state of the device.  They don't appreciate lazily
enabling MSI-X on the physical device.

To solve this, enable MSI-X with a single vector when the MSI-X
capability is enabled and immediate disable the vector.  This leaves
the physical device in exactly the same state between host and guest.
Furthermore, the brief gap where we enable vector 0, it fires into
userspace, not KVM, so the guest doesn't get spurious interrupts.
Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters
to enable MSI-X with zero vectors, but this will currently return an
error as the Linux MSI-X interfaces do not allow it.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org
(cherry picked from commit b0223e29afdc88cc262a764026296414396cd129)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agotarget-xtensa: fix search_pc for the last TB opcode
Max Filippov [Wed, 19 Dec 2012 20:04:09 +0000 (00:04 +0400)]
target-xtensa: fix search_pc for the last TB opcode

Zero out tcg_ctx.gen_opc_instr_start for instructions representing the
last guest opcode in the TB.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
(cherry picked from commit 36f25d2537c40c6c47f4abee5d31a24863d1adf7)

*modified to use older global version of gen_opc_instr_start

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agobuffered_file: do not send more than s->bytes_xfer bytes per tick
Paolo Bonzini [Tue, 20 Nov 2012 11:48:19 +0000 (12:48 +0100)]
buffered_file: do not send more than s->bytes_xfer bytes per tick

Sending more was possible if the buffer was large.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit bde54c08b4854aceee3dee25121a2b835cb81166)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
12 years agomigration: fix migration_bitmap leak
Paolo Bonzini [Wed, 12 Dec 2012 11:54:43 +0000 (12:54 +0100)]
migration: fix migration_bitmap leak

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 244eaa7514a944b36273eb8428f32da8e9124fcf)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>