Petr Matousek [Mon, 27 Oct 2014 11:41:44 +0000 (12:41 +0100)]
vnc: sanitize bits_per_pixel from the client
bits_per_pixel that are less than 8 could result in accessing
non-initialized buffers later in the code due to the expectation
that bytes_per_pixel value that is used to initialize these buffers is
never zero.
To fix this check that bits_per_pixel from the client is one of the
values that the rfb protocol specification allows.
This is CVE-2014-7815.
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
[ kraxel: apply codestyle fix ]
Gerd Hoffmann [Wed, 4 Mar 2015 17:55:51 +0000 (17:55 +0000)]
vmware-vga: CVE-2014-3689: turn off hw accel
Quick & easy stopgap for CVE-2014-3689: We just compile out the
hardware acceleration functions which lack sanity checks. Thankfully
we have capability bits for them (SVGA_CAP_RECT_COPY and
SVGA_CAP_RECT_FILL), so guests should deal just fine, in theory.
Subsequent patches will add the missing checks and re-enable the
hardware acceleration emulation.
Petr Matousek [Thu, 18 Sep 2014 06:35:37 +0000 (08:35 +0200)]
slirp: udp: fix NULL pointer dereference because of uninitialized socket
When guest sends udp packet with source port and source addr 0,
uninitialized socket is picked up when looking for matching and already
created udp sockets, and later passed to sosendto() where NULL pointer
dereference is hit during so->slirp->vnetwork_mask.s_addr access.
Fix this by checking that the socket is not just a socket stub.
This is CVE-2014-3640.
Signed-off-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Xavier Mehrenberger <xavier.mehrenberger@airbus.com> Reported-by: Stephane Duverger <stephane.duverger@eads.net> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-id: 20140918063537.GX9321@dhcp-25-225.brq.redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Related spice-only bug. We have a fixed 16 MB buffer here, being
presented to the spice-server as qxl video memory in case spice is
used with a non-qxl card. It's also used with qxl in vga mode.
When using display resolutions requiring more than 16 MB of memory we
are going to overflow that buffer. In theory the guest can write,
indirectly via spice-server. The spice-server clears the memory after
setting a new video mode though, triggering a segfault in the overflow
case, so qemu crashes before the guest has a chance to do something
evil.
Fix that by switching to dynamic allocation for the buffer.
Gerd Hoffmann [Wed, 4 Mar 2015 17:51:39 +0000 (17:51 +0000)]
vbe: rework sanity checks
Plug a bunch of holes in the bochs dispi interface parameter checking.
Add a function doing verification on all registers. Call that
unconditionally on every register write. That way we should catch
everything, even changing one register affecting the valid range of
another register.
Some of the holes have been added by commit e9c6149f6ae6873f14a12eea554925b6aa4c4dec. Before that commit the
maximum possible framebuffer (VBE_DISPI_MAX_XRES * VBE_DISPI_MAX_YRES *
32 bpp) has been smaller than the qemu vga memory (8MB) and the checking
for VBE_DISPI_MAX_XRES + VBE_DISPI_MAX_YRES + VBE_DISPI_MAX_BPP was ok.
Some of the holes have been there forever, such as
VBE_DISPI_INDEX_X_OFFSET and VBE_DISPI_INDEX_Y_OFFSET register writes
lacking any verification.
Security impact:
(1) Guest can make the ui (gtk/vnc/...) use memory rages outside the vga
frame buffer as source -> host memory leak. Memory isn't leaked to
the guest but to the vnc client though.
(2) Qemu will segfault in case the memory range happens to include
unmapped areas -> Guest can DoS itself.
The guest can not modify host memory, so I don't think this can be used
by the guest to escape.
Benoît Canet [Wed, 4 Mar 2015 17:17:12 +0000 (17:17 +0000)]
ide: Correct improper smart self test counter reset in ide core.
The SMART self test counter was incorrectly being reset to zero,
not 1. This had the effect that on every 21st SMART EXECUTE OFFLINE:
* We would write off the beginning of a dynamically allocated buffer
* We forgot the SMART history
Fix this.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Message-id: 1397336390-24664-1-git-send-email-benoit.canet@irqsave.net Reviewed-by: Markus Armbruster <armbru@redhat.com> Cc: qemu-stable@nongnu.org Acked-by: Kevin Wolf <kwolf@redhat.com>
[PMM: tweaked commit message as per suggestions from Markus] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Malformed input can have config_len in migration stream
exceed the array size allocated on destination, the
result will be heap overflow.
To fix, that config_len matches on both sides.
CVE-2014-0182
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
When VM guest programs multicast addresses for
a virtio net card, it supplies a 32 bit
entries counter for the number of addresses.
These addresses are read into tail portion of
a fixed macs array which has size MAC_TABLE_ENTRIES,
at offset equal to in_use.
To avoid overflow of this array by guest, qemu attempts
to test the size as follows:
- if (in_use + mac_data.entries <= MAC_TABLE_ENTRIES) {
however, as mac_data.entries is uint32_t, this sum
can overflow, e.g. if in_use is 1 and mac_data.entries
is 0xffffffff then in_use + mac_data.entries will be 0.
Qemu will then read guest supplied buffer into this
memory, overflowing buffer on heap.
CVE-2014-0150
Cc: qemu-stable@nongnu.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1397218574-25058-1-git-send-email-mst@redhat.com Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Michael Roth [Wed, 4 Mar 2015 16:57:56 +0000 (16:57 +0000)]
virtio: avoid buffer overrun on incoming migration
CVE-2013-6399
vdev->queue_sel is read from the wire, and later used in the
emulation code as an index into vdev->vq[]. If the value of
vdev->queue_sel exceeds the length of vdev->vq[], currently
allocated to be VIRTIO_PCI_QUEUE_MAX elements, subsequent PIO
operations such as VIRTIO_PCI_QUEUE_PFN can be used to overrun
the buffer with arbitrary data originating from the source.
Fix this by failing migration if the value from the wire exceeds
VIRTIO_PCI_QUEUE_MAX.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com>
Both virtio-block and virtio-serial read,
VirtQueueElements are read in as buffers, and passed to
virtqueue_map_sg(), where num_sg is taken from the wire and can force
writes to indicies beyond VIRTQUEUE_MAX_SIZE.
To fix, validate num_sg.
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: Amit Shah <amit.shah@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
hpet is a VARRAY with a uint8 size but static array of 32
To fix, make sure num_timers is valid using VMSTATE_VALID hook.
Reported-by: Anthony Liguori <anthony@codemonkey.ws> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
r->buf is hardcoded to 2056 which is (256 + 1) * 8, allowing 256 luns at
most. If more than 256 luns are specified by user, we have buffer
overflow in scsi_target_emulate_report_luns.
To fix, we allocate the buffer dynamically.
Signed-off-by: Asias He <asias@redhat.com> Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
virtio: out-of-bounds buffer write on invalid state load
CVE-2013-4151 QEMU 1.0 out-of-bounds buffer write in
virtio_load@hw/virtio/virtio.c
So we have this code since way back when:
num = qemu_get_be32(f);
for (i = 0; i < num; i++) {
vdev->vq[i].vring.num = qemu_get_be32(f);
array of vqs has size VIRTIO_PCI_QUEUE_MAX, so
on invalid input this will write beyond end of buffer.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
and read to the n->mac_table.in_use size buffer n->mac_table.in_use *
ETH_ALEN bytes, corrupting memory.
If adversary controls state then memory written there is controlled
by adversary.
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
with good in_use value, "n->mac_table.in_use * ETH_ALEN" can get
positive and bigger than mac_table.macs. For example 0x81000000
satisfies this condition when ETH_ALEN is 6.
Fix it by making the value unsigned.
For consistency, change first_multi as well.
Note: all call sites were audited to confirm that
making them unsigned didn't cause any issues:
it turns out we actually never do math on them,
so it's easy to validate because both values are
always <= MAC_TABLE_ENTRIES.
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Conflicts:
include/hw/virtio/virtio-net.h
Stefan Hajnoczi [Wed, 13 Feb 2013 08:25:34 +0000 (09:25 +0100)]
block/curl: only restrict protocols with libcurl>=7.19.4
The curl_easy_setopt(state->curl, CURLOPT_PROTOCOLS, ...) interface was
introduced in libcurl 7.19.4. Therefore we cannot protect against
CVE-2013-0249 when linking against an older libcurl.
Reported-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Andreas Färber <andreas.faeber@web.de>
Message-id: 1360743934-8337-1-git-send-email-stefanha@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Stefan Hajnoczi [Fri, 8 Feb 2013 07:49:10 +0000 (08:49 +0100)]
block/curl: disable extra protocols to prevent CVE-2013-0249
There is a buffer overflow in libcurl POP3/SMTP/IMAP. The workaround is
simple: disable extra protocols so that they cannot be exploited. Full
details here:
http://curl.haxx.se/docs/adv_20130206.html
QEMU only cares about HTTP, HTTPS, FTP, FTPS, and TFTP. I have tested
that this fix prevents the exploit on my host with
libcurl-7.27.0-5.fc18.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In snapshot mode, bdrv_open creates an empty temporary file without
checking for mkstemp or close failure, and ignoring the possibility
of a buffer overrun given a surprisingly long $TMPDIR.
Change the get_tmp_filename function to return int (not void),
so that it can inform its two callers of those failures.
Also avoid the risk of buffer overrun and do not ignore mkstemp
or close failure.
Update both callers (in block.c and vvfat.c) to propagate
temp-file-creation failure to their callers.
get_tmp_filename creates and closes an empty file, while its
callers later open that presumed-existing file with O_CREAT.
The problem was that a malicious user could provoke mkstemp failure
and race to create a symlink with the selected temporary file name,
thus causing the qemu process (usually root owned) to open through
the symlink, overwriting an attacker-chosen file.
This addresses CVE-2012-2652.
http://bugzilla.redhat.com/CVE-2012-2652
Signed-off-by: Jim Meyering <meyering@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Fri, 23 Dec 2011 14:39:03 +0000 (15:39 +0100)]
virtio-blk: refuse SG_IO requests with scsi=off
QEMU does have a "scsi" option (to be used like -device
virtio-blk-pci,drive=foo,scsi=off). However, it only
masks the feature bit, and does not reject the command
if a malicious guest disregards the feature bits and
issues a request.
Without this patch, using scsi=off does not protect you
from CVE-2011-4127.
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Matthew Daley [Thu, 10 Oct 2013 14:15:47 +0000 (14:15 +0000)]
xen_disk: mark ioreq as mapped before unmapping in error case
Commit 4472beae modified the semantics of ioreq_{un,}map so that they are
idempotent if called when they're not needed (ie., twice in a row). However,
it neglected to handle the case where batch mapping is not being used (the
default), and one of the grants fails to map. In this case, ioreq_unmap will
be called to unwind and unmap any mappings already performed, but ioreq_unmap
simply returns due to the aforementioned change (the ioreq has not already
been marked as mapped).
The frontend user can therefore force xen_disk to leak grant mappings, a
per-domain limited resource.
Fix by marking the ioreq as mapped before calling ioreq_unmap in this
situation.
Signed-off-by: Matthew Daley <mattjd@gmail.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
In addition, at least all files created with the "guest-file-open" QMP
command, and all files created with shell output redirection (or
otherwise) by utilities invoked by the fsfreeze hook script are affected.
For now mask all file mode bits for "group" and "others" in
become_daemon().
Temporarily, for compatibility reasons, stick with the 0666 file-mode in
case of files newly created by the "guest-file-open" QMP call. Do so
without changing the umask temporarily.
Currently the qemu-nbd program will auto-detect the format of
any disk it is given. This behaviour is known to be insecure.
For example, if qemu-nbd initially exposes a 'raw' file to an
unprivileged app, and that app runs
then the next time the app is started, the qemu-nbd will now
detect it as a 'qcow2' file and expose /etc/shadow to the
unprivileged app.
The only way to avoid this is to explicitly tell qemu-nbd what
disk format to use on the command line, completely disabling
auto-detection. This patch adds a '-f' / '--format' arg for
this purpose, mirroring what is already available via qemu-img
and qemu commands.
qemu-nbd --format raw -p 9000 evil.img
will now always use raw, regardless of what format 'evil.img'
looks like it contains
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
[Use errx, not err. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The current xen backend driver implementation uses int64_t variables
to store the size of the corresponding backend disk/file. It also uses
an int64_t variable to store the block size of that image. When writing
the number of sectors (file_size/block_size) to xenstore, however, it
passes these values as 32 bit signed integers. This will cause an
overflow for any disk of 1 TiB or more.
This patch changes the xen backend driver to use a 64 bit integer write
xenstore function.
Introduce 64 bit integer write interface to xenstore
The current implementation of xen_backend only provides 32 bit integer
functions to write to xenstore. This patch adds two functions that
allow writing 64 bit integers (one generic function and another for
the backend only).
This patch also fixes the size of the char arrays used to represent
these integers as strings (originally 32 bytes, however no more than
12 bytes are needed for 32 bit integers and no more than 21 bytes are
needed for 64 bit integers).
Alex Bligh [Fri, 5 Apr 2013 23:37:41 +0000 (23:37 +0000)]
Xen PV backend: Disable use of O_DIRECT by default as it results in crashes.
Due to what is almost certainly a kernel bug, writes with O_DIRECT may
continue to reference the page after the write has been marked as
completed, particularly in the case of TCP retransmit. In other
scenarios, this "merely" risks data corruption on the write, but with
Xen pages from domU are only transiently mapped into dom0's memory,
resulting in kernel panics when they are subsequently accessed.
This brings PV devices in line with emulated devices. Removing
O_DIRECT is safe as barrier operations are now correctly passed
through.
See:
http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
for more details.
Alex Bligh [Fri, 5 Apr 2013 23:37:19 +0000 (23:37 +0000)]
Xen PV backend: Move call to bdrv_new from blk_init to blk_connect
This commit delays the point at which bdrv_new (and hence blk_open
on the underlying device) is called from blk_init to blk_connect.
This ensures that in an inbound live migrate, the block device is
not opened until it has been closed at the other end. This is in
preparation for supporting devices with open/close consistency
without using O_DIRECT. This commit does NOT itself change O_DIRECT
semantics.
xen-mapcache: pass the right size argument to test_bits
Compute the correct size for test_bits().
qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call xen_map_cache()
with size is 0 if the requested address is in the RAM. Then
xen_map_cache() will pass the size 0 to test_bits() for checking if the
corresponding pfn was mapped in cache. But test_bits() will always
return 1 when size is 0 without any bit testing. Actually, for this
case, test_bits should check one bit. So this patch introduced a
__test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE,
then test_bits can work correctly with __test_bit_size
>> XC_PAGE_SHIFT as its size.
Alex Bligh [Wed, 6 Mar 2013 14:59:27 +0000 (14:59 +0000)]
xen: xen_sync_dirty_bitmap: attempt to fix SEGV
When xc_hvm_track_dirty_vram fails, iterate through pages based on
vram_offset and npages, rather than start_addr and size. DPRINTF
before the loop too.
This backport is less clean that it might be because there is no
memory_region_set_dirty that copes with more than one page in 4.2,
and the case where the call to xc_hvm_track_dirty_vram is
successful also needs to ensure xen_modified_memory is
called (which would on unstable have been done within
memory_region_set_dirty).
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Alex Bligh <alex@alex.org.uk>
Note a call to xen_modify_memory has been added to qemu_ram_alloc_from_ptr
as the upstream version does:
cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);
and this function does not exist in 4.2.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Avi Kivity <avi@redhat.com> Signed-off-by: Alex Bligh <alex@alex.org.uk>
David Gibson [Thu, 21 Feb 2013 12:16:41 +0000 (12:16 +0000)]
cpu_physical_memory_write_rom() needs to do TB invalidates
cpu_physical_memory_write_rom(), despite the name, can also be used to
write images into RAM - and will often be used that way if the machine
uses load_image_targphys() into RAM addresses.
However, cpu_physical_memory_write_rom(), unlike cpu_physical_memory_rw()
doesn't invalidate any cached TBs which might be affected by the region
written.
This was breaking reset (under full emu) on the pseries machine - we loaded
our firmware image into RAM, and while executing it rewrite the code at
the entry point (correctly causing a TB invalidate/refresh). When we
reset the firmware image was reloaded, but the TB from the rewrite was
still active and caused us to get an illegal instruction trap.
This patch fixes the bug by duplicating the tb invalidate code from
cpu_physical_memory_rw() in cpu_physical_memory_write_rom().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Alex Bligh <alex@alex.org.uk>
Anthony PERARD [Thu, 21 Feb 2013 12:16:37 +0000 (12:16 +0000)]
xen: Introduce xen_modified_memory.
This function is to be used during live migration. Every write access to the
guest memory should call this funcion so the Xen tools knows which pages are
dirty.
e1000: Discard packets that are too long if !SBP and !LPE
The e1000_receive function for the e1000 needs to discard packets longer than
1522 bytes if the SBP and LPE flags are disabled. The linux driver assumes
this behavior and allocates memory based on this assumption.
Fix invalidate if memory requested was not bucket aligned
When memory is mapped in qemu_map_cache with lock != 0 a reverse mapping
is created pointing to the virtual address of location requested.
The cached mapped entry is saved in last_address_vaddr with the memory
location of the base virtual address (without bucket offset).
However when this entry is invalidated the virtual address saved in the
reverse mapping is used. This cause that the mapping is freed but the
last_address_vaddr is not reset.
xen-all.c: fix multiply issue for int and uint types
If the two multiply operands are int and uint types separately,
the int type will be transformed to uint firstly, which is not the
intent in our code piece. The fix is to add (int64_t) transform
for the uint type before the multiply.
Jan Beulich [Wed, 13 Jun 2012 10:45:07 +0000 (10:45 +0000)]
qemu/xendisk: set maximum number of grants to be used
Legacy (non-pvops) gntdev drivers may require this to be done when the
number of grants intended to be used simultaneously exceeds a certain
driver specific default limit.
Anthony PERARD [Mon, 21 May 2012 16:12:43 +0000 (16:12 +0000)]
xen: Fix PV-on-HVM
In the context of PV-on-HVM under Xen, the emulated nics are supposed to be
unplug before the guest drivers are initialized, when the guest write to a
specific IO port.
Without this patch, the guest end up with two nics with the same MAC, the
emulated nic and the PV nic.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Paolo Bonzini [Tue, 20 Mar 2012 09:49:17 +0000 (10:49 +0100)]
main loop: use msec-based timeout in glib_select_fill
The timeval-based timeout is not needed until we actually invoke select,
so compute it only then. Also group the two calls that modify the
timeout, glib_select_fill and os_host_main_loop_wait.
Roger Pau Monne [Fri, 18 May 2012 12:05:31 +0000 (12:05 +0000)]
audio: split IN_T into two separate constants
Split IN_T into BSIZE and ITYPE, to avoid expansion if the OS has
defined macros for the intX_t and uintX_t types. The IN_T constant is
then defined in mixeng_template.h so it can be used by the
functions/macros on this header file.
This change has been tested successfully under Debian Linux and NetBSD
6.0BETA.
timers: the rearm function should be able to handle delta = INT64_MAX
Fix win32_rearm_timer and mm_rearm_timer: they should be able to handle
INT64_MAX as a delta parameter without overflowing.
Also, the next deadline in ms should be calculated rounding down rather
than up (see unix_rearm_timer and dynticks_rearm_timer).
Finally ChangeTimerQueueTimer takes an unsigned long and timeSetEvent
takes an unsigned int as delta, so cast the ms delta to the appropriate
unsigned integer.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
xen: do not initialize the interval timer and PCSPK emulator
PIT and PCSPK are emulated by the hypervisor so we don't need to emulate
them in Qemu: this patch prevents Qemu from waking up needlessly at
PIT_FREQ on Xen.
Jan Beulich [Mon, 14 May 2012 16:46:33 +0000 (16:46 +0000)]
xen_disk: properly update stats in ioreq_release()
While for the "normal" case (called from blk_send_response_all())
decrementing requests_finished is correct, doing so in the parse error
case is wrong; requests_inflight needs to be decremented instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
xen_disk: use bdrv_aio_flush instead of bdrv_flush
Use bdrv_aio_flush instead of bdrv_flush.
Make sure to call bdrv_aio_writev/readv after the presync bdrv_aio_flush is fully
completed and make sure to call the postsync bdrv_aio_flush after
bdrv_aio_writev/readv is fully completed.
John V. Baboval [Tue, 17 Apr 2012 15:42:41 +0000 (15:42 +0000)]
xen: Support guest reboots
Call xc_domain_shutdown with the reboot flag when the guest requests a reboot.
Signed-off-by: John V. Baboval <john.baboval@virtualcomputer.com> Signed-off-by: Tom Goetz <tom.goetz@virtualcomputer.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
xen: introduce an event channel for buffered io event notifications
Use the newly introduced HVM_PARAM_BUFIOREQ_EVTCHN to receive
notifications for buffered io events.
After the first notification is received leave the event channel masked
and setup a timer to process the rest of the batch.
Once we have completed processing the batch, unmask the event channel
and delete the timer.
xen_console: ignore console disconnect events from console/0
The first console has a different location compared to other PV devices
(console, rather than device/console/0) and doesn't obey the xenstore
state protocol. We already special case the first console in con_init
and con_initialise, we should also do it in con_disconnect.
Anthony PERARD [Wed, 18 Jan 2012 12:21:38 +0000 (12:21 +0000)]
xen mapcache: check if memory region has moved.
This patch changes the xen_map_cache behavior. Before trying to map a guest
addr, mapcache will look into the list of range of address that have been moved
(physmap/set_memory). There is currently one memory space like this, the vram,
"moved" from were it's allocated to were the guest will look into.
This help to have a succefull migration.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Kenneth Salerno [Sun, 19 Feb 2012 00:05:44 +0000 (16:05 -0800)]
qemu-1.0.1/VERSION
Hello,
The VERSION file in stable release qemu-1.0.1 has what I believe might be a typo: "1.0,1" rather than "1.0.1". This is causing a parsing issue for windres.exe in Win32 which chokes on:
#define CONFIG_FILEVERSION 1,0,1,0,1,0
#define CONFIG_PRODUCTVERSION 1,0,1,0,1,0
when it should be seeing this:
#define CONFIG_FILEVERSION 1,0,1,0
#define CONFIG_PRODUCTVERSION 1,0,1,0
Patch:
Signed-off-by: Justin M. Forbes <jforbes@redhat.com>
s390: fix cpu hotplug / cpu activity on interrupts
The add_del/running_cpu code and env->halted are tracking stopped cpus.
Sleeping cpus (idle and enabled for interrupts) are waiting inside the
kernel.
No interrupt besides the restart can move a cpu from stopped to
operational. This is already handled over there. So lets just remove
the bogus wakup from the common interrupt delivery, otherwise any
interrupt will wake up a cpu, even if this cpu is stopped (Thus leading
to strange hangs on sigp restart)
This fixes
echo 0 > /sys/devices/system/cpu/cpu0/online
echo 1 > /sys/devices/system/cpu/cpu0/online
in the guest
Signed-off-by: Christian Borntraeger<borntraeger@de.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
(cherry picked from commit 93116ac0cf9734e7b28886aedf03848b37d6785e)
David Gibson [Wed, 11 Jan 2012 19:46:27 +0000 (19:46 +0000)]
pseries: Don't try to munmap() a malloc()ed TCE table
For the pseries machine, TCE (IOMMU) tables can either be directly
malloc()ed in qemu or, when running on a KVM which supports it, mmap()ed
from a KVM ioctl. The latter option is used when available, because it
allows the (frequent bottlenext) H_PUT_TCE hypercall to be KVM accelerated.
However, even when KVM is persent, TCE acceleration is not always possible.
Only KVM HV supports this ioctl(), not KVM PR, or the kernel could run out
of contiguous memory to allocate the new table. In this case we need to
fall back on the malloc()ed table.
When a device is removed, and we need to remove the TCE table, we need to
either munmap() or free() the table as appropriate for how it was
allocated. The code is supposed to do that, but we buggily fail to
initialize the tcet->fd variable in the malloc() case, which is used as a
flag to determine which is the right choice.
This patch fixes the bug, and cleans up error messages relating to this
path while we're at it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
David Gibson [Tue, 13 Dec 2011 04:24:34 +0000 (15:24 +1100)]
pseries: Populate "/chosen/linux,stdout-path" in the FDT
There is a device tree property "/chosen/linux,stdout-path" which indicates
which device should be used as stdout - ie. "the console".
Currently we don't specify anything, which means both firmware and Linux
choose something arbitrarily. Use the routine we added in the last patch
to pick a default vty and specify it as stdout.
Currently SLOF doesn't use the property, but we are hoping to update it
to do so.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
(cherry picked from commit 68f3a94c64bbaaf8c7f2daa70de1b5d87a432f86)
David Gibson [Mon, 12 Dec 2011 18:24:33 +0000 (18:24 +0000)]
pseries: Add a routine to find a stable "default" vty and use it
In vty_lookup() we have a special case for supporting early debug in
the kernel. This accepts reg == 0 as a special case to mean "any vty".
We implement this by searching the vtys on the bus and returning the
first we find. This means that the vty we chose depends on the order
the vtys are specified on the QEMU command line - because that determines
the order of the vtys on the bus.
We'd rather the command line order was irrelevant, so instead return
the vty with the lowest reg value. This is still a guess as to what the
user really means, but it is at least stable WRT command line ordering.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf] fix braces
(cherry picked from commit 98331f8ad6a3e2cfbb402d72e6be47eac7706251)
David Gibson [Mon, 12 Dec 2011 18:24:32 +0000 (18:24 +0000)]
pseries: Emit device tree nodes in reg order
Although in theory the device tree has no inherent ordering, in practice
the order of nodes in the device tree does effect the order that devices
are detected by software.
Currently the ordering is determined by the order the devices appear on
the QEMU command line. Although that does give the user control over the
ordering, it is fragile, especially when the user does not generate the
command line manually - eg. when using libvirt etc.
So order the device tree based on the reg value, ie. the address of on
the VIO bus of the devices. This gives us a sane and stable ordering.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf] add braces
(cherry picked from commit 05c194384f836240ea4c2da5fa3be43a54bff021)
David Gibson [Mon, 28 Nov 2011 20:21:39 +0000 (20:21 +0000)]
pseries: Fix array overrun bug in PCI code
spapr_populate_pci_devices() containd a loop with PCI_NUM_REGIONS (7)
iterations. However this overruns the 'bars' global array, which only has
6 elements. In fact we only want to run this loop for things listed in the
bars array, so this patch corrects the loop bounds to reflect that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
(cherry picked from commit 135712de61dfa22368e98914d65b8b0860ec8505)
Alexander Graf [Fri, 18 Nov 2011 15:41:59 +0000 (16:41 +0100)]
console: Fix segfault on screendump without VGA adapter
When trying to create a screen dump without having any VGA adapter
inside the guest, QEMU segfaults.
This is because it's trying to switch back to the "previous" screen
it was on before dumping the VGA screen. Unfortunately, in my case
there simply is no previous screen so it accesses a NULL pointer.
Fix it by checking if previous_active_console is actually available.
Kevin Wolf [Wed, 7 Dec 2011 11:42:10 +0000 (12:42 +0100)]
qemu-img rebase: Fix for undersized backing files
Backing files may be smaller than the corresponding COW file. When
reading directly from the backing file, qemu-img rebase must consider
this and assume zero sectors after the end of backing files.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Avi Kivity [Mon, 5 Dec 2011 17:20:12 +0000 (19:20 +0200)]
coroutine: switch per-thread free pool to a global pool
ucontext-based coroutines use a free pool to reduce allocations and
deallocations of coroutine objects. The pool is per-thread, presumably
to improve locality. However, as coroutines are usually allocated in
a vcpu thread and freed in the I/O thread, the pool accounting gets
screwed up and we end allocating and freeing a coroutine for every I/O
request. This is expensive since large objects are allocated via the
kernel, and are not cached by the C runtime.
Fix by switching to a global pool. This is safe since we're protected
by the global mutex.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Fri, 25 Nov 2011 11:06:22 +0000 (12:06 +0100)]
qiov: prevent double free or use-after-free
qemu_iovec_destroy does not clear the QEMUIOVector fully, and the data
could thus be used after free or freed again. While I do not know any
example in the tree, I observed this using virtio-scsi (and SCSI
scatter/gather) when canceling DMA requests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Aurelien Jarno [Sat, 7 Jan 2012 14:20:12 +0000 (15:20 +0100)]
target-sh4: ignore ocbp and ocbwb instructions
ocbp and ocbwb controls the writeback of a cache line to memory. They
are supposed to do nothing in case of a cache miss. Given QEMU only
partially emulate caches, it is safe to ignore these instructions.
This fixes a kernel oops when trying to access an rtl8139 NIC with
recent versions.
Andriy Gapon [Thu, 22 Dec 2011 09:34:30 +0000 (11:34 +0200)]
usb-ohci: td.cbp incorrectly updated near page end
The current code that updates the cbp value after a transfer looks like this:
td.cbp += ret;
if ((td.cbp & 0xfff) + ret > 0xfff) {
<handle page overflow>
because the 'ret' value is effectively added twice the check may fire too early
when the overflow hasn't happened yet.
Below is one of the possible changes that correct the behavior:
Gerd Hoffmann [Thu, 5 Jan 2012 14:49:18 +0000 (15:49 +0100)]
usb-host: properly release port on unplug & exit
Factor out port release into a separate function. Call release function
in exit notifier too. Add explicit call the USBDEVFS_RELEASE_PORT
ioctl, just closing the hub file handle seems not to be enougth. Make
sure we release the port before resetting the device, otherwise host
drivers will not re-attach.