Add commands to exercise asynchronous reads/writes and to flush all
outstanding aio commands. Commands to exercise aio cancellations will
follow in a separate patch.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Stefan Weil [Sun, 21 Jun 2009 16:35:03 +0000 (18:35 +0200)]
Fix dump output in qemu-io.
The dump output was not nicely formatted for bytes
larger than 0x7f, because signed values expanded to
sizeof(int) bytes. So for example 0xab did not print
as "ab", but as "ffffffab".
I also cleaned the function prototype, which avoids
new type casts and allows to remove an existing
type cast.
Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
qemu/net: flag to control the number of vectors a nic has
Add an option to specify the number of MSI-X vectors for PCI NIC cards. This
can also be used to disable MSI-X, for compatibility with old qemu. This
option currently only affects virtio cards.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
qemu/virtio: virtio support for many interrupt vectors
Extend virtio to support many interrupt vectors, and rearrange code in
preparation for multi-vector support (mostly move reset out to bindings,
because we will have to reset the vectors in transport-specific code).
Actual bindings in pci, and use in net, to follow.
Load and save are not connected to bindings yet, so they are left
stubbed out for now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
qemu/apic: minimal MSI/MSI-X implementation for PC
Implement MSI support in APIC. Note that MSI and MMIO APIC registers
are at the same memory location, but actually not on the global bus: MSI
is on PCI bus, APIC is connected directly to the CPU. We map them on the
global bus at the same address which happens to work because MSI
registers are reserved in APIC MMIO and vice versa.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add functions implementing MSI-X support. First user will be virtio-pci.
Note that platform must set a flag to declare MSI supported: this
is a safety measure to avoid breaking platforms which should support
MSI-X but currently lack this in the interrupt controller emulation.
For PC this will be set by APIC.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add "cmask" table of constant register masks: if a bit is not writeable
and is set in cmask table, this bit is checked on load. An attempt to
load an image that would change such a register causes load to fail.
Use this table to make sure that load does not modify registers that
guest can not change (directly or indirectly).
Note: we can't just assume that read-only registers never change,
because the guest could change a register indirectly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
qemu/pci: make default_write_config use mask table
Change much of hw/pci to use symbolic constants and a table-driven
design: add a mask table with writable bits set and readonly bits unset.
Detect change by comparing original and new registers.
This makes it easy to support capabilities where read-only/writeable
bit layout differs between devices, depending on capabilities present.
As a result, writing a single byte in BAR registers now works as
it should. Writing to upper limit registers in the bridge
also works as it should. Code is also shorter.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
microblaze: Support the latest mmu-kernel stat64 ABI.
Microblaze recently changed their ABI. The new is not backwards compatible
and there doesn't seem to be a way to distinguish old/new binaries.
Let's support the latest ABI for now and hope someone figures out a way to
hande both ABI's later.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Nathan Froyd [Fri, 5 Jun 2009 01:45:03 +0000 (18:45 -0700)]
target-ppc: permit linux-user to read PVR
Access to the PVR SPR is normally forbidden from userspace apps. The
Linux kernel, however, fixes up reads in the appropriate trap handler.
To permit applications that read PVR to run on QEMU, then, we need to
implement the same handling of PVR reads.
Mark McLoughlin [Thu, 18 Jun 2009 17:21:34 +0000 (18:21 +0100)]
net: add '-net tap,sndbuf=nbytes'
2.6.30 adds a new TUNSETSNDBUF ioctl() which allows a send buffer limit
for the tap device to be specified. When this limit is reached, a tap
write() will return EAGAIN and poll() will indicate the fd isn't
writable.
This allows people to tune their setups so as to avoid e.g. UDP packet
loss when the sending application in the guest out-runs the NIC in the
host.
There is no obviously sensible default setting - a suitable value
depends mostly on the capabilities of the physical NIC through which the
packets are being sent.
Also, note that when using a bridge with netfilter enabled, we currently
never get EAGAIN because netfilter causes the packet to be immediately
orphaned. Set /proc/sys/net/bridge/bridge nf-call-iptables to zero to
disable this behaviour.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Mark McLoughlin [Thu, 18 Jun 2009 17:21:29 +0000 (18:21 +0100)]
net: add qemu_purge_queued_packets()
If net client sends packets asynchronously, it needs to purge its queued
packets in cleanup() so as to prevent sent callbacks being invoked with
a freed client.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gleb Natapov [Thu, 18 Jun 2009 13:29:18 +0000 (16:29 +0300)]
Don't register cpu reset handler for cpu with APIC.
APIC reset handler already resets cpu, no need to reset it twice.
Also register cpu_reset handler directly to make it impossible to
add additional code to main_cpu_reset() by mistake.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Support addr=... in option argument of -drive if=virtio
Make drive_init() accept addr=, put the value into struct DriveInfo.
Use it in all the places that create virtio-blk-pci devices:
pc_init1(), bamboo_init(), mpc8544ds_init().
Don't support addr= in third argument of monitor command pci_add and
second argument of drive_add, because that clashes with their first
arguments. Admittedly unelegant.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Make net_client_init() accept addr=, put the value into struct
NICinfo. Use it in pci_nic_init(), and remove arguments bus and
devfn.
Don't support addr= in third argument of monitor command pci_add,
because that clashes with its first argument. Admittedly unelegant.
Machines "malta" and "r2d" have a default NIC with a well-known PCI
address. Deal with that the same way as the NIC model: make
pci_nic_init() take an optional default to be used when the user
doesn't specify one.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Mark McLoughlin [Thu, 18 Jun 2009 10:31:07 +0000 (11:31 +0100)]
virtio-net: enable mergeable receive buffers
When virtio-net was merged in from qemu-kvm.git, the VNET_HDR related
features were dropped from the code.
However, VIRTIO_NET_F_MRG_RXBUF appears to have accidentally been
dropped too. Re-instate that now.
Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gleb Natapov [Wed, 17 Jun 2009 20:26:59 +0000 (23:26 +0300)]
Handle init/sipi in a main cpu exec loop. (v2)
This should fix compilation problem in case of CONFIG_USER_ONLY.
Currently INIT/SIPI is handled in the context of CPU that sends IPI.
This patch changes this to handle them like all other events in a main
cpu exec loop. When KVM will gain thread per vcpu capability it will
be much more clear to handle those event by cpu thread itself and not
modify one cpu's state from the context of the other.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Mark McLoughlin [Wed, 17 Jun 2009 10:38:28 +0000 (11:38 +0100)]
virtio: add support for indirect ring entries
Support a new feature flag for indirect ring entries. These are ring
entries which point to a table of buffer descriptors.
The idea here is to increase the ring capacity by allowing a larger
effective ring size whereby the ring size dictates the number of
requests that may be outstanding, rather than the size of those
requests.
This should be most effective in the case of block I/O where we can
potentially benefit by concurrently dispatching a large number of
large requests. Even in the simple case of single segment block
requests, this results in a threefold increase in ring capacity.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Glauber Costa [Wed, 17 Jun 2009 13:05:30 +0000 (09:05 -0400)]
Make nic option rom loading less painful.
The code how it is today, is totally painful to read and keep.
To begin with, the code is duplicated with the option rom loading
code that linux_boot and vga are already using.
This patch introduces a "bootable" state in NICInfo structure,
that we can use to keep track of whether or not a given nic should
be bootable, avoiding the introduction of yet another global state.
With that in hands, we move the code in vl.c to hw/pc.c, and use
the already existing infra structure to load those option roms.
Error checking code suggested by Mark McLoughlin
Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Riku Voipio [Thu, 18 Jun 2009 19:51:31 +0000 (22:51 +0300)]
linux-user: strace now handles guest stringscorrectly [v2]
On Tue, Jun 16, 2009 at 08:19:23PM -0500, Anthony Liguori wrote:
> malc wrote:
>>
>> On my system the above line causes gcc to emit:
>>
>> In file included from /home/malc/x/rcs/git/qemu/linux-user/strace.c:12:
>> /usr/include/linux/futex.h:48: error: field `__user' has incomplete type
>> /usr/include/linux/futex.h:48: error: syntax error before '*' token
>> /usr/include/linux/futex.h:63: error: field `list' has incomplete type
>> /usr/include/linux/futex.h:83: error: field `__user' has incomplete type
>> /usr/include/linux/futex.h:83: error: syntax error before '*' token
>> make[1]: *** [strace.o] Error 1
> We had the same problem with usb-linux.c. It's broken system headers,
> the __user stuff is supposed to get removed as part of the headers
> installation.
> It builds fine on my system (Fedora 10).
Howabout something like this:
commit eb8387cb0eda32a18880664eb5f0ca5c8bf05b45
Author: Riku Voipio <riku.voipio@iki.fi>
Date: Thu Jun 18 22:44:31 2009 +0300
Subject: linux-user: include futex defines directly
Since some common distributions have broken linux/futex.h, stop
including it. Instead add the defines directly.
Isaku Yamahata [Wed, 20 May 2009 02:31:43 +0000 (11:31 +0900)]
exec.c: remove unnecessary #if NB_MMU_MODES
remove unnecessary #if NB_MMU_MODES by using loop.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Eduardo Habkost [Mon, 25 May 2009 21:20:05 +0000 (18:20 -0300)]
Fix vga_screen_dump_blank() PPM generation
vga_screen_dump_blank() was not generating a valid PPM file: the width of the
image made no sense (why it was multiplied by sizeof(uint32_t)?), and there was
only one sample per pixel, instead of three.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Mark McLoughlin [Wed, 27 May 2009 09:06:11 +0000 (10:06 +0100)]
Prevent CD-ROM media eject while device is locked
Section 10.8.25 ("START/STOP UNIT Command") of SFF-8020i states that
if the device is locked we should refuse to eject if the device is
locked.
ASC_MEDIA_REMOVAL_PREVENTED is the appropriate return in this case.
In order to stop itself from ejecting the media it is running from,
Fedora's installer (anaconda) requires the CDROMEJECT ioctl() to fail
if the drive has been previously locked.
See also https://bugzilla.redhat.com/501412
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Glauber Costa [Thu, 28 May 2009 19:22:58 +0000 (15:22 -0400)]
set migration max downtime
provide a monitor command to allow one to set the maximum
downtime he is willing to suffer during migration, in seconds.
"ms", "us", "ns" and "s" are accepted as modifiers.
This parameter will be used by ram_save_live() code to determine
a safe moment to enter stage 3
Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Glauber Costa [Thu, 28 May 2009 19:22:57 +0000 (15:22 -0400)]
add non-arbitrary migration stop condition
Currently, we're entering migration's stage 3 when
a treshold of 10 pages remain to be transferred in the system.
This has hurt some users. However, any proposed threshold is
arbitrary by nature, and would only shift the annoyance.
The proposal of this patch is to define a max_downtime variable,
which represents the maximum downtime a migration user is willing
to suffer. Then, based on the bandwidth of last iteration, we
calculate how much data we can transfer in such a window of time.
Whenever we reach that value (or lower), we know is safe to enter
stage3.
This has largely improved the situation for me.
On localhost migrations, where one would expect things to go as
quickly as me running away from the duty of writting software for
windows, a kernel compile was enough to get the migration stuck.
It takes 20 ~ 30 iterations now.
Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Jan Kiszka [Sat, 30 May 2009 08:01:45 +0000 (10:01 +0200)]
kvm: Fix IRQ injection into full queue
User space may only inject interrupts during kvm_arch_pre_run if
ready_for_interrupt_injection is set in kvm_run. But that field is
updated on exit from KVM_RUN, so we must ensure that we enter the
kernel after potentially queuing an interrupt, otherwise we risk to
loose one - like it happens with the current code against latest
kernel modules (since kvm-86) that started to queue only a single
interrupt.
Fix the problem by reordering kvm_cpu_exec.
Credits go to Gleb Natapov for analyzing the issue in details.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Nitin A Kamble [Thu, 4 Jun 2009 21:29:50 +0000 (14:29 -0700)]
QEMU KVM: i386: Fix the cpu reset state
As per the IA32 processor manual, the accessed bit is set to 1 in the
processor state after reset. qemu pc cpu_reset code was missing this
accessed bit setting.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Andre Przywara [Fri, 5 Jun 2009 23:03:29 +0000 (01:03 +0200)]
allow CPUID vendor override
KVM-enabled QEMU will always report the vendor ID of the physical CPU it is
running on. Allow to override this if explicitly requested on the
command line. It will not suffice to name a CPU type (like -cpu phenom),
but you have to explicitly set the vendor: -cpu phenom,vendor=AuthenticAMD
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Glauber Costa [Tue, 9 Jun 2009 16:15:18 +0000 (12:15 -0400)]
provide cpu_index to env mapping
There are some people interested in, given a cpu number,
pick its CPUState. KVM is an example, although not yet in tree.
This patch provides a way of doing that.
Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gerd Hoffmann [Thu, 11 Jun 2009 09:32:14 +0000 (11:32 +0200)]
vnc: improve numpad support for qemu console.
Reorganize qemu console emulation code. Make it look at the numlock
state and interpret numpad keys as arrow+friends (numlock off) or
digits (numlock on). While being at it also wind up the other numpad
keys.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avi Kivity [Sun, 14 Jun 2009 08:38:51 +0000 (11:38 +0300)]
Remove io_index argument from cpu_register_io_memory()
The parameter is always zero except when registering the three internal
io regions (ROM, unassigned, notdirty). Remove the parameter to reduce
the API's power, thus facilitating future change.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Kevin Wolf [Tue, 16 Jun 2009 09:31:28 +0000 (11:31 +0200)]
l2_allocate: Write complete sectors
When modifying the L1 table, l2_allocate() needs to write complete sectors
instead of single entries. The L1 table is already in memory, reading it from
disk in the block layer to align the request is wasted performance.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Kevin Wolf [Thu, 28 May 2009 14:07:07 +0000 (16:07 +0200)]
qcow2: Rename global functions
The qcow2 source is now split into several more manageable files. During the
conversion quite some functions that were static before needed to be changed to
be global to make the source compile again.
We were lucky enough not to get name conflicts with these additional global
names, but they are not nice. This patch adds a qcow2_ prefix to all of the
global functions in qcow2.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>