Ed Swierk [Fri, 21 Aug 2009 02:00:31 +0000 (19:00 -0700)]
slirp: Read host DNS config on demand
Currently the qemu user-mode networking stack reads the host DNS
configuration (/etc/resolv.conf or the Windows equivalent) only once
when qemu starts. This causes name lookups in the guest to fail if the
host is moved to a different network from which the original DNS servers
are unreachable, a common occurrence when the host is a laptop.
This patch changes the slirp code to read the host DNS configuration on
demand, caching the results for at most 1 second to avoid unnecessary
overhead if name lookups occur in rapid succession. On non-Windows
hosts, /etc/resolv.conf is re-read only if the file has been replaced or
if its size or mtime has changed.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Ed Swierk [Fri, 21 Aug 2009 02:00:25 +0000 (19:00 -0700)]
slirp: Remove our_addr code
Three problems with our_addr:
- It's determined only once when qemu starts, but the address can change
(just like the DNS configuration can).
- It's supposed to be the IP address of a host network interface, but
there's no guarantee that gethostbyname(gethostname()) actually does
that: the host might be a laptop that has only a loopback interface up,
or the hostname might be localhost.localdomain, etc.
- It's useless at best: get_dns_addr() calls it, there's no reason to
send DNS requests to a different IP address if you're running a DNS
server on the host and resolv.conf points to 127.0.0.1.
These problems are easily solved by removing the code.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gerd Hoffmann [Fri, 14 Aug 2009 08:34:22 +0000 (10:34 +0200)]
switch balloon initialization to -device.
With that patch applied "-balloon virtio,args" becomes a shortcut for
"-device virtio-balloon-pci,args".
Side effects:
- ballon device gains support for id=<tag>.
- ballon device is off by default now.
- initialization order changes, which may in different pci slot
assignment depending on the VM configuration.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gerd Hoffmann [Thu, 20 Aug 2009 13:22:18 +0000 (15:22 +0200)]
ide: split away ide-internal.h
move lots of IDE defines to the new file.
also make a bunch of functions non-static
and add declaration for them. Needed by
the following patches of this series.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gerd Hoffmann [Thu, 20 Aug 2009 13:22:17 +0000 (15:22 +0200)]
ide: add IDEBus struct, cleanups
The current IDE code uses an array of two IDEState structs to maintain
the IDE bus. This patch adds a IDEBus to be used instead and does a
bunch of cleanups:
* move ide bus state from IDEState to IDEBus.
* drop a bunch of ugly pointer arithmetics to figure the active
interface, explicitly save the interface number instead.
* add helper functions to save/restore idebus state.
It also fixes a save/restore bug: loadvm allways stores the command in
the master's IDEState, even when it was saved from the slave.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gerd Hoffmann [Fri, 14 Aug 2009 08:36:06 +0000 (10:36 +0200)]
add qemu_error() + friends
This patch adds some functions for error reporting to address the
problem that error messages should be routed to different destinations
depending on the context of the caller, i.e. monitor command errors
should go to the monitor, command line errors to stderr.
qemu_error() is a printf-like function to report errors.
qemu_errors_to_file() and qemu_errors_to_mon() switch the destination
for the error message to the specified file or monitor. When setting a
new destination the old one will be kept. One can switch back using
qemu_errors_to_previous(). i.e. it works like a stack.
main() calls qemu_errors_to_file(stderr), so errors go to stderr by
default. monitor callbacks are wrapped into qemu_errors_to_mon() +
qemu_errors_to_previous(), so any errors triggered by monitor commands
will go to the monitor.
Each thread has its own error message destination. qemu-kvm probably
should add a qemu_errors_to_file(stderr) call to the i/o-thread
initialization code.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Gerd Hoffmann [Fri, 14 Aug 2009 08:36:05 +0000 (10:36 +0200)]
qdev: add return value to init() callbacks.
Sorry folks, but it has to be. One more of these invasive qdev patches.
We have a serious design bug in the qdev interface: device init
callbacks can't signal failure because the init() callback has no
return value. This patch fixes it.
We have already one case in-tree where this is needed:
Try -device virtio-blk-pci (without drive= specified) and watch qemu
segfault. This patch fixes it.
With usb+scsi being converted to qdev we'll get more devices where the
init callback can fail for various reasons.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Reimar Döffinger [Thu, 20 Aug 2009 10:34:22 +0000 (12:34 +0200)]
fix stack buffer overflows in eepro100.c tx
Hello,
the real world issue is that the hardware allows sends up to 2600 bytes,
and for some reason FreeBSD sometimes sends frames larger than the
ethernet frame size (102+1460 is the maximum I have seen so far),
overflowing the on-stack tx buffer of the driver.
Independent of that, the code should avoid allowing the guest to
overwrite the stack.
This is a minimal patch to fix the issue (you could leave out the size
change of the buf array as well, networking still seems to work either
way). Obviously there are better ways to handle it, but a proper fix IMO
would involve first getting rid of the code duplication and given the
number of patches pending for that code I see no point in working on that now.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Vijay Kumar [Fri, 21 Aug 2009 04:57:38 +0000 (10:27 +0530)]
Check block driver read error in pflash_cfi0x
If a flash file of size smaller than the flash size is specified in
the -pflash option, the block driver returns error. But the
pflash_cfi0x ignores the error. This results in a flash content of all
zeroes. And the simulation aborts while executing code.
This patch adds the checks for errors from bdrv_read and escalates it
to the calling code.
Signed-off-by: Vijay Kumar B. <vijaykumar@bravegnu.org> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avi Kivity [Mon, 17 Aug 2009 20:19:53 +0000 (23:19 +0300)]
kvm: Simplify cpu_synchronize_state()
cpu_synchronize_state() is a little unreadable since the 'modified'
argument isn't self-explanatory. Simplify it by making it always
synchronize the kernel state into qemu, and automatically flush the
registers back to the kernel if they've been synchronized on this
exit.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
-watchdog NAME is now equivalent to -device NAME, except it treats
option argument '?' specially, and supports only one watchdog.
A side effect is that a device created with -watchdog may now receive
a different PCI address.
i6300esb is now available on any machine with a PCI bus, not just PCs.
ib700 is still PC only, but that could be changed easily.
The only remaining use of struct WatchdogTimerModel and
watchdog_add_model() is supporting '-watchdog ?'. Should be replaced
by searching device_info_list for watchdog devices when we can
identify them there.
Also fixes ib700 not to use vm_clock before it is initialized: in
wdt_ib700_init(), called from register_watchdogs(), which runs before
init_timers(). The bug made ib700_write_enable_reg() crash in
qemu_del_timer().
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
virtio-blk: handle NULL returns from bdrv_aio_{read, write}
The bdrv_aio_{read,write} routines can return a NULL pointer when the
I/O submission fails. Currently we ignore this and will wait forever
for an I/O completion and leading to a hang of the guest.
I can easily reproduce this using the native Linux AIO patch, but it's
also possible using normal pthreads-based AIO.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now that do have a nicer interface to work against we can add Linux native
AIO support. It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.
This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.
To enable native kernel aio use the aio=native sub-command on the
drive command line. I have also added an option to qemu-io to
test the aio support without needing a guest.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently the raw-posix.c code contains a lot of knowledge about the
asynchronous I/O scheme that is mostly implemented in posix-aio-compat.c.
All this code does not really belong here and is getting a bit in the
way of implementing native AIO on Linux.
So instead move all the guts of the AIO implementation into
posix-aio-compat.c (which might need a better name, btw).
There's now a very small interface between the AIO providers and raw-posix.c:
- an init routine is called from raw_open_common to return an AIO context
for this drive. An AIO implementation may either re-use one context
for all drives, or use a different one for each as the Linux native
AIO support will do.
- an submit routine is called from the aio_reav/writev methods to submit
an AIO request
There are no indirect calls involved in this interface as we need to
decide which one to call manually. We will only call the Linux AIO native
init function if we were requested to by vl.c, and we will only call
the native submit function if we are asked to and the request is properly
aligned. That's also the reason why the alignment check actually does
the inverse move and now goes into raw-posix.c.
The old posix-aio-compat.h headers is removed now that most of it's
content is private to posix-aio-compat.c, and instead we add a new
block/raw-posix-aio.h headers is created containing only the tiny interface
between raw-posix.c and the AIO implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Amit Shah [Wed, 12 Aug 2009 19:20:28 +0000 (00:50 +0530)]
pci ids: remove redundant defines
Remove some redundant definitions for PCI classes:
PCI_CLASS_SERIAL_OTHER already exists as PCI_CLASS_COMMUNICATION_OTHER
and PCI_CLASS_PROCESSOR_CO is redefined.
PCI_CLASS_SERIAL_OTHER is not used anywhere.
Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Juan Quintela [Thu, 20 Aug 2009 17:42:25 +0000 (19:42 +0200)]
New VMstate save/load infrastructure
This patch introduces VMState infrastructure, to convert the save/load
functions of devices to a table approach. This new approach has the
following advantages:
- it is type-safe
- you can't have load/save functions out of sync
- will allows us to have new interesting commands, like dump <device>, that
shows all its internal state.
- Just now, the only added type is arrays, but we can add structures.
- Uses old load_state() function for loading old state.
Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Juan Quintela [Thu, 20 Aug 2009 17:42:20 +0000 (19:42 +0200)]
split do_loadvm() into do_loadvm() and load_vmstate()
do_loadvm() is now called from the monitor.
load_vmstate() is called by do_loadvm() and when -loadvm command line is used.
Command line don't have to play games with vmstop()/vmstart()
Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Kevin Wolf [Mon, 17 Aug 2009 13:50:10 +0000 (15:50 +0200)]
qcow2: Metadata preallocation
This introduces a qemu-img create option for qcow2 which allows the metadata to
be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
tables, but no data is written to them. Metadata is quite small, so this
happens in almost no time.
Especially with qcow2 on virtio this helps to gain a bit of performance during
the initial writes. However, as soon as create a snapshot, we're back to the
normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
while we're still having trouble with qcow2 on virtio.
Note that the option is disabled by default and needs to be specified
explicitly using qemu-img create -f qcow2 -o preallocation=metadata.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Jes Sorensen [Fri, 14 Aug 2009 09:36:15 +0000 (11:36 +0200)]
Add isa_reserve_irq().
Introduce isa_reserve_irq() which marks an irq reserved and returns
the appropriate qemu_irq entry from the i8259 table.
isa_reserve_irq() is a temporary interface to be used to allocate ISA
IRQs for devices which have not yet been converted to qdev, and for
special cases which are not suited for qdev conversions, such as the
'ferr'.
This patch goes on top of Gerd Hoffmann's which makes isa-bus.c own
the ISA irq table.
[ added isa-bus.o to some targets to fix build failures -- kraxel ]
Gerd Hoffmann [Fri, 14 Aug 2009 09:36:14 +0000 (11:36 +0200)]
isa bus irq changes and fixes.
Changes:
(1) make isa-bus maintain isa irqs, complain when allocating
already taken irqs.
(2) note that (1) works only for isa devices converted to qdev
already (floppy and ps2/kbd/mouse right now), so more work
is needed to make this really useful.
(3) split floppy init into isa and sysbus versions.
(4) add sysbus->isa bridge & fix -M isapc breakage.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Naphtali Sprei [Thu, 13 Aug 2009 12:01:20 +0000 (15:01 +0300)]
hw/eepro100.c: Use extended TBD only where applicable
Bug fix for segfault when run as i82551 HW:
Use Extended TBD only when HW supports it (i82558 and up).
Added assertions to guard from such buffer overflow
Introduce the MAX_TCB_BYTE_COUNT macro
Allocate buf big enough as HW needs (MAX_ETH_FRAME_SIZE -> MAX_TCB_BYTE_COUNT)
I don't feel 100% OK with the "s->device >= i82558B" condition
since it relies on the numeric (hex) value of those defines, which currently
is correct, but changes (which I don't forsee now) might break it.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
pci-hotplug: initialize dinfo to NULL in pci_device_hot_add
Suppress the following compiler warning emitted by at least gcc version 4.2.1 (SUSE Linux)
and gcc version 3.4.5 (mingw32 special):
hw/pci-hotplug.c: In function 'pci_device_hot_add':
hw/pci-hotplug.c:102: warning: 'dinfo' may be used uninitialized in this function
hw/pci-hotplug.c:102: note: 'dinfo' was declared here
Signed-off-by: Sebastian Herbszt <herbszt@gmx.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Stefan Weil [Fri, 14 Aug 2009 19:50:02 +0000 (21:50 +0200)]
block/vdi.c: Fix several bugs
* The code for option '-static' was wrong, so image creation
always created static images.
* Static images created with qemu-img did not set header entry
blocks_allocated.
* The size of the block map must be rounded to the next multiple
of SECTOR_SIZE, otherwise the block map is only read partially
for block map sizes which are not a multiple of SECTOR_SIZE.
Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Andre Przywara [Thu, 20 Aug 2009 21:34:17 +0000 (23:34 +0200)]
introduce kvm64 CPU
In addition to the TCG based qemu64 type let's introduce a kvm64 CPU type,
which is the least common denominator of all KVM-capable x86-CPUs
(based on Intel Pentium 4 Prescott). It can be used as a base type
for migration.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Andre Przywara [Thu, 20 Aug 2009 19:03:48 +0000 (21:03 +0200)]
allow overriding of CPUID level on command line
The CPUID level determines how many CPUID leafs are exposed to the guest.
Some features (like multi-core) cannot be propagated without the proper
level, but guests maybe confused by bogus entries in some leafs.
So add level= and xlevel= to the list of -cpu options to allow the user to
override the default settings. While at it, merge unnecessary local
variables into one and allow hexadecimal arguments.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Andre Przywara [Wed, 19 Aug 2009 13:42:42 +0000 (15:42 +0200)]
set CPUID bits to present cores and threads topology
Controlled by the enhanced -smp option set the CPUID bits to present the
guest the desired topology. This is vendor specific, but (with the exception
of the CMP_LEGACY bit) not conflicting, so we set all bits everytime.
There is no real multithreading support for AMD CPUs, so report cores
instead.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Andre Przywara [Wed, 19 Aug 2009 13:42:40 +0000 (15:42 +0200)]
extend -smp parsing to include cores= and threads= options
For injecting multi-core and multi-threading CPU topology into guests
extend the -smp syntax to accommodate cores and threads specification.
Syntax: -smp smp_value[,cores=nr_cores][,threads=nr_threads]\
[,socket=nr_sockets][,maxcpus=max_cpus]
smp_value is the legacy value specifying the total number of vCPUs for
the guest. If you specify one of cores, threads or sockets this value
can be omitted. Missing values will be computed to fulfill:
smp_value = nr_cores * nr_threads * nr_sockets
where it will favour sockets over cores over threads (to mimic the
current behavior, which will only inject multiple sockets.)
So -smp 4,threads=2 will inject two sockets with 2 threads each,
-smp cores=4 is an abbreviation for -smp 4,cores=4,threads=1,sockets=1.
If max_cpus (the number of hotpluggable CPUs) is omitted, it will
be set to smp_value.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Blue Swirl [Tue, 25 Aug 2009 18:29:36 +0000 (18:29 +0000)]
Sparc32: improve interrupt handling
Level 15 interrupts are broadcast to all CPUs, each CPU can clear the
interrupt using the local Clear Pending register.
Update intbit_to_level table.
Don't try to raise level 0 interrupts.
Calculate pending interrupts based on the separate inputs from master
register. Setting or resetting the pending level isn't correct because of
overlap of levels.
Level 14 is always used for CPU timer interrupts, remove the property.
Nathan Froyd [Tue, 25 Aug 2009 15:20:00 +0000 (08:20 -0700)]
target-mips: fix conditional moves off fp condition codes
Conditional moves off fp condition codes were using the result of
get_fp_bit to isolate and test the relevant condition code. However,
get_fp_bit returns the bit number of the condition code, not a
bitmask. (Compare the use of get_fp_bit in gen_compute_branch1, for
instance.)
Fixed by shifting a bitmask into place using the result of get_fp_bit in
the relevant functions (gen_mov{ci,cf_s,cf_d,cf_ps}).
Anthony Liguori [Fri, 14 Aug 2009 16:20:47 +0000 (11:20 -0500)]
Make the e1000 the default network adapter for the pc target.
The ne2k is an ancient card that performs pretty terribly under QEMU. In many
modern OSes, there is no longer drivers available for the ne2k.
Switch the default network adapter to e1000. This card is more widely
suppported and performs rather well under QEMU. There may be very old OSes
that had a ne2k driver but not an e1000 driver but I think this is likely the
exception.
I think the average user is better served with an e1000 vs ne2k.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Nathan Froyd [Tue, 11 Aug 2009 19:47:59 +0000 (12:47 -0700)]
eliminate errors about unused results in block/vpc.c
These errors come up when compiling with gcc-4.3.3 and some older headers:
/scratch/froydnj/qemu.git/block/vpc.c: In function 'vpc_create':
/scratch/froydnj/qemu.git/block/vpc.c:514: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:516: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:517: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:566: error: value computed is not used
Use memcpy to copy the strings instead of strncpy.
Signed-off-by: Nathan Froyd <froydnj@codesourcery.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>