Stefan Hajnoczi [Mon, 6 Dec 2010 16:08:03 +0000 (16:08 +0000)]
qed: Consistency check support
This patch adds support for the qemu-img check command. It also
introduces a dirty bit in the qed header to mark modified images as
needing a check. This bit is cleared when the image file is closed
cleanly.
If an image file is opened and it has the dirty bit set, a consistency
check will run and try to fix corrupted table offsets. These
corruptions may occur if there is power loss while an allocating write
is performed. Once the image is fixed it opens as normal again.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Mon, 6 Dec 2010 16:08:02 +0000 (16:08 +0000)]
qed: Read/write support
This patch implements the read/write state machine. Operations are
fully asynchronous and multiple operations may be active at any time.
Allocating writes lock tables to ensure metadata updates do not
interfere with each other. If two allocating writes need to update the
same L2 table they will run sequentially. If two allocating writes need
to update different L2 tables they will run in parallel.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Mon, 6 Dec 2010 16:08:01 +0000 (16:08 +0000)]
qed: Table, L2 cache, and cluster functions
This patch adds code to look up data cluster offsets in the image via
the L1/L2 tables. The L2 tables are writethrough cached in memory for
performance (each read/write requires a lookup so it is essential to
cache the tables).
With cluster lookup code in place it is possible to implement
bdrv_is_allocated() to query the number of contiguous
allocated/unallocated clusters.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes. Support for other
hole punching mechanisms can be added when they become available.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Support discards via the WRITE SAME command with the unmap bit set, and
tell the initiator about the support for it via the block limit and the
new thin provisioning EVPD pages. Also fix the comment which incorrectly
describedthe block limits EVPD page.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard. If no discard
granularity support is set discard support is disabled.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Thu, 16 Dec 2010 15:54:06 +0000 (15:54 +0000)]
ide: Register vm change state handler once only
We register the vm change state handler in a PCI BAR map() function.
This function can be called multiple times throughout the lifetime of a
PCI IDE device. This results in duplicate vm change state handlers
being register, none of which are ever unregistered.
Instead, register the vm change state handler in the device's init
function once and for all.
piix tested, cmd646 and via not tested.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Jes Sorensen [Thu, 16 Dec 2010 12:52:15 +0000 (13:52 +0100)]
qemu-img.c: Re-factor img_create()
This patch re-factors img_create() moving the code doing the actual
work into block.c where it can be shared with QEMU. This is needed to
be able to create images from QEMU to be used for live snapshots.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alexander Graf [Tue, 14 Dec 2010 00:34:40 +0000 (01:34 +0100)]
ahci: add ahci emulation
This patch adds an emulation layer for an ICH-9 AHCI controller. For now
this controller does not do IDE legacy emulation. It is a pure AHCI controller.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Roland Elek [Tue, 14 Dec 2010 00:34:37 +0000 (01:34 +0100)]
ide: add ncq identify data for ahci sata drives
I modified ide_identify() to include the zero-based queue length
value in word 75, and set bit 8 in word 76 to signal NCQ support
in the identify data for AHCI SATA drives.
Signed-off-by: Roland Elek <elek.roland@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alexander Graf [Tue, 14 Dec 2010 23:23:01 +0000 (00:23 +0100)]
ide: move transfer_start after variable modification
We hook into transfer_start and immediately call the end function
for ahci. This means that everything needs to be in place for the
end function when we start the transfer, so let's move the function
down to where all state is in place.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alexander Graf [Tue, 14 Dec 2010 23:23:00 +0000 (00:23 +0100)]
ide: Split out BMDMA code from ATA core
The ATA core is currently heavily intertwined with BMDMA code. Let's loosen
that a bit, so we can happily replace the DMA backend with different
implementations.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alexander Graf [Tue, 14 Dec 2010 00:34:34 +0000 (01:34 +0100)]
ide: fix whitespace gap in ide_exec_cmd
Now that we have the function split out, we have to reindent it.
In order to increase the readability of the actual functional change,
this is split out.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Jes Sorensen [Thu, 9 Dec 2010 13:17:25 +0000 (14:17 +0100)]
qemu-img.c: Clean up handling of image size in img_create()
This cleans up the handling of image size in img_create() by parsing
the value early, and then only setting it once if a value has been
added as the last argument to the command line.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Jes Sorensen [Thu, 9 Dec 2010 13:17:24 +0000 (14:17 +0100)]
Introduce strtosz_suffix()
This introduces strtosz_suffix() which allows the caller to specify a
default suffix in case the non default of MB is wanted.
strtosz() is kept as a wrapper for strtosz_suffix() which keeps it's
current default of MB.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Thu, 2 Dec 2010 16:54:13 +0000 (16:54 +0000)]
block: Fix the use of protocols in backing files
Backing filenames may contain a protocol. The code currently doesn't
consider this case and produces filenames that embed "<protocol>:".
Don't combine filenames if the backing filename contains a protocol.
Based on an earlier patch by Anthony Liguori <aliguori@us.ibm.com>.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Thu, 9 Dec 2010 11:53:00 +0000 (11:53 +0000)]
block: Introduce path_has_protocol() function
The bdrv_find_protocol() function returns NULL if an unknown protocol
name is given. It returns the "file" protocol when the filename
contains no protocol at all. This makes it difficult to distinguish
between paths which contain a protocol and those which do not.
Factor out a helper function that tests whether or not a filename has a
protocol. The next patch makes use of this function.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Ryan Harper [Wed, 8 Dec 2010 16:05:00 +0000 (10:05 -0600)]
blockdev: check dinfo ptr before using
If a user decides to punish a guest by revoking its block device via
drive_del, and subsequently also attempts to remove the pci device
backing it, and the device is using blockdev_auto_del() then we get a
segfault when we attempt to access dinfo->auto_del.[1]
The fix is to check if drive_get_by_blockdev() actually returns a valid
dinfo pointer or not.
Stefan Hajnoczi [Tue, 7 Dec 2010 09:35:56 +0000 (09:35 +0000)]
qemu-img: Fail creation if backing format is invalid
The qemu-img create command should check the backing format to ensure
only image files with valid backing formats are created. By checking in
qemu-img.c we can print a useful error message.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
parse_option_parameters() may need to create a new option parameter list
from a template list. Use append_option_parameters() instead of
duplicating the code.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
RBD is an block driver for the distributed file system Ceph
(http://ceph.newdream.net/). This driver uses librados (which is part
of the Ceph server) for direct access to the Ceph object store and is
running entirely in userspace (Yehuda also wrote a driver for the
linux kernel, that can be used to access rbd volumes as a block
device).
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Christian Brunner <chb@muc.de> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Jes Sorensen [Mon, 6 Dec 2010 14:25:39 +0000 (15:25 +0100)]
Fix formatting and missing braces in qemu-img.c
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Jes Sorensen [Mon, 6 Dec 2010 14:25:38 +0000 (15:25 +0100)]
Consolidate printing of block driver options
This consolidates the printing of block driver options in
print_block_option_help() which is called from both img_create() and
img_convert().
This allows for the "?" detection to be done just after the parsing of
options and the filename, instead of half way down the codepath of
these functions.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Tue, 30 Nov 2010 15:14:14 +0000 (15:14 +0000)]
block: Make bdrv_create_file() ':' handling consistent
Filenames may start with "<protocol>:" to explicitly use a protocol like
nbd. Filenames with unknown protocols are rejected in most of QEMU
except for bdrv_create_file(). Even if a file with an invalid filename
can be created, QEMU cannot use it since all the other relevant
functions reject such paths. Make bdrv_create_file() consistent.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Watchdog: disable watchdog timer when hard-rebooting a guest.
This commit causes the watchdog timer to be reset when a guest is
hard-rebooted.
The failure case previously was as follows:
(a) guest boots, watchdog is enabled
(b) guest does a reset eg:
echo 'b' > /proc/sysrq-trigger
(note that an ordinary /sbin/reboot wouldn't hit this case
since as the watchdog daemon is shut down, the daemon would
properly disable the watchdog device)
(c) the reboot takes longer than the remaining time on the
watchdog
(d) the watchdog therefore fires during the reboot
(e) probably the VM would just reboot again at this point which
is pretty benign, but it could depend on the action that the
user had selected for the watchdog
Now we use the qdev reset function to register a reset handler
which disables the timer. Note the handler is called _either_
just after init _or_ when the guest reboots.
In the i6300esb case there is a small refactoring of the code so
that the device's internal state is now fully restored to defaults
on a reboot.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Gleb Natapov [Wed, 8 Dec 2010 11:35:06 +0000 (13:35 +0200)]
Change fw_cfg_add_file() to get full file path as a parameter.
Change fw_cfg_add_file() to get full file path as a parameter instead
of building one internally. Two reasons for that. First caller may need
to know how file is named. Second this moves policy of file naming out
from fw_cfg. Platform may want to use more then two levels of
directories for instance.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Gleb Natapov [Wed, 8 Dec 2010 11:35:05 +0000 (13:35 +0200)]
Add bootindex parameter to net/block/fd device
If bootindex is specified on command line a string that describes device
in firmware readable way is added into sorted list. Later this list will
be passed into firmware to control boot order.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Gleb Natapov [Wed, 8 Dec 2010 11:34:54 +0000 (13:34 +0200)]
Introduce fw_name field to DeviceInfo structure.
Add "fw_name" to DeviceInfo to use in device path building. In
contrast to "name" "fw_name" should refer to functionality device
provides instead of particular device model like "name" does.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Bernhard Kohl [Wed, 8 Dec 2010 14:59:55 +0000 (15:59 +0100)]
wdt_i6300esb: register a reset function
The device shall set its default hardware state after each reset.
This includes that the timer is stopped which is especially important
if the guest does a reboot independantly of a watchdog bite. I moved
the initialization of the state variables completely from the init
to the reset function which is called right after init during the
first boot and afterwards during each reboot.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Alexander Graf [Wed, 8 Dec 2010 11:05:49 +0000 (12:05 +0100)]
isa_mmio: Always use little endian
This patch converts the ISA MMIO bridge code to always use little endian mmio.
All bswap code that existed was only there to convert from native cpu
endianness to little endian ISA devices.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Alexander Graf [Wed, 8 Dec 2010 11:05:40 +0000 (12:05 +0100)]
pci-host: Delegate bswap to mmio layer
The only reason we have bswap versions of the pci host code is that
most pci host devices are little endian. The ppc e500 is the only
odd one here, being big endian.
So let's directly pass the endianness down to the mmio layer and not
worry about it on the pci host layer.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Alexander Graf [Wed, 8 Dec 2010 11:05:38 +0000 (12:05 +0100)]
Make simple io mem handler endian aware
As an alternative to the 3 individual handlers, there is also a simplified
io mem hook function. To be consistent, let's add an endianness parameter
there too.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Alexander Graf [Wed, 8 Dec 2010 11:05:37 +0000 (12:05 +0100)]
Add endianness as io mem parameter
As stated before, devices can be little, big or native endian. The
target endianness is not of their concern, so we need to push things
down a level.
This patch adds a parameter to cpu_register_io_memory that allows a
device to choose its endianness. For now, all devices simply choose
native endian, because that's the same behavior as before.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Alexander Graf [Wed, 8 Dec 2010 11:05:36 +0000 (12:05 +0100)]
exec: introduce endianness swapped mmio
The way we're currently modeling mmio is too simplified. We assume that
every device has the same endianness as the target CPU. In reality,
most devices are little endian (all PCI and ISA ones I'm aware of). Some
are big endian (special system devices) and a very little fraction is
target native endian (fw_cfg).
So instead of assuming every device to be native endianness, let's move
to a model where the device tells us which endianness it's in.
That way we can compile the devices only once and get rid of all the ugly
swap will be done by the underlying layer.
For the same of readability, this patch only introduces the helper framework
but doesn't allow the registering code to set its endianness yet.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Fix the injection logic upon aer message to follow 6.2.4.1.2 more
closely: specifically only send an msi interrupt when the logical or of
the enabled bits changed, not when a bit which was previously clear
becomes set.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
msi depends on pci but pci should not depend on msi.
The only dependency we have is a recent addition
of pci_msi_ functions, IMO they add little enough to
open-code in the small number of users.
Follow-up patches add more cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Isaku Yamahata [Fri, 26 Nov 2010 12:01:41 +0000 (21:01 +0900)]
pci: make command SERR bit writable
pcie aer needs SERR bit to be writable, and the PCI spec requires
this as well. For compatibility, introduce compat global property
command_serr_enable and make this bit readonly for a pre 0.14 pc
machine.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
DMA into memory while VM is stopped makes it
hard to debug migration (consequitive saves
result in different files).
Fixing this completely is a large effort,
this patch does this for virtio-net.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com>