Hao Liu [Thu, 11 Dec 2014 02:46:15 +0000 (10:46 +0800)]
virsh: Emit error for VSH_OT_DATA without VSH_OFLAG_REQ
Commit 6b9964 enforces checking invalid use of VSH_OT_STRING with
VSH_OFLAG_REQ. This commit tries to do the same thing to stop using
VSH_OT_DATA without VSH_OFLAG_REQ and also fix existing misuse.
Ján Tomko [Wed, 7 Jan 2015 15:49:00 +0000 (16:49 +0100)]
safezero: fall back to writing zeroes even when resizing
Remove the resize flag and use the same code path for all callers.
This flag was added by commit 18f0316 to allow virStorageFileResize
use 'safezero' while preserving the behavior.
Explicitly return -2 when a fallback to a different method should
be done, to make the code path more obvious.
Fail immediately when ftruncate fails in the mmap method,
as we did before commit 18f0316.
John Ferlan [Fri, 9 Jan 2015 12:36:01 +0000 (07:36 -0500)]
virsh.pod: Update description
The 'pool-build' command description for --overwrite and --no-overwrite
indicated usage for only 'filesystem' pools; however, the 'disk' pool
also supports the flags as of commit id 'afa1029a'. So add a description
for that usage.
Eric Blake [Thu, 8 Jan 2015 00:05:36 +0000 (17:05 -0700)]
maint: in src/Makefile.am, $(top_srcdir)/src is verbose
I noticed this while working on a previous commit. Why should
we be calling out '../src/' when it is sufficient to refer to just
'./'? Blind copy-and-paste runs rampant in this file :)
* src/Makefile.am (INCLUDES, *_CFLAGS): Shorten to $(srcdir).
Pavel Hrdina [Thu, 8 Jan 2015 06:54:22 +0000 (07:54 +0100)]
src/Makefile: Fix parallel build after xen_xl_disk parser introduction
Well, the parallel build doesn't work as there are not dependencies
set correctly. When running 'make -j' I see this error:
make[2]: Entering directory '/home/zippy/work/libvirt/libvirt.git/src'
GEN util/virkeymaps.h
GEN locking/lock_protocol.h
make[2]: *** No rule to make target 'xenconfig/xen_xl_disk.h', needed by 'all'. Stop.
make[2]: *** Waiting for unfinished jobs....
GEN lxc/lxc_controller_dispatch.h
The fix is to correctly set dependencies by letting make know that .c
and .h are to be generated from .l. Moreover, the section is moved
closer to the other section which uses it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Geoff Hickey [Wed, 7 Jan 2015 23:45:42 +0000 (18:45 -0500)]
vmx: Fix a VMX parsing problem
VMware ESX does not always set the "serialX.fileType" tag in VMX files. The
default value for this tag is "device", and when adding a new serial port
of this type VMware will omit the fileType tag. This caused libvirt to
fail to parse the VMX file. Fixed by making this tag optional and using
"device" as a default value. Also updated vmx2xmltest to test for this
case.
Eric Blake [Wed, 7 Jan 2015 23:32:32 +0000 (16:32 -0700)]
build: fix xenconfig VPATH builds
Ever since commit 2c78051 split out a helper library for the sake of
changing CFLAGS, a VPATH build with xenconfig enabled has failed:
CC xenconfig/libvirt_xenxldiskparser_la-xen_xl_disk.lo
../../src/xenconfig/xen_xl_disk.l:37:21: fatal error: xen_xl.h: No such file or directory
# include "xen_xl.h"
^
compilation terminated.
Makefile:9462: recipe for target 'xenconfig/libvirt_xenxldiskparser_la-xen_xl_disk.lo' failed
The solution is to tell the build to look for xen_xl.h relative
to $(srcdir), since we keep that file under version control.
[Not fixed here - the raw use of -Wno-unused-parameter in CFLAGS
is NOT portable; ideally, we should be doing a configure test
and only supplying that argument when we know the compiler supports
-Wunused-parameter; but that's a patch for another day]
[Not fixed here - there are still issues with parallel builds hitting
a race between generating the files and trying to compile/distribute
them]
* src/Makefile.am (libvirt_xenxldiskparser_la_CFLAGS): Add another
include directory.
qemu: Fix system pages handling in <memoryBacking/>
In one of my previous commits (311b4a67) I've tried to allow to
pass regular system pages to <hugepages>. However, there was a
little bug that wasn't caught. If domain has guest NUMA topology
defined, qemuBuildNumaArgStr() function takes care of generating
corresponding command line. The hugepages backing for guest NUMA
nodes is handled there too. And here comes the bug: the hugepages
setting from XML is stored in KiB internally, however, the system
pages size was queried and stored in Bytes. So the check whether
these two are equal was failing even if it shouldn't.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Stefan Berger [Wed, 7 Jan 2015 16:41:49 +0000 (11:41 -0500)]
nwfilter: Add support for icmpv6 filtering
Make use of the ebtables functionality to be able to filter certain
parameters of icmpv6 packets. Extend the XML parser for icmpv6 types,
type ranges, codes, and code ranges. Extend the nwfilter documentation,
schema, and test cases.
Being able to filter icmpv6 types and codes helps extending the DHCP
snooper for IPv6 and filtering at least some parameters of IPv6's NDP
(Neighbor Discovery Protocol) packets. However, the filtering will not
be as good as the filtering of ARP packets since we cannot
check on IP addresses in the payload of the NDP packets.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Peter Krempa [Wed, 7 Jan 2015 10:35:08 +0000 (11:35 +0100)]
qemu: Don't unref domain after exit from nested async job
In commit 540c339a2535ec30d79e5ef84d8f50a17bc60723 the whole domain
reference counting was refactored in the qemu driver. Domain jobs now
don't need to reference the domain object as they now expect the
reference from the calling function.
However, the patch forgot to remove the unref call in case we exit the
monitor when we were acquiring a nested job. This caused the daemon to
crash on a subsequent access to the domain object once we've done an
operation requiring a nested job for a monitor access.
An easy reproducer case:
1) Start a vm with qcow disks
2) virsh snapshot-create-as DOMNAME
3) virsh dumpxml DOMNAME
4) daemon crashes in a semi-random spot while accessing a now-removed VM
object.
Fortunately, the commit wasn't released yet, so there are no security
implications.
Reported-by: Shanzi Yu <shyu@redhat.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com>
When migrate a vm, we will generate a xml via qemuDomainDefFormatLive and
pass this xml to target libvirtd. Libvirt will use the current network
state in def->data.network.actual to generate the xml, this will make
migrate failed when we set a network type guest interface use a macvtap
network as a source in a vm then migrate vm to another host(which has the
different macvtap network settings: different interface name, bridge name...)
Add a flag check in virDomainNetDefFormat, if we set a VIR_DOMAIN_XML_MIGRATABLE
flag when call virDomainNetDefFormat, we won't get the current vm interface
state.
Add missing VNC setup via Parallels SDK.
Parallels Cloud Server starts one VNC server per domain,
so we could process only one VNC server definition.
Network-based listening currently is unimplemented.
Signed-off-by: Alexander Burluka <aburluka@parallels.com>
When setting new bandwidth limits via
virDomainSetInterfaceParameters, the old ones are cleared first.
However, if setting the new ones fails, the old are already gone
and interface is left in inconsistent state. Therefore, right
before failing we ought to try to restore the old bandwidth.
Lack of a lease (whether mac is given or not) is a normal expected
scenario, since we are already filling in rv with nleases (which is
okay as 0 if there is no lease). There is no need to raise an error.
This fixes:
> virsh # net-dhcp-leases --mac 00:50:56:c0:00:01 default
> error: Failed to get leases info for default
> error: internal error: no lease with matching MAC address: 00:50:56:c0:00:01
Signed-off-by: Nehal J Wani <nehaljw.kkd1@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com>
We will get a warning when we have a guest in paused
status (caused by kernel panic) and restart libvirtd,
warning message like this:
Qemu reported unknown VM status: 'guest-panicked'
and this seems because we set a wrong status name in
qemu_monitor.c, and from qemu qapi-schema.json file
we know this status should named 'guest-panicked'.
Signed-off-by: Luyao Huang <lhuang@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
Eric Blake [Mon, 5 Jan 2015 23:38:50 +0000 (16:38 -0700)]
maint: update to latest gnulib
Another update is required to pick up today's gnulib fix for mingw
builds (now that gnulib turns on mingw's replacement printf that
understands %lld, it must also tell the compiler to respect the
improved definition of PRIdMAX and friends).
Cédric Bosdonnat [Wed, 12 Nov 2014 08:30:09 +0000 (09:30 +0100)]
Openvz --ipadd can be provided multiple times
Vzctl man page says that --ipadd can be provided multiple times to add
several IP addresses. Looping over the configured ip addresses to add
one --ipadd for each. This would even handle the multiple IPs handled
by openvz_conf.c
Domain conf: allow more than one IP address for net devices
Add the possibility to have more than one IP address configured for a
domain network interface. IP addresses can also have a prefix to define
the corresponding netmask.
Add a default implementation of virNetDevSetIPv4Address using netlink
and libnl. This avoids requiring /usr/sbin/ip or /usr/sbin/ifconfig
external binaries.
Cédric Bosdonnat [Thu, 18 Dec 2014 14:42:06 +0000 (15:42 +0100)]
Fix error when starting a container after an error
The typical case for the problem is starting a domain needing a network
that isn't started. Even after starting the network, we get an unknown error
when starting the container.
This is due to dynamic security label not being removed.
Chunyan Liu [Tue, 23 Dec 2014 06:36:05 +0000 (14:36 +0800)]
Add tests to xmconfigtest
Add tests to testing HVM default features (pae, acpi, apic)
conversion from xm config to libvirt xml. If no pae|acpi|apic
specified in xm config, after conversion, libvirt xml should
by default include:
<features>
<pae/>
<apic/>
<acpi/>
</features>
Chunyan Liu [Tue, 23 Dec 2014 06:36:04 +0000 (14:36 +0800)]
xenconfig: set HVM pae/apic/acpi/ default to 1
According to xm.config manual, HVM pae|apic|acpi feature default
is 1 (enabled). But in conversion from xm config to libvirt xml,
if xm config doesn't contain pae|apic|acpi, it sets default value
to 0, this causes some problems in HVM guest.
Update parser codes to set HVM pae|apic|acpi default value to 1
to match xm config convension.
libxl: Add support for parsing/formating Xen XL config
Now that xenconfig supports parsing and formatting Xen's
XL config format, integrate it into the libxl driver's
connectDomainXML{From,To}Native functions.
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com> Signed-off-by: Jim Fehlig <jfehlig@suse.com>
This parser allows for users to convert the new xl disk format and
spice graphics config to libvirt xml format and vice versa. Regarding
the spice graphics config, the code is pretty much straight forward.
For the disk {formating, parsing}, this parser takes care of the new
xl format which include positional parameters and key/value parameters.
In xl format disk config a <diskspec> consists of parameters separated by
commas. If the parameters do not contain an '=' they are automatically
assigned to certain options following the order below
target, format, vdev, access
The above are the only mandatory parameters in the <diskspec> but there
are many more disk config options. These options can be specified as
key=value pairs. This takes care of the rest of the options such as
In xm format, the above diskspec would be written as
phy:/dev/vg/guest-volume,hda,w
The disk parser is based on the same parser used successfully by
the Xen project for several years now. Ian Jackson authored the
scanner, which is used by this commit with mimimal changes. Only
the PREFIX option is changed, to produce function and file names
more consistent with libvirt's convention.
Signed-off-by: Kiarie Kahurani <davidkiarie4@gmail.com> Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Dmitry Guryanov [Tue, 23 Dec 2014 13:23:34 +0000 (16:23 +0300)]
parallels: report, that cdrom image is raw
VIR_STORAGE_FILE_AUTO should be used only in xml provided to
libvirt by user, if I understood correctly. Driver should
set storage source format to specific disk format in
*DomainGetXMLDesc.
Stefan Berger [Mon, 22 Dec 2014 21:57:21 +0000 (16:57 -0500)]
test: fix nwfilter tests following changes in virfirewall.c
Some of the nwfilter tests are now failing since --concurrent shows
up in the ebtables command. To avoid this, implement a function
preventing the probing for lock support in the eb/iptables tools
and use it in the tests.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
There is one problem that causes various errors in the daemon. When
domain is waiting for a job, it is unlocked while waiting on the
condition. However, if that domain is for example transient and being
removed in another API (e.g. cancelling incoming migration), it get's
unref'd. If the first call, that was waiting, fails to get the job, it
unref's the domain object, and because it was the last reference, it
causes clearing of the whole domain object. However, when finishing the
call, the domain must be unlocked, but there is no way for the API to
know whether it was cleaned or not (unless there is some ugly temporary
variable, but let's scratch that).
The root cause is that our APIs don't ref the objects they are using and
all use the implicit reference that the object has when it is in the
domain list. That reference can be removed when the API is waiting for
a job. And because each domain doesn't do its ref'ing, it results in
the ugly checking of the return value of virObjectUnref() that we have
everywhere.
This patch changes qemuDomObjFromDomain() to ref the domain (using
virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which
should be the only function in which the return value of
virObjectUnref() is checked. This makes all reference counting
deterministic and makes the code a bit clearer.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Claudio Bley [Thu, 18 Dec 2014 20:50:20 +0000 (21:50 +0100)]
docs: split typedef and struct definition for apibuild.py
The members of struct virSecurityLabel and struct virSecurityModel
were not shown in the libvirt API docs because the corresponding
<field> elements were missing from the libvirt-api.xml.
The reason is that apibuild.py does not cope well with typedef's
using inline struct definitions. It fails to associate the comment
with the typedef and because of this refuses to write out the
field of the struct.
Although QMP returns info about vCPU threads in TCG mode, the
data it returns is mostly lies. Only the first vCPU has a valid
thread_id returned. The thread_id given for the other vCPUs is
in fact the main emulator thread. All vCPUs actually run under
the same thread in TCG mode.
Our vCPU pinning code is not at all able to cope with this
so if you try to set CPU affinity per-vCPU you end up with
wierd errors
error: Failed to start domain instance-00000007
error: cannot set CPU affinity on process 24365: Invalid argument
Since few people will care about the performance of TCG with
strict CPU pinning, lets just disable that for now, so we get
a clear error message
error: Failed to start domain instance-00000007
error: Requested operation is not valid: cpu affinity is not supported
The code assumes that def->vcpus == nvcpupids, so when we setup
fake CPU pids for old QEMU with nvcpupids == 1, we cause the
later code to read off the end of the array. This has fun results
like sche_setaffinity(0, ...) which changes libvirtd's own CPU
affinity, or even better sched_setaffinity($RANDOM, ...) which
changes the affinity of a random OS process.
1) -numa to create a guest NUMA node
2) -object memory-backend-{ram,file} to tell qemu which memory
region on which host's NUMA node it should allocate the guest
memory from.
Combining these two together we can instruct qemu to create a
guest NUMA node that is tied to a host NUMA node. And it works
just fine. However, depending on machine type used, there might
be some issued during migration when OVMF is enabled (see QEMU
BZ). While this truly is a QEMU bug, we can help avoiding it. The
problem lies within the memory backend objects somewhere. Having
said that, fix on our side consists on putting those objects on
the command line if and only if needed. For instance, while
previously we would construct this (in all ways correct) command
line:
Eric Blake [Wed, 17 Dec 2014 23:10:45 +0000 (16:10 -0700)]
qemu: fix memory leak in blockinfo
Coverity flagged commit 0282ca45 as introducing a memory leak;
in all my refactoring to make capacity probing conditional on
whether the image is non-raw, I missed deleting the unconditional
probe.
* src/qemu/qemu_driver.c (qemuStorageLimitsRefresh): Drop
redundant assignment.
John Ferlan [Tue, 16 Dec 2014 14:15:03 +0000 (09:15 -0500)]
logical: Add "--type snapshot" to lvcreate command
A recent lvm change has resulted in a change for the "default" type of
logical volume created when the "--virtualsize" or "--V" is supplied on
the command line (e.g. when the allocation and capacity values of a to
be created volume differ). It seems that at the very least the following
change adjusts the default type:
When using the virsh vol-create-as or vol-create xmlfile commands, the
result is that libvirt will now create a "thin logical volume" and a
"thin logical volume pool" rather than just a "thin snapshot logical
volume". For example the following sequence:
# lvcreate --name test -L 2M -V 5M lvm_test
Rounding up size to full physical extent 4.00 MiB
Rounding up size to full physical extent 8.00 MiB
Logical volume "test" created.
# lvs lvm_test
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lvol1 lvm_test twi-a-tz-- 4.00m 0.00 0.98
test lvm_test Vwi-a-tz-- 8.00m lvol1 0.00
compared to the former code which had the following:
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
test LVM_Test swi-a-s--- 4.00m [test_vorigin] 0.00
Since libvirt doesn't know how to parse the thin logical volume
and pool, it will fail to find the newly created volume and pool
even though it exists in the volume group.
It cannot find since the command used to find/parse returns a thin volume
'test' with no associated device, for example the output is:
While it's possible to generate code to handle the new thin lv and pool, this
patch will add a "--type snapshot" onto the lvcreate command libvirt uses
in order to "for now" be able to continue to utilize the thin snapshots
There's nothing we need to do for shared iSCSI devices in
qemuAddSharedHostdev and qemuRemoveSharedHostdev. The iSCSI layer
takes care about that for us.
Signed-off-by: Luyao Huang <lhuang@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Eric Blake [Sat, 6 Dec 2014 07:14:43 +0000 (00:14 -0700)]
getstats: crawl backing chain for qemu
Wire up backing chain recursion. For the first time, it is now
possible to get libvirt to expose that qemu tracks read statistics
on backing files, as well as report maximum extent written on a
backing file during a block-commit operation.
For a running domain, where one of the two images has a backing
file, I see the traditional output:
I may later do a patch that trims the output to avoid 0 stats,
particularly for backing files (which are more likely to have
0 stats, at least for write statistics when no block-commit
is performed). Also, I still plan to expose physical size
information (qemu doesn't expose it yet, so it requires a stat,
and for block devices, a further open/seek operation). But
this patch is good enough without worrying about that yet.
* src/qemu/qemu_driver.c (QEMU_DOMAIN_STATS_BACKING): New internal
enum bit.
(qemuConnectGetAllDomainStats): Recognize new user flag, and pass
details to...
(qemuDomainGetStatsBlock): ...here, where we can do longer recursion.
(qemuDomainGetStatsOneBlock): Output new field.
Eric Blake [Tue, 25 Nov 2014 15:46:49 +0000 (08:46 -0700)]
getstats: add new flag for block backing chain
This patch introduces access to allocation information about
a backing chain of a live domain. While querying storage
volumes for read-only disks could provide some of the details,
we do NOT want to read() a file while qemu is writing it.
Also, there is one case where we have to rely on qemu: when
doing a block commit into a backing file, where that file is
stored in qcow2 format on a host block device, we want to know
the current highest write offset into that image, in order to
know if the disk must be resized larger. qemu-img does not
(currently) show this information, and none of the earlier
block APIs were extensible enough to expose it. But
virDomainListGetStats is perfect for the job!
We don't need a new group of statistics, as the existing block
group is sufficient. On the other hand, as existing libvirt
releases already report 1:1 mapping of block.count to <disk>
devices, changing the array size could confuse older clients;
and even with newer clients, the time and memory taken to
report additional statistics is not always necessary (backing
files are generally read-only except for block-commit, so while
read statistics may change, sizing statistics will not). So
the choice here is to add a new flag that only newer callers
will pass, when they are prepared for the additional information.
This patch introduces the new API, but it will take more
patches to get it implemented for qemu.
* include/libvirt/libvirt-domain.h
(VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING): New flag.
* src/libvirt-domain.c (virConnectGetAllDomainStats): Document it,
and add a new field when it is in use.
* tools/virsh-domain-monitor.c (cmdDomstats): Use new flag.
* tools/virsh.pod (domstats): Document it.
Eric Blake [Fri, 5 Dec 2014 23:19:00 +0000 (16:19 -0700)]
getstats: prepare for dynamic block.count stat
A coming patch will make it optionally possible to list backing
chain block stats; in this mode of operation, block.counts is no
longer the number of <disks> in the domain, but the number of
blocks in the array being reported. We still want block.count
listed first, but rather than iterate the tree twice (once to
count, and once to list stats), it's easier to just touch things
up after the fact.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Compute count
after the fact.
Eric Blake [Sat, 6 Dec 2014 06:01:05 +0000 (23:01 -0700)]
getstats: report block sizes for offline domains
The prior refactoring can now be put to use. With the same domain
as the earlier commit 7b49926 (one qcow2 disk and an empty
cdrom drive):
$ virsh domstats --block foo
Domain: 'foo'
block.count=2
block.0.name=hda
block.0.path=/var/lib/libvirt/images/foo.qcow2
block.0.allocation=1309614080
block.0.capacity=42949672960
block.0.physical=1309671424
block.1.name=hdc
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Use
qemuStorageLimitsRefresh to report offline statistics.
Eric Blake [Wed, 17 Dec 2014 06:18:51 +0000 (23:18 -0700)]
qemu: fix bugs in blockstats
The documentation for virDomainBlockInfo was confusing: it stated
that 'physical' was the size of the container, then gave an example
of it being the amount of storage used by a sparse file (that is,
for a sparse raw image on a regular file, the wording implied
capacity==physical, while allocation was smaller; but the example
instead claimed physical==allocation). Since we use 'physical' for
the last offset of a block device, we should do likewise for
regular files.
Furthermore, the example claimed that for a qcow2 regular file,
allocation==physical. At the time the code was first written,
this was true (qcow2 files were allocated sequentially, and were
never sparse, so the last sector written happened to also match
the disk space occupied); but modern qemu does much better and
can punch holes for a qcow2 with allocation < physical.
Basically, after this patch, the three fields are now reliably
mapped as:
'capacity' - how much storage the guest can see (equal to
physical for raw images, determined by image metadata otherwise)
'allocation' - how much storage the image occupies (similar to
what 'du' would report)
'physical' - the last offset of the image (similar to what 'ls'
would report)
'capacity' can be larger than 'physical' (such as for a qcow2
image that does not vary much from a backing file) or smaller
(such as for a qcow2 file with lots of internal snapshots).
Likewise, 'allocation' can be (slightly) larger than 'physical'
(such as counting the tail of cluster allocations required to
round a file size up to filesystem granularity) or smaller
(for a sparse file). A block-resize operation changes capacity
(which, for raw images, also changes physical); many non-raw
images automatically grow physical and allocation as necessary
when starting with an allocation smaller than capacity; and even
when capacity and physical stay unchanged, allocation can change
when converting sectors from holes to data or back.
Note that this does not change semantics for qcow2 images stored
on block devices; there, we still rely on qemu to report the
highest written extent for allocation. So using this API to
track when to extend a block device because a qcow2 image is
about to exceed a threshold will not see any changes.
Also, note that virStorageVolInfo is unfortunately limited to
just 'capacity' and 'allocation' (we can't expand it to add
'physical', although we can expand the XML to add it there);
historically, that struct's 'allocation' value has reported
file size for qcow2 files (what this patch terms 'physical'
for a domain block device), but disk usage for raw files (what
this patch terms 'allocation'). So follow-up patches will be
needed to make storage volumes report the same allocation
values and get at physical values, where those differ.
* include/libvirt/libvirt-domain.h (_virDomainBlockInfo): Tweak
documentation to match saner definition.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): For regular
files, physical size is capacity, not allocation.
Eric Blake [Wed, 17 Dec 2014 06:13:04 +0000 (23:13 -0700)]
getstats: rearrange blockinfo gathering
Ultimately, we want to avoid read()ing a file while qemu is running.
We still have to open() block devices to determine their physical
size, but that is safer. This patch rearranges code to group
together all code that reads the image, to make it easier for later
patches to skip the metadata collection when possible.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Check for empty
disk up front. Place metadata reading next to use.
Eric Blake [Fri, 12 Dec 2014 16:53:33 +0000 (09:53 -0700)]
getstats: perform recursion in monitor collection
When requested in a later patch, the QMP command results are now
examined recursively. As qemu_driver will eventually have to
read items out of the hash table as stored by this patch, the
computation of backing alias string is done in a shared location.
* src/qemu/qemu_domain.h (qemuDomainStorageAlias): New prototype.
* src/qemu/qemu_domain.c (qemuDomainStorageAlias): Implement it.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): Perform recursion.
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Update callers.
Eric Blake [Thu, 11 Dec 2014 22:28:41 +0000 (15:28 -0700)]
getstats: prepare monitor collection for recursion
A future patch will allow recursion into backing chains when
collecting block stats. This patch should not change behavior,
but merely moves out the common code that will be reused once
recursion is enabled, and adds the parameter that will turn on
recursion.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Add recursion parameter,
although it is ignored for now.
* src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
(qemuMonitorBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.h
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Likewise.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetAllBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacity): Add parameter, and
split...
(qemuMonitorJSONGetOneBlockStatsInfo)
(qemuMonitorJSONBlockStatsUpdateCapacityOne): ...into helpers.
(qemuMonitorJSONGetBlockStatsInfo): Update caller.
* src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Update caller.
* src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise.
Eric Blake [Sun, 16 Nov 2014 04:54:33 +0000 (21:54 -0700)]
qemu: let blockinfo reuse virStorageSource
Right now, grabbing blockinfo always calls stat on the disk, then
opens the image to determine the capacity, using a throw-away
virStorageSourcePtr. This has a couple of drawbacks:
1. We are calling stat and opening a file on every invocation of
the API. However, there are cases where the stats should NOT be
changing between successive calls (if a domain is running, no
one should be changing the physical size of a block device or raw
image behind our backs; capacity of read-only files should not
be changing; and we are the gateway to the block-resize command
to know when the capacity of read-write files should be changing).
True, we still have to use stat in some cases (a sparse raw file
changes allocation if it is read-write and the amount of holes is
changing, and a read-write qcow2 image stored in a file changes
physical size if it was not fully pre-allocated). But for
read-only images, even this should be something we can remember
from the previous time, rather than repeating every call.
2. We want to enhance the power of virDomainListGetStats, by
sharing code. But we already have a virStorageSourcePtr for
each disk, and it would be easier to reuse the common structure
than to have to worry about the one-off virDomainBlockInfoPtr.
While this patch does not optimize reuse of information in point
1, it does get us closer to being able to do so; by updating a
structure that survives between consecutive calls.
* src/util/virstoragefile.h (_virStorageSource): Add physical, to
mirror virDomainBlockInfo; rearrange fields to match public struct.
(virStorageSourceCopy): Copy the new field.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Store into
storage source, then copy to block info.
Eric Blake [Fri, 14 Nov 2014 16:44:40 +0000 (09:44 -0700)]
qemu: refactor blockinfo job handling
In order for a future patch to virDomainListGetStats to reuse
some code for determining disk usage of offline domains, we
need to make it easier to pull out part of the guts of grabbing
blockinfo. The current implementation grabs a job fairly late
in the game, while getstats will already own a job; reordering
things so that the job is always grabbed up front in both
functions will make it easier to pull out the common code.
This patch results in grabbing a job in cases where one was not
previously needed, but as it is a query job, it should not be
noticeably slower.
This patch touches the same code as the fix for CVE-2014-6458
(commit b799259); in that patch, we avoided hotplug changing
a disk reference during the time of obtaining a monitor lock
by copying all data we needed and no longer referencing disk;
this patch goes the other way and ensures that by holding the
job, the disk cannot be changed so we no longer need to worry
about the disk being invalidated across the monitor lock.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Rearrange job
control to be outside of disk information.
When any of the functions modified in commit 214c687b took false branch,
the function itself used none of its parameters resulting in "unused
parameter" error. Rewriting these functions to the stubs we use
elsewhere should fix the problem.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit e3435caf added cleanup code to qemuDomainSetVcpusFlags() that was
not supposed to reset the error. Usual procedure was done, saving the
error to temporary variable, but it was never free'd, but rather leaked.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
qemu: Add missing goto error in qemuRestoreCgroupState
Commit af2a1f05 tried clearly separating each condition in
qemuRestoreCgroupState() for the sake of readability, however somehow
one condition body was missing. That means that the body of the next
condition got executed only if both of there were true, which is
impossible, thus resulting in a dead code and a logic error.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
conf: Fix invalid condition when parsing storage owner
In commit d2632d60 we agreed taht we want the parsed uid to properly
overflow but only to -1, however the value was read into long and then
wrapped into uid_t. That meaned it failed on 32-bit systems.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
John Ferlan [Mon, 8 Dec 2014 13:06:57 +0000 (08:06 -0500)]
virstoragefile: Have virStorageFileResize use safezero
Currently virStorageFileResize() function uses build conditionals to
choose either the posix_fallocate() or syscall(SYS_fallocate) with no
fallback in order to preallocate the space in the newly resized file.
Since the safezero code has a similar set of conditionals modify the
resize and safezero code in order to allow the resize logic to make use
of safezero to unify the look/feel of the code paths.
Add a new boolean (resize) to safezero() to make the optional decision
whether to try syscall(SYS_fallocate) if the posix_fallocate fails because
HAVE_POSIX_FALLOCATE is not defined (eg, return -1 and errno == 0).
Create a local safezero_sys_fallocate in order to handle the resize
code paths that support that. If not present, the set errno = ENOSYS
in order to allow the caller to handle the failure scenarios.
John Ferlan [Fri, 5 Dec 2014 21:18:29 +0000 (16:18 -0500)]
virfile: Refactor safezero
Currently build conditionals decide which of two safezero() functions
should be built - either the posix_fallocate() or mmap() with a fallback
to a slower safewrite() algorithm in order to preallocate space in a raw file.
This patch will refactor safezero to utilize static functions for either
posix_fallocate or mmap/safewrite. The build conditional still exist, but
are only for shorter sections of code.
The posix_fallocate path will make use of the ret/errno setting to contain
the logic for safezero to decide whether it needs to fallback to other
algorithms. A return of -1 with errno not changed will indicate the conditional
is not present; otherwise, a return of -1 with errno change indicates the
call was made and it failed (no functional difference to current algorithm).
The mmap/safewrite option changes only slightly to handle the ftruncate
failure for mmap. That is, previously if the ftruncate failed, there was
no fallback to the slow safewrite option.
conf: Rework virDomainObjListFindByUUID to allow more concurrent APIs
Currently, when there is an API that's blocking with locked domain and
second API that's waiting in virDomainObjListFindByUUID() for the domain
lock (with the domain list locked) no other API can be executed on any
domain on the whole hypervisor because all would wait for the domain
list to be locked. This patch adds new optional approach to this in
which the domain is only ref'd (reference counter is incremented)
instead of being locked and is locked *after* the list itself is
unlocked. We might consider only ref'ing the domain in the future and
leaving locking on particular APIs, but that's no tonight's fairy tale.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Volume and pool formatting functions took different approaches to
unspecified uids/gids. When unknown, it is always parsed as -1, but one
of the functions formatted it as unsigned int (wrong) and one as
int (better). Due to that, our two of our XML files from tests cannot
be parsed on 32-bit machines.
RNG schema needs to be modified as well, but because both
storagepool.rng and storagevol.rng need same schema for permission
element, save some space by moving it to storagecommon.rng.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
qemu: Fix hotplugging cpus with strict memory pinning
When hot-plugging a VCPU into the guest, kvm needs to allocate some data
from the DMA zone, which might be in a memory node that's not allowed in
cpuset.mems. Basically the same problem as there was with starting the
domain and due to which commit 7e72ac787848b7434c9359a57c1e2789d92350f8
exists. This patch just extends it to hotplugging as well.
Instead of setting the value of cpuset.mems once when the domain starts
and then re-calculating the value every time we need to change the child
cgroup values, leave the cgroup alone and rather set the child data
every time there is new cgroup created. We don't leave any task in the
parent group anyway. This will ease both current and future code.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>