qemu: return -1 on error paths in qemuDomainSaveImageStartVM
Error paths after sending the event that domain is started written as if ret = -1
which is set at the beginning of the function. It's common idioma to keep 'ret'
equal to -1 until the end of function where it is set to 0. But here we use ret
to keep result of restore operation too and thus breaks the idioma and its users :)
Let's use different variable to hold restore result.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
John Ferlan [Mon, 1 Feb 2016 15:26:02 +0000 (10:26 -0500)]
logical: Clean up allocation when building regex on the fly
Rather than a loop reallocating space to build the regex, just allocate
it once up front, then if there's more than 1 nextent, append a comma and
another regex_unit string.
John Ferlan [Mon, 1 Feb 2016 14:59:44 +0000 (09:59 -0500)]
logical: Use 'stripes' value for mirror/raid segtype
The 'stripes' value is described as the "Number of stripes or mirrors in
a logical volume". So add "mirror" and anything that starts with "raid"
to the list of segtypes that can have an 'nextents' value greater than one.
Use of raid segtypes (raid1, raid4, raid5*, raid6*, and raid10) is favored
over mirror in more recent lvm code.
John Ferlan [Mon, 1 Feb 2016 14:39:00 +0000 (09:39 -0500)]
logical: Use VIR_APPEND_ELEMENT instead of VIR_REALLOC_N
Rather than preallocating a set number of elements, then walking through
the extents and adjusting the specific element in place, use the APPEND
macros to handle that chore.
In my previous commit a70f3b1c779120129 I've tried to fix case
when building from VPATH and a file wasn't being installed.
However, my fix broke non-VPATH build.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michael Chapman [Wed, 27 Jan 2016 02:24:54 +0000 (13:24 +1100)]
virsh: improve waiting for block job readiness
After a block job hits 100%, we only need to apply a timeout waiting for
a block job event if exactly one of the BLOCK_JOB or BLOCK_JOB_2
callbacks were able to be registered.
If neither callback could be registered, there's clearly no need for a
timeout.
If both callbacks were registered, then we're guaranteed to eventually
get one of the events. The path being used by virsh must be exactly the
source path or target device in the domain's disk definition, and these
are the respective strings sent back in these two events.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Michael Chapman [Wed, 27 Jan 2016 02:24:52 +0000 (13:24 +1100)]
virsh: be consistent with style of loop exit
When waiting for a block job, the various statuses (COMPLETED, READY,
CANCELED, etc.) should all be treated consistently by having the loop be
exited with "break". Use "goto cleanup" for the error cases only, when
no block job status is available.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Boris Fiuczynski [Wed, 20 Jan 2016 12:13:45 +0000 (13:13 +0100)]
qemu: Align dump options for watchdog and on_crash events
Having on_crash set to either coredump-destroy or coredump-restart
creates core dumps with option memory-only in the directory specified
by auto_dump_path. When a watchdog is triggered with the action dump
the core dump is also placed into the directory specified by auto_dump_path
but is created without the option memory-only.
This patch sets the option memory-only also for core dumps created by the
watchdog event.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Michal Privoznik [Sat, 30 Jan 2016 10:55:45 +0000 (11:55 +0100)]
includes: Install libvirt-common.h
The libvirt-common.h is build time generated file from .in.
Obviously, it's generated into builddir and not srcdir. Problem
is, the list of header files to install, virinc_HEADERS contains
only $(srcdir)/*.h and this misses libvirt-common.h. This problem
is pretty obvious when doing a VPATH build.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Create a helper routine in order to parse any extents information
including the extent size, length, and the device string contained
within the generated 'lvs' output string.
A future patch would then be able to avoid the code more cleanly
RBD supports cloning by creating a snapshot, protecting it and create
a child image based on that snapshot afterwards.
The RBD storage driver will try to find a snapshot with zero deltas between
the current state of the original volume and the snapshot.
If such a snapshot is found a clone/child image will be created using
the rbd_clone2() function from librbd.
rbd_clone2() is available in librbd since Ceph version Dumpling (0.67) which
dates back to August 2013.
It will use the same features, strip size and stripe count as the parent image.
This implementation will only create a single snapshot on the parent image if
never changes. This reduces the amount of snapshots created for that RBD image
which benefits the performance of the Ceph cluster.
During build the decision will be made to use either rbd_diff_iterate() or
rbd_diff_iterate2().
The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
Cloning is only supported if RBD format 2 is used. All images created by libvirt
are already format 2.
If a RBD format 1 image is used as the original volume the backend will report
a VIR_ERR_OPERATION_UNSUPPORTED error.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
rbd: Add support for wiping RBD volumes using TRIM.
Using VIR_STORAGE_VOL_WIPE_ALG_TRIM a RBD volume can be trimmed down
to 0 bytes using rbd_discard()
Effectively all the data on the volume will be lost/gone, but the volume
remains available for use afterwards.
Starting at offset 0 the storage pool will call rbd_discard() in stripe
size * count increments which is usually 4MB. Stripe size being 4MB and
count 1.
rbd_discard() is available since Ceph version Dumpling (0.67) which dates
back to August 2013.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When wiping the RBD image will be filled with zeros started
at offset 0 and until the end of the volume.
This will result in the RBD volume growing to it's full allocation
on the Ceph cluster. All data on the volume will be overwritten
however, making it unavailable.
It does NOT take any RBD snapshots into account. The original data
might still be in a snapshot of that RBD volume.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Use the cast of (virStorageVolWipeAlgorithm) adding the missing case:'s
(VIR_STORAGE_VOL_WIPE_ALG_ZERO and VIR_STORAGE_VOL_WIPE_ALG_LAST).
Additionally, the old code would also still run the SCRUB command on
default since it didn't go to cleanup when a invalid flag was supplied.
We now go to cleanup and exit if a invalid flag would be provided.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
John Ferlan [Fri, 22 Jan 2016 16:01:05 +0000 (11:01 -0500)]
logical: Fix comment examples for virStorageBackendLogicalFindLVs
When commit id '82c1740a' made changes to the output format (changing from
using a ',' separator to '#'), the examples in the lvs output from the
comments weren't changed.
Additionally, the two new fields added ('segtype' and 'stripes') were
not included in the output, leaving it well confusing.
This patch fixes the sample output, adds a 'striped' example, and makes
other comment related adjustments for long line and spacing between followup
'NB' remarks (while I'm there).
Change their return type from unsigned int to bool: the corresponding
members in struct _virPCIDevice are defined as bool, and even the
corresponding virPCIDeviceSet*() functions take a bool value as input
so there's no point in these functions having unsigned int as return
type.
Michal Privoznik [Thu, 28 Jan 2016 14:20:17 +0000 (15:20 +0100)]
gendispatch: Don't output spaces on empty line
In our generator for some code we put empty lines in the output
to separate blocks of code. However, in some cases we put couple
of spaces on the empty line too. It's not bug, it just isn't
nice.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Andrea Bolognani [Fri, 15 Jan 2016 10:57:26 +0000 (11:57 +0100)]
pci: Add debug messages when unbinding from stub driver
Unbinding a PCI device from the stub driver can require several steps,
and it can be useful for debugging to be able to trace which of these
steps are performed and which are skipped for each device.
Andrea Bolognani [Tue, 19 Jan 2016 14:58:31 +0000 (15:58 +0100)]
pci: Phase out virPCIDeviceReattachInit()
The name is confusing, and there are just two uses: one is a test case,
and the other will be removed as part of an upcoming refactoring of
the hostdev code.
Andrea Bolognani [Wed, 27 Jan 2016 09:35:17 +0000 (10:35 +0100)]
virnetdevopenvswitch: Don't call strlen() twice on the same string
Commit 871e10f fixed a memory corruption error, but called strlen()
twice on the same string to do so. Even though the compiler is
probably smart enough to optimize the second call away, having a
single invocation makes the code slightly cleaner.
Suggested-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Wed, 27 Jan 2016 08:49:54 +0000 (09:49 +0100)]
virnetdevmacvlan: Provide stubs for build without macvtap
In 370608b4c76f we have introduced two new internal APIs.
However, there are no stubs for build without macvtap. Therefore
build on systems lacking macvtap support (e.g. mingw or freebds)
fails when trying to link.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The problem is that virCommandRun can return an empty string in the event that
the port being queried does not exist. When this happens then we are
unconditionally overwriting a newline character at position strlen()-1. When
strlen is 0, we overwrite memory that does not belong to the string.
The fix: Only overwrite the newline if the string is not empty.
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Laine Stump [Tue, 19 Jan 2016 19:20:54 +0000 (14:20 -0500)]
util: keep/use a bitmap of in-use macvtap devices
This patch creates two bitmaps, one for macvlan device names and one
for macvtap. The bitmap position is used to indicate that libvirt is
currently using a device with the name macvtap%d/macvlan%d, where %d
is the position in the bitmap. When requested to create a new
macvtap/macvlan device, libvirt will now look for the first clear bit
in the appropriate bitmap and derive the device name from that rather
than just starting at 0 and counting up until one works.
When libvirtd is restarted, the qemu driver code that reattaches to
active domains calls the appropriate function to "re-reserve" the
device names as it is scanning the status of running domains.
Note that it may seem strange that the retry counter now starts at
8191 instead of 5. This is because we now don't do a "pre-check" for
the existence of a device once we've reserved it in the bitmap - we
move straight to creating it; although very unlikely, it's possible
that someone has a running system where they have a large number of
network devices *created outside libvirt* named "macvtap%d" or
"macvlan%d" - such a setup would still allow creating more devices
with the old code, while a low retry max in the new code would cause a
failure. Since the objective of the retry max is just to prevent an
infinite loop, and it's highly unlikely to do more than 1 iteration
anyway, having a high max is a reasonable concession in order to
prevent lots of new failures.
Leno Hou [Mon, 11 Jan 2016 06:59:00 +0000 (14:59 +0800)]
util: increase libnl buffer size
In the following cases nl_recv() was returning the error "No buffer
space available":
* When switching CPUs to offline/online in a system more than 128 cpus
* When using virsh to destroy domain in a system with many interfaces
This patch sets the buffer size for all netlink sockets created by
libnl to 128K and turns on message peeking for nl_recv(). This
eliminates the "No buffer space available" errors seen in the cases
above, and also preempts other future errors the smaller buffers could
have caused.
Signed-off-by: Leno Hou <houqy@linux.vnet.ibm.com> Signed-off-by: Laine Stump <laine@laine.org>
Pavel Hrdina [Mon, 11 Jan 2016 11:40:32 +0000 (12:40 +0100)]
device: cleanup input device code
The current code was a little bit odd. At first we've removed all
possible implicit input devices from domain definition to add them later
back if there was any graphics device defined while parsing XML
description. That's not all, while formating domain definition to XML
description we at first ignore any input devices with bus different to
USB and VIRTIO and few lines later we add implicit input devices to XML.
This seems to me as a lot of code for nothing. This patch may look
to be more complicated than original approach, but this is a preferred
way to modify/add driver specific stuff only in those drivers and not
deal with them in common parsing/formating functions.
The update is to add those implicit input devices into config XML to
follow the real HW configuration visible by guest OS.
There was also inconsistence between our behavior and QEMU's in the way,
that in QEMU there is no way how to disable those implicit input devices
for x86 architecture and they are available always, even without graphics
device. This applies also to XEN hypervisor. VZ driver already does its
part by putting correct implicit devices into live XML.
Michal Privoznik [Tue, 26 Jan 2016 16:37:29 +0000 (17:37 +0100)]
vircgroup: Finish renaming of virCgroupIsolateMount
In dc576025c360 we renamed virCgroupIsolateMount function to
virCgroupBindMount. However, we forgot about one occurrence in
section of the code which provides stubs for platforms without
support for CGroups like *BSD for instance.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
lxc: don't try to hide parent cgroups inside container
On the host when we start a container, it will be
placed in a cgroup path of
/machine.slice/machine-lxc\x2ddemo.scope
under /sys/fs/cgroup/*
Inside the containers' namespace we need to setup
/sys/fs/cgroup mounts, and currently will bind
mount /machine.slice/machine-lxc\x2ddemo.scope on
the host to appear as / in the container.
While this may sound nice, it confuses applications
dealing with cgroups, because /proc/$PID/cgroup
now does not match the directory in /sys/fs/cgroup
This particularly causes problems for systems and
will make it create repeated path components in
the cgroup for apps run in the container eg
This also causes any systemd service that uses
sd-notify to fail to start, because when systemd
receives the notification it won't be able to
identify the corresponding unit it came from.
In particular this break rabbitmq-server startup
Future kernels will provide proper cgroup namespacing
which will handle this problem, but until that time
we should not try to play games with hiding parent
cgroups.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The VIR_DOMAIN_STATS_VCPU flag to virDomainListGetStats
enables reporting of stats about vCPUs. Currently we
only report the cumulative CPU running time and the
execution state.
This adds reporting of the wait time - time the vCPU
wants to run, but the host scheduler has something else
running ahead of it.
In implementing this I notice our reporting of CPU execute
time has very poor granularity, since we are getting it
from /proc/$PID/stat. As a future enhancement we should
prefer to get CPU execute time from /proc/$PID/schedstat
or /proc/$PID/sched (if either exist on the running kernel)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Peter Krempa [Mon, 4 Jan 2016 18:46:31 +0000 (19:46 +0100)]
test: Touch up error message when attempting to pin invalid vCPU
Report
error: invalid argument: requested vcpu '100' is not present in the domain
instead of
error: invalid argument: requested vcpu is higher than allocated vcpus
Peter Krempa [Mon, 4 Jan 2016 16:00:51 +0000 (17:00 +0100)]
tests: qemuxml2xml: Order pinning information numerically
A future patch will refactor the storage of the pinning information in a
way where the ordering will be lost. Order them numerically to avoid
changing the tests later.
Peter Krempa [Mon, 7 Dec 2015 11:49:12 +0000 (12:49 +0100)]
virsh: cpu-stats: Remove unneeded flags
virDomainGetCPUStats doesn't support flags so there's no need to carry
the 'flags' variable around. Additionally since the API is poorly
designed I doubt that it will be extended.
Michal Privoznik [Mon, 25 Jan 2016 06:57:21 +0000 (07:57 +0100)]
virt-host-validate: Fix error level for user namespace check
From the code it seems to me that we need user namespace if
configured in domain XML. Otherwise we don't use it at all.
However our tool is more strict about that. Fix this discrepancy.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Sun, 24 Jan 2016 14:11:32 +0000 (15:11 +0100)]
virt-host-validate: Check those CGroups that we actually use
Since the introduction of virt-host-validate tool the set of
cgroup controllers we use has changed so the tool is checking for
some cgroups that we don't need (e.g. net_cls, although I doubt
we have ever used that one) and is not checking for those we
actually use (e.g. cpuset).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
It all works like this. The change-media command dumps domain
XML, finds the corresponding cdrom device we want to change media
in and returns it in the xmlNodePtr form. This way we don't have
to bother with keeping all the subelements or attributes that we
don't care about in the XML that is fed back to libvirt for the
update API.
Now, the problem is we try to be clever here and detect if disk
already has a source (indicated by <source/> subelement).
However, bare fact that the element is there does not mean disk
has source. Make our clever check better.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Fri, 15 Jan 2016 12:01:30 +0000 (13:01 +0100)]
qemu: snapshot: Correctly report qemu error on 'savevm'
Since 'savevm' was not converted to QMP libvirt has to parse for error
strings in the text monitor output. One of the unhandled errors is
produced when qemu treats a device as unmigratable.
As current qemu actually does support AHCI migration this bug is
applicable only to older versions of qemu.
Make bhyveload respect boot order as specified by os.boot section of the
domain XML or by "boot order" for specific devices. As bhyve does not
support a real boot order specification right now, it's just about
choosing a single device to boot from.
Laine Stump [Thu, 21 Jan 2016 19:19:56 +0000 (14:19 -0500)]
util: reset MAC address of macvtap passthrough physdev after disassociate
libvirt always resets the MAC address of the physdev used for macvtap
passthrough when the guest is finished with it. This was happening
prior to the 802.1Qb[gh] DISASSOCIATE command, and was quite often
failing, presumably because the driver wouldn't allow the MAC address
to be reset while the association was still active, with a log message
like this:
virNetDevSetMAC:168 : Cannot set interface MAC to 00:00:00:00:00:00 on 'eth13': Cannot assign requested address
This patch changes the order - we now do the 802.1Qb[gh] disassociate
and delete the macvtap interface first, then and reset the MAC
address.
Cole Robinson [Thu, 21 Jan 2016 18:18:04 +0000 (13:18 -0500)]
lxc: fuse: Fill in MemAvailable for /proc/meminfo
'free' on Fedora 23 will use MemAvailable to calculate its 'available'
field, but we are passing through the host's value. Set it to match
MemFree, which is what 'free' will do for older linux that don't have
MemAvailable
Cole Robinson [Thu, 21 Jan 2016 18:14:54 +0000 (13:14 -0500)]
lxc: fuse: Fix /proc/meminfo size calculation
We virtualize bits of /proc/meminfo by replacing host values with
values specific to the container.
However for calculating the final size of the returned data, we are
using the size of the original file and not the altered copy, which
could give garbelled output.
On the formatting side switch to producing cmdline= instead of extra=.
Update a few tests and add serveral more.
- test-cmdline is added to test the exclusive use of cmdline.
- test-fullvirt-direct-kernel-boot.cfg is updated due to the switch
on the formatting side and now tests the exclusive use of cmdline=.
- Tests are added for both paravirt and fullvirt where the .cfg uses
extra= and (paravirt only) root=. These are format (xl->xml) only
since the inverse will generate cmdline= hence is not a round trip
(which was already true if using root=, which used to generate
extra= on the way back).
- Tests are added for both paravirt and fullvirt where the .cfg
declares cmdline= as well as bogus extra= and (paravirt only) root=
entries which should be ignored. Again these are format only tests
since the inverse won't include the bogus lines.
The last two bullets here required splitting the DO_TEST macro into
two halves, as is done in the xmconfigtest.c case.
In order to introduce a use of VIR_WARN for logging I had to add
virerror.h and VIR_LOG_INIT.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Joao Martins [Thu, 21 Jan 2016 10:21:10 +0000 (10:21 +0000)]
libxl: dispose libxl_dominfo after libxl_domain_info()
As suggested in a previous thread [0] this patch adds some missing calls
to libxl_dominfo_{init,dispose} when doing some of the libxl_domain_info
operations which would otherwise lead to memory leaks.
Jim Fehlig [Wed, 20 Jan 2016 18:36:26 +0000 (11:36 -0700)]
Xen: VIR_FROM_THIS cleanup
The virErrorDomain enum has VIR_FROM_XEN, VIR_FROM_XEND,
VIR_FROM_XENSTORE, VIR_FROM_SEXPR, and VIR_FROM_XENXM. Use
these elements in the corresponding .c files. While at it,
remove the VIR_FROM_THIS define in src/xenconfig/xenxs_private.h.
Jiri Denemark [Thu, 10 Dec 2015 15:09:09 +0000 (16:09 +0100)]
Introduce migration iteration event
The VIR_DOMAIN_EVENT_ID_MIGRATION_ITERATION event will be triggered
whenever VIR_DOMAIN_JOB_MEMORY_ITERATION changes its value, i.e.,
whenever a new iteration over guest memory pages is started during
migration.
Dmitry Andreev [Wed, 20 Jan 2016 13:10:09 +0000 (16:10 +0300)]
qemuDomainReboot: use fakeReboot=true only for acpi mode
When acpi is used to reboot/shutdown qemu domain, qemu emits
SHUTDOWN event. Libvirt uses fakeReboot variable in order to
differentiate reboot or shutdown. fakeReboot value is reseted
to false after domain restart/reset.
When mode=agent is used to reboot qemu domain, qemu doesn't emit
SHUTDOWN event and libvirt doesn't reset fakeReboot value to false.
In this case next 'shutdown -h now' performs reboot. That's why
we don't need to set fakeReboot=true for mode=agent.
Michal Privoznik [Wed, 20 Jan 2016 14:44:45 +0000 (15:44 +0100)]
virsh: Don't fetch status for all domains in cmdList
We are getting the list of domains and after that we iterate over
the list and try to get status for each domain hoping it will
skip over domains that disappeared meanwhile. However, this
solution to race is bogus - domain may disappear right after we
have checked its state and before we exec another API over it
(e.g. virDomainHasManagedSaveImage()). Also, when printing just
names or uuids (list --name / --uuid) we issue APIs to obtain the
values, however these require no RPC call as all requested info
is in virDomain object that client already has.
Therefore move the status obtaining only to the place that really
needs it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The generated output is dependent on perl hashtable ordering, which
gives different results for i686 and x86_64. Fix this by sorting
the hash keys before iterating over them
When generating docs in a VPATH build we get a failure to
create a file due to the 'internals' subdir not existing:
Generating internals/locking.html.tmp
/bin/sh: line 3: internals/locking.html.tmp: No such file or directory
rm: cannot remove ‘internals/locking.html.tmp’: No such file or directory
Makefile:2229: recipe for target 'internals/locking.html.tmp' failed
make: *** [internals/locking.html.tmp] Error 1
For some reason, make has decided to run the target
Ján Tomko [Thu, 14 Jan 2016 12:32:23 +0000 (13:32 +0100)]
leaseshelper: split out custom leases file read
Introduce virLeaseReadCustomLeaseFile which will populate
the new leases array with all the leases, except for expired
ones and the ones matching 'ip_to_delete'.
In order to be able to process disk storage pool's using a multipath
device to handle the partitions, libvirt_parthelper will need a way to
not automatically add a partition separator "p" to the generated device
name for each partition found. This is designed to mimic the multipath
features known as 'user_friendly_names' and custom 'alias' name.
If the part_separator attribute is set to "no", then generation of the
multipath partition name will not include the "p" partition separator
unless the source device path name ends with a number. The generated
partition names that get passed back to libvirt are processed in order
to find the device mapper multipath (dm-#) path device.
For example, device path "/dev/mapper/mpatha" would create partitions
"/dev/mapper/mpatha1", "/dev/mapper/mpatha2", etc. instead of
"/dev/mapper/mpathap1", "/dev/mapper/mpathap2", etc. If the device
path ends with a number "/dev/mapper/mpatha1", then the algorithm
to generate names "/dev/mapper/mpatha1p1", "/dev/mapper/mpatha1p2", etc.
would be utilized.
John Ferlan [Thu, 7 Jan 2016 11:57:28 +0000 (06:57 -0500)]
conf: Add storage pool device attribute part_separator
Add a new storage pool source device attribute 'part_separator=[yes|no]'
in order to allow a 'disk' storage pool using a device mapper multipath
device to not add the "p" partition separator to the generated device
name when libvirt_parthelper is run.
This will allow libvirt to find device mapper multipath devices which were
configured in /etc/multipath.conf to use 'user_friendly_names' or custom
'alias' names for the LUN.
Michal Privoznik [Mon, 18 Jan 2016 10:13:22 +0000 (11:13 +0100)]
virLogManagerDomainReadLogFile: Don't do dummy allocs
Since we pass dummy variables @fdout and @fdoutlen into
virNetClientProgramCall() we make it alloc @fdout array (even
though it's an array of 0 elements since vitlogd can hardly pass
us some FDs at this stage). Nevertheless, it's an allocation not
followed by free():
==29385== 0 bytes in 60 blocks are definitely lost in loss record 2 of 1,009
==29385== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==29385== by 0x54B99EF: virAllocN (viralloc.c:191)
==29385== by 0x56821B1: virNetClientProgramCall (virnetclientprogram.c:359)
==29385== by 0x563B304: virLogManagerDomainReadLogFile (log_manager.c:272)
==29385== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485)
==29385== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660)
==29385== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696)
==29385== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957)
==29385== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955)
==29385== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152)
==29385== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396)
==29385== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Mon, 18 Jan 2016 09:50:14 +0000 (10:50 +0100)]
qemuProcessReadLog: Fix memmove arguments
So I can observe this crasher that with freshly started daemon
(and virtlogd enabled) I am trying to startup a domain that
immediately dies (because it's said to use huge pages but I
haven't allocated a single one in the pool). Hardly reproducible
with -O0 or under valgrind. But I just got lucky:
==20469== Invalid write of size 8
==20469== at 0x4C2E99B: memcpy@GLIBC_2.2.5 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20469== by 0x217EDD07: qemuProcessReadLog (qemu_process.c:1670)
==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696)
==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957)
==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955)
==20469== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152)
==20469== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396)
==20469== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450)
==20469== by 0x21846845: qemuDomainCreate (qemu_driver.c:7468)
==20469== by 0x5611CD0: virDomainCreate (libvirt-domain.c:6753)
==20469== by 0x125D9A: remoteDispatchDomainCreate (remote_dispatch.h:3613)
==20469== by 0x125CB7: remoteDispatchDomainCreateHelper (remote_dispatch.h:3589)
==20469== Address 0x27a52ad0 is 0 bytes after a block of size 5,584 alloc'd
==20469== at 0x4C29F80: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20469== by 0x9B8D1DB: xdr_string (in /lib64/libc-2.21.so)
==20469== by 0x563B39C: xdr_virLogManagerProtocolNonNullString (log_protocol.c:24)
==20469== by 0x563B6B7: xdr_virLogManagerProtocolDomainReadLogFileRet (log_protocol.c:123)
==20469== by 0x164B34: virNetMessageDecodePayload (virnetmessage.c:407)
==20469== by 0x5682360: virNetClientProgramCall (virnetclientprogram.c:379)
==20469== by 0x563B30E: virLogManagerDomainReadLogFile (log_manager.c:272)
==20469== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485)
==20469== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660)
==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696)
==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957)
==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955)
This points to memmove() in qemuProcessReadLog(). Imagine we just
read the following string from qemu:
After the first pass of the while() loop in the
qemuProcessReadLog() (in which we have taken the false branch in
the if) @buf still points to the beginning of the string,
@filter_next points to the beginning of the second line. So we
start second iteration because there is yet another newline
character at the end. In this iteration @eol points to it
actually. Now, the control gets inside true branch of if(). Just
to remind you:
got = 58
filter_next = buf + 5,
eol = buf + 58.
Therefore skip = 54 which is correct. The message we want to skip
is 54 bytes long. However:
memmove(filter_next, eol + 1, (got - skip) +1);
which is
memmove(filter_next, eol + 1, 5)
is obviously wrong as there is only one byte we can access, not 5!
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
That happens because without any specific options gcc doesn't keep enum
information in the resulting binary object. I managed to isolate the
parameters of gcc that caused this issue to disappear, however I
remember that they influenced the resulting binaries quite a bit and
were definitely not something we would want to add as mandatory to the
build process.
So to deal with this cleanly, let's take that enum and separate it out
to its own header file. Since it is only used in the lockd driver and
the protocol, lock_driver_lockd.h feels like a suitable name.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Michal Privoznik [Mon, 18 Jan 2016 08:11:19 +0000 (09:11 +0100)]
qemuTestDriverInit: fill driver with zeroes
In the commit aea47e48c473a we have fixed a single pointer within
driver structure. Since all callers pass statically allocated
driver on stack other pointers within driver may contain random
values too. Before touching it lets overwrite it with zeroes and
thus fix all dangling pointers.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Jiri Denemark [Fri, 15 Jan 2016 15:34:37 +0000 (16:34 +0100)]
security: Do not restore labels on device tree binary
A device tree binary file specified by /domain/os/dtb element is a
read-only resource similar to kernel and initrd files. We shouldn't
restore its label when destroying a domain to avoid breaking other
domains configure with the same device tree.
Jiri Denemark [Fri, 15 Jan 2016 09:55:58 +0000 (10:55 +0100)]
security: Do not restore kernel and initrd labels
Kernel/initrd files are essentially read-only shareable images and thus
should be handled in the same way. We already use the appropriate label
for kernel/initrd files when starting a domain, but when a domain gets
destroyed we would remove the labels which would make other running
domains using the same files very unhappy.
Yaniv Kaul [Fri, 15 Jan 2016 07:33:49 +0000 (08:33 +0100)]
qemu: Print better warning in qemuAgentNotifyEvent
We have this function qemuAgentNotifyEvent() which is supposed to
be called from thread pool responsible for processing qemu
monitor events. The function then should wake up other thread
that is waiting for a guest to shutdown or reboot. However, if we
have received a different error a warning is printed out. This
warning lacks info on which event is expected.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Tue, 22 Dec 2015 14:46:01 +0000 (09:46 -0500)]
cgroup: Fix possible bug as a result of code motion for vcpu cgroup setup
Commit id '90b721e43' moved where the virCgroupAddTask was made until
after the check for the vcpupin checks. However, in doing so it missed
an option where if the cpumap didn't exist, then the code would continue
back to the top of the current vcpu loop. The results was that the
virCgroupAddTask wouldn't be called.