Peter Krempa [Thu, 15 Apr 2021 15:15:26 +0000 (17:15 +0200)]
conf: domain: Move default setting from virDomainDiskDefParseXML to virDomainDiskDefPostParse
Move the setting of read-only state, the default disk bus and setting of
'snapshot' state for read-only disks to the post parse callback to clean
up the disk parser.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Wed, 14 Apr 2021 14:48:53 +0000 (16:48 +0200)]
conf: domain: Introduce an internal variant of virDomainDiskDefNew
The <disk> XML element parser is going to be modified so that the
virStorageSource bits are pre-parsed. Add virDomainDiskDefNewSource,
which uses an existing 'src' pointer.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Wed, 14 Apr 2021 14:43:22 +0000 (16:43 +0200)]
qemu: driver: Use virDomainDiskDefParseSource in qemuDomainBlockCopy
qemuDomainBlockCopy needs just the source portion of the disk but uses
the disk parser for it. Since we have a specific function now, refactor
the code to avoid having to deal with the unused virDomainDiskDef.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Sat, 17 Apr 2021 07:59:51 +0000 (09:59 +0200)]
lxc: Format --handshakefd for controller cmd fully
The command line argument is called --hanshakefd (check out
lxc_controller.c:main()). But the command line builder puts only
--handshake. This works, because there is no other argument
sharing the prefix.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
I've encountered the following bug, but only on Gentoo with
systemd and CGroupsV2. I've started an LXC container successfully
but destroying it reported the following error:
error: Failed to destroy domain 'amd64'
error: internal error: failed to get cgroup backend for 'pathOfController'
Debugging showed, that CGroup hierarchy is full of surprises:
Anyway, those nested directories should not be a problem, because
virCgroupKillRecursiveInternal() removes them recursively, right?
Sort of. The function really does remove nested directories, but
it assumes that every directory has the same controller as the
rest. Just take a look at virCgroupV2KillRecursive() - it gets
'Any' controller (the first one it found in ".scope") and then
passes it to virCgroupKillRecursiveInternal().
This assumption is not true though. The controllers found in
".scope" are the following:
cpuset cpu io memory pids
while "libvirt" has fewer:
cpuset cpu io memory
Up until now it's not problem, because of how we order
controllers internally - "cpu" is the first and thus picking
"Any" controller returns just that. But the rest of directories
has no controllers, their "cgroup.controllers" is just empty.
What fixes the bug is dropping @controller argument from
virCgroupKillRecursiveInternal() and letting each iteration work
pick its own controller.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Fri, 16 Apr 2021 14:14:33 +0000 (16:14 +0200)]
vircgroupbackend: Extend error messages in VIR_CGROUP_BACKEND_CALL()
The VIR_CGROUP_BACKEND_CALL() macro gets a backend for controller
and calls corresponding callback in it. If either is NULL then an
error message is printed out. However, the error message contains
only the intended callback func and not controller or backend
found.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Mon, 19 Apr 2021 06:11:55 +0000 (08:11 +0200)]
cmdDomBlkError: Fix crash when initial call to virDomainGetDiskErrors fails
virDomainGetDiskErrors uses the weird semantics where we make the
caller query for the number of elements and then pass pre-allocated
structure.
The cleanup section errorneously used the 'count' variable to free the
allocated elements for the API but 'count' can be '-1' in cases when the
API returns failure, thus attempting to free beyond the end of the
array.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/155 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 15 Apr 2021 11:26:37 +0000 (13:26 +0200)]
virsh: snapshot: Don't validate schema of XML generated by 'virsh snapshot-create-as'
Commit 95f8e3237e5486f487324c6 which introduced XML schema validation
for snapshot XMLs always asserted the validation for the XML generated
by 'virsh snapshot-create-as' on the basis that it's libvirt-generated,
thus valid.
This unfortunately isn't true as users can influence certain bits of the
XML such as the disk image path which must be a full path. Thus if a
user tries to invoke virsh as:
$ virsh snapshot-create-as upstream --diskspec vda,file=relative.qcow2
error: XML document failed to validate against schema: Unable to validate doc against /path/to/domainsnapshot.rng
Extra element disks in interleave
Element domainsnapshot failed to validate content
They get a rather useless error from the libxml2 RNG validator.
With this fix applied, we get to the XML parser in libvirtd which has a
more reasonable error:
$ virsh snapshot-create-as upstream --diskspec vda,file=relative.qcow2
error: XML error: disk snapshot image path 'relative.qcow2' must be absolute
Instead users can force validation of the XML generated by 'virsh
snapshot-create-as' by passing the '--validate' flag.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 14 Apr 2021 11:03:58 +0000 (13:03 +0200)]
virXMLParseHelper: Rework error reporting
Move the reporting of parsing error on the error path of the parser as
other code paths report their own errors already.
Additionally prefer printing the 'url' as document name if provided
instead of "[inline data]" as that usually gives a better hint at least
which kind of XML is being parsed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Tim Wiederhake <twiederh@redhat.com>
Currently virMediatedDeviceGetIOMMUGroupDev() looks up the iommu group
number and uses that to construct a path to the iommu group device.
virMediatedDeviceGetIOMMUGroupNum() then uses that device path and takes
the basename to get the group number. That's unnecessary extra string
manipulation for *GroupNum(). Reverse the implementations and make
*GroupDev() call *GroupNum().
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Tim Wiederhake [Thu, 15 Apr 2021 08:12:13 +0000 (10:12 +0200)]
virlog: Remove stray "todo" in comment
Fixes: 8fe30b2167b5b56461b11dbf02aca83030070caf Signed-off-by: Tim Wiederhake <twiederh@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com>
Jim Fehlig [Tue, 13 Apr 2021 23:29:19 +0000 (17:29 -0600)]
libxl: Add debug statements
Over several years of debugging reports related to VM shutdown, destruction,
and cleanup, I've found that logging of all events received from libxl and
logging the entry of libxlDomainCleanup has proven useful. Add the these
debug messages upstream to aid in future debugging.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Wed, 14 Apr 2021 08:09:35 +0000 (10:09 +0200)]
qemu: Expose disk serial in virDomainGetGuestInfo()
When querying guest info via virDomainGetGuestInfo() the
'guest-get-disks' agent command is called. It may report disk
serial number which we parse, but never report nor use for
anything else.
As it turns out, it may help management application find matching
disk in their internals.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-By: Tomáš Golembiovský <tgolembi@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When running on systemd host the cgroup itself is removed by machined
so when we reach this code the directory no longer exist. If libvirtd
was running the whole time between starting and destroying VM the
detection is skipped because we still have both FD in memory. But if
libvirtd was restarted and no operation requiring cgroup devices
executed the FDs would be 0 and libvirt would try to detect them using
the cgroup directory. This results in reporting following errors:
libvirtd[955]: unable to open '/sys/fs/cgroup/machine.slice/machine-qemu\x2d1\x2dguest.scope/': No such file or directory
libvirtd[955]: Failed to remove cgroup for guest
When running on non-systemd host where we handle cgroups manually this
would not happen.
When destroying VM it is not necessary to detect the BPF prog and map
because the following code only closes the FDs without doing anything
else. We could run code that would try to detach the BPF prog from the
cgroup but that is not necessary as well. If the cgroup is removed and
there is no other FD open to the prog kernel will cleanup the prog and
map eventually.
Reported-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Tested-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Pavel Hrdina [Fri, 9 Apr 2021 16:28:09 +0000 (18:28 +0200)]
vircgroupv2: properly free BPF prog and map FDs
When nested cgroup was introduced it did not properly free file
descriptors for BPF prog and map. With nested cgroups we create the BPF
bits in the nested cgroup instead of the VM root cgroup.
This would leak the FDs which would be the last reference to the prog
and map so kernel would not remove the resources as well. It would only
happen once libvirtd process exits.
Fixes: 184245f53b94fc84f727eb6e8a2aa52df02d69c0 Reported-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Tested-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Tue, 13 Apr 2021 09:50:11 +0000 (11:50 +0200)]
nodedev: Don't fail device enumeration if MDEVCTL is missing
After all devices were enumerated, the enumeration thread call
nodeDeviceUpdateMediatedDevices() to refresh the state of
mediated devices. This means that 'mdevctl' will be executed. But
it may be missing on some systems (e.g. mine) in which case we
should just skip the update of mdevs instead of failing whole
device enumeration.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Michal Privoznik [Tue, 13 Apr 2021 08:45:24 +0000 (10:45 +0200)]
nodedev: Mark device initialization complete even in case of an error
To speed up nodedev driver initialization, the device enumeration
is done in a separate thread. Once finished, the thread sets a
boolean variable that allows public APIs to be called (instead of
waiting for the thread to finish).
However, if there's an error in the device enumeration thread
then the control jumps over at the 'error' label and the boolean
is never set. This means, that any virNodeDev*() API is stuck
forever. Mark the initialization as complete (the thread is
quitting anyway) and let the APIs proceed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Michal Privoznik [Tue, 13 Apr 2021 09:18:23 +0000 (11:18 +0200)]
nodedev: Wait for device initialization in all public API callbacks
Although I have not experienced this in real life, there is a
possible race condition when creating new device, getting its XML
or parent or listing its capabilities. If the nodedev driver is
still enumerating devices (in a separate thread) and one of
virNodeDeviceGetXMLDesc(), virNodeDeviceGetParent(),
virNodeDeviceNumOfCaps(), virNodeDeviceListCaps() or
virNodeDeviceCreate() is called then it can lead to spurious
results because the device enumeration thread is removing devices
from or adding them to the internal list of devices (among with
their states).
Therefore, wait for things to settle down before proceeding with
any of the APIs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Michal Privoznik [Tue, 13 Apr 2021 08:52:07 +0000 (10:52 +0200)]
nodedev: Signal initCond with driver locked
This is more academic dispute than a real bug, but this is taken
from pthread_cond_broadcast(3p) man:
The pthread_cond_broadcast() or pthread_cond_signal() functions
may be called by a thread whether or not it currently owns the
mutex that threads calling pthread_cond_wait() or
pthread_cond_timedwait() have associated with the condition
variable during their waits; however, if predictable scheduling
behavior is required, then that mutex shall be locked by the
thread calling pthread_cond_broadcast() or
pthread_cond_signal().
Therefore, broadcast the initCond while the nodedev driver is
still locked.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Michal Privoznik [Tue, 13 Apr 2021 09:14:15 +0000 (11:14 +0200)]
nodedev: Rename nodeDeviceWaitInit()
The consensus is to put the verb last. Therefore, the new name is
nodeDeviceInitWait(). This allows us to introduce new function
(done later in a separate commit) that will "complete" the device
initialization.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
But usefulness of such declaration is questionable, at best.
Unfortunately, we can't drop every such declaration - we have to
carry some over, because they are part of public API (e.g.
virDomainPtr). But for internal types - we can do drop them and
use what every other C project uses 'virXXX *'.
This change was generated by a very ugly shell script that
generated sed script which was then called over each file in the
repository. For the shell script refer to the cover letter:
Pavel Hrdina [Mon, 12 Apr 2021 20:55:02 +0000 (22:55 +0200)]
qemu_conf: properly set 'deprecation_behavior' default value
The comment for that option states that the default value is 'none' but
it was not set by the code. By default the value is NULL which results
into the following warning:
warning : qemuBuildCompatDeprecatedCommandLine:10393 : Unsupported deprecation behavior '(null)' for VM 'test'
Fixes: 700450449377be4bf923e91d00f8fe8cf0975f66 Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Luke Yue [Mon, 12 Apr 2021 14:04:27 +0000 (22:04 +0800)]
virfile: Replace AbsPath judgement method with g_path_is_absolute()
The g_path_is_absolute() considers more situations
than just a simply "path[0] == '/'".
Related issue: https://gitlab.com/libvirt/libvirt/-/issues/12 Signed-off-by: Luke Yue <lukedyue@gmail.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 12 Apr 2021 15:42:23 +0000 (17:42 +0200)]
qemuBlockJobProcessEventCompletedPull: Add backingStore terminators if base is NULL
When doing a blockpull with NULL base the full contents of the disk are
pulled into the topmost image which then becomes fully self-contained.
qemuBlockJobProcessEventCompletedPull doesn't install the backing chain
terminators though, although it's guaranteed that there will be no
backing chain behind disk->src.
Add the terminators for completness and for disabling backing chain
detection on further boots.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Mon, 12 Apr 2021 15:24:22 +0000 (17:24 +0200)]
qemuBlockJobProcessEventCompletedPull: Avoid dangling pointer after blockpull
When doing a full block pull job (base == NULL) and the config XML
contains a compatible disk, the completer function would leave a
dangling pointer in 'cfgdisk->src->backingStore' as cfgdisk->src would
be set to the value of 'cfgbase' which was always set to
'cfgdisk->src->backingStore'.
This is wrong though since for the live definition XML we set the
respective counterpart to 'job->data.pull.base' which is NULL in the
above scenario.
This leads to a invalid pointer read when saving the config XML and may
end up in a crash.
Resolve it by setting 'cfgbase' only when 'job->data.pull.base' is
non-NULL.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1946918 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Mon, 12 Apr 2021 15:26:06 +0000 (17:26 +0200)]
nodedev: Only set up mdevctl monitors if mdevctl.d exist
During its initialization, the nodedev driver tries to set up
monitors for /etc/mdevctl.d directory, so that it can register
mdevs as they come and go. However, if the file doesn't exist
there is nothing to monitor and therefore we can exit early. In
fact, we have to otherwise monitorFileRecursively() fails and
whole driver initialization fails with it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Mon, 12 Apr 2021 14:25:13 +0000 (16:25 +0200)]
nodedev: Don't join not spawned threads
During the nodedev driver initialization two threads are created:
one for listening on udev events (like device plug/unplug) and
the other for enumerating devices (so that the main thread doing
the driver init is not blocked). If something goes wrong at any
point then nodeStateCleanup() is called which joins those two
threads (possibly) created before. But it tries to join them even
they weren't created which is undefined behaviour (and it just so
happens that it crashes on my system).
If those two virThread variables are turned into pointers then we
can use comparison against NULL to detect whether threads were
created.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Mon, 12 Apr 2021 14:20:57 +0000 (16:20 +0200)]
nodedev: Lock @priv sooner
The nodedev driver private data object @priv is created by
calling udevEventDataNew(). After that, driver->privateData
pointer is set to the freshly allocated object and only a few
lines after all of this the object is locked. Technically it is
safe because there should not be any other thread at this point,
but defensive style of programming says it's better if the object
is locked before driver's privateData is set.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Mon, 12 Apr 2021 14:18:24 +0000 (16:18 +0200)]
nodedev: Unlock @priv if initialization of mdevctlMonitors fails
If initialization of priv->mdevctlMonitors fails, then the
control jumps over to cleanup label where nodeStateCleanup() is
called which tries to lock @priv. But since @priv was already
locked before taking the jump a deadlock occurs. The solution is
to jump onto @unlock label, just like the code around is doing.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Mon, 12 Apr 2021 16:18:49 +0000 (18:18 +0200)]
bhyve: Fix declaration of 'params' in 'bhyveParsePCIFbuf'
In commit ad80bba90a3 I mistakenly didn't delete '**' from the
variable declaration when converting it to 'GStrv' and deleted the
'separator' variable since it was declared on the same line as a
different variable.
Fixes: ad80bba90a3 Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Peter Krempa [Thu, 8 Apr 2021 14:50:18 +0000 (16:50 +0200)]
tests: qemucapabilitiesdata: Fix formatting of manually added hunk
Commit 66c5674e797 added a query for the device properties of 'usb-host'
but the command header isn't formated the same way as if it were
autogenerated. Reformat all the files.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Fri, 9 Apr 2021 13:18:41 +0000 (15:18 +0200)]
qemuxml2argvtest: Parse 'arch' from XML early
If we want to provide correct (fake) caps already for the XML parser we
need to be able to parse the arch early so that we can properly
initialize the caps cache prior to calling the XML parser.
This patch adds code which parses the arch and updates the caps cache
prior to the parse step.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Wed, 31 Mar 2021 14:11:42 +0000 (16:11 +0200)]
qemuxml2argvtest: Rewrite parsing of XMLs to provide earlier parsing
In upcoming patches we'll need to parse a certain bit of XML before
calling the full XML parser. This effectively open-codes what
virDomainDefParseFile to reach virDomainDefParseNode.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>