conf: use a hash table for storing nwfilter object list
The current use of an array for nwfilter objects requires
the caller to iterate over all elements to find a filter,
and also requires locking each filter.
Switching to a pair of hash tables enables O(1) lookups
both by name and uuid, with no locking required.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
nwfilter: update comment about locking filter updates
The comment against the 'updateMutex' refers to a problem with
lock ordering when looking up filters in the virNWFilterObjList
which uses an array. That problem does indeed exist.
Unfortunately it claims that switching to a hash table would
solve the lock ordering problems during instantiation. That
is not correct because there is a second lock ordering
problem related to how we traverse related filters when
instantiating filters. Consider a set of filters:
Filter A:
Reference Filter C
Reference Filter D
Filter B:
Reference Filter D
Reference Filter C
In one example, we lock A, C, D, in the other example
we lock A, D, C.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
nwfilter: fix crash when counting number of network filters
The virNWFilterObjListNumOfNWFilters method iterates over the
driver->nwfilters, accessing virNWFilterObj instances. As such
it needs to be protected against concurrent modification of
the driver->nwfilters object.
This API allows unprivileged users to connect, so users with
read-only access to libvirt can cause a denial of service
crash if they are able to race with a call of virNWFilterUndefine.
Since network filters are usually statically defined, this is
considered a low severity problem.
This is assigned CVE-2022-0897.
Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Boris Fiuczynski [Thu, 17 Mar 2022 09:48:30 +0000 (10:48 +0100)]
nodedev: trigger mdev device definition update on udev add and remove
When nodedev objects are added and removed if possible check if mdev-types is
supported by the object and trigger a mdev device definition update to correct
the associated parent nodedevs.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Boris Fiuczynski [Thu, 17 Mar 2022 09:48:29 +0000 (10:48 +0100)]
nodedev: update mdevs on parent change
The parent of the mdev definition can change due to the existance of the
parent device. The parents existance can e.g. depend on the device
driver load state.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Michal Privoznik [Thu, 17 Mar 2022 08:19:39 +0000 (09:19 +0100)]
virnetdev: Use VIR_WITH_MUTEX_LOCK_GUARD in virNetDevGenerateName()
The virNetDevGenerateName() function uses a global array of
virNetDevGenName structs to find next unused name for network
device. This obviously needs some locking and in fact each member
of the array has its own lock. However, these members are not
virObjects, they are just plain structs, therefore
VIR_WITH_MUTEX_LOCK_GUARD() must be used instead of
VIR_WITH_OBJECT_LOCK_GUARD() to lock individual mutexes.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
qemu: domainjob: Allow InitJob if cb is not set in qemuDomainObjInitJob()
This allows init job even if cb structure is not set. This patch
also includes slight rewriting of the function to make it look
cleaner when freeing resources, by allocating privateData at the
end.
Signed-off-by: Kristina Hanicova <khanicov@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
qemu: domainjob: Allow operations if cb is not set in job structure
We should allow resetting / freeing / restoring / parsing /
formatting qemuDomainJobObj even if 'cb' attribute is not set.
This is theoretical for now, but the attribute must not be always
set in the future. It is sufficient to check if 'cb' exists
before dereferencing it.
Michal Privoznik [Tue, 15 Mar 2022 11:45:54 +0000 (12:45 +0100)]
qemu_cgroup: Don't deny devices from cgroupDeviceACL
On domain startup a couple of devices are allowed in the devices
controller no matter the domain configuration. The aim is to
allow devices crucial for QEMU or one of its libraries, or user
is passing through a device (e.g. through additional cmd line
arguments) and wants QEMU to access it.
However, during unplug it may happen that a device is configured
to use one of such devices and since we deny /dev nodes on
hotplug we would deny such device too. For example,
/dev/urandom belongs onto the list of implicit devices and users
can hotplug and hotunplug an RNG device with /dev/urandom as
backend.
The fix is fortunately simple - just consult the list of implicit
devices before removing the device from the namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Tue, 15 Mar 2022 11:37:44 +0000 (12:37 +0100)]
qemu_cgroup: Drop ENOENT special case for RNG devices
When allowing or denying RNG device in CGroups there's a special
check if the backend device exists (errno == ENOENT) in which
case success is returned to caller. This is in contrast with the
rest of the functions and in fact wrong too - if the backend
device doesn't exist then QEMU will fail opening it. Might as
well signal error here.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Mon, 14 Mar 2022 12:35:15 +0000 (13:35 +0100)]
qemu_namespace: Be less aggressive in removing /dev nodes from namespace
When creating /dev nodes in a QEMU domain's namespace the first
thing we simply do is unlink() the path and create it again. This
aims to solve the case when a file changed type/major/minor in
the host and thus we need to reflect this in the guest's
namespace. Fair enough, except we can be a bit more clever about
it: firstly check whether the path doesn't already exist or isn't
already of the correct type/major/minor and do the
unlink+creation only if needed.
Currently, this is implemented only for symlinks and
block/character devices. For regular files/directories (which are
less common) this might be implemented one day, but not today.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Michal Privoznik [Mon, 14 Mar 2022 14:05:11 +0000 (15:05 +0100)]
qemu_namespace: Don't unlink paths from cgroupDeviceACL
When building namespace for a domain there are couple of devices
that are created independent of domain config (see
qemuDomainPopulateDevices()). The idea behind is that these
devices are crucial for QEMU or one of its libraries, or user is
passing through a device and wants us to create it in the
namespace too. That's the reason that these devices are allowed
in the devices CGroup controller as well.
However, during unplug it may happen that a device is configured
to use one of such devices and since we remove /dev nodes on
hotplug we would remove such device too. For example,
/dev/urandom belongs onto the list of implicit devices and users
can hotplug and hotunplug an RNG device with /dev/urandom as
backend.
The fix is fortunately simple - just consult the list of implicit
devices before removing the device from the namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Michal Privoznik [Sat, 12 Mar 2022 04:41:56 +0000 (05:41 +0100)]
virsh: Don't open code virshEnumComplete()
Now that we have a function that generates string list for given
enum, let's use that instead of open coding it.
Note, after this there are still some 'candidates' left (e.g,
virshNetworkEventNameCompleter(), or
virshNetworkUpdateCommandCompleter()). These are not converted
because either they don't have a convenient int2str function or
they don't start from the very beginning of the enum.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Sat, 12 Mar 2022 04:37:50 +0000 (05:37 +0100)]
virsh: Introduce virshEnumComplete()
We have plenty of completers which iterate over all values of
given enum and do nothing more than translate every member into
string (using corresponding virXXXTypeToString()).
Introduce a convenience function so that callers can pass just
VIR_XXX_LAST and virXXXTypeToString and the rest is taken care
of.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Michal Privoznik [Fri, 11 Mar 2022 08:13:56 +0000 (09:13 +0100)]
virsh: Properly terminate string list in virshDomainInterfaceSourceModeCompleter()
A completer must return a NULL terminated list of strings, which
means that when dealing with enums, it has to allocate one
pointer more than the value of VIR_XXX_LAST. But this is not
honoured in virshDomainInterfaceSourceModeCompleter() leading to
out of bounds read.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 10 Mar 2022 11:59:30 +0000 (12:59 +0100)]
qemu: migration: Use 'VIR_MIGRATE_PARAM_TLS_DESTINATION' for the NBD connection
The NBD connection for non-shared storage migration can have the same
issue regarding TLS certificate name match as the migration connection
itself.
Propagate the configured name also for the NBD connections.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1901394 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Thu, 10 Mar 2022 09:05:53 +0000 (10:05 +0100)]
conf: Add support for setting expected TLS hostname for NBD disks
In cases when the hostname of the NBD server doesn't match the hostname
in the TLS certificate the new attribute 'tlsHostname' can be used to
override it.
Add the XML infrastructure and tests.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Notable changes:
- 'tls-hostname' field for NBD client to override local hostname
- machine types 'pc-i440fx-1.7' and older are now deprecated
- 'snapshot-access' block driver added
- The 'protocol' field of 'set_password' and 'expire_password'
parameter is now an enum instead of a pure string allowing 'vnc' and
'spice' as value and the arguments are also covered by the schema.
- 'copy-before-write' block driver now has a 'bitmap' property
- 'query-migrate' now reports 'precopy-bytes', 'downtime-bytes',
'postcopy-bytes' for 'ram' and 'disk' statistics
- RTC_CHANGE event now has a 'qom-path' property to identify the RTC
- 'umip' cpu feature is now migratable
- SGX property 'section-size' reinstated after regression
Changes in build setting:
- fuse block export support now enabled
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 9 Mar 2022 14:43:56 +0000 (15:43 +0100)]
virDomainSnapshotDefParse: Decouple parsing of memory snapshot config
Separate the steps of parsing the memory snapshot config from the
post-processing and validation code. The upcoming patch refactoring the
parsing will be simpler.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 8 Mar 2022 17:20:46 +0000 (18:20 +0100)]
conf: snapshot: Remove VIR_DOMAIN_SNAPSHOT_PARSE_DISKS flag
All callers except the one in the 'esx' driver pass the flag. The 'esx'
driver has a check that 'def->ndisks' is zero after parsing the
definition. This means that we can simply always parse the disks.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 8 Mar 2022 17:08:54 +0000 (18:08 +0100)]
qemuDomainSnapshotForEachQcow2Raw: Act only on internal snapshots
Similarly to the external snapshot code the internal inactive snapshot
creation helper should act only when an internal snapshot of the disk is
required. For now the callers ensure that it's either _INTERNAL or _NO
when control reaches this function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Fri, 4 Mar 2022 13:34:02 +0000 (14:34 +0100)]
qemuSnapshotDiskPrepareActiveExternal: Handle only external snapshots
Preparation steps ensure that the 'snapshot' field can only be
'VIR_DOMAIN_SNAPSHOT_LOCATION_NONE' or
VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL' at this point, but upcoming
patches will change that. Handle only external snapshots.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Haonan Wang [Thu, 10 Mar 2022 15:22:24 +0000 (23:22 +0800)]
virsh: Provide completer for vol-wipe algorithms
Related issue: https://gitlab.com/libvirt/libvirt/-/issues/9
Signed-off-by: Haonan Wang <hnwanga1@gmail.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The security label setting for the external images is part of the
'source' element and documented there. Remove the empty definition added
accidentally in commit ac88a8cfad1
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 9 Mar 2022 08:40:31 +0000 (09:40 +0100)]
docs: formatsnapshot: Remove explicit listing of supported snapshot formats
In blockdev mode we support creating snapshots on all kinds of storage
that qemu allows us to format the image. Drop the part of the sentence
enumerating explicitly supported protocols.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 9 Mar 2022 08:34:40 +0000 (09:34 +0100)]
docs: formatsnapshot: Move paragraphs describing 'disk' element together
There was another paragraph describing the attribute 'type' of the
'disk' element under the description of the subelements. Move it to the
top to get all relevant information in one place.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
When using thue 'run' script to launch a daemon, it is intended to
temporarily stop the systemd units and re-start them again after.
When using this script over an SSH connection, it will get SIGHUP
if the connection goes away, and in this case it fails to re-start
the systemd units. We need to catch SIGHUP and turn it into a
normal python exception. For good measure we do the same for
SIGQUIT and SIGTERM too. SIGINT already gets turned into an
exception by default which we handle.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently the 'run' script modifies $PATH to add the 'tools'
directly to pick up client programs. It fails to add the 'src'
directory to pick up the daemons.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
conf: remove misleading comments about access being 'lockless'
For the various structs storing lists of objects, the access
to the hash tables is not lockless. The mutex on the object
owning the hash table must be held.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
We are not guaranteed that the string we are printing onto stdout
contains '\n' and thus that the stdout is flushed. In fact, I've
met this problem when virsh asked me whether I want to edit the
domain XML again (vshAskReedit()) but the prompt wasn't displayed
(as it does not contain a newline character) and virsh just sat
there waiting for my input, I sat there waiting for virsh's
output. Flush stdout after all fputs()-s which do not flush
stdout.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
A bit of effort by me and Michal helped make this the case, and it helped us
uncover some potential issues. I am not documenting it as supported or adding
an Alpine container into the CI, but since there were some distribution bugs
mentioning libvirt issues I thing it would be nice of us to notify those
distribution maintainers that read our release news.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Linux netfilter at some point (Linux 2.6.39) inverted the meaning of the
'--ctdir reply' and newer netfilter implementations now expect
'--ctdir original' instead and vice-versa.
We check for the kernel version and assume that all Linux kernels with version
2.6.39 have the newer inverted logic.
Any distro backporting the Linux kernel patch that inverts the --ctdir logic
(Linux commit 96120d86f) must also backport this patch for Linux and
adapt the kernel version being tested for.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Given our supported platform targets, we no longer need to
consider a version of Linux before 2.6.39, so can drop
support for the old direction behaviour.
The test suite updates are triggered because that never
probed for the ctdir direction, and so the iptables syntax
generator unconditionally dropped the ctdir args.
Reviewed-by: Laine Stump <laine@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Since iptables version 1.4.16 '-m state --state NEW' is converted to
'-m conntrack --ctstate NEW'. Therefore, when encountering this or later
versions of iptables use '-m conntrack --ctstate'.
Given our supported platform targets, we no longer need to
consider a version of iptables before 1.4.16, so can drop
support for the old syntax.
The test suite updates are triggered because that never
probed for the new syntax, and so unconditionally
generated the old syntax.
Reviewed-by: Laine Stump <laine@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Mon, 7 Mar 2022 15:02:55 +0000 (16:02 +0100)]
docs: Convert 'governance' page to rST
Extra care is taken to preserve the 'codeofconduct' anchor which is used
in our page template. Upcoming patch will change that but we'll retain
the anchor.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
libvirt-qemu: Don't allow NULL cmd in virDomainQemuMonitorCommandWithFiles()
Nothing in daemon code is prepared for the command in
virDomainQemuMonitorCommandWithFiles() to be NULL. In fact, the
client side doesn't expect this either as our RPC describes the
argument as:
remote_nonnull_string cmd;
Validate the argument in the public API implementation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>