Pavel Hrdina [Tue, 17 Jan 2023 09:33:22 +0000 (10:33 +0100)]
docs: document correct cpu shares limits with both cgroups v1 and v2
The limits are different with cgroups v1 and v2 but our XML
documentation and virsh manpage mentioned only cgroups v1 limits without
explicitly saying it only applies to cgroups v1.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Pavel Hrdina [Tue, 17 Jan 2023 09:08:08 +0000 (10:08 +0100)]
domain_validate: drop cpu.shares cgroup check
This check is done when VM is defined but doesn't take into account what
cgroups version is currently used on the host system so it doesn't work
correctly.
To make proper check at this point we would have to figure out cgroups
version while defining a VM but that will still not guarantee that the
VM will start correctly in the future as the host may be rebooted with
different cgroups version.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Note that with the introduction of SPDX, Fedora no longer wants
maintainers to do effective license analysis, hence we now list
all the licenses that are applicable to the binary package
contents
Laine Stump [Fri, 13 Jan 2023 04:42:18 +0000 (23:42 -0500)]
tests: remove unused qemu .args file
net-user-passt.args was generated early during testing of the passt
qemu commandline, when qemuxml2argvtest was using
DO_TEST("net-user-passt"). This was later changed to
DO_TEST_CAPS_LATEST(), so the file net-user-passt.x86_64-latest.args
is used instead, but the original (now unused) test file was
accidentally added to the original patch. This patch removes it.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Laine Stump [Fri, 13 Jan 2023 04:42:16 +0000 (23:42 -0500)]
conf: remove <backend upstream='xxx'/> attribute
This attribute was added to support setting the --interface option for
passt, but in a post-push/pre-9.0-release review, danpb pointed out
that it would be better to use the existing <source dev='xxx'/>
attribute to set --interface rather than creating a new attribute (in
the wrong place). So we remove backend/upstream, and change the passt
commandline creation to grab the name for --interface from source/dev.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
qemuBuildThreadContextProps: Generate ThreadContext less frequently
Currently, the ThreadContext object is generated whenever we see
.host-nodes attribute for a memory-backend-* object. The idea was
that when the backend is pinned to a specific set of host NUMA
nodes, then the allocation could be happening on CPUs from those
nodes too. But this may not be always possible.
Users might configure their guests in such way that vCPUs and
corresponding guest NUMA nodes are on different host NUMA nodes
than emulator thread. In this case, ThreadContext won't work,
because ThreadContext objects live in context of the emulator
thread (vCPU threads are moved around by us later, when emulator
thread finished its setup and spawned vCPU threads - see
qemuProcessSetupVcpus()). Therefore, memory allocation is done by
emulator thread which is pinned to a subset of host NUMA nodes,
but tries to create a ThreadContext object with a disjoint subset
of host NUMA nodes, which fails.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2154750 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
security_selinux: Set and restore /dev/sgx_* labels
For SGX type of memory, QEMU needs to open and talk to
/dev/sgx_vepc and /dev/sgx_provision files. But we do not set nor
restore SELinux labels on these files when starting a guest.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Erik Skultety [Thu, 12 Jan 2023 06:57:58 +0000 (07:57 +0100)]
ci: integration: Set an expiration on logs artifacts
The default expiry time is 30 days. Since the RPM artifacts coming from
the previous pipeline stages are set to expire in 1 day we can set the
failed integration job log artifacts to the same value. The sentiment
here is that if an integration job legitimately failed (i.e. not with
an infrastructure failure) unless it was fixed in the meantime it will
fail the next day with the scheduled pipeline again, meaning, that even
if the older log artifacts are removed, they'll be immediately
replaced with fresh ones.
Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Wed, 11 Jan 2023 08:54:59 +0000 (09:54 +0100)]
qemu: Don't check pidfile in qemuPasstStart
The pidfile is guaranteed to be non-NULL (thanks to glib allocation
functions) and it's dereferenced two lines above anyway.
Reported by coverity:
/src/qemu/qemu_passt.c: 278 in qemuPasstStart()
272 return 0;
273
274 error:
275 ignore_value(virPidFileReadPathIfLocked(pidfile, &pid));
276 if (pid != -1)
277 virProcessKillPainfully(pid, true);
>>> CID 404360: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "pidfile" suggests that it may be null, but it
>>> has already been dereferenced on all paths leading to the check.
278 if (pidfile)
279 unlink(pidfile);
280
281 return -1;
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Peter Krempa [Tue, 10 Jan 2023 13:31:27 +0000 (14:31 +0100)]
qemu: Fix handling of passed FDs in remoteDispatchDomainFdAssociate
To ensure same behaviour when remote driver is or is not used we must
not steal the FDs and array holding them passed to qemuDomainFDAssociate
but rather duplicate them. At the same time the remote driver must close
and free them to prevent leak.
Pointed out by Coverity as FD leak on error path:
*** CID 404348: Resource leaks (RESOURCE_LEAK)
/src/remote/remote_daemon_dispatch.c: 7484 in remoteDispatchDomainFdAssociate()
7478 rv = 0;
7479
7480 cleanup:
7481 if (rv < 0)
7482 virNetMessageSaveError(rerr);
7483 virObjectUnref(dom);
>>> CID 404348: Resource leaks (RESOURCE_LEAK)
>>> Variable "fds" going out of scope leaks the storage it points to.
7484 return rv;
Fixes: abd9025c2fd Fixes: f762f87534e Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
spec: Restart libvirtd on upgrade without socket activation
The %posttrans phase has a special case for upgrading libvirt daemon
with --listen, but it forgot to also restart the daemon in order to
run the new installed version.
remote: fix double free of migration params on error
The remote_*_args methods will generally borrow pointers
passed in the caller, so should not be freed.
On failure of the virTypedParamsSerialize method, however,
xdr_free was being called. This is presumably because it
was thought that the params may have been partially
serialized and need cleaning up. This is incorrect, as
virTypedParamsSerialize takes care to cleanup partially
serialized data. This xdr_free call would lead to free'ing
the borrowed cookie pointers, which would be a double free.
Reviewed-by: Martin Kletzander <mkletzan@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Laine Stump [Fri, 6 Jan 2023 23:36:44 +0000 (18:36 -0500)]
specfile: require passt for the build if fedora >= 36 or rhel >= 9
The only reason we need it at build time is to find its location in
$PATH so it can be hardcoded into the libvirt binary (and avoid the
possibility of someone adding in a malicious binary somewhere earlier
in the path, I guess).
Only 'recommend' passt during installation though, since it is not
needed unless someone is actually using it.
There is no need to add in a build-time "WITH_PASST" option (IMO),
since it adds very little to the size of the code - "PASST" (the path
to the binary) will just be set to "passt", so if someone does manage
to build and install passt on an older version of Fedora or RHEL, it
will still work (as long as it's installed somewhere in the path).
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Laine Stump [Thu, 15 Dec 2022 19:19:16 +0000 (14:19 -0500)]
qemu: hook up passt config to qemu domains
This consists of (1) adding the necessary args to the qemu commandline
netdev option, and (2) starting a passt process prior to starting
qemu, and making sure that it is terminated when it's no longer
needed. Under normal circumstances, passt will terminate itself as
soon as qemu closes its socket, but in case of some error where qemu
is never started, or fails to startup completely, we need to terminate
passt manually.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Laine Stump [Fri, 11 Nov 2022 19:43:45 +0000 (14:43 -0500)]
conf: parse/format passt-related XML additions
This implements XML config to represent a subset of the features
supported by 'passt' (https://passt.top), which is an alternative
backend for emulated network devices that requires no elevated
privileges (similar to slirp, but "better").
Along with setting the backend to use passt (via <backend
type='passt'/> when the interface type='user'), we also support
passt's --log-file and --interface options (via the <backend>
subelement logFile and upstream attributes) and its --tcp-ports and
--udp-ports options (which selectively forward incoming connections to
the host on to the guest) via the new <portForward> subelement of
<interface>. Here is an example of the config for a network interface
that uses passt to connect:
Laine Stump [Fri, 11 Nov 2022 20:24:57 +0000 (15:24 -0500)]
conf: add passt XML additions to schema
Initial support for network devices using passt (https://passt.top)
for the backend connection will require:
* new attributes of the <backend> subelement:
* "type" that can have the value "passt" (to differentiate from
slirp, because both slirp and passt will use <interface
type='user'>)
* "logFile" (a path to a file that passt should use for its logging)
* "upstream" (a netdev name, e.g. "eth0").
* a new subelement <portForward> (described in more detail later)
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
We assume that FD passed images already exist so all existance checks
are skipped.
For the case that a FD-passed image is passed without a terminated
backing chain (thus forcing us to detect) we attempt to read the header
from the FD.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Wed, 4 May 2022 13:00:18 +0000 (15:00 +0200)]
qemu: Prepare data for FD-passed disk image sources
When starting up a VM with FD-passed images we need to look up the
corresponding named FD set and associate it with the virStorageSource
based on the name.
The association is brought into virStorageSource as security labelling
code will need to access the FD to perform selinux labelling.
Similarly when startup is complete in certain cases we no longer need to
keep the copy of FDs and thus can close them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 5 Jan 2023 14:07:38 +0000 (15:07 +0100)]
qemu: domain: Introduce qemuDomainStartupCleanup
The new helper qemuDomainStartupCleanup is used to perform cleanup after
a startup of a VM (successful or not). The initial implementation just
calls qemuDomainSecretDestroy, which can be un-exported.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Mon, 2 May 2022 16:51:45 +0000 (18:51 +0200)]
conf: Add 'fdgroup' attribute for 'file' disks
The 'fdgroup' will allow users to specify a passed FD (via the
'virDomainFDAssociate()' API) to be used instead of opening a path.
This is useful in cases when e.g. the file is not accessible from inside
a container.
Since this uses the same disk type as when we open files via names this
patch also introduces a hypervisor feature which the hypervisor asserts
that code paths are ready for this possibility.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 24 Mar 2022 14:50:27 +0000 (15:50 +0100)]
qemu: Implement qemuDomainFDAssociate
Implement passing and storage of FDs for the qemu driver. The FD tuples
are g_object instances stored in a per-domain hash table and are
automatically removed once the connection is closed.
In the future we can consider supporting also to not tie the lifetime of
the passed FDs bound to the connection.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Wed, 4 Jan 2023 14:25:21 +0000 (15:25 +0100)]
conf: storage_source: Introduce type for storing FDs associated for storage
For FD-passing of disk sources we'll need to keep the FDs around.
Introduce a data type helper based on a g_object so that we get
reference counting.
One instance will (due to security labelling) will need to be part of
the virStorageSource struct thus it's declared in the storage_source_conf
module.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 24 Feb 2022 16:01:40 +0000 (17:01 +0100)]
lib: Introduce virDomainFDAssociate API
The API can be used to associate one or more (e.g. a RO and RW fd for a
disk backend image) FDs to a VM. They can be then used per definition.
The primary use case for now is for complex deployment where
libvirtd/virtqemud may be run inside a container and getting the image
into the container is complicated.
In the future it will also allow passing e.g. vhost FDs and other
resources to a VM without the need to have a filesystem representation
for it.
Passing raw FDs has few intricacies and thus libvirt will by default not
restore security labels.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Mon, 9 Jan 2023 13:35:52 +0000 (14:35 +0100)]
qemu: Fix variable sizing issues with 'bandwidth' argument of qemuBlockCommit
The patch moving the code didn't faithfully represent the typecasting
of the 'bandwidth' variable needed to properly convert from the legacy
'unsigned long' argument which resulted in a build failure on 32 bit
systems:
../src/qemu/qemu_block.c: In function ‘qemuBlockCommit’:
../src/qemu/qemu_block.c:3249:23: error: comparison is always false due to limited range of data type [-Werror=type-limits]
3249 | if (bandwidth > LLONG_MAX >> 20) {
| ^
Fix it by returning the check into qemuDomainBlockCommit as it's needed
only because of the legacy argument type in the old API and use
'unsigned long long' for qemuBlockCommit.
Fixes: f5a77198bf9 Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Peter Krempa [Mon, 9 Jan 2023 13:01:59 +0000 (14:01 +0100)]
qemu: snapshot: Restructure control flow to detect errors sooner and work around compiler
Some compilers aren't happy when an automatically freed variable is used
just to free something (thus it's only assigned in the code):
When compiling qemuSnapshotDelete after recent commits they complain:
../src/qemu/qemu_snapshot.c:3153:61: error: variable 'delData' set but not used [-Werror,-Wunused-but-set-variable]
g_autoslist(qemuSnapshotDeleteExternalData) delData = NULL;
^
To work around the issue we can restructure the code which also has the
following semantic implications:
- since qemuSnapshotDeleteExternalPrepare does validation we error out
sooner than attempting to start the VM
- we read the temporary variable at least in one code path
Fixes: 4a4d89a9252 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Use a temporary variable to avoid memory alignment issues on ARM:
../src/nwfilter/nwfilter_dhcpsnoop.c: In function ‘virNWFilterSnoopLeaseFileLoad’:
../src/nwfilter/nwfilter_dhcpsnoop.c:1745:20: error: cast increases required alignment of target type [-Werror=cast-align]
1745 | (unsigned long long *) &ipl.timeout,
|
Fixes: 0d278aa089bf3a00bf2d6e56d2f01ea4677190a7 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Pavel Hrdina [Thu, 1 Dec 2022 14:38:59 +0000 (15:38 +0100)]
qemu_process: abort snapshot delete when daemon starts
If the daemon crashes or is restarted while the snapshot delete is in
progress we have to handle it gracefully to not leave any block jobs
active.
For now we will simply abort the snapshot delete operation so user can
start it again. We need to refuse deleting external snapshots if there
is already another active job as we would have to figure out which jobs
we can abort.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Fri, 6 Jan 2023 17:08:45 +0000 (18:08 +0100)]
qemu_domain: store snapshotDelete in qemuDomainJobPrivate
When daemon is restarted and libvirt tries to recover domain jobs we
need to know if the snapshot job was a snapshot delete in order to
safely abort running QEMU block jobs.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Mon, 5 Dec 2022 12:03:32 +0000 (13:03 +0100)]
qemu_snapshot: when deleting snapshot invalidate parent snapshot
When deleting external snapshots the operation may fail at any point
which could lead to situation that some disks finished the block commit
operation but for some disks it failed and the libvirt job ends.
In order to make sure that the qcow2 images are in consistent state
introduce new element "<snapshotDeleteInProgress/>" that will mark the
disk in snapshot metadata as invalid until the snapshot delete is
completed successfully.
This will prevent deleting snapshot with the invalid disk and in future
reverting to snapshot with the invalid disk.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Wed, 7 Dec 2022 13:17:24 +0000 (14:17 +0100)]
qemu_snapshot: update metadata when deleting snapshots
With external snapshots we need to modify the metadata bit more then
what is required for internal snapshots. Mainly the storage source
location changes with every external snapshot.
This means that if we delete non-leaf snapshot we need to update all
children snapshots and modify the disk sources for all affected disks.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Thu, 15 Dec 2022 14:50:47 +0000 (15:50 +0100)]
qemu_snapshot: implement deletion of external snapshot
When deleting snapshot we are starting block-commit job over all disks
that are part of the snapshot.
This operation may fail as it writes data changes to the backing qcow2
image so we need to wait for all the disks to finish the operation and
wait for correct signal from QEMU. If deleting active snapshot we will
get `ready` signal and for inactive snapshots we need to disable
autofinalize in order to get `pending` signal.
At this point if commit for any disk fails for some reason and we abort
the VM is still in consistent state and user can fix the reason why the
deletion failed.
After that we do `pivot` or `finalize` if it's active snapshot or not to
finish the block job. It still may fail but there is nothing else we can
do about it.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Mon, 5 Dec 2022 18:15:50 +0000 (19:15 +0100)]
qemu_snapshot: prepare data for external snapshot deletion
In order to save some CPU cycles we will collect all the necessary data
to delete external snapshot before we even start. They will be later
used by code that deletes the snapshots and updates metadata when
needed.
With external snapshots we need data that libvirt gets from running QEMU
process so if the VM is not running we need to start paused QEMU process
for the snapshot deletion and kill at afterwards.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
and user calls `snapshot-delete snap1 --children-only` the current
snapshot is external but all the children snapshots are internal only
and we are able to delete it.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Previously the reparent happened before the actual snapshot deletion.
This change moves the code closer to the rest of the code handling
snapshot metadata when deletion happens. This makes the metadate
deletion happen after the data files are deleted.
Following patch will extract it into separate function
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Thu, 15 Dec 2022 14:40:45 +0000 (15:40 +0100)]
qemu_snapshot: rework snapshot children deletion
This simplifies the code a bit by reusing existing parts that deletes
a single snapshot.
The drawback of this change is that we will now call the re-parent bits
to keep the metadata in sync for every child even though it will get
deleted as well.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Wed, 22 Jun 2022 10:13:45 +0000 (12:13 +0200)]
qemu_blockjob: process QEMU_MONITOR_JOB_STATUS_PENDING signal
QEMU emits this signal when the job finished its work and is about to be
finalized. If the job is started with autofinalize disabled the job
waits for user input to finalize the job.
This will be used by snapshot delete code.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Tue, 13 Dec 2022 15:40:02 +0000 (16:40 +0100)]
qemu_monitor_json: allow configuring autofinalize for block commit
Deleting external snapshots will require configuring autofinalize to
synchronize the block jobs for disks withing single snapshot in order to
be able safely abort of one of the jobs fails.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Tue, 21 Jun 2022 13:22:15 +0000 (15:22 +0200)]
qemu_monitor: introduce qemuMonitorJobFinalize
Upcoming snapshot deletion code will require that multiple commit jobs
are finished in sync. To allow aborting then if one fails we will need
to use manual finalization of the jobs.
This commit implements the monitor code for `job-finalize`.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pavel Hrdina [Mon, 5 Dec 2022 12:11:19 +0000 (13:11 +0100)]
qemu_block: add async domain job support to qemuBlockCommit
This will allow to use it while having async domain job active which we
will use when deleting external snapshots. At the same time we will need
to have the block job started as synchronous.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Up until commit 629282d88454, using mode=restrictive caused
virNumaSetupMemoryPolicy() to be called from qemuProcessHook(),
and that in turn resulted in virNumaNodesetIsAvailable() being
called and the nodeset being validated.
After that change, the only validation for the nodeset is the one
happening in qemuBuildMemoryBackendProps(), which is skipped when
using mode=restrictive.
Make sure virNumaNodesetIsAvailable() is called whenever a
nodeset has been provided by the user, regardless of the mode.
tests: Add cases for numatune with unavailable nodes
The one for mode=strict fails, as expected, while the one for
mode=restrictive currently doesn't even though it should. The
next commit will address the issue.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>