Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Refactor qemuDomainObjSetJobPhase
We will want to update migration phase without affecting job ownership.
Either in the thread that already owns the job or from an event handler
which only changes the phase (of a job no-one owns) without assuming it.
Let's move the ownership change to a new qemuDomainObjStartJobPhase
helper and let qemuDomainObjSetJobPhase set the phase without touching
ownership.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Add new migration phases for post-copy recovery
When recovering from a failed post-copy migration, we need to go through
all migration phases again, but don't need to repeat all the steps in
each phase. Let's create a new set of migration phases dedicated to
post-copy recovery so that we can easily distinguish between normal and
recovery code.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Improve post-copy migration handling on reconnect
When libvirt daemon is restarted during an active post-copy migration,
we do not always mark the migration as broken. In this phase libvirt is
not really needed for migration to finish successfully. In fact the
migration could have even finished while libvirt was not running or it
may still be happily running.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Ignore missing memory statistics in query-migrate
We want to use query-migrate QMP command to check the current migration
state when reconnecting to active domains, but the reply we get to this
command may not contain any statistics at all if called on the
destination host.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Finish completed unattended migration
So far migration could only be completed while a migration API was
running and waiting for the migration to finish. In case such API could
not be called (the connection that initiated the migration is broken)
the migration would just be aborted or left in a "don't know what to do"
state. But this will change soon and we will be able to successfully
complete such migration once we get the corresponding event from QEMU.
This is specific to post-copy migration when vCPUs are already running
on the destination and we're only waiting for all memory pages to be
transferred. Such post-copy migration (which no-one is actively
watching) is called unattended migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Make sure migrationPort is released even in callbacks
Normally migrationPort is released in the Finish phase, but we need to
make sure it is properly released also in case qemuMigrationDstFinish is
not called at all. Currently the only callback which is called in this
situation qemuMigrationDstPrepareCleanup which already releases
migrationPort. This patch adds similar handling to additional callbacks
which will be used in the future.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Handle migration job in qemuMigrationDstFinish
The function which started a migration phase should also finish it by
calling qemuMigrationJobFinish/qemuMigrationJobContinue so that the code
is easier to follow.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Introduce qemuMigrationDstFinishActive
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
Jiri Denemark [Tue, 17 May 2022 12:19:12 +0000 (14:19 +0200)]
qemu: Introduce qemuMigrationDstFinishOffline
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Introduce qemuMigrationDstFinishFresh
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Preserve error in qemuMigrationDstFinish
We want to prevent our error path that can potentially kill the domain
on the destination host from overwriting an error reported earlier, but
we were only doing so in one specific path when starting vCPUs fails.
Let's do it in all paths.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Separate success and failure path in qemuMigrationDstFinish
Most of the code in "endjob" label is executed only on failure. Let's
duplicate the rest so that the label can be used only in error path
making the success path easier to follow and refactor.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Handle 'postcopy-paused' migration state
When connection breaks during post-copy migration, QEMU enters
'postcopy-paused' state. We need to handle this state and make the
situation visible to upper layers.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Fetch paused migration stats
Even though a migration is paused, we still want to see the amount of
data transferred so far and that the migration is indeed not progressing
any further.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Don't wait for migration job when migration is running
Migration is a job which takes some time and if it succeeds, there's
nothing to call another migration on. If a migration fails, it might
make sense to rerun it with different arguments, but this would only be
done once the first migration fails rather than while it is still
running.
If this was not enough, the migration job now stays active even if
post-copy migration fails and anyone possibly retrying the migration
would be waiting for the job timeout just to get a suboptimal error
message.
So let's special case getting a migration job when another one is
already active.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Fri, 13 May 2022 12:15:00 +0000 (14:15 +0200)]
qemu: Restore async job start timestamp on reconnect
Jobs that are supposed to remain active even when libvirt daemon
restarts were reported as started at the time the daemon was restarted.
This is not very helpful, we should restore the original timestamp.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Abort failed post-copy when we haven't called Finish yet
When migration fails after it already switched to post-copy phase on the
source, but early enough that we haven't called Finish on the
destination yet, we know the vCPUs were not started on the destination
and the source host still has a complete state of the domain. Thus we
can just ignore the fact post-copy phase started and normally abort the
migration and resume vCPUs on the source.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Keep migration job active after failed post-copy
When post-copy migration fails, we can't just abort the migration and
resume the domain on the source host as it is already running on the
destination host and no host has a complete state of the domain memory.
Instead of the current approach of just marking the domain on both ends
as paused/running with a post-copy failed sub state, we will keep the
migration job active (even though the migration API will return failure)
so that the state is more visible and we can better control what APIs
can be called on the domains and even allow for resuming the migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Add qemuDomainObjRestoreAsyncJob
The code for setting up a previously active backup job in
qemuProcessRecoverJob is generalized into a dedicated function so that
it can be later reused in other places.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Rename qemuDomainObjRestoreJob as qemuDomainObjPreserveJob
It is used for saving job out of domain object. Just like
virErrorPreserveLast is used for errors. Let's make the naming
consistent as Restore would suggest different semantics.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Explicitly emit events on post-copy failure
The events would normally be triggered only if we're changing domain
state. But most of the time the domain is already in the right state and
we're just changing its substate from {PAUSED,RUNNING}_POSTCOPY to
*_POSTCOPY_FAILED. Let's emit lifecycle events explicitly when post-copy
migration fails to make the failure visible without polling.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Keep domain running on dst on failed post-copy migration
There's no need to artificially pause a domain when post-copy fails
from our point of view unless QEMU connection is broken too as migration
may still be progressing well.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
Introduce VIR_DOMAIN_RUNNING_POSTCOPY_FAILED
This new "post-copy failed" reason for the running state will be used on
the destination host when post-copy migration fails while the domain is
already running there.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
- Icelake-Client cpu model family removed:
"Icelake-Client-noTSX-x86_64-cpu"
"Icelake-Client-v1-x86_64-cpu"
"Icelake-Client-v2-x86_64-cpu"
"Icelake-Client-v3-x86_64-cpu"
"Icelake-Client-x86_64-cpu"
- 'zero-copy-send' migration feature added
- display 'sdl' qapified
- 'arch-lbr' cpu feature added
- new HyperV enlightenments:
'hv-tlbflush-ext'
'hv-tlbflush-direct'
'hv-emsr-bitmap'
'hv-xmm-input'
- 'none-machine' has two new properties:
- "boot" described as "Boot configuration"
- "memory" described as "Memory size configuration"
- 'igd-passthrough-isa-bridge' is now Xen-only
- CXL: Compute eXpress Link related devices:
"CXL"
"cxl-rp",
"cxl-type3",
"pxb-cxl",
"pxb-cxl-bus",
"pxb-cxl-host",
- 'dma-translation' feature of 'intel-iommu'
- 'vmcb-clean' cpu feature now migratable:
- possibly due to host kernel upgrade
- changes commandline generated for the 'cpu-host-model' case of
qemuxml2argvtest
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 6 Jun 2022 08:10:44 +0000 (10:10 +0200)]
qemu: Fix crash in qemuBuildDeviceCommandlineHandleOverrides
'STREQ' is used to compare the override alias with the device alias.
While the parser ensures that the override alias is non-NULL, the device
alias may be NULL and STREQ doesn't handle that.
Fixes: 38ab5c9ead5
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/321 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Fri, 3 Jun 2022 13:49:01 +0000 (15:49 +0200)]
qemu: fd: Fix monitor usage of qemuFDPassDirectGetPath
We need to use the 'name' variable and just overwrite it with the FD
number when FDs are passed on the monitor. Otherwise we will read NULL
path if the FD is accessed before being passed on the monitor. The idea
of this helper is to simplify the monitor code so it would be
counterproductive to have other behaviour.
Fixes the following symptom:
$ virsh attach-interface cd network default --model virtio
error: Failed to attach interface
error: internal error: unable to execute QEMU command 'netdev_add': File descriptor named '(null)' has not been found
Fixes: bca9047906fd73fd30f275dd45b64998fbbcf6de
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/318
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2092856 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
qemu: Restore label to temp file in qemuDomainScreenshot()
Obtaining a screenshot via virDomainScreenshot() works like this:
1) we create a temp file, label it, then
2) tell QEMU to store the screenshot into it, and
3) finally, open the file for transfer via virStream
Since the file is just temporary and even explicitly unlinked at
the end, no seclabel restoration is done. This makes perfect
sense for security models which attach a label to file itself
(DAC, SELinux) because the label is gone with the file. However,
for models where a list of files and allowed actions is kept on a
side (AppArmor) this approach means we just append files into the
profile and never remove them. In turn, the file grows and policy
update takes longer with each entry.
Restore the seclabel for AppArmor's sake.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Michal Privoznik [Sun, 23 Jan 2022 11:10:35 +0000 (12:10 +0100)]
virStorageSourceGetActualType: Change type of retval
The virStorageSourceGetActualType() function returns either
virStorageSource->type (which is of type virStorageType), or
virStorageSourcePoolDef->type, which really stores a value of the
same enum. Thus, the latter struct can be changed so that the
virStorageSourceGetActualType() function can return correct type
instead of generic int.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Michal Privoznik [Sun, 23 Jan 2022 11:04:44 +0000 (12:04 +0100)]
Drop needless typecast to virStorageType enum
There are three places (two in domain_conf.c and one in
qemu_migration.c) where a virStorageSource->type is typecasted to
virStorageType (for the purpose of catching missing enum member
in a switch() statement at compile time). This is needless,
because as of v8.2.0-rc1~120 the struct member is of proper type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Peter Krempa [Mon, 23 May 2022 11:48:31 +0000 (13:48 +0200)]
docs: domain: Remove extraneous quotes
Certain documentation bits tried to put a reference of a value into
quotes, but that's not needed for both the pure view of the rST source
and the rendered output.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'virNetMessageEncodePayloadRaw' is not supposed to be called with 'NULL'
data, but the code path from 'virNetClientStreamSendPacket' does so.
Now 'virNetMessageEncodePayloadEmpty' is intended for such case, but
since it's just a sub-set of steps from 'virNetMessageEncodePayloadRaw'
it's more straightforward to add NULL-tolerance to 'virNetMessageEncodePayloadRaw'
and subsequently remove 'virNetMessageEncodePayloadEmpty'.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/308 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Wed, 4 May 2022 15:23:56 +0000 (17:23 +0200)]
glibcompat: Provide proper override for 'g_hash_table_steal_extended'
We've emulated the function in virHashSteal, with a note pointing to use
the proper version. Move the code to glibcomapt.c and make it such that
builds using newer glib already use the new function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 31 May 2022 13:15:57 +0000 (15:15 +0200)]
docs: Add HTML reference checker
In many cases we move around or rename internal anchors which may break
links leading to the content.
docutils handle the case of links inside a document, but we are lacking
the same form of checking between documents.
Introduce a script which cross-checks all the anchors and links in HTML
output files and prints problems and use it as a test case for the
'docs' directory.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 31 May 2022 13:07:33 +0000 (15:07 +0200)]
docs: rpc: Fix broken headings
Remove what seems like links from some headings. This error predates the
conversion to RST where an '<a href' was used instead of '<a id' in the
source document.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Mon, 30 May 2022 10:56:39 +0000 (12:56 +0200)]
docs: formatdomain: Remove the 'anchor' role
The role was used to pass through raw HTML to define custom anchor
names. Since all of the document was now converted to use the anchors
generated from headers we can remove it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>