Michal Privoznik [Mon, 16 May 2022 12:16:06 +0000 (14:16 +0200)]
qemu: Generate command line for <defaultiothread/> pool size
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2059511 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Mon, 16 May 2022 11:57:18 +0000 (13:57 +0200)]
qemu_validate: Check if QEMU's capable of setting <defaultiothread/> pool size
Since the main-loop and iothread classes are derived from the
same class (EventLoopBaseClass) we don't need new capability and
can use QEMU_CAPS_IOTHREAD_THREAD_POOL_MAX directly to check
whether QEMU's capable of setting defaultiothread pool size.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Mon, 16 May 2022 11:23:32 +0000 (13:23 +0200)]
conf: Introduce <defaultiothread/>
As of v7.0.0-877-g70ac26b9e5 QEMU exposes its default event loop
for devices with no IOThread assigned as an QMP object. In the
very next commit (v7.0.0-878-g71ad4713cc) it was extended for
thread-pool-min and thread-pool-max attributes. Expose them under
new <defaultiothread/> element.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Wed, 11 May 2022 11:32:26 +0000 (13:32 +0200)]
qemu: Wire up new virDomainSetIOThreadParams parameters
Introduced in previous commit, QEMU driver needs to be taught how
to set VIR_DOMAIN_IOTHREAD_THREAD_POOL_MIN and
VIR_DOMAIN_IOTHREAD_THREAD_POOL_MAX parameters on given IOThread.
Fortunately, this is fairly trivial to do and since these two
parameters are exposed in domain XML too the update of inactive
XML can be wired up too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Wed, 11 May 2022 11:31:47 +0000 (13:31 +0200)]
include: Introduce typed params for virDomainSetIOThreadParams wrt pool size
Our public API offers virDomainSetIOThreadParams() function which
allows users to set various aspects of IOThreads. Introduce two
new typed parameters: VIR_DOMAIN_IOTHREAD_THREAD_POOL_MIN and
VIR_DOMAIN_IOTHREAD_THREAD_POOL_MAX which will allow users to
modify the thread-pool-min and thread-pool-max attributes of an
iothread.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Tue, 10 May 2022 10:26:29 +0000 (12:26 +0200)]
qemu_validate: Check if QEMU's capable of setting iothread pool size
Now that we have a capability that reflects whether QEMU is
capable of setting iothread pool size, let's introduce a
validator check to make sure users are not trying to use this
feature with QEMU that doesn't support it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This capability reflects whether QEMU allows setting
thread-pool-min and thread-pool-max attributes on iothread
object. Since both attributes were introduced in the same commit
(v7.0.0-878-g71ad4713cc) and can't exist independently of each
other we can stick with one capability covering both of them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
conf: Introduce thread_pool_min and thread_pool_max attributes to IOThread
At least in case of QEMU an IOThread is actually a pool of
threads (see iothread_set_aio_context_params() in QEMU's code
base). As such, it can have minimal and maximal number of worker
threads. Allow setting them in domain XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
conf: Introduce allocator for virDomainIOThreadIDDef
So far, iothread configuration structure (virDomainIOThreadIDDef)
is allocated by plain g_new0(). This is perfectly okay because
all members of the struct default to value 0 anyway. But soon
this is going to change. Therefore, replace those g_new0() with a
function so that the default value can be set consistently in one
place.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Mon, 16 May 2022 09:59:48 +0000 (11:59 +0200)]
conf: Move iothread formatter into a separate function
Formatting iothreads is currently open coded inside of
virDomainDefFormatInternalSetRootName(). While this works, it
makes the function needlessly long, especially if the formatting
code will expand in near future. Therefore, move it into a
separate function. At the same time, make
virDomainDefIothreadShouldFormat() accept const domain definition
so that the new function can also accept const domain definition.
Formatters shouldn't need to change definition.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
virDomainIOThreadIDDefArrayInit: Decrease scope of @iothrid
In virDomainIOThreadIDDefArrayInit() the variable @iothrid is
used only inside a loop but is declared for whole function. Bring
the variable into the loop so that it's obvious that the variable
is not used elsewhere.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Mon, 16 May 2022 09:48:15 +0000 (11:48 +0200)]
virDomainDefParseIOThreads: Use g_autoptr() for @iothrid
Using g_autoptr() for @iothrid variable inside
virDomainDefParseIOThreads() allows us to drop explicit call to
virDomainIOThreadIDDefFree() in one case.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
For easier attribute parsing we have virXMLProp*() family of
functions. These accept flags through which a caller can pose
some conditions onto the attribute value, for instance:
VIR_XML_PROP_NONZERO when the attribute may not be zero, etc.
What we are missing is VIR_XML_PROP_NONNEGATIVE when the
attribute value may be non-negative. Obviously, this flag makes
sense only for some members of the virXMLProp*() family.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Erik Skultety [Tue, 7 Jun 2022 12:51:14 +0000 (14:51 +0200)]
ci: integration: Set 'safe.directory' when installing QEMU from git
Since a fix for CVE-2022-24765 was released every git command is now
checked against the context repo in which it's supposed to run
resulting in a fatal error if the repo is owned by other user than the
one running the git command.
This means that in order to be able to do 'sudo make install', we have
to set the 'safe.directory' for the root user. This is because QEMU
runs 'git submodule update' automatically on 'make install'.
Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
virsh: Check whether enough arguments was passed to iothreadset
Virsh has iothreadset command which allows setting various
attributes of IOThreads. However, when the command is called
without any arguments (besides domain and IOThread IDs), then
@params stays NULL and is passed to virDomainSetIOThreadParams()
which produces rather user unfriendly error message:
error: params in virDomainSetIOThreadParams must not be NULL
Introduce a check and produce better error message.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Claudio Fontana <cfontana@suse.de>
Jiri Denemark [Tue, 24 May 2022 15:18:56 +0000 (17:18 +0200)]
qemu: Fix VSERPORT_CHANGE event in post-copy migration
When a domain has a guest agent channel enabled and the agent is running
in the guest, we will get VSERPORT_CHANGE event on a destination host as
soon as we start vCPUs there. This is not an issue for normal migration,
but post-copy migration will remain running after we started vCPUs on
the destination. If it runs for more than 30s, the VSERPORT_CHANGE event
handler will fail to get a job and log the following error message:
Timed out during operation: cannot acquire state change lock (held
by monitor=remoteDispatchDomainMigrateFinish3Params)
and of course we will think the guest agent is not connected and thus
all APIs talking to it will fail. Until the agent or libvirt daemon is
restarted.
Luckily we only need to update the channel state (to mark it as
connected) and connect to the agent neither of which conflicts with
migration. Thus we can safely enable processing this event during
migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 24 May 2022 14:17:23 +0000 (16:17 +0200)]
Introduce VIR_JOB_MIGRATION_SAFE job type
This is a special job for operations that need to modify domain state
during an active migration. The modification must not affect any state
that could conflict with the migration code. This is useful mainly for
event handlers that need to be processed during migration and which
could otherwise time out on acquiring a normal MODIFY job.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Call qemuDomainCleanupAdd from qemuMigrationJobContinue
Every single call to qemuMigrationJobContinue needs to register a
cleanup callback in case the migrating domain dies between phases or
when migration is paused due to a failure in postcopy mode.
Let's integrate registering the callback in qemuMigrationJobContinue to
make sure the current thread does not release a migration job without
setting a cleanup callback.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Register qemuProcessCleanupMigrationJob after Begin phase
The callback will properly cleanup non-p2p migration job in case the
migrating domain dies between Begin and Perform while the client which
controls the migration is not cooperating (normally the API for the next
migration phase would handle this).
The same situation can happen even after Prepare and Perform phases, but
they both already register a suitable callback, so no fix is needed
there.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Create completed jobData in qemuMigrationSrcComplete
Normally the structure is created once the source reports completed
migration, but with post-copy migration we can get here even after
libvirt daemon was restarted. It doesn't make sense to preserve the
structure in our status XML as we're going to rewrite almost all of it
while refreshing the stats anyway. So we just create the structure here
if it doesn't exist to make sure we can properly report statistics of a
completed migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Start a migration phase in qemuMigrationAnyConnectionClosed
Non-postcopy case talks to QEMU monitor and thus needs to create a
nested job. Since qemuMigrationAnyConnectionClosed is called in case
there's no thread processing a migration API, we need to make the
current thread a temporary owner of the migration job to avoid "This
thread doesn't seem to be the async job owner: 0". This is done by
starting a migration phase.
While no monitor interaction happens in postcopy case and just setting
the phase (to indicate a broken postcopy migration) would be enough,
being consistent and setting the owner does not hurt anything.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Rename qemuMigrationSrcCleanup
The function is now called qemuMigrationAnyConnectionClosed to make it
clear it is supposed to handle broken connection during migration. It
will soon be used on both sides of migration so the "Src" part was changed
to "Any" to avoid renaming the function twice in a row.
The original *Cleanup name could easily be confused with cleanup
callbacks called when a domain is destroyed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Add support for migrate-recover QMP command
This command tells QEMU to start listening for an incoming post-copy
recovery connection. Just like migrate-incoming is used for starting
fresh migration on the destination host.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Refactor qemuMigrationDstPrepareFresh
Offline migration jumps over a big part of qemuMigrationDstPrepareFresh.
Let's move that part into a new qemuMigrationDstPrepareActive function
to make the code easier to follow.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Introduce qemuMigrationDstPrepareFresh
Moves most of the code from qemuMigrationDstPrepareAny to a new
qemuMigrationDstPrepareFresh so that qemuMigrationDstPrepareAny can be
shared with post-copy resume.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Perform phase
It just calls migrate QMP command with resume=true without having to
worry about migration capabilities or parameters, storage migration,
etc. since everything has already been done in the normal Perform phase.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Separate starting migration from qemuMigrationSrcRun
qemuMigrationSrcRun does a lot of thing before and after telling QEMU to
start the migration. Let's make the core reusable by moving it to a new
qemuMigrationSrcStart function.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Don't set VIR_MIGRATE_PAUSED for post-copy resume
For historical reasons we automatically enabled VIR_MIGRATE_PAUSED flag
when a migration was started for a paused domain. However, when resuming
failed post-copy migration the domain on the source host will always be
paused (as it is already running on the destination host). We must avoid
enabling VIR_MIGRATE_PAUSED in this case.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Wed, 18 May 2022 12:02:13 +0000 (14:02 +0200)]
qemu: Use QEMU_MIGRATION_PHASE_POSTCOPY_FAILED
This phase marks a migration protocol as broken in a post-copy phase.
Libvirt is no longer actively watching the migration in this phase as
the migration API that started the migration failed.
This may either happen when post-copy migration really fails (QEMU
enters postcopy-paused migration state) or when the migration still
progresses between both QEMU processes, but libvirt lost control of it
because the connection between libvirt daemons (in p2p migration) or a
daemon and client (non-p2p migration) was closed. For example, when one
of the daemons was restarted.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Do not set job owner in qemuMigrationJobSetPhase
Both qemuMigrationJobSetPhase and qemuMigrationJobStartPhase were doing
the same thing, which made little sense. Let's follow the difference
between qemuDomainObjSetJobPhase and qemuDomainObjStartJobPhase and
change job owner only in qemuMigrationJobStartPhase.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Refactor qemuDomainObjSetJobPhase
We will want to update migration phase without affecting job ownership.
Either in the thread that already owns the job or from an event handler
which only changes the phase (of a job no-one owns) without assuming it.
Let's move the ownership change to a new qemuDomainObjStartJobPhase
helper and let qemuDomainObjSetJobPhase set the phase without touching
ownership.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Add new migration phases for post-copy recovery
When recovering from a failed post-copy migration, we need to go through
all migration phases again, but don't need to repeat all the steps in
each phase. Let's create a new set of migration phases dedicated to
post-copy recovery so that we can easily distinguish between normal and
recovery code.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Improve post-copy migration handling on reconnect
When libvirt daemon is restarted during an active post-copy migration,
we do not always mark the migration as broken. In this phase libvirt is
not really needed for migration to finish successfully. In fact the
migration could have even finished while libvirt was not running or it
may still be happily running.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Ignore missing memory statistics in query-migrate
We want to use query-migrate QMP command to check the current migration
state when reconnecting to active domains, but the reply we get to this
command may not contain any statistics at all if called on the
destination host.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Finish completed unattended migration
So far migration could only be completed while a migration API was
running and waiting for the migration to finish. In case such API could
not be called (the connection that initiated the migration is broken)
the migration would just be aborted or left in a "don't know what to do"
state. But this will change soon and we will be able to successfully
complete such migration once we get the corresponding event from QEMU.
This is specific to post-copy migration when vCPUs are already running
on the destination and we're only waiting for all memory pages to be
transferred. Such post-copy migration (which no-one is actively
watching) is called unattended migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Make sure migrationPort is released even in callbacks
Normally migrationPort is released in the Finish phase, but we need to
make sure it is properly released also in case qemuMigrationDstFinish is
not called at all. Currently the only callback which is called in this
situation qemuMigrationDstPrepareCleanup which already releases
migrationPort. This patch adds similar handling to additional callbacks
which will be used in the future.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Handle migration job in qemuMigrationDstFinish
The function which started a migration phase should also finish it by
calling qemuMigrationJobFinish/qemuMigrationJobContinue so that the code
is easier to follow.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Introduce qemuMigrationDstFinishActive
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
Jiri Denemark [Tue, 17 May 2022 12:19:12 +0000 (14:19 +0200)]
qemu: Introduce qemuMigrationDstFinishOffline
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Introduce qemuMigrationDstFinishFresh
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Preserve error in qemuMigrationDstFinish
We want to prevent our error path that can potentially kill the domain
on the destination host from overwriting an error reported earlier, but
we were only doing so in one specific path when starting vCPUs fails.
Let's do it in all paths.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Separate success and failure path in qemuMigrationDstFinish
Most of the code in "endjob" label is executed only on failure. Let's
duplicate the rest so that the label can be used only in error path
making the success path easier to follow and refactor.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Handle 'postcopy-paused' migration state
When connection breaks during post-copy migration, QEMU enters
'postcopy-paused' state. We need to handle this state and make the
situation visible to upper layers.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Fetch paused migration stats
Even though a migration is paused, we still want to see the amount of
data transferred so far and that the migration is indeed not progressing
any further.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Don't wait for migration job when migration is running
Migration is a job which takes some time and if it succeeds, there's
nothing to call another migration on. If a migration fails, it might
make sense to rerun it with different arguments, but this would only be
done once the first migration fails rather than while it is still
running.
If this was not enough, the migration job now stays active even if
post-copy migration fails and anyone possibly retrying the migration
would be waiting for the job timeout just to get a suboptimal error
message.
So let's special case getting a migration job when another one is
already active.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Fri, 13 May 2022 12:15:00 +0000 (14:15 +0200)]
qemu: Restore async job start timestamp on reconnect
Jobs that are supposed to remain active even when libvirt daemon
restarts were reported as started at the time the daemon was restarted.
This is not very helpful, we should restore the original timestamp.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Abort failed post-copy when we haven't called Finish yet
When migration fails after it already switched to post-copy phase on the
source, but early enough that we haven't called Finish on the
destination yet, we know the vCPUs were not started on the destination
and the source host still has a complete state of the domain. Thus we
can just ignore the fact post-copy phase started and normally abort the
migration and resume vCPUs on the source.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Keep migration job active after failed post-copy
When post-copy migration fails, we can't just abort the migration and
resume the domain on the source host as it is already running on the
destination host and no host has a complete state of the domain memory.
Instead of the current approach of just marking the domain on both ends
as paused/running with a post-copy failed sub state, we will keep the
migration job active (even though the migration API will return failure)
so that the state is more visible and we can better control what APIs
can be called on the domains and even allow for resuming the migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Add qemuDomainObjRestoreAsyncJob
The code for setting up a previously active backup job in
qemuProcessRecoverJob is generalized into a dedicated function so that
it can be later reused in other places.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Rename qemuDomainObjRestoreJob as qemuDomainObjPreserveJob
It is used for saving job out of domain object. Just like
virErrorPreserveLast is used for errors. Let's make the naming
consistent as Restore would suggest different semantics.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 10 May 2022 13:20:25 +0000 (15:20 +0200)]
qemu: Explicitly emit events on post-copy failure
The events would normally be triggered only if we're changing domain
state. But most of the time the domain is already in the right state and
we're just changing its substate from {PAUSED,RUNNING}_POSTCOPY to
*_POSTCOPY_FAILED. Let's emit lifecycle events explicitly when post-copy
migration fails to make the failure visible without polling.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>