Michal Privoznik [Fri, 28 Jun 2024 07:56:46 +0000 (09:56 +0200)]
qemu_domain: Set 'passt' net backend if 'default' is unsupported
It may happen that QEMU is compiled without SLIRP but with
support for passt. In such case it is acceptable to alter user
provided configuration and switch backend to passt as it offers
all the features as SLIRP.
Resolves: https://issues.redhat.com/browse/RHEL-45518 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Fri, 28 Jun 2024 07:53:10 +0000 (09:53 +0200)]
qemu_validate: Use domaincaps to validate supported net backend type
Now that the logic for detecting supported net backend types has
been moved to domain capabilities generation, we can just use it
when validating net backend type. Just like we do for device
models and so on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Fri, 28 Jun 2024 07:36:24 +0000 (09:36 +0200)]
conf: Accept 'default' backend type for <interface type='user'/>
After previous commits, domain capabilities XML reports basically
two possible values for backend type: 'default' and 'passt'.
Despite its misleading name, 'default' really means 'use
hypervisor's builtin SLIRP'. Since it's reported in domain
capabilities as a value accepted, make our parser and XML schema
accept it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
If mgmt apps on top of libvirt want to make a decision on the
backend type for <interface type='user'/> (e.g. whether past is
supported) we currently offer them no way to learn this fact.
Domain capabilities were invented exactly for this reason. Report
supported net backend types there.
Now, because of backwards compatibility, specifying no backend
type (which translates to VIR_DOMAIN_NET_BACKEND_DEFAULT) means
"use hyperviosr's builtin SLIRP". That behaviour can not be
changed. But it may happen that the hypervisor has no support for
SLIRP. So we have to report it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Since -netdev user can be disabled during QEMU compilation, we
can't blindly expect it to just be there. We need a capability
that tracks its presence.
For qemu-4.2.0 we are not able to detect the capability so do the
next best thing - assume the capability is there. This is
consistent with our current behaviour where we blindly assume the
capability, anyway.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jon Kohler [Thu, 27 Jun 2024 18:11:56 +0000 (11:11 -0700)]
qemu: fix switchover-ack regression for old qemu
When enabling switchover-ack on qemu from libvirt, the .party value
was set to both source and target; however, qemuMigrationParamsCheck()
only takes that into account to validate that the remote side of the
migration supports the flag if it is marked optional or auto/always on.
In the case of switchover-ack, when enabled on only the dst and not
the src, the migration will fail if the src qemu does not support
switchover-ack, as the dst qemu will issue a switchover-ack msg:
qemu/migration/savevm.c ->
loadvm_process_command ->
migrate_send_rp_switchover_ack(mis) ->
migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL)
Since the src qemu doesn't understand messages with header_type ==
MIG_RP_MSG_SWITCHOVER_ACK, qemu will kill the migration with error:
qemu-kvm: RP: Received invalid message 0x0007 length 0x0000
qemu-kvm: Unable to write to socket: Bad file descriptor
Looking at the original commit [1] for optional migration capabilities,
it seems that the spirit of optional handling was to enhance a given
existing capability where possible. Given that switchover-ack
exclusively depends on return-path, adding it as optional to that cap
feels right.
[1] 61e34b08568 ("qemu: Add support for optional migration capabilities")
Fixes: 1cc7737f69e ("qemu: add support for qemu switchover-ack") Signed-off-by: Jon Kohler <jon@nutanix.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Avihai Horon <avihaih@nvidia.com> Cc: Jiri Denemark <jdenemar@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: YangHang Liu <yanghliu@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Michal Privoznik [Fri, 14 Jun 2024 11:18:25 +0000 (13:18 +0200)]
remote_daemon_dispatch: Unref sasl session when closing client connection
In ideal world, where clients close connection gracefully their
SASL session is freed in virNetServerClientDispose() as it's
stored in client->sasl. Unfortunately, if client connection is
closed prematurely (e.g. the moment virsh asks for credentials),
the _virNetServerClient member is never set and corresponding
SASL session is never freed. The handler is still stored in
client private data, so free it in remoteClientCloseFunc().
20,862 (288 direct, 20,574 indirect) bytes in 3 blocks are definitely lost in loss record 1,763 of 1,772
at 0x50390C4: g_type_create_instance (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x501BDAF: g_object_new_internal.part.0 (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x501D43D: g_object_new_with_properties (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x501E318: g_object_new (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x49BAA63: virObjectNew (virobject.c:252)
by 0x49BABC6: virObjectLockableNew (virobject.c:274)
by 0x4B0526C: virNetSASLSessionNewServer (virnetsaslcontext.c:230)
by 0x18EEFC: remoteDispatchAuthSaslInit (remote_daemon_dispatch.c:3696)
by 0x15E128: remoteDispatchAuthSaslInitHelper (remote_daemon_dispatch_stubs.h:74)
by 0x4B0FA5E: virNetServerProgramDispatchCall (virnetserverprogram.c:423)
by 0x4B0F591: virNetServerProgramDispatch (virnetserverprogram.c:299)
by 0x4B18AE3: virNetServerProcessMsg (virnetserver.c:135)
Resolves: https://issues.redhat.com/browse/RHEL-22574 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Mon, 24 Jun 2024 07:31:09 +0000 (09:31 +0200)]
virt-host-validate: Detect SEV-ES and SEV-SNP
With a simple cpuid (Section "E.4.17 Function
8000_001Fh—Encrypted Memory Capabilities" in "AMD64 Architecture
Programmer’s Manual Vol. 3") we can detect whether CPU is capable
of running SEV-ES and/or SEV-SNP guests. Report these in
virt-host-validate tool.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Mon, 24 Jun 2024 07:22:16 +0000 (09:22 +0200)]
virt-host-validate: Move AMD SEV into a separate func
The code that validates AMD SEV is going to be expanded soon.
Move it into its own function to avoid lengthening
virHostValidateSecureGuests() where the code lives now, even
more.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Tue, 25 Jun 2024 08:51:55 +0000 (10:51 +0200)]
qemu_validate: Use domaincaps to validate supported launchSecurity type
Now that the logic for detecting supported launchSecurity types
has been moved to domain capabilities generation, we can just use
it when validating launchSecurity type. Just like we do for
device models and so on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Tue, 25 Jun 2024 08:45:43 +0000 (10:45 +0200)]
qemu: Fill launchSecurity in domaincaps
The inspiration for these rules comes from
qemuValidateDomainDef().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Tue, 25 Jun 2024 07:53:57 +0000 (09:53 +0200)]
domcaps: Report launchSecurity
In order to learn what types of <launchSecurity/> are supported
users can turn to domain capabilities and find <sev/> and
<s390-pv/> elements. While these may expose some additional info
on individual launchSecurity types, we are lacking clean
enumeration (like we do for say device models). And given that
SEV and SEV SNP share the same basis (info found under <sev/> is
applicable to SEV SNP too) we have no other way to report SEV SNP
support.
Therefore, report supported launchSecurity types in domain
capabilities.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Fri, 21 Jun 2024 12:00:32 +0000 (14:00 +0200)]
qemu_capabilities: Probe SEV capabilities even for QEMU_CAPS_SEV_SNP_GUEST
While it's very unlikely to have QEMU that supports SEV-SNP but
doesn't support plain SEV, for completeness sake we ought to
query SEV capabilities if QEMU supports either. And similarly to
QEMU_CAPS_SEV_GUEST we need to clear the capability if talking to
QEMU proves SEV is not really supported.
This in turn removes the 'sev-snp-guest' capability from one of
our test cases as Peter's machine he uses to refresh capabilities
is not SEV capable. But that's okay. It's consistent with
'sev-guest' capability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Tue, 25 Jun 2024 07:58:43 +0000 (09:58 +0200)]
qemuxmlconftest; Explicitly enable QEMU_CAPS_SEV_SNP_GUEST for "launch-security-sev-snp"
Soon, QEMU_CAPS_SEV_SNP_GUEST is going to be dependant on more
than plain presence of "sev-snp-guest" object in QEMU. Explicitly
enable the capability for "launch-security-sev-snp" test so that
we can continue testing cmd line and xml2xml.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Rayhan Faizel [Thu, 6 Jun 2024 14:27:51 +0000 (19:57 +0530)]
qemu_block: Validate number of hosts for iSCSI disk device
An iSCSI device with zero hosts will result in a segmentation fault. This patch
adds a check for the number of hosts, which must be one in the case of iSCSI.
Jon Kohler [Mon, 24 Jun 2024 17:38:51 +0000 (10:38 -0700)]
qemu: add support for qemu switchover-ack
Add plumbing for QEMU's switchover-ack migration capability, which
helps lower the downtime during VFIO migrations. This capability is
enabled by default as long as both the source and destination support
it.
Note: switchover-ack depends on the return path capability, so this may
not be used when VIR_MIGRATE_TUNNELLED flag is set.
Extensive details about the qemu switchover-ack implementation are
available in the qemu series v6 cover letter [1] where the highlight is
the extreme reduction in guest visible downtime. In addition to the
original test results below, I saw a roughly ~20% reduction in downtime
for VFIO VGPU devices at minimum.
=== Test results ===
The below table shows the downtime of two identical migrations. In the
first migration swithcover ack is disabled and in the second it is
enabled. The migrated VM is assigned with a mlx5 VFIO device which has
300MB of device data to be migrated.
Switchover ack gives a roughly 4.5 times improvement in downtime.
The 1480ms difference is time that is used for resource allocation for
the VFIO device in the destination. Without switchover ack, this time is
spent when the source VM is stopped and thus the downtime is much
higher. With switchover ack, the time is spent when the source VM is
still running.
Jiri Denemark [Wed, 12 Jun 2024 14:44:28 +0000 (16:44 +0200)]
qemu: Fix migration with disabled vmx-* CPU features
When starting a domain on a host which lacks a vmx-* CPU feature which
is expected to be enabled by the CPU model specified in the domain XML,
libvirt properly marks such feature as disabled in the active domain
XML. But migrating the domain to a similar host which lacks the same
vmx-* feature will fail with libvirt reporting the feature as missing.
This is because of a bug in the hack ensuring backward compatibility
libvirt running on the destination thinks the missing feature is
expected to be enabled.
https://issues.redhat.com/browse/RHEL-40899
Fixes: v10.1.0-85-g5fbfa5ab8a Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jonathon Jongsma [Wed, 12 Jun 2024 17:18:49 +0000 (12:18 -0500)]
qemu: Don't specify vfio-pci.ramfb when ramfb is false
Commit 7c8e606b64c73ca56d7134cb16d01257f39c53ef attempted to fix
the specification of the ramfb property for vfio-pci devices, but it
failed when ramfb is explicitly set to 'off'. This is because only the
'vfio-pci-nohotplug' device supports the 'ramfb' property. Since we use
the base 'vfio-pci' device unless ramfb is enabled, attempting to set
the 'ramfb' parameter to 'off' this will result in an error like the
following:
error: internal error: QEMU unexpectedly closed the monitor
(vm='rhel'): 2024-06-06T04:43:22.896795Z qemu-kvm: -device
{"driver":"vfio-pci","host":"0000:b1:00.4","id":"hostdev0","display":"on
","ramfb":false,"bus":"pci.7","addr":"0x0"}: Property 'vfio-pci.ramfb'
not found.
This also more closely matches what is done for mdev devices.
Laine Stump [Fri, 21 Jun 2024 12:17:58 +0000 (08:17 -0400)]
network: add more firewall test cases
This patch adds some previously missing test cases that test for
proper firewall rule creation when the following are included in the
network definition:
* <forward dev='blah'>
* no forward element (an "isolated" network)
* nat port range when only ipv4 is nat-ed
* nat port range when both ipv4 & ipv6 are nated
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com>
Laine Stump [Wed, 12 Jun 2024 19:25:46 +0000 (15:25 -0400)]
tests: fix broken nftables test data so that individual tests are successful
When the chain names and table name used by the nftables firewall
backend were changed in commit 958aa7f274904eb8e4678a43eac845044f0dcc38, I forgot to change the test
data file base.nftables, which has the extra "list" and "add
chain/table" commands that are generated for the first test case of
networkxml2firewalltest.c. When the full set of tests is run, the
first test will be an iptables test case, so those extra commands
won't be added to any of the nftables cases, and so the data in
base.nftables never matches, and the tests are all successful.
However, if the test are limited with, e.g. VIR_TEST_RANGE=2 (test #2
will be the nftables version of the 1st test case), then the commands
to add nftables table/chains *will* be generated in the test output,
and so the test will fail. Because I was only running the entire test
series after the initial commits of nftables tests, I didn't notice
this. Until now.
base.nftables has now been updated to reflect the current names for
chains/table, and running individual test cases is once again
successful.
Fixes: 958aa7f274904eb8e4678a43eac845044f0dcc38 Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com>
Adam Julis [Fri, 21 Jun 2024 16:16:55 +0000 (18:16 +0200)]
qemuDomainDiskChangeSupported: Fill in missing check
The attribute 'discard_no_unref' of <disk/> is not allowed to be
changed while the virtual machine is running.
Resolves: https://issues.redhat.com/browse/RHEL-37542 Signed-off-by: Adam Julis <ajulis@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Fri, 7 Jun 2024 16:46:34 +0000 (12:46 -0400)]
network: allow for forward dev to be a transient interface
A user reported that if they set <forward mode='nat|route' dev='blah'>
starting the network would fail if the device 'blah' didn't already
exist.
This is caused by using "iif" and "oif" in nftables rules to check for
the forwarding device - these two commands work by saving the named
interface's ifindex (an unsigned integer) when the rule is added, and
comparing it to the ifindex associated with the packet's path at
runtime. This works great if the interface both 1) exists when the
rule is added, and 2) is never deleted and re-created after the rule
is added (since it would end up with a different ifindex).
When checking for the network's bridge device, it is okay for us to
use "iif" and "oif", because the bridge device is created before the
firewall rules are added, and will continue to exist until just after
the firewall rules are deleted when the network is shutdown.
But since the forward device might be deleted/re-added during the
lifetime of the network's firewall rules, we must instead us "oifname"
and "iifname" - these are much less efficient than "Xif" because they
do a string compare of the interface's name rather than just comparing
two integers (ifindex), but they don't require the interface to exist
when the rule is added, and they can properly cope with the named
interface being deleted and re-added later.
Fixes: a4f38f6ffe6a9edc001d18890ccfc3f38e72fb94 Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Fri, 21 Jun 2024 08:37:35 +0000 (10:37 +0200)]
domain_validate: Add missing 'break' in virDomainDefLaunchSecurityValidate()
A few commits ago (v10.4.0-101-gc65eba1f57) I've introduced
virDomainDefLaunchSecurityValidate() and a switch() statement in
it. Some cases are empty but are lacking 'break' statement which
is not valid. Provide missing 'break' statement.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Thu, 13 Jun 2024 12:35:57 +0000 (14:35 +0200)]
qemu_firmware: Pick the right firmware for SEV-SNP guests
The firmware descriptors have 'amd-sev-snp` feature which
describes whether firmware is suitable for SEV-SNP guests.
Provide necessary implementation to detect the feature and pick
the right firmware if guest is SEV-SNP enabled.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Wed, 12 Jun 2024 13:22:00 +0000 (15:22 +0200)]
qemu: Build cmd line for SEV-SNP
Pretty straightforward as qemu has 'sev-snp-guest' object which
attributes maps pretty much 1:1 to our XML model. Except for
@vcek where QEMU has 'vcek-disabled`, an inverted boolean, while
we model it as virTristateBool. But that's easy to map too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Tue, 11 Jun 2024 09:58:41 +0000 (11:58 +0200)]
conf: Introduce SEV-SNP support
SEV-SNP is an enhancement of SEV/SEV-ES and thus it shares some
fields with it. Nevertheless, on XML level, it's yet another type
of <launchSecurity/>.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Mon, 10 Jun 2024 14:17:26 +0000 (16:17 +0200)]
qemu_monitor: Allow querying SEV-SNP state in 'query-sev'
In QEMU commit v9.0.0-1155-g59d3740cb4 the return type of
'query-sev' monitor command changed to accommodate SEV-SNP. Even
though we currently support launching plain SNP guests, this will
soon change.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Wed, 12 Jun 2024 07:29:59 +0000 (09:29 +0200)]
src: Convert some _virDomainSecDef::sectype checks to switch()
In a few instances there is a plain if() check for
_virDomainSecDef::sectype. While this works perfectly for now,
soon there'll be another type and we can utilize compiler to
identify all the places that need adaptation. Switch those if()
statements to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Wed, 12 Jun 2024 08:06:57 +0000 (10:06 +0200)]
Drop needless typecast to virDomainLaunchSecurity
The sectype member of _virDomainSecDef struct is already declared
as of virDomainLaunchSecurity type. There's no need to typecast
it to the very same type when passing it to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Tue, 11 Jun 2024 10:12:08 +0000 (12:12 +0200)]
conf: Move some members of virDomainSEVDef into virDomainSEVCommonDef
Some parts of SEV are to be shared with SEV SNP. In order to
reuse XML parsing / formatting code cleanly, let's move those
common bits into a new struct (virDomainSEVCommonDef) and adjust
rest of the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Tue, 11 Jun 2024 08:44:24 +0000 (10:44 +0200)]
qemu_monitor_json: Report error in error paths in SEV related code
While working on qemuMonitorJSONGetSEVMeasurement() and
qemuMonitorJSONGetSEVInfo() I've noticed that if these functions
fail, they do so without appropriate error set. Fill in error
reporting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Thu, 13 Jun 2024 14:21:47 +0000 (16:21 +0200)]
qemuMigrationSrcRun: Re-check whether VM is active before accessing job data
'qemuProcessStop()' clears the 'current' job data. While the code under
the 'error' label in 'qemuMigrationSrcRun()' does check that the VM is
active before accessing the job, it also invokes multiple helper
functions to clean up the migration including
'qemuMigrationSrcNBDCopyCancel()' which calls 'qemuDomainObjWait()'
invalidating the result of the liveness check as it unlocks the VM.
Duplicate the liveness check and explain why. The rest of the code e.g.
accessing the monitor is safe as 'qemuDomainEnterMonitorAsync()'
performs a liveness check. The cleanup path just ignores the return
values of those functions.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Thu, 13 Jun 2024 14:15:58 +0000 (16:15 +0200)]
qemu: migration: Properly check for live VM after qemuDomainObjWait()
Similarly to the one change in commit 4d1a1fdffda19a62d62fa2457d162362
we should be checking that the VM is not being yet destroyed if we've
invoked qemuDomainObjWait().
Use the new helper qemuDomainObjIsActive().
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The helper checks whether VM is active including the internal qemu
state. This helper will become useful in situations when an async job
is in use as VIR_JOB_DESTROY can run along async jobs thus both checks
are necessary.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Thu, 6 Jun 2024 15:43:12 +0000 (17:43 +0200)]
qemu: process: Ensure that 'beingDestroyed' gets cleared only after VM id is reset
Prevent the possibility that a VM could be considered as alive while
inside qemuProcessStop.
A recently fixed bug which unlocked the domain object while inside
qemuProcessStop showed that there's possibility to confuse the state of
the VM to be considered active while 'qemuProcessStop' is processing
shutdown of the VM. Ensure that this doesn't happen by clearing the
'beingDestroyed' flag only after the VM id is cleared.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Wed, 12 Jun 2024 13:54:24 +0000 (15:54 +0200)]
qemuProcessStop: Move code not depending on 'vm->def->id' after reset of the ID
There are few function calls done while cleaning up a stopped VM which
do require the old VM id, to e.g. clean up paths containing the 'short'
domain name in the path.
Anything else, which doesn't strictly require it can be moved after
clearing the 'id' in order to decrease likelyhood of potential bugs.
This patch moves all the code which does not require the 'id' (except
for the log entry and closing the monitor socket) after the statement
clearing the id and adds a comment explaining that anything in the
section must not unlock the VM object.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 10 Jun 2024 16:12:16 +0000 (18:12 +0200)]
qemuProcessStop: Prevent crash when qemuDomainObjStopWorker() unlocks the VM
'qemuDomainObjStopWorker()' which is meant to dispose of the event loop
thread for the monitor unlocks the VM object while disposing the thread
to prevent possible deadlocks with events waiting on the monitor thread.
Unfortunately 'qemuDomainObjStopWorker()' is called *before* the VM is
marked as inactive by clearing 'vm->def->id', but at the same time it's
no longer marked as 'beingDestroyed' when we're inside
'qemuProcessStop()'.
If 'vm' would be kept locked this wouldn't be a problem. Same way it's
not a problem for anything that uses non-ASYNC VM jobs, or when the
monitor is accessed in an async job, as the 'destroy' job interlocks
with those.
It is a problem for code inside an async job which uses
'qemuDomainObjWait()' though. The API contract of qemuDomainObjWait()
ensures the caller that the VM on successful return from it, but in this
specific reason it's not the case, as both 'beingDestroyed' is already
false, and 'vm->def->id' is not yet cleared.
To fix the issue move the 'qemuDomainObjStopWorker()' call *after*
clearing 'vm->def->id' and also add a note stating what the function is
doing.
Fixes: 860a999802d3c82538373bb3f314f92a2e258754 Closes: https://gitlab.com/libvirt/libvirt/-/issues/640 Reported-by: luzhipeng <luzhipeng@cestc.cn> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Tue, 11 Jun 2024 13:50:52 +0000 (15:50 +0200)]
qemuDomainDiskPrivateDispose: Prevent dangling 'disk' pointer in blockjob data
Clear the 'disk' member of 'blockjob' as we're freeing the disk object
at this point. While this should not normally happen it was observed
when other bug allowed the VM to be cleared while other threads didn't
yet finish.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Similarly to other blockjob handlers, if there's no disk associated with
the blockjob the handler needs to behave correctly. This is needed as
the disk might have been de-associated on unplug or other operations.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Boris Fiuczynski [Wed, 19 Jun 2024 12:29:17 +0000 (14:29 +0200)]
nodedev: add ccw device state and remove fencing
Instead of fencing offline ccw devices add the state to the ccw
capability.
Resolves: https://issues.redhat.com/browse/RHEL-39497 Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Prevent the creation of a new DASD node object when the device does not
exist.
Resolves: https://issues.redhat.com/browse/RHEL-39497 Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Boris Fiuczynski [Wed, 19 Jun 2024 12:29:15 +0000 (14:29 +0200)]
nodedev: improve DASD detection
In newer DASD driver versions the ID_TYPE tag is supported. This tag is
missing after a system reboot but when the ccw device is set offline and
online the tag is included. To fix this version independently we need to
check if devices detected as type disk is actually a DASD to maintain
the node object consistency and not end up with multiple node objects
for DASDs.
Resolves: https://issues.redhat.com/browse/RHEL-39497 Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Boris Fiuczynski [Wed, 19 Jun 2024 12:29:14 +0000 (14:29 +0200)]
nodedev: refactor storage type fixup
Refactor the storage type fixup into a reusable method.
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Fri, 14 Jun 2024 11:40:58 +0000 (13:40 +0200)]
virnetworkobj: Free fwRemoval before setting another one in virNetworkObjSetFwRemoval()
The virNetworkObjSetFwRemoval() function is called at least two
times when there's a network running and network driver
initializes:
1) when loading state XMLs:
#0 virNetworkObjSetFwRemoval (obj=0x7fffd4028250, fwRemoval=0x7fffd4020ad0) at ../src/conf/virnetworkobj.c:258
#1 0x00007ffff7a69c68 in virNetworkLoadState (...) at ../src/conf/virnetworkobj.c:952
#2 0x00007ffff7a6a35d in virNetworkObjLoadAllState (...) at ../src/conf/virnetworkobj.c:1072
#3 0x00007ffff7f9625f in networkStateInitialize (...) at ../src/network/bridge_driver.c:624
2) when firewall rules are being reloaded:
#0 virNetworkObjSetFwRemoval (obj=0x7fffd4028250, fwRemoval=0x7fffd402e5b0) at ../src/conf/virnetworkobj.c:258
#1 0x00007ffff7f997b4 in networkReloadFirewallRulesHelper (obj=0x7fffd4028250, opaque=0x0) at ../src/network/bridge_driver.c:1703
#2 0x00007ffff7a6b09b in virNetworkObjListForEachHelper (payload=0x7fffd4028250, ...) at ../src/conf/virnetworkobj.c:1414
#3 0x00007ffff79287b6 in virHashForEachSafe (...) at ../src/util/virhash.c:387
#4 0x00007ffff7a6b119 in virNetworkObjListForEach (...) at ../src/conf/virnetworkobj.c:1441
#5 0x00007ffff7f99978 in networkReloadFirewallRules (...) at ../src/network/bridge_driver.c:1742
#6 0x00007ffff7f962f2 in networkStateInitialize (...) at ../src/network/bridge_driver.c:645
Since virNetworkObjSetFwRemoval() does not free the object stored
in the first call, the second call just overwrites the stored
pointer leading to a memory leak:
5,530 (48 direct, 5,482 indirect) bytes in 1 blocks are definitely lost in loss record 1,863 of 1,880
at 0x4848C43: calloc (vg_replace_malloc.c:1595)
by 0x4F1E979: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.7800.6)
by 0x4976E32: virFirewallNew (virfirewall.c:118)
by 0x4979BA9: virFirewallParseXML (virfirewall.c:1071)
by 0x4ABEB1E: virNetworkLoadState (virnetworkobj.c:938)
by 0x4ABF35C: virNetworkObjLoadAllState (virnetworkobj.c:1072)
by 0x4E9A25E: networkStateInitialize (bridge_driver.c:624)
by 0x4CB1FA6: virStateInitialize (libvirt.c:665)
by 0x15A6C6: daemonRunStateInit (remote_daemon.c:611)
by 0x49E69F0: virThreadHelper (virthread.c:256)
by 0x532B428: start_thread (in /lib64/libc.so.6)
by 0x5397373: clone (in /lib64/libc.so.6)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Fri, 14 Jun 2024 10:52:54 +0000 (12:52 +0200)]
virfirewall: Fir a memleak in virFirewallParseXML()
As a part of parsing XML, virFirewallParseXML() calls
virXMLNodeContentString() and then passes the return value
further. But virXMLNodeContentString() is documented so that it's
the caller's responsibility to free the returned string, which
virFirewallParseXML() never does. This leads to a memory leak:
14,300 bytes in 220 blocks are definitely lost in loss record 1,879 of 1,891
at 0x4841858: malloc (vg_replace_malloc.c:442)
by 0x5491E3C: xmlBufCreateSize (in /usr/lib64/libxml2.so.2.12.6)
by 0x54C2401: xmlNodeGetContent (in /usr/lib64/libxml2.so.2.12.6)
by 0x49F7791: virXMLNodeContentString (virxml.c:354)
by 0x4979F25: virFirewallParseXML (virfirewall.c:1134)
by 0x4ABEB1E: virNetworkLoadState (virnetworkobj.c:938)
by 0x4ABF35C: virNetworkObjLoadAllState (virnetworkobj.c:1072)
by 0x4E9A25E: networkStateInitialize (bridge_driver.c:624)
by 0x4CB1FA6: virStateInitialize (libvirt.c:665)
by 0x15A6C6: daemonRunStateInit (remote_daemon.c:611)
by 0x49E69F0: virThreadHelper (virthread.c:256)
by 0x532B428: start_thread (in /lib64/libc.so.6)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Commit 23c47944882b added parsing of serial ports connected to vspc, but
the VM can also have a network serial port with an empty filename or no
filename at all. Parse these the same way, as a <serial type='null'>.
Swapnil Ingle [Tue, 18 Jun 2024 16:39:58 +0000 (16:39 +0000)]
Pass shutoff reason to release hook
Sometimes in release hook it is useful to know if the VM shutdown was graceful
or not. This is especially useful to do cleanup based on the VM shutdown failure
reason in release hook. This patch proposes to use the last argument 'extra'
to pass VM shutoff reason in the call to release hook.
Making this change for Qemu and LXC.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Marc Hartmayer [Tue, 23 Apr 2024 18:09:05 +0000 (20:09 +0200)]
node_device_udev: Pass the `udevEventData` via parameter and use refcounting
Instead of accessing the global `driver` object pass the `udevEventData` as
parameter to the thread handler and watch callback. This has the advantage that:
1. proper refcounting
2. easier to read and test
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Marc Hartmayer [Tue, 23 Apr 2024 18:09:02 +0000 (20:09 +0200)]
node_device_udev: Use a worker pool for processing events and emitting nodedev event
Use a worker pool for processing the events (e.g. udev, mdevctl config changes)
and the initialization instead of a separate initThread and a mdevctl-thread.
This has the large advantage that we can leverage the job API and now this
thread pool is responsible to do all the "costly-work" and emitting the libvirt
nodedev events.
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Marc Hartmayer [Tue, 23 Apr 2024 18:09:00 +0000 (20:09 +0200)]
node_device_udev: nodeStateShutdownPrepare: Disconnect the signals explicitly
The documentation of gobject signals reads:
"If you are connecting handlers to signals and using a GObject instance as your
signal handler user data, you should remember to pair calls to
g_signal_connect() with calls to g_signal_handler_disconnect() or
g_signal_handlers_disconnect_by_func(). While signal handlers are automatically
disconnected when the object emitting the signal is finalised..." [1]
This means that the signal handlers are automatically disconnected as soon as
the `priv->mdevCtlMonitors` are finalised/released by `udevEventDataDispose`.
But this also means that it's possible that new work is tried to be scheduled
for the workerpool by the `mdevctlEventHandleCallback` (main thread context)
even if the workerpool has already been stopped by `nodeStateShutdownWait`. To
fully understand this, it's important to know that the main loop of the main
thread is still running for some time even after `nodeStateShutdownPrepare` has
been called. Let's avoid this situation by explicitly disconnect the signals
during `nodeStateShutdownPrepare`, which is called in the main thread, so that
no new work is attempted to be scheduled for the worker pool.
Marc Hartmayer [Tue, 23 Apr 2024 18:08:59 +0000 (20:08 +0200)]
node_device_udev: Introduce and use `stateShutdownPrepare` and `stateShutdownWait`
Introduce and use the driver functions for the node state shutdown preparation
and wait. As they're also called in the error/cleanup path of
`nodeStateInitialize`, they must be written in a way, that they can safely be
executed even if not everything is initialized.
In the next commit, these functions will be extended.
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Marc Hartmayer [Tue, 23 Apr 2024 18:08:58 +0000 (20:08 +0200)]
node_device_udev: Fix leak of mdevctlLock, udevThreadCond, and mdevCtlMonitors
Even if `priv->udev_monitor` was never initialized, the mdevctlLock, udevThread
were. Therefore let's match the order of releasing the resources the order of
allocating the resources in `nodeStateInitialize`.
In addition, use `g_steal_pointer` in `g_list_free_full`.
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Marc Hartmayer [Tue, 23 Apr 2024 18:08:57 +0000 (20:08 +0200)]
node_device_udev: Move responsibility to release `(init|udev)Thread` to `udevEventDataDispose`
Everything is released in `udevEventDataDispose` except for the threads, change
this as this makes the code easier to read as it can be simplified a little.
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Marc Hartmayer [Tue, 23 Apr 2024 18:08:53 +0000 (20:08 +0200)]
node_device_udev: Don't take `mdevctlLock` for `mdevctl list` and add comments about locking
Commit a99d876a0f58 ("node_device: Use automatic mutex management") replaced the
locking mechanism and accidentally removed the comment with the reason why the
lock is taken. The reason was to "ensure only a single thread can query mdevctl
at a time", but this reason is no longer valid or maybe it never was. Therefore,
let's remove this lock and add a comment to `mdevCtl` what it protects.
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Boris Fiuczynski [Tue, 23 Apr 2024 18:08:50 +0000 (20:08 +0200)]
nodedev: reset active config data on udev remove event
When a mdev device is destroyed or stopped the udev remove event
handling needs to reset the active config data of the node object
representing a persisted mdev.
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Boris Fiuczynski [Tue, 23 Apr 2024 18:08:49 +0000 (20:08 +0200)]
nodedev: immediate update of active config on udev events
When an udev add, change or remove event occurs the mdev active config data
requires an update via mdevctl as the udev does not contain all config data.
This update needs to occur immediately and to be finished before the libvirt
nodedev event is issued to keep the API usage reliable.
After this change, scheduleMdevctlUpdate call is already called in
`udevAddOneDevice` and can therefore be removed in `udevHandleOneDevice`.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Marc Hartmayer [Tue, 23 Apr 2024 18:08:48 +0000 (20:08 +0200)]
node_device_udev: Set @def to NULL
@def is owned by @obj after adding it the node device object list. As soon as
the @obj lock has been released, another thread could free @obj and therefore
@def. If now someone accesses @def this would lead to a heap-use-after-free and
therefore most likely to a segmentation fault, therefore set @def to NULL after
the ownership has moved.
While at it, add comments to other code places why @def is set to NULL.
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Boris Fiuczynski [Tue, 23 Apr 2024 18:08:47 +0000 (20:08 +0200)]
nodedev: fix mdev add udev event data handling
Two situations will trigger an udev add event:
1) the mdev is created when started (transient) or
2) the mdev was defined and is started
In case 1 there is no node object existing and no config data is copied.
In case 2 copying the active config data of an existing node object will
only copy invalid data. Instead copying the defined config data will
store valid data into the newly added node object.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>