qemu: adjust memlock for multiple vfio/vdpa devices
When multiple VFIO or VDPA devices are assigned to a guest, the guest
can fail to start because the guest fails to map enough memory. For
example, the case mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=2111317 results in this
failure:
The current memlock limit calculation does not work for scenarios where
there are multiple such devices assigned to a guest. The root causes are
a little bit different between VFIO and VDPA devices.
For VFIO devices, the issue only occurs when a vIOMMU is present. In
this scenario, each vfio device is assigned a separate AddressSpace
fully mapping guest RAM. When there is no vIOMMU, the devices are all
within the same AddressSpace so no additional memory limit is needed.
For VDPA devices, each device requires the full memory to be mapped
regardless of whether there is a vIOMMU or not.
In order to enable these scenarios, we need to multiply memlock limit
by the number of VDPA devices plus the number of VFIO devices for guests
with a vIOMMU. This has the potential for pushing the memlock limit
above the host physical memory and negating any protection that these
locked memory limits are providing, but there is no other short-term
solution.
In the future, there should be have a revised userspace iommu interface
(iommufd) that the VFIO and VDPA backends can make use of. This will be
able to share locked memory limits between both vfio and vdpa use cases
and address spaces and then we can disable these short term hacks. But
this is still in development upstream.
libxl: Fix build with recent Xen that introduces new disk backend type
Xen toolstack has gained basic Virtio support recently which becides
adding various virtio related stuff introduces new disk backend type
LIBXL_DISK_BACKEND_STANDALONE [1].
Unfortunately, this caused a regression in libvirt build with Xen support
enabled, reported by the osstest today [2]:
CC libxl/libvirt_driver_libxl_impl_la-xen_xl.lo
../../src/libxl/xen_xl.c: In function 'xenParseXLDisk':
../../src/libxl/xen_xl.c:779:17: error: enumeration value 'LIBXL_DISK_BACKEND_STANDALONE'
not handled in switch [-Werror=switch-enum]
switch (libxldisk->backend) {
^~~~~~
cc1: all warnings being treated as errors
The interesting fact is that switch already has a default branch (which ought
to cover such new addition), but the error is triggered as -Wswitch-enum
gives a warning about an omitted enumeration code even if there is a default
label.
Also there is a similar issue in libxlUpdateDiskDef() which I have reproduced
after fixing the first one, but it that case the corresponding switch doesn't
have a default branch.
Fix both issues by inserting required enumeration item to make the compiler
happy and adding ifdef guard to be able to build against old Xen libraries
as well (without LIBXL_HAVE_DEVICE_DISK_SPECIFICATION). Also add a default
branch to switch in libxlUpdateDiskDef().
Please note, that current patch doesn't implement the proper handling of
LIBXL_DISK_BACKEND_STANDALONE and friends, it is just intended to fix
the regression immediately to unblock the osstest. Also it worth mentioning
that current patch won't solve the possible additions in the future.
Ján Tomko [Thu, 11 Aug 2022 16:50:00 +0000 (18:50 +0200)]
qemu: always assume QEMU_CAPS_DUMP_GUEST_MEMORY
Introduced back in 2012 by QEMU commit:
commit 783e9b4826b95e53e33c42db6b4bd7d89bdff147
introduce a new monitor command 'dump-guest-memory' to dump guest's memory
Released in QEMU 1.2.0
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Andrea Bolognani [Wed, 17 Aug 2022 13:41:57 +0000 (15:41 +0200)]
tests: Reset macOS dyld environment
This is needed to ensure the environment variables that we need
for the test program itself, specifically to load mock libraries,
do not interfere with any command that gets invoked by it, either
directly or indirectly. We already perform the same cleanup step
for LD_* variables.
This makes the test failures
error : virCommandWait:2752 : internal error: Child process
(/usr/libexec/qemu/vhost-user/test-vhost-user-gpu --print-capabilities)
unexpected fatal signal 6: dyld[8896]: symbol not found in flat
namespace '_virQEMUCapsGet'
error : qemuVhostUserFillDomainGPU:394 : operation failed: Unable to
find a satisfying vhost-user-gpu
that were showing up on macOS 12 go away.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Andrea Bolognani [Wed, 17 Aug 2022 13:37:16 +0000 (15:37 +0200)]
util: Preserve macOS dyld environment by default
The DYLD_* environment variables on macOS have the same purpose
as the LD_* variables have on Linux. Since we're preserving the
latter by default, it makes sense to do the same for the former
as well.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Michal Privoznik [Thu, 11 Aug 2022 18:57:02 +0000 (20:57 +0200)]
qemu_tpm: Don't crash if qemuTPMPcrBankBitmapToStr(NULL)
Historically, the tpm->data.emulator.activePcrBanks member was an
unsigned int but since it was used as a bitmap it was converted
to virBitmap type instead. Now, the virBitmap is allocated inside
of virDomainTPMDefParseXML() but only if <activePcrBanks/> was
found with at last one child element. Otherwise it stays NULL.
Fast forward to starting a domain with TPM 2.0 and no
<activePcrBanks/> configured. Eventually,
qemuTPMEmulatorBuildCommand() is called, which subsequently calls
qemuTPMEmulatorReconfigure() and finally
qemuTPMPcrBankBitmapToStr() passing the NULL value. Before
rewrite to virBitmap this function would return NULL for empty
activePcrBanks but now, well, now it crashes.
Fixes: 52c7c31c8038aa31d502f59a40e4fb4ba9f61113 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Amneesh Singh [Thu, 18 Aug 2022 03:17:20 +0000 (08:47 +0530)]
qemu_driver: use qemuMonitorQueryStats to extract halt poll time
This patch uses qemuMonitorQueryStats to query "halt_poll_success_ns"
and "halt_poll_fail_ns" for every vCPU. The respective values for each
vCPU are then added together.
Signed-off-by: Amneesh Singh <natto@weirdnatto.in> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
This patch adds an API for the "query-stats" QMP command.
The query returns a JSON containing the statistics based on the target,
which can either be vCPU or VM, and the providers. The API deserializes
the query result into an array of GHashMaps, which can later be used to
extract all the query statistics. GHashMaps are used to avoid traversing
the entire array to find the statistics you are looking for. This would
be a singleton array if the target is a VM since the returned JSON is
also a singleton array in that case.
Signed-off-by: Amneesh Singh <natto@weirdnatto.in> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Since we cannot properly plug a new VM into the distributed switch, we can at
least report the provided pieces of information, so that XML editing still works
even for VMs with such interfaces.
vmx: Require networkName for bridged and custom NICs
Commit 70768cda9740 marked this particular config string optional, but
forgot that two of the interface types still require this name to
exist. Mark it as optional only if there is no connectionType.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit 70768cda9740 added a functionality that was previously (in an unsubmitted
version of the commit) represented differently in the XML, but the filenames
kept the old name. Fix the name so they are not misleading.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jim Fehlig [Thu, 11 Aug 2022 22:36:24 +0000 (16:36 -0600)]
schema: Add maxphysaddr element to hostcpu
The output of "virsh capabilities" was not conformant to the
capability.rng schema. Add the missing element to the schema.
Fixes: c647bf29afb9890c792172ecf7db2c9c27babbb6 Signed-off-by: Tim Wiederhake <twiederh@redhat.com> Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Jim Fehlig [Thu, 11 Aug 2022 22:13:36 +0000 (16:13 -0600)]
schema: Remove optional nesting in hostcpu rng
The hostcpu rng has an optional "model" element, with the remaining
elements each within a nested optional. Remove the optional nesting
and have each element explicitly listed as optional
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
While we assume that -blockdev is supported the validator had also some
corner cases for -drive. Since we use '-drive' exclusively for the
extremely rarely used SD cards it makes no sense to have the validation.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Thu, 21 Jul 2022 13:25:24 +0000 (15:25 +0200)]
qemu: command: Generate -drive for SD cards via JSON props
Since we know we have a modern qemu at hand which can interpret the
dotted syntax, we can format the -drive needed for SD cards via the
common infrastructure we have for all blockdev stuff.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 20 Jul 2022 09:26:12 +0000 (11:26 +0200)]
qemu: monitor: Remove unused qemuMonitorQueryNamedBlockNodes and clean up
The top level API is unused so it can be removed but internally the JSON
version is called by other monitor commands which extract information
from the reply.
Thus qemuMonitorJSONQueryNamedBlockNodes is unexported and moved
appropriately.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 19 Jul 2022 13:45:22 +0000 (15:45 +0200)]
qemu: block: Remove legacy spellings for InetSocketAddress
In one of early iterations of the gluster driver 'tcp' was used instead
of 'inet' and 'socket' instead of 'path' for unix sockets. All of this
can be now removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 20 Jul 2022 18:08:37 +0000 (20:08 +0200)]
qemu: Refactor access to 'qomName' field of the qemu disk private data
The code which fills 'qomName' does so only when the blockdev capability
is enabled so we don't have to check it separately as it can be only
non-NULL when blockdev is used.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Fri, 15 Jul 2022 14:58:34 +0000 (16:58 +0200)]
qemu: capabilities: Unconditionally set QEMU_CAPS_BLOCKDEV/QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI
The cleanup of the code to always assume support for QEMU_CAPS_BLOCKDEV
will not be simple, so for now we hardcode the support and the code will
be cleaned up gradually.
We also disallow users to clear the flags via the namespace property or
qemu.conf configuration.
The change to the PPC64 test data originates from the fact that the
capability dump is not from the release version but is lacking one of
the necessary flags to enable -blockdev.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>