Laine Stump [Thu, 14 Jan 2021 16:57:45 +0000 (11:57 -0500)]
network: explicitly set the MTU of the bridge device.
In the past, the MTU of libvirt virtual network bridge devices was
implicitly set by setting the MTU of the "dummy tap device" (which was
being added in order to force a particular MAC address from the
bridge). But the dummy tap device was removed in commit ee6c936fbb
(libvirt-6.8.0), and so the mtu setting in the network is ignored.
The solution is, of course, to explicitly set the bridge device MTU
when it is created.
Note that any guest interface with a larger MTU that is attached will
cause the bridge to (temporarily) assume the larger MTU, but it will
revert to the bridge's own MTU when that device is deleted (this is
not due to anything libvirt does; it's just how Linux host bridges
work).
Fixes: ee6c936fbbfb217175326f0201d59cc6727a0678
Resolves: https://bugzilla.redhat.com/1913561 Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Laine Stump [Tue, 12 Jan 2021 19:10:05 +0000 (14:10 -0500)]
qemu: don't set interface MTU when managed='no'
managed='no' on an <interface> allows an unprivileged libvirt to use a
pre-created tap/macvtap device that libvirt has permission to
open/read/write, but no permission to modify (i.e. set the MTU or MAC
address). But when the XML had an <mtu size='blah'/> setting (which
was put there in order to tell the *guest* OS what MTU to set for the
emulated device at the other end of the tap) we were attempting to set
the MTU of the tap device on the host, paying no attention to the
setting of 'managed'. That would of course end in failure.
This patch only sets the MTU if managed='no' is *not* set (so, if it
is 'yes', or just not set at all).
Note that MTU of the tap is also set when connecting the tap to a
bridge device, but managed='no' is only allowed for <interface
type='ethernet'>, which would never attach to a bridge anyway, so we
don't need the check there.
Resolves: https://bugzilla.redhat.com/1905929 Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Shi Lei [Mon, 11 Jan 2021 02:23:37 +0000 (10:23 +0800)]
netlink: Introduce a helper function to simplify netlink functions
Extract common code as helper function virNetlinkTalk, then simplify
the functions virNetlink[DumpLink|NewLink|DelLink|GetNeighbor].
Signed-off-by: Shi Lei <shi_lei@massclouds.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Shi Lei [Mon, 11 Jan 2021 02:23:35 +0000 (10:23 +0800)]
netlink: Minor changes for macros NETLINK_MSG_[NEST_START|NEST_END|PUT]
Move macros NETLINK_MSG_[NEST_START|NEST_END|PUT] from .h into .c;
within these macros, replace 'goto' with reporting error and returning;
simplify virNetlinkDumpLink and virNetlinkDelLink by using NETLINK_MSG_PUT.
Signed-off-by: Shi Lei <shi_lei@massclouds.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Erik Skultety [Thu, 14 Jan 2021 10:20:36 +0000 (11:20 +0100)]
gitlab-ci.yml: Convert only/except to the rules syntax
'rules' syntax replaces the only/except syntax with which it is
mutually exclusive. In some cases the 'rules' syntax is more readable
than the 'only/except' equivalent, in some cases it is not.
The idea behind this conversion is to introduce an explicit env variable
controlling the 'allow_failure' attribute which would then be attached
to a broken build job which would in turn result in a soft failure.
Such behaviour is not possible to achieve with the older 'only/except'
syntax.
Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Erik Skultety [Wed, 13 Jan 2021 16:45:06 +0000 (17:45 +0100)]
gitlab-ci.yml: Replace template anchors with extends
'extends' is slightly more readable and definitely more flexible in
terms of allowing includes of templates.
The main reason for this patch though is that the next patch converts
the 'only/except' syntax to the new (preferable) 'rules' syntax.
Variable anchors are still kept intact because the use case there is
different from regular template anchors.
Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Peter Krempa [Thu, 14 Jan 2021 12:57:52 +0000 (13:57 +0100)]
conf: disk: Parse and format <metadata_cache> also for <mirror>
Commit 154df5840d added support for <metadata_cache> as property of a
<disk>. Since the same parser is used to parse the XML used with
virDomainBlockCopy it starts the copy job with the appropriate cache
configured, but the <mirror> doesn't show this configuration nor it's
preserved if libvirtd is restarted during the mirror.
Add parsing, formatting and tests for <metadata_cache> for a <mirror>.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Andrea Bolognani [Tue, 12 Jan 2021 16:17:44 +0000 (17:17 +0100)]
qemu: Fix memstat for (non-)transitional memballoon
Depending on the memballoon model, the corresponding QOM node
will have a different type and we need to account for this
when searching for it in the QOM tree.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 11 Jan 2021 09:40:43 +0000 (10:40 +0100)]
qemuMigrationSrcNBDStorageCopyReady: Use ready-state of mirror from qemuBlockJobData
Use the per-job state to determine when the non-shared-storage mirror is
complete rather than the per-disk definition one. The qemuBlockJobData
is a newer approach and is always cleared after the blockjob is
terminated while the 'mirrorState' variable in the definition of the
disk may be left over. In such case the disk mirror would be considered
complete prematurely.
openvswitch: Check if OVS_VSCTL exists when getting interface name
So far we assumed that any vhostuser interface is plugged into an
OVS bridge and thus 'ovs-vsctl' exists. But this is not always
true. In testing scenarios it is possible to create a vhostuser
interface with this tool dpdk-testpmd (part of dpdk RPM) which
creates/connects to UNIX socket needed for vhostuser. Of course,
since there is no OVS then there is no interface name in which
case virNetDevOpenvswitchGetVhostuserIfname() should return 0.
The rest of APIs that assume OVS are not 'fixed' because we still
want them to fail (e.g. getting statistics, plugging interface
into an OVS bridge, unplugging it from an OVS bridge, ...).
The only API that is fixed is
virNetDevOpenvswitchGetVhostuserIfname() because it is called
explicitly when starting a guest (and callers are okay if no name
was found).
The other way to fix this bug seems to be to simply require
'ovs-vsctl' on spec file level, but that is too heavy gun given
that vhostuser is used by a small set of our users (assumption
made on requirements for vhostuser). Also, this way would drag in
yet another dependency for all users (even those who want minimal
libvirt).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1913156 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Erik Skultety [Fri, 8 Jan 2021 16:23:32 +0000 (17:23 +0100)]
docs: kbase: sev: Adjust the claims that virtio-blk doesn't work
Using virtio-blk with SEV on host kernels prior to 5.1 didn't work
because of SWIOTLB limitations and the way virtio has to use it over
DMA-API for SEV (see [1] for detailed info). That is no longer true, so
reword the kbase article accordingly.
Tim Wiederhake [Fri, 8 Jan 2021 14:43:02 +0000 (15:43 +0100)]
cpu-data: Pretend to always run on logical processor #0
The output of cpuid depends on the logical processor id the process
runs on, as reflected by the "local apic id" present in cpuid leaves
(eax=1,ebx=0), (eax=11,ebx=0), and (eax=11,ebx=1). This produces
arbitrary changes in the output files that complicate comparisons.
This patch masks the occurences of the local apic id with 0x00, so
that two consecutive runs of "./cpu-data.py gather" produce identical
results.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Laine Stump [Fri, 8 Jan 2021 05:36:31 +0000 (00:36 -0500)]
call virDomainNetNotifyActualDevice() for all interface types
Now that this function can be called regardless of interface type (and
whether or not we have a conn for the network driver), let's actually
call it for all interface types. This will assure that we re-connect
any disconnected bridge devices for <interface type='bridge'> as
mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1730084#c26
(until now we've only been reconnecting bridge devices for <interface
type='network'>)
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 20 Oct 2020 16:35:09 +0000 (12:35 -0400)]
conf: make virDomainNetNotifyActualDevice() callable for all interface types
The bridge reattach functionality in this function should be called
for interface types other than just type='network', so make it
callable for any type - it just becomes a NOP for types where no
action is needed.
In the case of <interface type='network'> we need to create a port in
the network driver, and for both type='network and type='bridge' we
need to reattach the bridge device (note that
virDomainNetGetActualBridgeName() gets the bridge name from the
appropriate (and different!) location for either type of interface).
All other interfaces currently require no action.
modifying callers of this function to actually call it for all
interface types is in the next patch. For now the behavior should be
identical pre and post-patch.
(NB: the conn argument can now legitimately be NULL, so we need to
change the ATTRIBUTE_NONNULL() directive for the function's
declaration - I noticed when making this change that argument 3 (the
NetDefPtr) could never be NULL, so I added ATTRIBUTE_NONNULL(3) while
removing ATTRIBUTE_NONNULL(1) (conn)).
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>Reviewed-by: Michal Privoznik <mprivozn@redhat.com>#Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Fri, 8 Jan 2021 01:03:05 +0000 (20:03 -0500)]
util: Skip over any extra verbiage preceding version in dnsmasq version string
dnsmasq usually prints out a version string like this:
Dnsmasq version 2.82 [...]
but a user reported that the build of dnsmasq included with pihole has
a version string like this:
Dnsmasq version pi-hole-2.81 [...]
We parse the dnsmasq version number to figure out if the dnsmasq
binary supports certain features. Since we expect the version number
(and it must be only numbers!) to start on the first non-space after
the string "Dnsmasq version", we fail to parse this format of the
version string.
Rather than spending a bunch of time trying to get pihole to change
that, we can just make our parsing more permissive - after searching
for "Dnsmasq version", we'll skip ahead to the first decimal digit,
rather than just the first non-space.
(NB: The features we're checking for purely by looking at version
number have been in all releases of dnsmasq since at least 2012, so we
could actually just remove the reading of the version number
completely. However it's possible (although *highly* unlikely)
that some new feature would be added to dnsmasq in the future and we
would need to add that code back.)
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/29 Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Laine Stump [Fri, 8 Jan 2021 00:55:43 +0000 (19:55 -0500)]
util: new function virSkipToDigit()
This function skips over the beginning of a string until it reaches a
decimal digit (0-9) or the NULL at the end of the string. The original
pointer is modified in place (similar to virSkipSpaces()).
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Thu, 7 Jan 2021 14:30:21 +0000 (15:30 +0100)]
conf: snapshot: Add support for <metadata_cache>
Similarly to the domain config code it may be beneficial to control the
cache size of images introduced as snapshots into the backing chain.
Wire up handling of the 'metadata_cache' element.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 6 Jan 2021 16:19:03 +0000 (17:19 +0100)]
schema: secret: Relax requirements for usage name
There's plenty of existing documentation [1] which shows as example a
name which contains a space and a dot ('client.admin secret') as ceph
usage name.
Use a more relaxed type in the RNG schema since the usage name is
actually just a string used to look up the secret.
[1]:
https://docs.ceph.com/en/latest/rbd/libvirt/#configuring-the-vm
https://documentation.suse.com/ses/6/html/ses-all/cha-ceph-libvirt.html#ceph-libvirt-cfg-vm
Libvirt docs were correct though:
https://libvirt.org/formatsecret.html#CephUsageType
Peter Krempa [Wed, 6 Jan 2021 15:51:21 +0000 (16:51 +0100)]
schema: Add define for object names
Objects such as domain, pool, etc re-define the regex for the format.
Add more generic types for objects with/without a slash which we'll be
able to reuse also for other objects.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 6 Jan 2021 16:12:03 +0000 (17:12 +0100)]
schema: domaincommon: Remove pointless 'choice' from 'inituser'/'initgroup'
'genericName' allows arbitrary numeric strings so using an explicit
'unsignedInt' choice is pointless. The elements take an username or a
uid which is prefixed by '+', both of which are covered by
'genericName'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Thu, 7 Jan 2021 09:19:22 +0000 (10:19 +0100)]
qemuDomainSetBlockIoTune: Skip monitor call for empty cdrom
Similarly to startup of the VM qemu doesn't like setting throttling for
an empty drive. Just skip it since we do the correct thing once new
media is inserted.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/117 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Han Han <hhan@redhat.com>
Peter Krempa [Mon, 30 Nov 2020 13:14:22 +0000 (14:14 +0100)]
qemuBuildChrChardevStr: Rename 'flags' to 'cdevflags'
The monitor code uses 'flags' for the flags of the monitor builder,
while in this function it's a different set of flags. All callers pass a
variable named 'cdevflags', so rename the argument to suit.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 30 Nov 2020 15:23:55 +0000 (16:23 +0100)]
qemuMonitorAddObject: Refactor cleanup
Remove freeing/clearing of @props as the function doesn't guarantee that
it happens on success, rename the variable hodling copy of the alias and
use g_autofree to automatically free it and remove the cleanup label as
well as 'ret' variable.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 30 Nov 2020 15:21:18 +0000 (16:21 +0100)]
qemuMonitorAddObject: Fix semantics of @alias
The callers of qemuMonitorAddObject rely on the fact that @alias is
filled only when the object is added successfully. This is documented
but the code didn't behave like that.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 30 Nov 2020 14:34:56 +0000 (15:34 +0100)]
qemuMonitorJSONMakeCommandInternal: Clear @arguments when stolen
All callers of qemuMonitorJSONMakeCommandInternal will benefit from
making @arguments a double pointer and passing it to
virJSONValueObjectCreate directly which will clear it if it steals the
value.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Erik Skultety [Thu, 7 Jan 2021 15:53:21 +0000 (16:53 +0100)]
hostdev: mdev: Lookup mdevs by sysfs path rather than mdev struct
The lookup didn't do anything apart from comparing the sysfs paths
anyway since that's what makes each mdev unique.
The most ridiculous usage of the old logic was in
virHostdevReAttachMediatedDevices where in order to drop an mdev
hostdev from the list of active devices we first had to create a new
mdev and use it in the lookup call. Why couldn't we have used the
hostdev directly? Because the hostdev and mdev structures are
incompatible.
The way mdevs are currently removed is via a write to a specific sysfs
attribute. If you do it while the machine which has the mdev assigned
is running, the write call may block (with a new enough kernel, with
older kernels it would return a write error!) until the device
is no longer in use which is when the QEMU process exits.
The interesting part here comes afterwards when we're cleaning up and
call virHostdevReAttachMediatedDevices. The domain doesn't exist
anymore, so the list of active hostdevs needs to be updated and the
respective hostdevs removed from the list, but remember we had to
create an mdev object in the memory in order to find it in the list
first which will fail because the write to sysfs had already removed
the mdev instance from the host system.
And so the next time you try to start the same domain you'll get:
"Requested operation is not valid: mediated device <path> is in use by
driver QEMU, domain <name>"
Fixes: https://gitlab.com/libvirt/libvirt/-/issues/119 Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Erik Skultety [Thu, 7 Jan 2021 15:48:40 +0000 (16:48 +0100)]
hostdev: Update mdev pointer reference after checking device type
We set the pointer to some garbage packed structure data without
knowing whether we were actually handling the type of device we
expected to be handling. On its own, this was harmless, because we'd
never use the pointer as we'd skip the device if it were not the
expected type. However, it's better to make the logic even more
explicit - we first check the device and only when we're sure we have
the expected type we then update the pointer shortcut.
Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Laine Stump [Wed, 6 Jan 2021 20:42:47 +0000 (15:42 -0500)]
util: validate pcie_cap_pos != 0 in virDeviceHasPCIExpressLink()
virDeviceHasPCIExpressLink() wasn't checking that pcie_cap_pos was
valid before attempting to use it, which could lead to reading the
byte at offset 0 + PCI_CAP_ID_EXP instead of [valid offset] +
PCI_CAP_ID_EXP. In particular, this could happen for "integrated" PCI
devices (those that are on the PCIe root complex). If it happened that
the byte from the wrong address had the "right" bit set, then it would
lead to us innappropriately believing that Express Link info was
available when it wasn't, and the node device driver would then log an
error like this:
virPCIDeviceGetLinkCapSta:2754 :
internal error: pci device 0000:00:18.0 is not a PCI-Express device
during a libvirtd restart. (this didn't ever occur until after
virPCIDeviceIsPCIExpress() was made more intelligent in commit c00b6b1ae, which hasn't yet been in any official release)
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Wed, 6 Jan 2021 18:44:40 +0000 (13:44 -0500)]
lxc: eliminate leaked and dangling pointers in virLXCProcessSetupInterfaceTap
The two scenarios were found by Coverity after a seemingly-unrelated
change to virLXCProcessSetupInterfaceTap() (in commit ecfc2d5f43), and
explained by John Ferlan here:
a) On entry to virLXCProcessSetupInterfaceTap() if net->ifname != NULL
then a copy of net->ifname is made into parentVeth, and a reference
to *that* pointer is sent down to virNetDevVethCreate().
b) If parentVeth (aka net->ifname) is a template name (e.g. "blah%d"),
then virNetDevVethCreate() calls virNetDevGenerateName(), and if
virNetDevGenerateName() successfully generates a usable name
(e.g. "blah27") then it will free the original template string
(which is pointed to by net->ifname and by parentVeth), then
replace the pointer in parentVeth with a pointer to the new
string. Note that net->ifname still points to the now-freed
template string.
c) returning back up to virLXCProcessSetupInterfaceTap(), we check if
net->ifname == NULL - it *isn't* (still contains stale pointer to
template string), so we don't replace it with the pointer to the new
string that is in parentVeth.
d) Result: the new string is leaked once we return from
virLXCProcessSetupInterfaceTap(), while there is a dangling pointer
to the old string in net->ifname.
There is also a leak if there is a failure somewhere between steps (b)
and (c) above - the failure cleanup in virNetDevVethCreate() will only
free the newly-generated parentVeth string if the original pointer was
NULL (narrator: "It wasn't."). But it's a new string allocated by
virNetDevGenerateName(), not the original string from net->ifname, so
it really does need to be freed.
The solution is to make a copy of the entire original string into a
g_autofree pointer, then iff everything is successful we g_free() the
original net->ifname and replace it by stealing the string returned by
virNetDevVethCreate().
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Mon, 21 Dec 2020 01:35:02 +0000 (20:35 -0500)]
lxc: remove unnecessary call to virNetDevReserveName()
In all cases *except* when parsing status XML as libvirt is being
restarted, the XML parser will delete any manually specified interface
name (aka "<target dev='blah'/>" aka net->ifname) that could have been
generated by virNetDevGenerateName(). This means that during the setup
when a domain is being started (e.g. during
virLXCProcessSetupInterfaceTap()) it is pointless to call
virNetDevReserveName() with any setting of net->ifname that has come
from the XML parser - it is guaranteed to not fit the pattern of any
auto-generated name, and so the call is just a NOP anyway.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>