Julio Faracco [Wed, 22 Apr 2020 20:05:57 +0000 (17:05 -0300)]
conf: Add <lease/> option for <dhcp/> settings
If an user is trying to configure a dhcp neetwork settings, it is not
possible to change the leasetime of a range or a host entry. This is
available using dnsmasq extra options, but they are associated with
dhcp-range or dhcp-hosts fields. This patch implements a leasetime for
range and hosts tags. They can be defined under that settings:
Signed-off-by: Julio Faracco <jcfaracco@gmail.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Mon, 20 Apr 2020 14:12:03 +0000 (16:12 +0200)]
udevHandleOneDevice: Remove old instance of device on "move"
When a device is "move"-d (this basically means it was renamed),
we add the new device onto our list but keep the old there too.
Fortunately, udev sets this DEVPATH_OLD property which points to
the old device path. We can use it to remove the old instance.
To test this try renaming an interface, for instance:
# ip link set tunl0 name tunl1
# ip link set tunl1 name tunl0
One problem with udev is that it sends old ifname in INTERFACE
property, which creates a problem for us, the property is where
we get the ifname from and use it then to query all kind of info
about the interface. Well, if it is non-existent then we can't
query anything. This happens if ifname rename is suppressed
(net.ifnames=0 on kernel cmd line for instance). Fortunately, we
can use "kernel" source for udev events which has always the
fresh info.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Michal Privoznik [Mon, 20 Apr 2020 13:59:19 +0000 (15:59 +0200)]
node_device_udev: Split udevRemoveOneDevice() into two
Move internals of udevRemoveOneDevice() into a separate function
which accepts sysfs path as an argument and actually removes the
device from the internal list. It will be reused later.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Michal Privoznik [Mon, 20 Apr 2020 13:40:01 +0000 (15:40 +0200)]
udevRemoveOneDevice: Unlock node device obj upon return
When removing a node device object from the internal list the
udevRemoveOneDevice() function does plain unref over the object.
This is not sufficient. If there is another thread that's waiting
for the object lock it will wait forever.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Ján Tomko [Tue, 21 Apr 2020 16:35:59 +0000 (18:35 +0200)]
conf: split out virDomainFeaturesDefParse
The virDomainDefParseXML function has grown so large it broke the build:
../../src/conf/domain_conf.c:20362:1: error: stack frame size of 4168 bytes
in function 'virDomainDefParseXML' [-Werror,-Wframe-larger-than=]
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Andrea Bolognani [Tue, 21 Apr 2020 17:06:16 +0000 (19:06 +0200)]
virsh: Fix return code for dump and migrate
When the job monitoring logic was refactored, these two commands
were not converted properly and the result is that a successful
dump or migration (char '0') would be reported as a failed one
(int 48) instead.
Fixes: dc0771cfa2e78ffecd7c8234538ee548748d7bef Reported-by: Brian Rak <brak@gameservers.com> Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Andrea Bolognani [Mon, 20 Apr 2020 10:49:09 +0000 (12:49 +0200)]
CONTRIBUTING: Include information on build dependencies
libvirt depends on a ton of packages, so trying to install them
all by using the classic approach of repeatedly running configure
and reacting to each failure by installing the corresponding
missing package will inevitably lead to frustration.
Luckily there's an easy solution to get most dependencies
installed in one fell swoop, and we just need to document it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>
Jim Fehlig [Fri, 17 Apr 2020 20:19:16 +0000 (14:19 -0600)]
tests: check conversion of passthrough hypervisor feature
Add a new test to check the 'mode' attribute of the passthrough element
and augment an existing, related test to check enablement of the
passthrough element only.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Jim Fehlig [Wed, 15 Apr 2020 22:34:54 +0000 (16:34 -0600)]
conf: add xen hypervisor feature 'passthrough'
'passthrough' is Xen-Specific guest configuration option new to Xen 4.13
that enables IOMMU mappings for a guest and hence whether it supports PCI
passthrough. The default is disabled. See the xl.cfg(5) man page and
xen.git commit babde47a3fe for more details.
The default state of disabled prevents hotlugging PCI devices. However,
if the guest configuration contains a PCI passthrough device at time of
creation, libxl will automatically enable 'passthrough' and subsequent
hotplugging of PCI devices will also be possible. It is not possible to
unconditionally enable 'passthrough' since it would introduce a migration
incompatibility due to guest ABI change. Instead, introduce another Xen
hypervisor feature that can be used to enable guest PCI passthrough
To allow finer control over how IOMMU maps to guest P2M table, the
passthrough element also supports a 'mode' attribute with values
restricted to snyc_pt and share_pt, similar to xl.cfg(5) 'passthrough'
setting .
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
e820_host is a Xen-specific option, only available for PV domains, that
provides the domain a virtual e820 memory map based on the host one. It
is enabled with a new Xen hypervisor feature, e.g.
e820_host is required when using PCI passthrough and is generally
considered safe for any PV kernel. e820_host is silently ignored if set
in HVM domain configuration. See xl.cfg(5) man page in the Xen
documentation for more details.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com>
The udev monitor thread "udevEventHandleThread()" will lag the
actual/real view of devices in sysfs as it serially processes udev
monitor events. So for instance if you were to run the following cmd
to create a new veth pair and rename one of the veth endpoints
you might see the following monitor events and real world that looks like
time
| create v0 sysfs entry
wake udevEventHandleThread | create v1 sysfs entry
udev_monitor_receive_device(v1-add) | move v0 sysfs to v2
udevHandleOneDevice(v1) |
udev_monitor_receive_device(v0-add) |
udevHandleOneDevice(v0) | <--- error msgs in virNetDevGetLinkInfo()
udev_monitor_receive_device(v2-move) | as v0 no longer exists
udevHandleOneDevice(v2) |
\/
As you can see the changes in sysfs can take place well before we get
to act on the events in the udevEventHandleThread(), so by the time we
get around to processing the v0 add event, the sysfs entry has been
moved to v2.
To work around this we check if the sysfs entry is valid before
attempting to read it and don't bother trying to read link info if
not. This is safe since we will never read sysfs entries earlier than
it existing, ie. if the entry is not there it has either been removed
in the time since we enumerated the device or something bigger is
busted, in either case, no sysfs entry, no link info. In the case
described above we will eventually get the link info as we work
through the queue of monitor events and get to the 'move' event.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Mark Asselstine [Thu, 16 Apr 2020 15:57:45 +0000 (11:57 -0400)]
node_device_udev: handle move events
It is possible and common to rename some devices, this is especially
true for ethernet devices such as veth pairs.
In the udevEventHandleThread() we will be notified of this change but
currently we only process "add", "change" and "remove"
events. Renaming a device such as above results in a "move" event, not
a "remove" followed by and "add" or vise versa. This change will add
the new/destination device to our records but unfortunately there is
no usable mechanism to identify the old/source device to remove it
from the records. So this is admittedly only a partial fix.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Philipp Hahn [Mon, 20 Apr 2020 13:01:11 +0000 (15:01 +0200)]
doc/python: Update to Python 3
Convert the simple example to Python 3 syntax:
- print() is a function
- do not use bare except
- libvirt.open*() does not return None but raises an exception
- APIs related to the graphics adapter are no longer on the
IMachine interface, but on a IGraphicsAdapter interface
- The LaunchVMProcess method takes a list of env variables
instead of a single variable containing a concatenated
list. Since we only ever pass a single env variable, we
can simply stuff it straight into a list.
- The DHCP server start method no longer needs the network
name
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
- The CreatedSharedFolder method now accepts a target mount
point. Since we don't request automount, we're just passing
NULL. We could, however, use this to pass the desired
mount target from the XML config in future.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Long ago we switched the vbox driver to run inside libvirtd to avoid
libvirt.so being polluted with GPLv2-only code. Since libvirtd is not
built on Windows, we disabled vbox on Windows builds. Thus the MSCOM
glue code is not required.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Sun, 19 Apr 2020 05:25:34 +0000 (07:25 +0200)]
virNetDevSwitchdevFeature: Make failure to get 'family_id' non-fatal
I've just got a new machine and I'm still converging on the
kernel config. Anyway, since I don't have enabled any of SRIO-V
drivers, my kernel doesn't have NET_DEVLINK enabled (i.e.
virNetDevGetFamilyId() returns 0). But this makes nodedev driver
ignore all interfaces, because when enumerating all devices via
udev, the control reaches virNetDevSwitchdevFeature() eventually
and subsequently virNetDevGetFamilyId() which 'fails'. Well, it's
not really a failure - the virNetDevSwitchdevFeature() stub
simply returns 0.
Also, move the call a few lines below, just around the place
where it's needed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Michal Privoznik [Sun, 19 Apr 2020 06:26:04 +0000 (08:26 +0200)]
virNetDevGetFamilyId: Change signature
Introduced in v3.8.0-rc1~96, the virNetDevGetFamilyId() gets
netlink family ID for passed family name (even though it's used
only for getting "devlink" ID). Nevertheless, the function
returns 0 on an error or if no family ID was found. This makes it
harder for a caller to distinguish these two. Change the retval
so that a negative value is returned upon error, zero is no ID
found (but no error encountered) and a positive value is returned
on successful translation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
qemu: Label restore path outside of secdriver transactions
As explained in the previous commit, we need to relabel the file
we are restoring the domain from. That is the FD that is passed
to QEMU. If the file is not under /dev then the file inside the
namespace is the very same as the one in the host. And regardless
of using transactions, the file will be relabeled. But, if the
file is under /dev then when using transactions only the copy
inside the namespace is relabeled and the one in the host is not.
But QEMU is reading from the one in the host, actually.
This API allows drivers to separate out handling of @stdin_path
of virSecurityManagerSetAllLabel(). The thing is, the QEMU driver
uses transactions for virSecurityManagerSetAllLabel() which
relabels devices from inside of domain's namespace. This is what
we usually want. Except when resuming domain from a file. The
file is opened before any namespace is set up and the FD is
passed to QEMU to read the migration stream from. Because of
this, the file lives outside of the namespace and if it so
happens that the file is a block device (i.e. it lives under
/dev) its copy will be created in the namespace. But the FD that
is passed to QEMU points to the original living in the host and
not in the namespace. So relabeling the file inside the namespace
helps nothing.
But if we have a separate API for relabeling the restore file
then the QEMU driver can continue calling
virSecurityManagerSetAllLabel() with transactions enabled and
call this new API without transactions.
We already have an API for relabeling a single file
(virSecurityManagerDomainSetPathLabel()) but in case of SELinux
it uses @imagelabel (which allows RW access) and we want to use
@content_context (which allows RO access).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
The daemons are not supported on Win32 and therefore were not compiled
in that platform. However, with the daemon code sharing, all the code in
utils *is* compiled and it failed because `waitpid`, `fork`, and
`setsid` are not available. So, as before, let's not build them on
Win32 and make the code more portable by using existing vir* wrappers.
Not compiling virDaemonForkIntoBackground on Win32 is good, but the
second part of the original patch incorrectly replaced waitpid and fork
with our virProcessWait and virFork APIs. These APIs are more than just
simple wrappers and we don't want any of the extra functionality.
Especially virFork would reset any setup made before
virDaemonForkIntoBackground is called, such as logging, signal handling,
etc.
As a result of the change the additional fix in v6.2.0-67-ga87e4788d2
(util: virdaemon: fix waiting for child processes) is no longer
needed and it is effectively reverted by this commit.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We previously added a hack to symlink CSS files from the source dir into
the build dir, to allow the website to be browsed locally. We should
have also done this for any images.
This change merges several variables into one "$(assets)" so that we
treat all static files in the root dir the same way.
Reviewed-by: Laine Stump <laine@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Thu, 16 Apr 2020 12:18:28 +0000 (14:18 +0200)]
qemuDomainDefPostParse: Fail if unable to fill machine type
Previously, we used virCapabilitiesDomainDataLookup() to fill
machine type in post parse callback if none was provided in the
domain XML. If machine type couldn't be filled in an error was
reported. After 4a4132b4625 we've changed it to
virQEMUCapsGetPreferredMachine() which returns NULL, but we no
longer report an error and proceed with the post parse callbacks
processing. This may lead to a crash because the code later on
assumes def->os.machine is not NULL.
Fixes: 4a4132b4625 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Mores <pmores@redhat.com>
Michal Privoznik [Tue, 14 Apr 2020 09:18:02 +0000 (11:18 +0200)]
qemu: Revoke access to mirror on failed blockcopy
When preparing to do a blockcopy, the mirror image is modified so
that QEMU can access it. For instance, the mirror has seclabels
set, if it is a NVMe disk it is detached from the host and so on.
And usually, the restore is done upon successful finish of the
blockcopy operation. But, if something fails then we need to
explicitly revoke the access to the mirror image (and thus
reattach NVMe disk back to the host).
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1822538 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Mores <pmores@redhat.com>
Andrea Bolognani [Wed, 15 Apr 2020 17:12:46 +0000 (19:12 +0200)]
docs: Remove one example from pci-addresses.rst
The idea behind this document is to show, with actual examples,
that users should not expect PCI addresses in the domain XML and
in the guest OS to match.
The first zPCI example already serves this purpose perfectly, so
in the interest of keeping the page as brief and easy to digest
as possible the second one is removed.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Andrea Bolognani [Wed, 15 Apr 2020 17:11:01 +0000 (19:11 +0200)]
docs: Move sections around in pci-addresses.rst
The section about VFIO devices is kept separate from the rest
because it's less about domain XML and guest OS disagreeing on the
PCI address of a device, and more about which of the two PCI
addresses in the domain XML is even relevant to the guest OS.
The section on zPCI addresses, on the other hand, falls squarely
in the "more complex cases" category, so it should live in the
corresponding section.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Lin Ma [Thu, 16 Apr 2020 04:44:51 +0000 (12:44 +0800)]
qemu: fix hang in p2p + xbzrle compression + parallel migration
When we do parallel migration, The multifd-channels migration parameter
needs to be set on the destination side as well before incoming migration
URI, unless we accept the default number of connections(2).
Usually, This can be correctly handled by libvirtd. But in this case if
we use p2p + xbzrle compression without parameter '--comp-xbzrle-cache',
qemuMigrationParamsDump returns too early, The corresponding migration
parameter will not be set on the destination side, It results QEMU hangs.
Andrea Bolognani [Mon, 30 Mar 2020 16:29:06 +0000 (18:29 +0200)]
gitlab: Enable improved ccache usage
Setting CC="ccache cc" works in most cases, but sometimes it will
break the build: in particular, we have experienced issues in the
past with that approach when using cgo to build our Go bindings.
A more robust approach is to have a directory containing symlinks
from the compiler name to the ccache binary: in that case, ccache
itself will invoke the compiler, and the build system will be none
the wiser.
Since libvirt-ci commit 2563aebb6c5c, container images contain a
suitable symlink directory, so all that's needed to enable the new
approach is to add this directory to $PATH.
Since we're touching this anyway, we make a few more changes:
$CCACHE_DIR is no longer created manually, because ccache will
take care of creating it for us if it doesn't already exist; the
ccache setup is moved out of the job template and into
script_variables, removing unnecessary duplication; a limit is
set on the size of the cache (500 MB, which is twice the amount
used by a fresh build on my Fedora 31 machine).
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Andrea Bolognani [Mon, 30 Mar 2020 16:26:16 +0000 (18:26 +0200)]
gitlab: Don't define $MAKE
Since libvirt-ci commit 27cfddee8835, paths to build tools such as
ninja and make are exported in the container's environment and can
be used directly.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With libpmem support compiled into qemu it will trigger the following
denials on every startup.
apparmor="DENIED" operation="open" name="/"
apparmor="DENIED" operation="open" name="/sys/bus/nd/devices/"
This is due to [1] that tries to auto-detect if the platform supports
auto flush for all region.
Once we know all the paths that are potentially needed if this feature
is really used we can add them conditionally in virt-aa-helper and labelling
calls in case </pmem> is enabled.
But until then the change here silences the denial warnings seen above.
Andrea Bolognani [Tue, 14 Apr 2020 17:37:09 +0000 (19:37 +0200)]
docs: Add pci-addresses.rst
This document describes the relationship between PCI addresses as
seen in the domain XML and by the guest OS, which is a topic that
people get confused by time and time again.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>
Peter Krempa [Thu, 9 Apr 2020 13:50:40 +0000 (15:50 +0200)]
backup: Allow 'encryption' of backups and scratch images
Add the appropriate entries into the schema to allow encryption of the
backup or scratch image. Since we use blockdev internals for everything
no changes to the code are actually necessary.
Peter Krempa [Thu, 9 Apr 2020 13:25:35 +0000 (15:25 +0200)]
virsh: cmdUndefine: Properly extract delete-storage-volume-snapshots flag
Commit 86608f787ee added the above flag as an alias for ambiguous
'delete-snapshots' flag, but forgot to actually change the code that
extracts it, thus the new version actually doesn't work.
Peter Krempa [Tue, 31 Mar 2020 13:43:46 +0000 (15:43 +0200)]
qemu: backup: Fix handling of backing store for backup target images
We always tried to install backing store for the image even if it didn't
make sense, e.g. for a full backup into a raw image. Additionally we
didn't record the backing file into the qcow2 metadata so the image
itself contained the diff of data but reading from it would be
incomplete as it depends on the backing image.
This patch fixes both issues by carefully installing the correct backing
file when appropriate and also recording it into the metadata when
creating the image.
Andrea Bolognani [Tue, 14 Apr 2020 10:59:04 +0000 (12:59 +0200)]
Convert all remaining Markdown files to reStructuredText
We've adopted reStructuredText as the primary markup language for
our documentation and, given that both GitLab and GitHub can render
documents in this format just fine, it makes sense to get rid of
the few last remaining bits of Markdown and standardize on
reStructuredText across the board.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
CONTRIBUTING: Add entry point for new contributors
It's generally expected that a git repository will contain this file,
which serves as an entry point for people interested in contributing
to the project.
In our case, we have extensive documentation available on the
website which we don't want to duplicate, so let's just point people
there.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Laine Stump [Mon, 6 Apr 2020 03:44:16 +0000 (23:44 -0400)]
conf: during PCI hotplug, require that the controller support hotplug
Before this patch we would simply rely on QEMU failing to attach the
device. Since we have a flag in the address set telling us which
controllers support hotplug, we can fail the operation sooner.
This also assures that when hotplugging with no provided PCI address,
that we skip any controllers with hotplug='off', and attempt to assign
the device to a controller that not only supports hotplug, but also
has it enabled.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Mon, 6 Apr 2020 02:57:43 +0000 (22:57 -0400)]
conf: check HOTPLUGGABLE connect flag when validating a PCI address
The HOTPLUGGABLE flag is set for appropriates buses in a PCI address
set, and thnis patch updates virDomainPCIAddressFlagsCompatible() to
check the HOTPLUGGABLE flag when searching for a suitable bus/slot for
a device. No devices request HOTPLUGGABLE though (yet), so there is no
observable effect.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Mon, 6 Apr 2020 02:40:37 +0000 (22:40 -0400)]
qemu/conf: set HOTPLUGGABLE connect flag during PCI address set init
virDomainPCIAddressBusSetModel() is called for each PCI controller
when building an address set prior to assiging PCI addresses to
devices.
This patch adds a new argument, allowHotplug, to that function that
can be set to false if we know for certain that a particular
controller won't support hotplug
The most interesting case is in qemuDomainPCIAddressSetCreate(), where
the config of each existing controller is available while building the
address set, so we can appropriately set allowHotplug = false when the
user has "hotplug='off'" in the config of a controller that normally
would support hotplug. In all other cases, it is set to true or false
in accordance with the capability of the controller model.
So far we aren't doing anything with this bus flag in the address set.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Sun, 5 Apr 2020 22:01:43 +0000 (18:01 -0400)]
conf: simplify logic when checking for AUTOASSIGN PCI addresses
Old behavior: If the address was manually provided by config, copy
device AUTOASSIGN flag into the bus flag, and then later on in the
function *always* check for a match of the flags (which will always
match if the address came from config, since we just copied it).
New behavior: Don't mess with the bus flags - just directly check if
the AUTOASSIGN flag matches in bus and dev, but only make the check if
the address didn't come from config (i.e. it was auto-assigned by
libvirt).
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When the HOTPLUGGABLE flag was originally added, it was set for all
the PCI controllers that accepted hotplugged devices, and requested
for all devices that were auto-assigned to a controller. While we're
still autoassigning to the same list of controllers, those controllers
may or may not support hotplug, so let's use the flag that fits what
we're actually doing.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Mon, 23 Mar 2020 02:32:49 +0000 (22:32 -0400)]
conf: add new PCI_CONNECT flag AUTOASSIGN
This new flag will be set for any controller that we decide can have
devices assigned to it automatically during PCI device assignment. In
the past PCI_CONNECT_TYPE_HOTPLUGGABLE was used for this purpose, but
that is overloading that flag, and no longer technically correct; what
we *really* want is to auto-assign devices to any pcie-root-port or
pcie-switch-downstream-port regardless of whether or not that
controller happens to have hotplug enabled.
This patch just adds the flag, but doesn't use it at all. Note that
the numbering of all the other flags was changed in order to insert
the new flag near the beginning of the list; that doesn't cause any
problem because the connect flags aren't stored anywhere between runs
of libvirtd.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Wed, 4 Mar 2020 03:22:14 +0000 (22:22 -0500)]
qemu: hook up pcie-root-port hotplug='off' option
If a pcie-root-port or pcie-downstream-port has hotplug='off' in its
<target> subelement, and if the qemu binary supports the hotplug=false
option, then it will be added to the commandline for the pcie
controller. This controller will then not allow any hotplug/unplug of
devices while the guest is running (and the hotplug capability won't
be advertised to the guest OS, so the guest OS also won't present
unplugging of PCI devices as an option).
For any PCI controllers other than pcie-downstream-port and
pcie-root-port, of for qemu binaries that don't support the hotplug
commandline option, an error will be logged during validation.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 3 Mar 2020 17:23:52 +0000 (12:23 -0500)]
conf: new attribute "hotplug" for pci controllers
a <controller type='pci'...> element can now have a "hotplug"
attribute in the <target> subelement. This is intended to control
whether or not the slot(s) of the controller support
hotplugging/unplugging a device:
Since support for configuring such an option is hypervisor-dependent
(and will vary among different types of PCI controllers even on a
single hypervisor), no validation is done in this patch - that
validation will be done in the patch that wires support for the
setting into the hypervisor.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Thu, 27 Feb 2020 20:22:59 +0000 (15:22 -0500)]
qemu: new capabilities flag pcie-root-port.hotplug
This caps flag is set when the qemu binary supports the option
"hotplug" for pcie-root-port, ioh3420 (Intel pcie-root-port) and
xio3130-downstream (Intel pcie-downstream-port). If it's available,
it's possible to disable hotplugging/unplugging devices on a
particular port by adding ",hotplug=off" to the qemu device
commandline. This option first appears in qemu-5.0.0.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In a guest with only one vcpu, when pinning the emulator in say CPU184
and the vcpu0 in CPU0 of the host, the user might expect that only
CPU0 and CPU184 of the host will be used by the guest.
The reality is that Libvirt takes some time to honor the emulator
and vcpu pinning, taking care of NUMA constraints first. This will
result in other CPUs of the host being potentially used by the
QEMU thread until the emulator/vcpu pinning is done. The user
then might be confused by the output of 'virsh cpu-stats' in this
scenario, showing around 200 microseconds of cycles being spent
in other CPUs.
Let's document this behavior, which is explained in detail in
Libvirt commit v5.0.0-199-gf136b83139, in the cputune section
of formatdomain.html.in.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Jim Fehlig [Tue, 7 Apr 2020 23:33:26 +0000 (17:33 -0600)]
xenconfig: Add support for max_event_channels
Add support in the domXML<->native config converter for max_event_channels.
The parser and formater functions for max_grant_frames were reworked to
also parse max_event_channels. In doing so the xenbus controller is added
earlier in the config parsing, requiring a small adjustment to one of the
existing tests. Include a new test for the event channel conversion.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jim Fehlig [Tue, 7 Apr 2020 23:15:04 +0000 (17:15 -0600)]
libxl: Add support for max_event_channels
Add support for setting event_channels in libxl domain config object and
include a test to check that it is properly converted from XML to libxl
domain config.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jim Fehlig [Tue, 7 Apr 2020 22:37:09 +0000 (16:37 -0600)]
conf: Add a new xenbus controller option for event channels
Event channels are like PV interrupts and in conjuction with grant frames
form a data transfer mechanism for PV drivers. They are also used for
inter-processor interrupts. Guests with a large number of vcpus and/or
many PV devices many need to increase the maximum default value of 1023.
For this reason the native Xen config format supports the
'max_event_channels' setting. See xl.cfg(5) man page for more details.
Similar to the existing maxGrantFrames option, add a new xenbus controller
option 'maxEventChannels', allowing to adjust the maximum value via libvirt.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>