Peter Krempa [Wed, 1 Mar 2017 08:15:33 +0000 (09:15 +0100)]
qemu: snapshot: Forbid internal snapshots with pflash firmware
If the variable store (<nvram>) file is raw qemu can't do a snapshot of
it and thus the snapshot fails. QEMU rejects such snapshot by a message
which would not be properly interpreted as an error by libvirt.
Additionally allowing to use a qcow2 variable store backing file would
solve this issue but then it would become eligible to become target of
the memory dump.
Offline internal snapshot would be incomplete too with either storage
format since libvirt does not handle the pflash file in this case.
Forbid such snapshot so that we can avoid problems.
Laine Stump [Fri, 24 Mar 2017 00:18:25 +0000 (20:18 -0400)]
network: only check for IPv6 RA routes when the network has an IPv6 address
commit 00d28a78 added a check to see if there were any IPv6 routes
added by RA (Router Advertisement) via an interface that had accept_ra
set to something other than "2". The check was being done
unconditionally, but it's only relevant if IPv6 forwarding is going to
be turned on, and that will only happen if the network has an IPv6
address.
Migration was implemented by QEMU commit:
commit 8cdcf3c1e58d04b6811956d7608efeb66c42d719
Author: Peter Xu <peterx@redhat.com>
Date: Fri Jan 6 12:06:13 2017 +0800
Laine Stump [Thu, 2 Mar 2017 19:55:01 +0000 (14:55 -0500)]
util: new function virNetDevPFGetVF()
Given an SRIOV PF netdev name (e.g. "enp2s0f0") and VF#, this new
function returns the netdev name of the referenced VF device
(e.g. "enp2s11f6"), or NULL if the device isn't bound to a net driver.
Laine Stump [Thu, 9 Mar 2017 19:04:16 +0000 (14:04 -0500)]
util: new internal function to permit silent failure of virNetDevSetMAC()
We will want to allow silent failure of virNetDevSetMAC() in the case
that the SIOSIFHWADDR ioctl fails with errno == EADDRNOTAVAIL. (Yes,
that is very specific, but we really *do* want a logged failure in all
other circumstances, and don't want to duplicate code in the caller
for the other possibilities).
This patch renames the 3 different virNetDevSetMAC() functions to
virNetDevSetMACInternal(), adding a 3rd arg called "quiet" and making
them static (because this extra control will only be needed within
virnetdev.c). A new global virNetDevSetMAC() is defined that calls
whichever of the three *Internal() functions gets compiled with quiet
= false. Callers in virnetdev.c that want to notice a failure with
errno == EADDRNOTAVAIL and retry with a different strategy rather than
immediately failing, can call virNetDevSetMACInternal(..., true).
Laine Stump [Tue, 7 Mar 2017 17:58:15 +0000 (12:58 -0500)]
util: new function virPCIDeviceRebind()
This function unbinds a device from its driver, then immediately
rebinds it to its driver again. The code for this new function is just
the 2nd half of virPCIDeviceBindWithDriverOverride(), so that
function's 2nd half is replaced with a call to virPCIDeviceRebind().
Laine Stump [Fri, 3 Mar 2017 16:54:59 +0000 (11:54 -0500)]
util: change virPCIGetNetName() to not return error if device has no net name
...and cleanup the callers to report it when it *is* an error.
In many cases It's useful for virPCIGetNetName() to not log an error
and simply return a NULL pointer when the given device isn't bound to
a net driver (e.g. we're looking at a VF that is permanently bound to
vfio-pci). The existing code would silently return an error in this
case, which could eventually lead to the dreaded "An error occurred
but the cause is unknown" log message.
This patch changes virPCIGetNetName() to still return success if the
device simply isn't bound to a net driver, and adjusts all the callers
that require a non-null netname to check for that condition and log an
error when it happens.
Laine Stump [Sun, 5 Mar 2017 22:32:15 +0000 (17:32 -0500)]
util: eliminate useless local variable
vf in virNetDevMacVLanDeleteWithVPortProfile() is initialized to -1
and never set. It's not set for a good reason - because it doesn't
make sense during macvtap device setup to refer to a VF device as
"PF:VF#". This patch replaces the two uses of "vf" with "-1", and
removes the local variable, so that it's more clear we are always
calling the utility functions with vf set to -1.
Laine Stump [Mon, 20 Feb 2017 03:06:33 +0000 (22:06 -0500)]
util: remove unused args from virNetDevSetVfConfig()
This function is only called in two places, and the ifindex,
nltarget_kernel, and getPidFunc args are never used (and never will
be).
ifindex - we always know the name of the device, and never know the
ifindex - if we really did need the ifindex we would have to get it
from the name using virNetDevGetIndex(). In practice, we just send -1
to virNetDevSetVfConfig(), which doesn't bother to learn the real
ifindex (you only need a name *or* an ifindex for the netlink command
to succeed, not both).
nltarget_kernel - messages to set the config of an SRIOV VF will
always go to netlink in the kernel, not to another user process, so
this arg is always true (there are other uses of netlink messages
where the message might need to go to another user process, but never
in the case of RTM_SETLINK for SRIOV).
getPidFunc - this arg is only used if nltarget_kernel is false, and it
never is.
None of this has any functional effect, it just makes it easier to
follow what's happening when virNetDevSetVfConfig() is called.
Laine Stump [Fri, 17 Feb 2017 19:28:55 +0000 (14:28 -0500)]
util: permit querying a VF MAC address or VLAN tag by itself
virNetDevParseVfConfig() assumed that both the MAC address and VLAN
tag pointers were valid, so even if you only wanted one or the other,
you would need a variable to hold the returned value for both. This
patch checks each for a NULL pointer before filling it in.
John Ferlan [Wed, 22 Mar 2017 11:58:05 +0000 (07:58 -0400)]
util: Remove NONNULL's for virNetDevVPortProfile[Associate|Disassociate]
The source code will check for NULL arguments for 'macvtap_macaddr' and
'vmuuid', so no need for the NONNULL in the prototypes. Following the stack
for both arguments to virNetDevVPortProfileOpSetLink also shows called
functions would handle a NULL value.
Additionally, modified the prototype to use the same 'macvtap_macaddr'
name as the source code for consistency.
John Ferlan [Tue, 21 Mar 2017 18:32:01 +0000 (14:32 -0400)]
util: Remove NONNULL(1) for virHostdevPrepareDomainDevices
Since the code checks 'mgr == NULL' anyway, no need for the prototype
to have the NONNULL arg check. Also add an error message to indicate what
the failure is so that there isn't a failed for some reason error.
John Ferlan [Tue, 21 Mar 2017 17:37:05 +0000 (13:37 -0400)]
qemu: Remove non null 'vm' check from qemuMonitorOpen
The prototype requires not passing a NULL in the parameter and the callers
all would fail far before this code would fail if 'vm' was NULL, so just
remove the check.
Laine Stump [Tue, 21 Mar 2017 15:24:08 +0000 (11:24 -0400)]
network: reconnect tap devices during networkNotifyActualDevice
If a network is destroyed and restarted, or its bridge is changed, any
tap devices that had previously been connected to the bridge will no
longer be connected. As a first step in automating a reconnection of
all tap devices when this happens, this patch modifies
networkNotifyActualDevice() (which is called once for every
<interface> of every active domain whenever libvirtd is restarted) to
reconnect any tap devices that it finds disconnected.
With this patch in place, you will need to restart libvirtd to
reconnect all the taps to their proper bridges. A future patch will
add a callback that hypervisor drivers can register with the network
driver to that the network driver can trigger this behavior
automatically whenever a network is started.
Laine Stump [Sat, 18 Mar 2017 18:03:20 +0000 (14:03 -0400)]
util: new function virNetDevTapAttachBridge()
This patch splits out the part of virNetDevTapCreateInBridgePort()
that would need to be re-done if an existing tap device had to be
re-attached to a bridge, and puts it into a separate function. This
can be used both when an existing domain interface config is updated
to change its connection, and also to re-attach to the "same" bridge
when a network has been stopped and restarted. So far it is used for
nothing.
Laine Stump [Sat, 18 Mar 2017 00:57:18 +0000 (20:57 -0400)]
util: new function virNetDevGetMaster()
This function provides the bridge/bond device that the given network
device is attached to. The return value is 0 or -1, and the master
device is a char** argument to the function - this is needed in order
to allow for a "success" return from a device that has no master.
Laine Stump [Fri, 17 Mar 2017 21:49:45 +0000 (17:49 -0400)]
util: allow retrieving ethtool features when unprivileged
The only reason that the ethtool features weren't being retrieved in
an unprivileged libvirtd was because they required ioctl(), and the
ioctl was using an AF_PACKET socket, which requires root. Now that we
are using AF_UNIX for ioctl(), this restriction can be removed.
Laine Stump [Fri, 17 Mar 2017 21:33:42 +0000 (17:33 -0400)]
util: use AF_UNIX family (not AF_PACKET) for ioctl sockets
The exact family of the socket created for the fd used by ioctl(7)
doesn't matter, it just needs to be a socket and not a file. But for
some reason when macvtap support was added, it used
AF_PACKET/SOCK_DGRAM sockets for its ioctls; we later used the same
AF_PACKET/SOCK_DGRAM socket for new ioctls we added, and eventually
modified the other pre-existing ioctl sockets (for creating/deleting
bridges) to also use AF_PACKET/SOCK_DGRAM (that code originally used
AF_UNIX/SOCK_STREAM).
The problem with using AF_PACKET (intended for sending/receiving "raw"
packets, i.e. packets that can be some protocol other than TCP or UDP)
is that it requires root privileges. This meant that none of the
ioctls in virnetdev.c or virnetdevip.c would work when running
libvirtd unprivileged.
This packet solves that problem by changing the family to AF_UNIX when
creating the socket used for any ioctl().
Michal Privoznik [Tue, 21 Mar 2017 14:23:35 +0000 (15:23 +0100)]
domain_capabilities: Don't report machine type for bhyve
For some drivers the domain's machine type makes no sense. They
just don't use it. A great example is bhyve driver. Therefore it
makes very less sense to report machine in domain capabilities
XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
network: check accept_ra before enabling ipv6 forwarding
When enabling IPv6 on all interfaces, we may get the host Router
Advertisement routes discarded. To avoid this, the user needs to set
accept_ra to 2 for the interfaces with such routes.
See https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
on this topic.
To avoid user mistakenly losing routes on their hosts, check
accept_ra values before enabling IPv6 forwarding. If a RA route is
detected, but neither the corresponding device nor global accept_ra
is set to 2, the network will fail to start.
virNetlinkCommand() processes only one response message, while some
netlink commands, like route dumping, need to process several.
Add virNetlinkDumpCommand() as a virNetlinkCommand() sister.
util: extract the request sending code from virNetlinkCommand()
Allow to reuse as much as possible from virNetlinkCommand(). This
comment prepares for the introduction of virNetlinkDumpCommand()
only differing by how it handles the responses.
John Ferlan [Tue, 21 Mar 2017 16:43:56 +0000 (12:43 -0400)]
qemu: Fix qemuMonitorOpen prototype
Commit id '85af0b8' added a 'timeout' as the 4th parameter to
qemuMonitorOpen, but neglected to update the ATTRIBUTE_NONNULL(4)
to be (5) for the cb parameter.
that we shouldn't be adding a "no-resolv" to the dnsmasq.conf file for
a network if there isn't any <forwarder> element that specifies an IP
address but no qualifying domain. If there is such an element, it will
handle all DNS requests that weren't otherwise handled by one of the
forwarder entries with a matching domain attribute. If not, then DNS
requests that don't match the domain of any <forwarder> would not be
resolved if we added no-resolv.
So, only add "no-resolv" when there is at least one <forwarder>
element that specifies an IP address but no qualifying domain.
John Ferlan [Tue, 7 Mar 2017 20:44:41 +0000 (15:44 -0500)]
conf: Alter coding style of storage conf function prototypes
In an effort to be consistent with the source module, alter the function
prototypes to follow the similar style of source with the "type" on one
line followed by the function name and arguments on subsequent lines with
with argument getting it's own line.
John Ferlan [Tue, 7 Mar 2017 20:28:27 +0000 (15:28 -0500)]
conf: Adjust coding style for storage conf sources
Alter the format of the code to follow more recent style guidelines of
two empty lines between functions, function decls with "[static] type"
on one line followed by function name with arguments to functions each
on one line.
Peter Krempa [Fri, 17 Mar 2017 15:38:47 +0000 (16:38 +0100)]
(log|lock)daemon: Don't spam logs with IO error messages after client disconnects
The log and lock protocol don't have an extra handshake to close the
connection. Instead they just close the socket. Unfortunately that
resulted into a lot of spurious garbage logged to the system log files:
2017-03-17 14:00:09.730+0000: 4714: error : virNetSocketReadWire:1800 : End of file while reading data: Input/output error
or in the journal as:
Mar 13 16:19:33 xxxx virtlogd[32360]: End of file while reading data: Input/output error
Use the new facility in the netserverclient to suppress the IO error
report from the virNetSocket layer.
Jiri Denemark [Mon, 13 Mar 2017 11:32:02 +0000 (12:32 +0100)]
qemu: Update CPU definition according to QEMU
When starting a domain with custom guest CPU specification QEMU may add
or remove some CPU features. There are several reasons for this, e.g.,
QEMU/KVM does not support some requested features or the definition of
the requested CPU model in libvirt's cpu_map.xml differs from the one
QEMU is using. We can't really avoid this because CPU models are allowed
to change with machine types and libvirt doesn't know (and probably
doesn't even want to know) about such changes.
Thus when we want to make sure guest ABI doesn't change when a domain
gets migrated to another host, we need to update our live CPU definition
according to the CPU QEMU created. Once updated, we will change CPU
checking to VIR_CPU_CHECK_FULL to make sure the virtual CPU created
after migration exactly matches the one on the source.
Jiri Denemark [Fri, 10 Mar 2017 22:55:59 +0000 (23:55 +0100)]
qemu: Refactor Hyper-V features check
The checks are now in a dedicated qemuProcessVerifyHypervFeatures
function.
In addition to moving the code this patch also fixes a few bugs: the
original code was leaking cpuFeature and the return value of
virCPUDataCheckFeature was not checked properly.
Jiri Denemark [Wed, 1 Mar 2017 14:18:22 +0000 (15:18 +0100)]
Introduce /domain/cpu/@check XML attribute
The attribute can be used to request a specific way of checking whether
the virtual CPU matches created by the hypervisor matches the
specification in domain XML.
The disk tuning group parameter is ignored by qemu if no other
throttling options are set. Reject such configuration, since the name
would not be honored after setting parameters via the live tuning API.
Peter Krempa [Fri, 17 Mar 2017 08:23:54 +0000 (09:23 +0100)]
qemu: command: Extract tests for subsets of blkdeviotune settings
When checking capabilities for qemu we need to check whether subsets of
the disk throttling settings are supported. Extract the checks into a
separate functions as they will be reused in next patch.
Peter Krempa [Fri, 17 Mar 2017 07:43:27 +0000 (08:43 +0100)]
qemu: Don't steal pointers from 'persistentDef' in qemuDomainGetBlockIoTune
While the code path that queries the monitor allocates a separate copy
of the 'group_name' string the path querying the config would not copy
it. The call to virTypedParameterAssign would then steal the pointer
(without clearing it) and the RPC layer freed it. Any subsequent call
resulted into a crash.
Nitesh Konkar [Thu, 16 Mar 2017 11:55:15 +0000 (17:25 +0530)]
perf: remote: Compare perf nparams against the correct constant
Currently 'virsh perf domain' errors out as the perf nparams is
incorrectly compared against REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX
instead of REMOTE_DOMAIN_PERF_EVENTS_MAX.
Andrea Bolognani [Thu, 16 Mar 2017 16:41:21 +0000 (17:41 +0100)]
tests: Test generic PCIe Root Ports
We want pcie-root-ports to be used when available in QEMU,
but at the same time we need to ensure that hosts running
older QEMU releases keep working and that the user can
override the default at any time.
Add a comment for the original pcie-root-port test cases
to make it clear how these new test cases are different.
Andrea Bolognani [Tue, 14 Mar 2017 13:42:51 +0000 (14:42 +0100)]
qemu: Use generic PCIe Root Ports by default when available
ioh3420 is emulated Intel hardware, so it always looked
quite out of place in aarch64/virt guests. Even for x86/q35
guests, the recently-introduced pcie-root-port is a better
choice because, unlike ioh3420, it doesn't require IO space
(a fairly constrained resource) to work.
If pcie-root-port is available in QEMU, use it; ioh3420 is
still used as fallback for when pcie-root-port is not
available.
Short circuit SASL auth when no mechanisms are available
If the SASL config does not have any mechanisms we currently
just report an empty list to the client which will then
fail to identify a usable mechanism. This is a server config
error, so we should fail immediately on the server side.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When providing explicit x509 cert/key paths in libvirtd.conf,
the user must provide all three. If one or more is missed,
this leads to obscure errors at runtime when negotiating
the TLS session
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Linux still defaults to a 1024 open file handle limit. This causes
scalability problems for libvirtd / virtlockd / virtlogd on large
hosts which might want > 1024 guest to be running. In fact if each
guest needs > 1 FD, we can't even get to 500 guests. This is not
good enough when we see machines with 100's of physical cores and
TBs of RAM.
In comparison to other memory requirements of libvirtd & related
daemons, the resource usage associated with open file handles
is essentially line noise. It is thus reasonable to increase the
limits unconditionally for all installs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Michal Privoznik [Sat, 11 Mar 2017 06:23:42 +0000 (07:23 +0100)]
qemu: Adaptive timeout for connecting to monitor
There were couple of reports on the list (e.g. [1]) that guests
with huge amounts of RAM are unable to start because libvirt
kills qemu in the initialization phase. The problem is that if
guest is configured to use hugepages kernel has to zero them all
out before handing over to qemu process. For instance, 402GiB
worth of 1GiB pages took around 105 seconds (~3.8GiB/s). Since we
do not want to make the timeout for connecting to monitor
configurable, we have to teach libvirt to count with this
fact. This commit implements "1s per each 1GiB of RAM" approach
as suggested here [2].
Michal Privoznik [Mon, 13 Mar 2017 10:05:08 +0000 (11:05 +0100)]
virTimeBackOffWait: Avoid long periods of sleep
While connecting to qemu monitor, the first thing we do is wait
for it to show up. However, we are doing it with some timeout to
avoid indefinite waits (e.g. when qemu doesn't create the monitor
socket at all). After beaa447a29 we are using exponential back
off timeout meaning, after the first connection attempt we wait
1ms, then 2ms, then 4 and so on. This allows us to bring down
wait time for small domains where qemu initializes quickly.
However, on the other end of this scale are some domains with
huge amounts of guest memory. Now imagine that we've gotten up to
wait time of 15 seconds. The next one is going to be 30 seconds,
and the one after that whole minute. Well, okay - with current
code we are not going to wait longer than 30 seconds in total,
but this is going to change in the next commit.
The exponential back off is usable only for first few iterations.
Then it needs to be caped (one second was chosen as the limit)
and switch to constant wait time.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Wed, 15 Mar 2017 19:09:35 +0000 (15:09 -0400)]
test: Don't assume a configFile exists for Storage Pool tests
Fix a "bug" in the storage pool test driver code which "assumed"
testStoragePoolObjSetDefaults should fill in the configFile for
both the Define/Create (persistent) and CreateXML (transient) pools
by just VIR_FREE()'ing it during CreateXML. Because the configFile
was filled in, during Destroy the pool wouldn't be free'd which
could cause issues for future patches which add tests to validate
vHBA creation for the storage pool using the same name.
John Ferlan [Wed, 15 Mar 2017 19:07:21 +0000 (15:07 -0400)]
conf: Alter error message for vHBA creation using parent wwnn/wwpn
Commit id 'bb74a7ffe' added a fairly non specific message when providing
only the <parent wwnn='xxx'/> or <parent wwpn='xxx'/> instead of providing
both wwnn and wwpn. This patch just modifies the message to be more specific
about which was missing.
John Ferlan [Fri, 27 Jan 2017 23:50:57 +0000 (18:50 -0500)]
conf: Return the vHBA name from virNodeDeviceCreateVport
Rather than returning true/false and having the caller check if the
vHBA was actually created, let's do that check within the CreateVport
function. That way the caller can faithfully assume success based
on a name start the thread looking for the LUNs. Prior to this change
it's possible that the vHBA wasn't really created (e.g if the call to
virVHBAGetHostByWWN returned NULL), we'd claim success, but in reality
there'd be no vHBA for the pool. This also fixes a second yet seen
issue that if the nodedev was present, but the parent by name wasn't
provided (perhaps parent by wwnn/wwpn or by fabric_name), then a failure
would be returned. For this path it shouldn't be an error - we should
just be happy that something else is managing the device and we don't
have to create/delete it.
The end result is that the createVport code can now just start the
refresh thread once it gets a non NULL name back.
John Ferlan [Mon, 20 Feb 2017 12:00:51 +0000 (07:00 -0500)]
util: Rename virFileWaitForDevices
The function is actually in virutil.c, but prototyped in virfile.h.
This patch fixes that by renaming the function to virWaitForDevices,
adding the prototype in virutil.h and libvirt_private.syms, and then
changing the callers to use the new name.