Peter Krempa [Wed, 16 Aug 2017 13:44:35 +0000 (15:44 +0200)]
qemu: capabilities: Tolerate missing @qemuCaps in virQEMUCapsSupportsGICVersion
Report the given GIC version as unsupported if @qemuCapsi is NULL. This
will be helpful to run post parse callbacks even if qemu is not
currently installed.
Peter Krempa [Tue, 15 Aug 2017 16:41:59 +0000 (18:41 +0200)]
conf: add infrastructure for tolerating certain post parse callback failures
Some failures of the post parse callback can be tolerated. This is
specifically desired when loading the configs of existing VMs. In such
case the post parse callback should not really be modifying anything
in the definition.
This patch adds a parse flag VIR_DOMAIN_DEF_PARSE_ALLOW_POST_PARSE_FAIL
which will allow the callbacks to report non-fatal failures by returning
a positive return value. In such case the field 'postParseFailed' in the
domain definition is set to true, to notify the drivers that the
callback failed and possibly needs to be re-run.
Peter Krempa [Tue, 15 Aug 2017 16:09:32 +0000 (18:09 +0200)]
conf: Return any non-zero value from virDomainDeviceInfoIterateInternal callback
Post parse callbacks will need to be able to signal that they failed
non-fatally. This means that we need to return the value returned by the
callback without modification.
Peter Krempa [Tue, 15 Aug 2017 13:25:23 +0000 (15:25 +0200)]
qemu: domain: Don't re-allocate qemuCaps in post parse callbacks
The domain post parse callback, domain address callback and the domain
device callback (for every single device) would each grab qemuCaps for
the current emulator. This is quite wasteful. Use the new callback to do
this just once.
Peter Krempa [Tue, 15 Aug 2017 13:18:51 +0000 (15:18 +0200)]
conf: Add callbacks that allocate per-def private data
Some drivers use def-specific private data across callbacks (e.g.
qemuCaps in the qemu driver). Currently it's mostly allocated in every
single callback. This is rather wasteful, given that every single call
to the device callback allocates it.
The new callback will allocate the data (if not provided externally) and
then use it for the VM, address and device post parse callbacks.
Peter Krempa [Tue, 15 Aug 2017 13:11:45 +0000 (15:11 +0200)]
conf: Add 'basic' post parse callback
Add yet another post parse callback, which is executed prior the real
one without @parseOpaque. This is meant to set basics before
@parseOpaque (in case of the qemu driver qemuCaps) can be allocated.
This callback will allow to optimize passing of custom parseOpaque
through the callbacks.
Erik Skultety [Wed, 28 Jun 2017 13:39:51 +0000 (15:39 +0200)]
nodedev: udev: Remove the udevEventHandleCallback on fatal error
So we have a sanity check for the udev monitor fd. Theoretically, it
could happen that the udev monitor fd changes (due to our own wrongdoing,
hence the 'sanity' here) and if that happens it means we are handling an
event from a different entity than we think, thus we should remove the
handle if someone somewhere somehow hits this hypothetical case.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Erik Skultety [Tue, 20 Jun 2017 14:50:26 +0000 (16:50 +0200)]
nodedev: mdev: Report an error when mdev path resolution fails
It might happen that virFileResolveLinkHelper fails on the lstat system
call. virFileResolveLink expects the caller to report an error when it
fails, however this wasn't the case for udevProcessMediatedDevice.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
There is an apparmor deny due to qemu now locking those files:
apparmor="DENIED" operation="file_lock" [...]
name="/home/ubuntu/vm-start-stop/vms/7936-0_CODE.fd"
name="/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow"
[...] comm="qemu-system-aarch64" requested_mask="k" denied_mask="k"
The profile needs to allow locking for loader and nvram files via
the locking (k) rule.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
With that qemu change in place the rules generated for the image
and backing files need the allowance to also lock (k) the files.
Disks are added via add_file_path and with this fix rules now get
that permission, but no other rules are changed, example:
- "/var/lib/uvtool/libvirt/images/kvmguest-artful-normal-a2.qcow" rw,
+ "/var/lib/uvtool/libvirt/images/kvmguest-artful-normal-a2.qcow" rwk
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Pavel Hrdina [Tue, 15 Aug 2017 13:20:55 +0000 (15:20 +0200)]
util: introduce virXMLPropStringLimit
The virXMLPropStringLimit is an equivalent of virXPathStringLimit
which should be preferred if you already have a XML dom node or
if you need to parse more than one property.
Michal Privoznik [Wed, 16 Aug 2017 10:55:03 +0000 (12:55 +0200)]
network: Use self inflating bitmap for class IDs
Back in the day when I was implementing QoS for networks there
were no self inflating virBitmaps. Only the static ones.
Therefore, I had to allocate the whole 8KB of memory in order to
keep track of used/unused class IDs. This is rather wasteful
because nobody is ever gonna use that much classes (kernel
overhead would drastically lower the bandwidth). Anyway, now that
we have self inflating bitmaps we can start small and allocate
more if there's need for it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
There's some specific logic in qemuBuildCpuCommandLine to support
auto adding -cpu qemu 32 for arch=i686 with an x86_64 qemu binary.
Add a test case for it
John Ferlan [Wed, 26 Jul 2017 14:18:39 +0000 (10:18 -0400)]
network: Use @maxnames instead of @nnames
To be consistent with the API definition, use the @maxnames instead
of @nnames when describing/comparing against the maximum names to
be provided for the *ConnectList[Defined]Networks APIs.
John Ferlan [Wed, 26 Jul 2017 13:43:12 +0000 (09:43 -0400)]
network: Move virObjectRef during AssignDef processing
Move the virObjectRef in virNetworkObjAssignDefLocked to after
the virHashAddEntry to make it "clearer" why the @ref is being
incremented. Upon return from the ObjNew we will have 1 ref on
the object already, adding it to the hash table requires the
increment.
John Ferlan [Wed, 10 May 2017 11:29:57 +0000 (07:29 -0400)]
network: Introduce virNetworkObjIsPersistent
In preparation to privatize the virNetworkObj - create an accessor function
to get the current @persistent value. Also change the value to a bool rather
than an unsigned int (since that's how it's generated anyway).
John Ferlan [Wed, 26 Jul 2017 11:42:51 +0000 (07:42 -0400)]
network: Unconditionally initialize macmap when stopping virtual network
Since we can only ever have one reference to obj->macmap, rather
than only clearing obj->macmap during virNetworkObjUnrefMacMap
(e.g. virtual network from networkShutdownNetwork), let's just
unconditionally clear the obj->macmap to ensure that some future
change that created it's own reference to obj->macmap wouldn't
have that reference disappear if virNetworkObjDispose got called.
John Ferlan [Tue, 9 May 2017 20:51:05 +0000 (16:51 -0400)]
network: Move macmap mgmt from bridge_driver to virnetworkobj
In preparation for having a private virNetworkObj - let's create/move some
API's that handle the obj->macmap. The API's will be renamed to have a
virNetworkObj prefix to follow conventions and the arguments slightly
modified to accept what's necessary to complete their task.
John Ferlan [Wed, 26 Jul 2017 10:59:19 +0000 (06:59 -0400)]
network: Move and rename networkMacMgrFileName
Move networkMacMgrFileName into src/util/virmacmap.c and rename to
virMacMapFileName. We're about to move some more MacMgr processing
files into virnetworkobj and it doesn't make sense to have this helper
in the driver or in virnetworkobj.
John Ferlan [Fri, 21 Jul 2017 21:25:57 +0000 (17:25 -0400)]
qemu: Fix bug assuming usage of default UUID for certificate passphrase
If an environment specific _tls_x509_cert_dir is provided, then
do not VIR_STRDUP the defaultTLSx509secretUUID as that would be
for the "default" environment and not the vnc, spice, chardev, or
migrate environments. If the environment needs a secret to decode
it's certificate, then it must provide the secret. If the secrets
happen to be the same, then configuration would use the same UUID
as the default (but we cannot assume that nor can we assume that
the secret would be necessary).
John Ferlan [Fri, 31 Mar 2017 15:35:05 +0000 (11:35 -0400)]
util: Add object checking for virObject{Ref|Unref}
Rather than assuming that what's passed to virObject{Ref|Unref}
would be a virObjectPtr as long as it's not NULL, let's do the
similar checks virObjectIsClass in order to prevent a possible
increment or decrement to some field at the obj->u.s.refs offset.
John Ferlan [Mon, 31 Jul 2017 22:58:55 +0000 (18:58 -0400)]
util: Add magic number check for object validity
The virObjectIsClass API has only ever checked object validity
based on if the @obj is not NULL and it was derived from some class.
While this has worked well in general, there is one additional
check that could be made prior to calling virClassIsDerivedFrom
which loops through the classes checking the magic number against
the klass expected magic number.
If by chance a non virObject is passed, rather than assuming the
void * @obj is a _virObject and thus offsetting to obj->klass,
obj->magic, and obj->parent, let's check that the void * @obj
has at least the "base part" of the magic number in the right
place and generate a more specific VIR_WARN message if not.
There are many consumers to virObjectIsClass, include the locking
primitives virObject{Lock|Unlock}, virObjectRWLock{Read|Write},
and virObjectRWUnlock. For those callers, the locking call will
not fail, but it also will not attempt a virMutex* call which
will "most likely" fail since the &obj->lock is used.
In order to avoid some possible future wrap on the 0xCAFExxxx
value, add a check during initialization that some new class
won't cause the wrap. Should be good for a few years at least!
It is still left up to the caller to handle the failed API calls
just as it would be if it passed a NULL opaque pointer anyobj.
John Ferlan [Fri, 28 Jul 2017 15:09:31 +0000 (11:09 -0400)]
util: Create common error path for invalid object
If virObjectIsClass fails "internally" to virobject.c, create a
macro to generate the VIR_WARN describing what the problem is.
Also improve the checks and message a bit to indicate which was
the failure - whether the obj was NULL or just not the right class
John Ferlan [Fri, 28 Jul 2017 16:03:50 +0000 (12:03 -0400)]
util: Introduce and use virObjectRWUnlock
Rather than overload virObjectUnlock as commit id '77f4593b' has
done, create a separate virObjectRWUnlock API that will force the
consumers to make the proper decision regarding unlocking the
RWLock's. Similar to the RWLockRead and RWLockWrite, use the
virObjectGetRWLockableObj helper. This restores the virObjectUnlock
code to using the virObjectGetLockableObj.
John Ferlan [Fri, 28 Jul 2017 14:27:11 +0000 (10:27 -0400)]
util: Introduce virObjectGetRWLockableObj
Introduce a helper to handle the error path more cleanly. The same
as virObjectGetLockableObj in order to essentially follow the original
logic of commit 'b545f65d' to ensure that the input argument at least
has some validity before using.
John Ferlan [Fri, 28 Jul 2017 14:14:40 +0000 (10:14 -0400)]
util: Only have virObjectLock handle virObjectLockable
Now that virObjectRWLockWrite exists to handle the virObjectRWLockable
objects, let's restore virObjectLock to only handle virObjectLockable
class locks. There still exists the possibility that the input @anyobj
isn't a valid object and the resource isn't truly locked, but that
also exists before commit id '77f4593b'.
This also restores some logic that commit id '77f4593b' removed
with respect to a common code path that commit id '10c2bb2b' had
introduced as virObjectGetLockableObj. This code path merely does
the same checks as the original virObjectLock commit 'b545f65d',
but in callable/reusable helper to ensure the @obj at least has
some validity before using.
John Ferlan [Fri, 28 Jul 2017 14:06:55 +0000 (10:06 -0400)]
util: Introduce and use virObjectRWLockWrite
Instead of making virObjectLock be the entry point for two
different types of locks, let's create a virObjectRWLockWrite API
which will only handle the virObjectRWLockableClass objects.
Use the new virObjectRWLockWrite for the virdomainobjlist code
in order to handle the Add, Remove, Rename, and Load operations
that need to be very synchronous.
John Ferlan [Fri, 28 Jul 2017 13:57:04 +0000 (09:57 -0400)]
util: Rename virObjectLockRead to virObjectRWLockRead
Since the class it represents is based on virObjectRWLockableClass
and in order to make sure we differentiate just in case anyone somehow
believes they could use virObjectLockRead for a virObjectLockableClass,
let's rename the API to use the RW in the name. Besides the RW locks
refer to pthread_rwlock_{init|rdlock|wrlock|unlock|destroy} while the
other locks refer to pthread_mutex_{init|lock|unlock|destroy}.
Laine Stump [Sun, 13 Aug 2017 15:32:31 +0000 (11:32 -0400)]
util: eliminate superfluous saveVlan check in virNetDevSetNetConfig()
Commit 81fb440b further qualified an if statement by adding the
boolean saveVlan to the condition. Coverity pointed out that this
change in the logic eliminated the need to check saveVlan in an
argument to virAsprintf().
Laine Stump [Sun, 13 Aug 2017 15:19:27 +0000 (11:19 -0400)]
util: fix improper assignment of return value in virHostdevReadNetConfig()
Commit 9a94af6d restructured virHostdevReadNetConfig() so that it
would manually set ret = 0 after successfully reading the device's
config, but Coverity pointed out that "ret = 0" was erroneously placed
outside of an "else" clause, meaning that the the value of ret set in
the "if" clause was unnecessarily and incorrectly overwritten.
This patch moves ret = 0 into the else clause, which should silence
Coverity.
Laine Stump [Thu, 10 Aug 2017 19:46:38 +0000 (15:46 -0400)]
util: check for PF online status earlier in guest startup
When using a VF from an SRIOV-capable network card in a guest (either
in macvtap passthrough mode, or via VFIO PCI device assignment), The
associated PF netdev must be online in order for the VF to be usable
by the guest. The guest, however, is not able to change the state of
the PF. And libvirt *could* set the PF online as needed, but that
could lead to the host receiving unexpected IPv6 traffic (since the
default for an unconfigured interface is to participate in IPv6
autoconf). For this reason, before assigning a VF to a guest, libvirt
verifies that the related PF netdev is online - if it isn't, then we
log an error and don't allow the guest startup to continue.
Until now, this check was done during virNetDevSetNetConfig(). This
works nicely because the same function is called both for macvtap
passthrough and for VFIO device assignment. But in the case of VFIO,
the VF has already been unbound from its netdev driver by the time we
get to virNetDevSetNetConfig(), and in the case of dual port Mellanox
NICs that have their VFs setup in single port mode, the *only* way to
determine the proper PF netdev to query for online status is via the
"phys_port_id" file that is in the VF netdev's sysfs directory. *BUT*
if we've unbound the VF from the netdev driver, then it doesn't *have*
a netdev sysfs directory.
So, in order to check the correct PF netdev for online status, this
patch moved the check earlier in the setup, into
virNetDevSaveNetConfig(), which is called *before* unbinding the VF
from its netdev driver.
(Note that this implies that if you are using VFIO device assignment
for the VFs of a Mellanox NIC that has the VFs programmed in single
port mode, you must let the VFs be bound to their net driver and use
"managed='yes'" in the device definition. To be more specific, this is
only true if the VFs in single port mode are using port *2* of the PF
- if the VFs are using only port 1, then the correct PF netdev will be
arrived at by default/chance))
Laine Stump [Thu, 10 Aug 2017 00:49:26 +0000 (20:49 -0400)]
util: restructure virNetDevReadNetConfig() to eliminate false error logs
virHostdevRestoreNetConfig() calls virNetDevReadNetConfig() to try and
read the "original config" of a netdev, and if that fails, it tries
again with a different directory/netdev name. This achieves the
desired effect (we end up finding the config wherever it may be), but
for each failure, virNetDevReadNetConfig() places a nice error message
in the system logs. Experience has shown that false-positive error
logs like this lead to erroneous bug reports, and can often mislead
those searching for *real* bugs.
This patch changes virNetDevReadNetConfig() to explicitly check if the
file exists before calling virFileReadAll(); if it doesn't exist,
virNetDevReadNetConfig() returns a success, but leaves all the
variables holding the results as NULL. (This makes sense if you define
the purpose of the function as "read a netdev's config from its config
file *if that file exists*).
To take advantage of that change, the caller,
virHostdevRestoreNetConfig() is modified to fail immediately if
virNetDevReadNetConfig() returns an error, and otherwise to try the
different directory/netdev name if adminMAC & vlan & MAC are all NULL
after the preceding attempt.
Laine Stump [Tue, 8 Aug 2017 00:25:57 +0000 (20:25 -0400)]
util: save the correct VF's info when using a dual port SRIOV NIC in single port mode
Mellanox ConnectX-3 dual port SRIOV NICs present a bit of a challenge
when assigning one of their VFs to a guest using VFIO device
assignment.
These NICs have only a single PCI PF device, and that single PF has
two netdevs sharing the single PCI address - one for port 1 and one
for port 2. When a VF is created it can also have 2 netdevs, or it can
be setup in "single port" mode, where the VF has only a single netdev,
and that netdev is connected either to port 1 or to port 2.
When the VF is created in dual port mode, you get/set the MAC
address/vlan tag for the port 1 VF by sending a netlink message to the
PF's port1 netdev, and you get/set the MAC address/vlan tag for the
port 2 VF by sending a netlink message to the PF's port 2 netdev. (Of
course libvirt doesn't have any way to describe MAC/vlan info for 2
ports in a single hostdev interface, so that's a bit of a moot point)
When the VF is created in single port mode, you can *set* the MAC/vlan
info by sending a netlink message to *either* PF netdev - the driver
is smart enough to understand that there's only a single netdev, and
set the MAC/vlan for that netdev. When you want to *get* it, however,
the driver is more accurate - it will return 00:00:00:00:00:00 for the
MAC if you request it from the port 1 PF netdev when the VF was
configured to be single port on port 2, or if you request if from the
port 2 PF netdev when the VF was configured to be single port on port
1.
Based on this information, when *getting* the MAC/vlan info (to save
the original setting prior to assignment), we determine the correct PF
netdev by matching phys_port_id between VF and PF.
(IMPORTANT NOTE: this implies that to do PCI device assignment of the
VFs on dual port Mellanox cards using <interface type='hostdev'>
(i.e. if you want the MAC address/vlan tag to be set), not only must
the VFs be configured in single port mode, but also the VFs *must* be
bound to the host VF net driver, and libvirt must use managed='yes')
By the time libvirt is ready to set the new MAC/vlan tag, the VF has
already been unbound from the host net driver and bound to
vfio-pci. This isn't problematic though because, as stated earlier,
when a VF is created in single port mode, commands to configure it can
be sent to either the port 1 PF netdev or the port 2 PF netdev.
When it is time to restore the original MAC/vlan tag, again the VF
will *not* be bound to a host net driver, so it won't be possible to
learn from sysfs whether to use the port 1 or port 2 PF netdev for the
netlink commands. And again, it doesn't matter which netdev you
use. However, we must keep in mind that we saved the original settings
to a file called "${PF}_${VFNUM}". To solve this problem, we just
check for the existence of ${PF1}_${VFNUM} and ${PF2}_${VFNUM}, and
use whichever one we find (since we know that only one can be there)
Laine Stump [Mon, 31 Jul 2017 05:55:49 +0000 (01:55 -0400)]
util: match phys_port_id when converting PF-netdev to/from VF-netdev
This patch updates functions in netdev.c to pay attention to
phys_port_id. It uses the new function virNetDevGetPhysPortID() to
learn the phys_port_id of a VF or PF, then sends that info to
virPCIGetNetName(), which has newly been modified to take an optional
phys_port_id.
Laine Stump [Mon, 31 Jul 2017 04:21:45 +0000 (00:21 -0400)]
util: make virPCIGetNetName() more versatile
A single PCI device may have multiple netdevs associated with it. Each
of those netdevs will have a different phys_port_id entry in
sysfs. This patch modifies virPCIGetNetName() to allow selecting one
of the potential many netdevs in two different ways:
1) by setting the "idx" argument, the caller can select the 1st (0),
2nd (1), etc. netdev from the PCI device's net subdirectory.
2) If the physPortID arg is set (to a null-terminated string) then
virPCIGetNetName() returns the netdev that has that phys_port_id in
the sysfs file of the same name in the netdev's directory.
Laine Stump [Mon, 31 Jul 2017 03:32:43 +0000 (23:32 -0400)]
util: new function virNetDevGetPhysPortID()
On Linux each network device *can* (but not necessarily *does*) have
an attribute called phys_port_id which can be read from the file of
that name in the netdev's sysfs directory. The examples I've seen have
been a many-digit hexadecimal number (as an ASCII string).
This value can be useful when a single PCI device is associated with
multiple netdevs (e.g a dual port Mellanox SR-IOV NIC - this card has
a single PCI Physical Function (PF), and that PF has two netdevs
associated with it (the "net" subdirectory of the PF in sysfs has two
links rather than the usual single link to a netdev directory). Each
of the PF netdevs has a different phys_port_id. The Virtual Functions
(VF) are similar - the PF (a PCI device) has "n" VFs (also each of
these is a PCI device), each VF has two netdevs, and each of the VF
netdevs points back to the VF PCI device (with the "device" entry in
its sysfs directory) as well as having a phys_port_id matching the PF
netdev it is associated with.
virNetDevGetPhysPortID() simply attempts to read the phys_port_id for
the given netdev and return it to the caller. If this particular
netdev driver doesn't support phys_port_id, it returns NULL (*not* a
NULL-terminated string, but a NULL pointer) but still counts it as a
success.
This code is so complicated because we allow enabling the same
bits at many places. Just like in this case: huge pages can be
enabled by global <hugepages/> element under <memoryBacking> or
on per <memory/> basis. To complicate things a bit more, users
are allowed to omit the page size which case the default page
size is used. And this is what is causing this bug. If no page
size is specified, @pagesize is keeping value of zero throughout
whole function. Therefore we need yet another boolean to hold
[use, don't use] information as we can't sue @pagesize for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
So the hostdev manager has some lists to keep track which devices
are active (=assigned to a domain) or inactive. The manager and
its lists are allocated in myInit and freed in myCleanup but one
of them (activeSCSIHostdevs) was missing. Also, the order in
which the cleanup was done doesn't make it easy to spot it,
therefore reoder it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently, there's a bug when undefining a domain with NVRAM
store. Basically, the unlink() of the NVRAM store file happens
during the undefine procedure iff domain is inactive. So, if
domain is running and undefine is called the file is left behind.
It won't be removed in the domain cleanup process either
(qemuProcessStop). One of the solutions is to remove if
regardless of the domain state and rely on qemu having the file
opened. This still has a downside that if the domain is defined
back the NVRAM store file is going to be new, any changes to the
current one are lost (just like with any other file that is
deleted while a process has it opened). But is it really a
downside?
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Andrea Bolognani [Tue, 25 Jul 2017 07:34:53 +0000 (09:34 +0200)]
docs: Add "PCI topology and hotplug" guidelines
For all machine types except i440fx, making a guest hotplug
capable requires some sort of planning. Add some information
to help users make educated choices when defining the PCI
topology of guests.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
If there's no content in <script></script>, the XSTL generator
will turn it into <script/> which is not permitted in XHTML.
Adding a single whitespace is enough to guarantee an explicit
closing tag. Without this, the scripts never get loaded by
the browser.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>