When assigning an SRIOV virtual function to a guest using "intelligent
PCI passthrough" (<interface type='hostdev'>, which sets the MAC
address and vlan tag of the VF before passing its info to qemu),
libvirt first learns the current MAC address and vlan tag by sending
an NLM_F_REQUEST message for the VF's PF (physical function) to the
kernel via a NETLINK_ROUTE socket (see virNetDevLinkDump()); the
response message's IFLA_VFINFO_LIST section is examined to extract the
info for the particular VF being assigned.
This worked fine with kernels up until kernel commit 115c9b81928360d769a76c632bae62d15206a94a (first appearing in upstream
kernel 3.3) which changed the ABI to not return IFLA_VFINFO_LIST in
the response until a newly introduced IFLA_EXT_MASK field was included
in the request, with the (newly introduced, of course) RTEXT_FILTER_VF
flag set.
The justification for this ABI change was that new fields had been
added to the VFINFO, causing NLM_F_REQUEST messages to fail on systems
with large numbers of VFs if the requesting application didn't have a
large enough buffer for all the info. The idea is that most
applications doing an NLM_F_REQUEST don't care about VFINFO anyway, so
eliminating it from the response would lower the requirements on
buffer size. Apparently, the people who pushed this patch made the
mistaken assumption that iproute2 (the "ip" command) was the only
package that used IFLA_VFINFO_LIST, so it wouldn't break anything else
(and they made sure that iproute2 was fixed.
The logic of this "fix" is debatable at best (one could claim that the
proper fix would be for the applications in question to be fixed so
that they properly sized the buffer, which is what libvirt does
(purely by virtue of using libnl), but it is what it is and we have to
deal with it.
In order for <interface type='hostdev'> to work properly on systems
with a kernel 3.3 or later, libvirt needs to add the afore-mentioned
IFLA_EXT_MASK field with RTEXT_FILTER_VF set.
Of course we also need to continue working on systems with older
kernels, so that one bit of code is compiled conditionally. The one
time this could cause problems is if the libvirt binary was built on a
system without IFLA_EXT_MASK which was subsequently updated to a
kernel that *did* have it. That could be solved by manually providing
the values of IFLA_EXT_MASK and RTEXT_FILTER_VF and adding it to the
message anyway, but I'm uncertain what that might actually do on a
system that didn't support the message, so for the time being we'll
just fail in that case (which will very likely never happen anyway).
Laine Stump [Thu, 20 Dec 2012 19:52:41 +0000 (14:52 -0500)]
util: add missing error log messages when failing to get netlink VFINFO
This patch fixes the lack of error messages when libvirt fails to find
VFINFO in a returned netlinke response message.
https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 is an example
of the error message that was previously logged when the
IFLA_VFINFO_LIST object was missing from the netlink response. The
reason for this failure is detailed in
Even though that root problem has been fixed, the experience of
finding the root cause shows us how important it is to properly log an
error message in these cases. This patch *seems* to replace the entire
function, but really most of the changes are due to moving code that
was previously inside an if() statement out to the top level of the
function (the original if() was reversed and made to log an error and
return).
Eric Blake [Wed, 19 Dec 2012 19:07:39 +0000 (12:07 -0700)]
build: make broken -Wlogical-op test be gcc-only
Commit 8b8fcdea introduced a check for broken gcc -Wlogical-op,
but did not guard the check against non-gcc compilers, which might
lead to spurious failures when another compiler encounters an
unknown pragma. Additionally, all of our compiler warning logic
should belong in a single file, and use cache variables to allow
overriding the decision at configure time if necessary.
* configure.ac (BROKEN_GCC_WLOGICALOP): Move...
* m4/virt-compile-warnings.m4 (LIBVIRT_COMPILE_WARNINGS): ...here,
and update to modern autoconf idioms.
Fix arch datatype in vahControl in virt-aa-helper.c
When changing to virArch, the virt-aa-helper.c file was not
completely changed. The vahControl struct was left with a
char *arch field, instead of virArch arch field.
Change string form of VIR_ARCH_ITANIUM back to ia64
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
Prior to the virArch changes, the CPU baseline method would
free the arch string in the returned CPU. Fix the regression
by setting arch to VIR_ARCH_NONE at the end
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If securityselinuxtest was run on a system with newer SELinux
policy it would fail, due to using svirt_tcg_t instead of
svirt_t. Fixing the domain type to be KVM avoids this issue.
We can use VIR_REALLOC_N with NULL pointer, which behaves the same way
as VIR_ALLOC_N in that case, so no need for a condition that's
checking if some data are allocated already.
---
I tried to find other parts of the code similar to this, so I can do a
full cleanup for the whole repository, so I used this (excuse the long
line, but that's how I was writing it):
git grep -nHC 5 -e VIR_REALLOC_N -e VIR_ALLOC_N | while read line; do if [[ "$line" == "--" ]]; then if [[ ${#tmpbuf} -gt 10 && "$REALLOC_N" == "true" && "$ALLOC_N" == "true" ]]; then echo $line; while [[ ${#tmpbuf[*]} -gt 0 ]]; do echo "${tmpbuf[0]}"; tmpbuf=( "${tmpbuf[@]:1:${#tmpbuf[*]}}" ); done; fi; unset tmpbuf REALLOC_N ALLOC_N; else if [[ "$ALLOC_N" != "true" && "${line/VIR_ALLOC_N//}" != "${line}" ]]; then ALLOC_N="true"; fi; if [[ "$REALLOC_N" != "true" && "${line/VIR_REALLOC_N//}" != "${line}" ]]; then REALLOC_N="true"; fi; tmpbuf[${#tmpbuf[*]}]="$line"; fi; done | less
And reviewed the output just to find out this was the only occurrence of
the inconsistency.
On few places there are too many levels of indentation when some of
them can be fixed with negating the option they are in or omitting
useless condition altogether.
Jiri Denemark [Mon, 17 Dec 2012 18:27:38 +0000 (19:27 +0100)]
spec: Do not install *.py[co] in python examples
Unfortunately, rpm is stupid enough to bytycompile python scripts even
though they are located in /usr/share/doc/libvirt-python-*/examples and
it does so after %install phase is finished. Thus there's no way we
could remove those files from BUILDROOT. As a workaround, we may safely
remove the examples subdirectory completely without losing anything. The
python scripts that were installed there are also copied directly into
/usr/share/doc/libvirt-python-*/ by
%doc python/tests/*.py
rule. And yes, the files are actually tests, not examples.
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a set of APIs for managing architectures
Introduce a 'virArch' enum for CPU architectures. Include
data type providing wordsize and endianness, and APIs to
query this info and convert to/from enum and string form.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It turns out that it would be very intrusive to correctly backport the
entire --bind-dynamic option to older dnsmasq versions
(e.g. dnsmasq-2.48 that is used on RHEL6.x and CentOS 6.x), but very
simple to patch those versions to just use SO_BINDTODEVICE on all
their listening sockets (SO_BINDTODEVICE also has the desired effect
of permitting only traffic that was received on the interface(s) where
dnsmasq was set to listen.)
This patch modifies the dnsmasq capabilities detection to detect the
string:
--bind-interfaces with SO_BINDTODEVICE
in the output of "dnsmasq --version", and in that case realize that
using the old --bind-interfaces option is just as safe as
--bind-dynamic (and therefore *not* forbid creation of networks that
use public IP address ranges).
If -bind-dynamic is available, it is still preferred over
--bind-interfaces.
Note that this patch does no harm in upstream, or in any distro's
downstream if it happens to end up there, but builds for distros that
have a new enough dnsmasq to support --bind-dynamic do *NOT* need to
specifically backport this patch; it's only required for distro
releases that have dnsmasq too old to have --bind-dynamic (and those
distros will need to add the SO_BINDTODEVICE patch to dnsmasq,
*including the extra string in the --version output*, as well.
Jiri Denemark [Fri, 14 Dec 2012 15:14:15 +0000 (16:14 +0100)]
build: Fix AUTHORS generation
Using s/#authorslist#/$$out/ makes perl eat @domain part of all email
addresses from $out since it tries to interpret them as array variables.
I'm not sure if we can escape those in s/// but I know we can use print:
s/#authorslist#// and print '$$out'
to tell perl not to even look inside $out.
This patch also fixes gen-AUTHORS so that it works in VPATH.
Laine Stump [Thu, 13 Dec 2012 16:40:47 +0000 (11:40 -0500)]
network: fix indentation of networkDnsmasqConfContents
Somehow I managed to push the changes to this file with improper
indentation. This patch just re-indents, reformats the comment lines,
and re-groups a couple of multi-line strings so that they fit within
80 columns. The resulting binary should be identical.
Authorization requires authentication but no agent is available.
The error means that polkit needs a password, but there is no polkit
agent registered in your session. Polkit agents are the bit of UI that
pop up and actually ask for your password.
Preface the error with the string 'polkit:' so folks can hopefully
make more sense of it.
Add basic driver API framework for device attach/detach support in LXC
This wires up the LXC driver to support the domain device attach/
detach/update APIs, following the same code design as used in
the QEMU driver. No actual changes are possible with this commit,
it is only providing the framework
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add support for misc host device passthrough with LXC
This extends support for host device passthrough with LXC to
cover misc devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add support for storage host device passthrough with LXC
This extends support for host device passthrough with LXC to
cover storage devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add support for USB host device passthrough with LXC
This adds support for host device passthrough with the
LXC driver. Since there is only a single kernel image,
it doesn't make sense to pass through PCI devices, but
USB devices are fine. For the latter we merely need to
make the /dev/bus/usb/NNN/MMM character device exist
in the container's /dev
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently LXC guests can be given arbitrary pre-mounted
filesystems, however, for some usecases it is more appropriate
to provide block devices which the container can mount itself.
This first impl only allows for <disk type='block'>, in other
words exposing a host disk device to a container. Since LXC
does not have device namespace virtualization, we are cheating
a little bit. If the XML specifies /dev/sdc4 to be given to
the container as /dev/sda1, when we do the mknod /dev/sda1
in the container's /dev, we actually use the major:minor
number of /dev/sdc4, not /dev/sda1.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Skip bulk relabelling of resources in SELinux driver when used with LXC
The virSecurityManager{Set,Restore}AllLabel methods are invoked
at domain startup/shutdown to relabel resources associated with
a domain. This works fine with QEMU, but with LXC they are in
fact both currently no-ops since LXC does not support disks,
hostdevs, or kernel/initrd files. Worse, when LXC gains support
for disks/hostdevs, they will do the wrong thing, since they
run in host context, not container context. Thus this patch
turns then into a formal no-op when used with LXC. The LXC
controller will call out to specific security manager labelling
APIs as required during startup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Refactor LXC NIC creation to allow reuse by hotplug code
The code for creating veth/macvlan devices is part of the
LXC process startup code. Refactor this a little and export
the methods to the rest of the LXC driver. This allows them
to be reused for NIC hotplug code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The <hostdev> device type has long had a redundant "mode"
attribute, which has always been "subsys". This finally
introduces a new mode "capabilities", which will be used
by the LXC driver for device assignment. Since container
based virtualization uses a single kernel, the idea of
assigning physical PCI devices doesn't make sense. It is
still reasonable to assign USB devices, but for assigning
arbitrary nodes in /dev, the new 'capabilities' mode is
to be used.
The first capability support is 'storage', which is for
assignment of block devices. Functionally this is really
pretty similar to the <disk> support. The only difference
is the device node name is identical in both host and
container namespaces.
The second capability support is 'misc', which is for
assignment of character devices. There is no existing
parallel to this. Again the device node is the same
inside & outside the container.
The reason for keeping the char & storage devices
separate in the domain XML, is to mirror the split
in the node device XML. NB the node device XML does
not yet report character devices, but that's another
new patch to come
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Test cases for virSysinfoRead. Initially, there are tests for
x86 (DMI based) and s390 (/proc/... based).
In lack of PPC data, I have stubbed out the test for it, but it
can be added with a minimal effort.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
There was a double free issue caused by virSysinfoRead on s390,
as the same manufacturer string instance was assigned to more
than one processor record.
Cleaned up other potential memory issues and restructured the sysinfo
parsing code by moving repeating patterns into a helper function.
The restructuring made it necessary to conditionally disable
-Wlogical-op for some older GCC versions, using pragma GCC diagnostic.
This is a GCC specific pragma, which is acceptable, since we're
using it to work around a GCC specific bug.
Finally, added a function virSysinfoSetup to configure the sysinfo
data source files/script during run time, to facilitate writing test
programs. This function is not published in sysinfo.h and only
there for testing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Peter Krempa [Mon, 17 Dec 2012 10:49:33 +0000 (11:49 +0100)]
conf: cpu: Refactor parsing of vendor_id and fallback attributes
This patch simplifies the code that parses the fallback and vendor_id
attributes from the domain xml cpu definition.
Changes done:
- free temp variables in the cleanup section instead of local use
- remove checking for presence of the attribute to directly getting the
value (saving call to virXPathBoolean)
- replace loop used to check for ',' in the vendor_id string with strchr
Support custom 'svirt_tcg_t' context for TCG based guests
The current SELinux policy only works for KVM guests, since
TCG requires the 'execmem' privilege. There is a 'virt_use_execmem'
boolean to turn this on globally, but that is unpleasant for users.
This changes libvirt to automatically use a new 'svirt_tcg_t'
context for TCG based guests. This obsoletes the previous
boolean tunable and makes things 'just work(tm)'
Since we can't assume we run with new enough policy, I also
make us log a warning message (once only) if we find the policy
lacks support. In this case we fallback to the normal label and
expect users to set the boolean tunable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Guannan Ren [Fri, 14 Dec 2012 07:08:01 +0000 (15:08 +0800)]
qemu: use newer -device video device in qemu commandline
'-device VGA' maps to '-vga std'
'-device cirrus-vga' maps to '-vga cirrus'
'-device qxl-vga' maps to '-vga qxl'
(there is also '-device qxl' for secondary devices)
'-device vmware-svga' maps to '-vga vmware'
For qemu(>=1.2), we can use -device to replace -vga for video
device. For the primary video device, the patch tries to use 0x2
slot for matching old qemu. If the 0x2 slot is allocated already,
the addr property could help for using any available slot.
For qemu(< 1.2), we keep using -vga for primary device.
Guannan Ren [Mon, 17 Dec 2012 06:01:20 +0000 (14:01 +0800)]
conf: add optional attribte primary to video <model> element
If there are multiple video devices
primary = 'yes' marks this video device as the primary one.
The rest are secondary video devices. No more than one could be
mark as primary. If none of them has primary attribute, the first
one will be the primary by default like what it was.
The reason of this changing is that for qemu, only one primary video
device is permitted which can be of any type. For secondary video
devices, only qxl is allowd. Primary attribute removes the restriction
that the first have to be the primary one.
We always put the primary video device into the first position of
video device structure array after parsing.
This adds an implementation of virNetSocketGetUNIXIdentity()
using LOCAL_PEERCRED socket option and xucred struct, defined
in <sys/ucred.h> on systems that have it.
Laine Stump [Fri, 14 Dec 2012 17:44:03 +0000 (12:44 -0500)]
network: fix (non)update of dnsmasq config during virDomainUpdateDeviceFlags
A forgotten "!" in recently-modified code at the top of
networkRefreshDaemon() meant an improper early return, which led to 1)
dnsmasq config files not being updated from the newly modified config,
and 2) dnsmasq not being sent a SIGHUP so that it could learn about
the changes to the config.
virNetworkDefGetIpByIndex() returns NULL if there are no ip objects of
the requested type, and if there are no IP elements, then dnsmasq
shouldn't be running, so we can return early. Otherwise we should
rewrite the config files and send a SIGHUP.
Michal Privoznik [Fri, 14 Dec 2012 11:17:55 +0000 (12:17 +0100)]
sanlock: Re-add lockspace unconditionally
Currently, if sanlock is already registering a lockspace other
libvirtd instances (from other hosts) obtain -EINPROGRESS. On
sufficiently new sanlock, sanlock_inq_lockspace() is called,
which suspend execution until lockspace state is changed. With
current libvirt implementation, we fail to retry adding the
lockspace again but continue in error path. Therefore we produce
meaningless error message:
virLockManagerSanlockSetupLockspace:363 : Unable to add lockspace
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Success
qemudLoadDriverConfig:558 : Failed to load lock manager sanlock
We should try to re-add the lockspace after its state change to
be sure it was added successfully. In fact, with sufficiently new
sanlock we can just avoid dummy usleep() which is used if there's
no inquire API.
Eric Blake [Thu, 13 Dec 2012 23:28:27 +0000 (16:28 -0700)]
install: fix virtlockd installation
The virtlockd daemon scripts were lousy, when compared to their
counterparts in daemon/Makefile.am. In particular, when init
scripts were selected, this resulted in 'make distcheck' failing
due to failure to clean up src/virtlockd.init.
* src/Makefile.am (install-systemd): Fix dependencies. Use MKDIR_P.
(uninstall-systemd): Remove empty directory. Use fewer processes.
(install-init, install-sysconfig): Use MKDIR_P.
(uninstall-init): Remove correct file, and also empty directory.
(uninstall-sysconfig): Remove empty directory.
(DISTCLEANFILES): Clean up trivially built sources.
Laine Stump [Fri, 14 Dec 2012 00:09:51 +0000 (19:09 -0500)]
qemu: don't fail update netdev on bridge detach failure
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).