Osier Yang [Fri, 3 May 2013 18:07:38 +0000 (02:07 +0800)]
qemu: Manage shared device entry for scsi host device
This adds the shared device entry when starting domain (more
exactly, when preparing host devices), and remove the entry
when destroying domain (when reattaching host devices).
Osier Yang [Fri, 3 May 2013 18:07:37 +0000 (02:07 +0800)]
qemu: Refactor the helpers to track shared scsi host device
This changes the helpers qemu{Add,Remove}SharedDisk into
qemu{Add,Remove}SharedDevice, as most of the code in the helpers
can be reused for scsi host device.
To track the shared scsi host device, first it finds out the
device path (e.g. /dev/s[dr]*) which is mapped to the sg device,
and use device ID of the found device path (/dev/s[dr]*) as the
hash key. This is because of the device ID is not unique between
between /dev/s[dr]* and /dev/sg*, e.g.
% sg_map
/dev/sg0 /dev/sda
/dev/sg1 /dev/sr0
% ls -l /dev/sda
brw-rw----. 1 root disk 8, 0 May 2 19:26 /dev/sda
%ls -l /dev/sg0
crw-rw----. 1 root disk 21, 0 May 2 19:26 /dev/sg0
Osier Yang [Fri, 3 May 2013 18:07:35 +0000 (02:07 +0800)]
qemu: Rename qemu_driver->sharedDisks to qemu_driver->sharedDevices
"Shared disk" is not only the thing we should care about after "scsi
hostdev" is introduced. A same scsi device can be used as "disk" for
one domain, and as "scsi hostdev" for another domain at the same time.
That's why this patch renames qemu_driver->sharedDisks. Related functions
and structs are also renamed.
Don't mount selinux fs in LXC if selinux is disabled
Before trying to mount the selinux filesystem in a container
use is_selinux_enabled() to check if the machine actually
has selinux support (eg not booted with selinux=0)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Change the build process & driver initialization so that the
VirtualBox driver is built into libvirtd, instead of libvirt.so
This change avoids the VirtualBox GPLv2-only license causing
compatibility problems with libvirt.so which is under the
GPLv2-or-later license.
NB this change prevents use of the VirtualBox driver on the
Windows platform, until such time as libvirtd can be made
to work there.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Fix LXC startup when /var/run is an absolute symlink
During startup, the LXC driver uses paths such as
/.oldroot/var/run/libvirt/lxc/...
to access directories from the previous root filesystem
after doing a pivot_root(). Unfortunately if /var/run
is an absolute symlink to /run, instead of a relative
symlink to ../run, these paths break.
At least one Linux distro is known to use an absolute
symlink for /var/run, so workaround this, by resolving
all symlinks before doing the pivot_root().
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Osier Yang [Tue, 14 May 2013 09:03:02 +0000 (17:03 +0800)]
conf: Fix the bug of disk->copy_on_read formating
The reason for it's not exposed for such long time is that the
enums for VirtioEventIdx and CopyOnReadType have same enum values
and Correspondingstrings. This fixes the bug and adds test.
Ján Tomko [Fri, 12 Apr 2013 15:30:56 +0000 (17:30 +0200)]
daemon: fix leak after listing all volumes
CVE-2013-1962
remoteDispatchStoragePoolListAllVolumes wasn't freeing the pool.
The pool also held a reference to the connection, preventing it from
getting freed and closing the netcf interface driver, which held two
sockets open.
qemu: Fix crash in migration of graphics-less guests.
Commit 7f15ebc7a2b599ab10dbc15bca6f823591213e67 introduced a bug
happening when guests without a <graphics> element are migrated.
The initialization of listenAddress happens unconditionally
from the cookie even if the cookie->graphics pointer was NULL.
Moved the initialization to where it is safe.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Jiri Denemark [Thu, 16 May 2013 06:16:13 +0000 (08:16 +0200)]
build: Fix check-driverimpls in VPATH
DRIVER_SOURCE_FILES mixes files with absolute path (inherited from
REMOTE_DRIVER_GENERATED) with file paths that are relative to srcdir but
check-driverimpls.pl needs full paths.
Update the LXC driver documentation to describe the way
containers are setup by default. Also describe the common
virsh commands for managing containers and a little about
the security. Placeholders for docs about configuring
containers still to be filled in.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Stefan Berger [Thu, 16 May 2013 01:02:11 +0000 (21:02 -0400)]
nwfilter: check for inverted ctdir
Linux netfilter at some point (Linux 2.6.39) inverted the meaning of the
'--ctdir reply' and newer netfilter implementations now expect
'--ctdir original' instead and vice-versa.
We check for the kernel version and assume that all Linux kernels with version
2.6.39 have the newer inverted logic.
Any distro backporting the Linux kernel patch that inverts the --ctdir logic
(Linux commit 96120d86f) must also backport this patch for Linux and
adapt the kernel version being tested for.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
FreeBSD ships an old gcc 4.2.1 which generates
bogus code, e.g. getsockopt() call returns
struct xucred with bogus values, which doesn't even
allow to connect to libvirtd:
error: Failed to find group record for gid '1284660778': No error: 0
So roll back to just -fstack-protector on FreeBSD.
John Ferlan [Fri, 26 Apr 2013 18:29:37 +0000 (14:29 -0400)]
Adjust improperly formatted <sysinfo> uuid
If the <sysinfo> system table 'uuid' field is improperly formatted,
then qemu will fail to start the guest with the error:
virsh start dom
error: Failed to start domain dom
error: internal error process exited while connecting to monitor: Invalid SMBIOS UUID string
This was because the parsing rules were lax with respect to allowing extraneous
spaces and dashes in the provided UUID. As long as there were 32 hexavalues
that matched the UUID for the domain the string was accepted. However startup
failed because the string format wasn't correct. This patch will adjust the
string format so that when it's presented to the driver it's in the expected
format.
The lxcContainerMountAllFS method had a 'bool skipRoot'
flag to control whether it mounts the / filesystem. Since
removal of the non-pivot root container setup codepaths,
this flag is obsolete as the only caller always passes
'true'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Many methods accept a string parameter specifying the
old root directory prefix. Since removal of the non-pivot
root container setup codepaths, this parameter is obsolete
in many methods where the callers always pass "/.oldroot".
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The lxcContainerMountBasicFS method had a 'bool pivotRoot'
flag to control whether it mounted a private /dev. Since
removal of the non-pivot root container setup codepaths,
this flag is obsolete as the only caller always passes
'true'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Osier Yang [Tue, 14 May 2013 12:44:54 +0000 (20:44 +0800)]
qemu: Support discard for disk
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
John Ferlan [Tue, 23 Apr 2013 13:02:07 +0000 (09:02 -0400)]
Adjust usage of qemu -no-reboot and -no-shutdown options
During building of the qemu command line determine whether to add/use the
'-no-reboot' option only if each of the 'on' events want to to destroy
the domain; otherwise, use the '-no-shutdown' option.
Prior to this change both could be on the command line, which while allowed
could be construed as a conflict.
Adding a VNC WebSocket support for QEMU driver. This functionality is
in upstream qemu from commit described as v1.3.0-982-g7536ee4, so the
capability is being recognized based on QEMU version for now.
Adding support for new attribute 'websocket' in the '<graphics>'
element, the attribute value is the port to listen on with '-1'
meaning auto-allocation, '0' meaning no websockets.
Osier Yang [Tue, 14 May 2013 05:25:50 +0000 (13:25 +0800)]
qemu: New XML to disable memory merge at guest startup
QEMU introduced command line "-mem-merge=on|off" (defaults to on) to
enable/disable the memory merge (KSM) at guest startup. This exposes
it by new XML:
<memoryBacking>
<nosharepages/>
</memoryBacking>
The XML tag is same with what we used internally for old RHEL.
Eric Blake [Tue, 14 May 2013 05:25:49 +0000 (13:25 +0800)]
qemu: detect -machine mem-merge capability
* src/qemu/qemu_capabilities.h: New capability bit.
* src/qemu/qemu_capabilities.c (virQEMUCapsProbeQMPCommandLine): New
function, based on qemuMonitorGetCommandLineOptionParameters, which was
introduced by commit bd56d0d813; use it to set new capability bit.
(virQEMUCapsInitQMP): Use new function.
There is no way to escape the ':' if it appears in the
pool or image name. Thus it must be explicitly forbidden
if it occurs in the libvirt XML. People are known to
be abusing the lack of escaping in current libvirt to
pass arbitrary args to QEMU.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Clang does not like the -export-dynamic flag. The compiler does
not need it in the first place, so we can avoid the problem by
only setting it for the linker
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Clang will happily claim to support any warning flags
unless the -Werror and -Wunknown-warning-option flags
are set. Thus we need to make sure these are set when
testing for clags.
We must also set the clang specific warning flags
-Wno-unused-command-line-argument to avoid a warning
from the ssp-buffer-size flag when linking .o files.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ignore cast alignment warnings in inotify code for Xen.
The inotify Xen code causes a cast alignment warning, but this
is harmless since the kernel inotify interface will ensure
sufficient alignment of the inotify structs in the buffer being
read
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure consistent enablement of gcc 'diagnostic' pragma
The virt-compile-warnings.m4 file would do an explicit
check for whether the compile could use the 'diagnostic'
pragma push/pop feature. The src/internal.h file would
then only enable it for GCC >= 4.6
This breaks with clang which supports the pragma but
does not claim GCC 4.6 compat. Export a variable from
the m4 check to the header file so they are consistent.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Eric Blake [Mon, 13 May 2013 22:48:55 +0000 (16:48 -0600)]
qemu: fix bad free
Commit bd56d0d8 could lead to freeing an uninitialized pointer:
qemu/qemu_monitor_json.c: In function 'qemuMonitorJSONGetCommandLineOptionParameters':
qemu/qemu_monitor_json.c:4284: warning: 'cmd' may be used uninitialized in this function
When combining old gcc (4.2.1) and new gcrypt (1.5.2), such as
when using the Ports repository on FreeBSD, the build fails with:
CC libvirt_driver_la-libvirt.lo
cc1: warnings being treated as errors
In file included from libvirt.c:58:
/usr/local/include/gcrypt.h:1336: warning: 'gcry_ac_io_mode_t' is deprecated [-Wdeprecated-declarations]
Relevant part of gcrypt.h:
1333 typedef struct gcry_ac_io
1334 {
1335 /* This is an INTERNAL structure, do NOT use manually. */
1336 gcry_ac_io_mode_t mode _GCRY_ATTR_INTERNAL;
1337 gcry_ac_io_type_t type _GCRY_ATTR_INTERNAL;
1338 union
The sad part is that we aren't even using the deprecated symbols - their
mere inclusion in the installed header is provoking the problems. It
looks like newer gcc is a bit more tolerant (that is, this is a
shortcoming of FreeBSD's use of an older compiler).
Eric Blake [Fri, 26 Apr 2013 17:13:45 +0000 (11:13 -0600)]
qemu: query command line options in QMP
Ever since the conversion to using only QMP for probing features
of qemu 1.2 and newer, we have been unable to detect features
that are added only by additional command line options. For
example, we'd like to know if '-machine mem-merge=on' (added
in qemu 1.5) is present. To do this, we will take advantage
of qemu 1.5's query-command-line-parameters QMP call [1].
This patch wires up the framework for probing the command results;
if the QMP command is missing, or if a particular command line
option does not output any parameters (for example, -net uses
a polymorphic parser, which showed up as no parameters as of qemu
1.5), we silently treat that command as having no results.
Eric Blake [Fri, 26 Apr 2013 14:59:02 +0000 (08:59 -0600)]
json: support removing a value from an object
In an upcoming patch, I need the way to safely transfer a nested
virJSON object out of its parent container for independent use,
even after the parent is freed.
* src/util/virjson.h (virJSONValueObjectRemoveKey): New function.
(_virJSONObject, _virJSONArray): Use correct type.
* src/util/virjson.c (virJSONValueObjectRemoveKey): Implement it.
* src/libvirt_private.syms (virjson.h): Export it.
* tests/jsontest.c (mymain): Test it.
Gene Czarcinski [Tue, 7 May 2013 17:42:55 +0000 (13:42 -0400)]
Support for static routes on a virtual bridge
network: static route support for <network>
This patch adds the <route> subelement of <network> to define a static
route. the address and prefix (or netmask) attribute identify the
destination network, and the gateway attribute specifies the next hop
address (which must be directly reachable from the containing
<network>) which is to receive the packets destined for
"address/(prefix|netmask)".
These attributes are translated into an "ip route add" command that is
executed when the network is started. The command used is of the
following form:
ip route add <address>/<prefix> via <gateway> \
dev <virbr-bridge> proto static metric <metric>
Tests are done to validate that the input data are correct. For
example, for a static route ip definition, the address must be a
network address and not a host address. Additional checks are added
to ensure that the specified gateway is directly reachable via this
network (i.e. that the gateway IP address is in the same subnet as one
of the IP's defined for the network).
prefix='0' is supported for both family='ipv4' address='0.0.0.0'
netmask='0.0.0.0' or prefix='0', and for family='ipv6' address='::',
prefix=0', although care should be taken to not override a desired
system default route.
Anytime an attempt is made to define a static route which *exactly*
duplicates an existing static route (for example, address=::,
prefix=0, metric=1), the following error message will be sent to
syslog:
RTNETLINK answers: File exists
This can be overridden by decreasing the metric value for the route
that should be preferred, or increasing the metric for the route that
shouldn't be preferred (and is thus in place only in anticipation that
the preferred route may be removed in the future). Caution should be
used when manipulating route metrics, especially for a default route.
Note: The use of the command-line interface should be replaced by
direct use of libnl so that error conditions can be handled better. But,
that is being left as an exercise for another day.
Signed-off-by: Gene Czarcinski <gene@czarc.net> Signed-off-by: Laine Stump <laine@laine.org>
Use of the select() system call is inherantly dangerous since
applications will hit a buffer overrun if any FD number exceeds
the size of the select set size (typically 1024). Replace the
two uses of select() with poll() and use cfg.mk to ban any
future use of select().
NB: This changes the phyp driver so that it uses an infinite
timeout, instead of busy-waiting for 1ms at a time.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Jim Fehlig [Fri, 10 May 2013 18:05:00 +0000 (12:05 -0600)]
Fix starting domains when kernel has no cgroups support
Found that I was unable to start existing domains after updating
to a kernel with no cgroups support
# zgrep CGROUP /proc/config.gz
# CONFIG_CGROUPS is not set
# virsh start test
error: Failed to start domain test
error: Unable to initialize /machine cgroup: Cannot allocate memory
virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when
attempting to open /proc/cgroups on such a system, but it was being
dropped in virCgroupSetPartitionSuffix().
Change virCgroupSetPartitionSuffix() to propagate errors returned by
its callees. Also check for ENOENT in qemuInitCgroup() when determining
if cgroups support is available.
Han Cheng [Fri, 3 May 2013 18:07:29 +0000 (02:07 +0800)]
qemu: Introduce activeScsiHostdevs list for scsi host devices
Although virtio-scsi supports SCSI PR (Persistent Reservations),
the device on host may do not support it. To avoid losing data,
Just like PCI and USB pass through devices, only one live guest
is allowed per SCSI host pass through device."
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Support NBD backed disks/filesystems in LXC driver
The LXC driver can already configure <disk> or <filesystem>
devices to use the loop device. This extends it to also allow
for use of the NBD device, to support non-raw formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The <filesystem> element can now accept a <driver type='nbd'/>
as an alternative to 'loop'. The benefit of NBD is support
for non-raw disk image formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Re-arrange code setting up ifs/disk loop devices for LXC
The current code for setting up loop devices to LXC disks first
does a switch() based on the disk format, then looks at the
disk driver name. Reverse this so it first looks at the driver
name, and then the disk format. This is more useful since the
list of supported disk formats depends on what driver is used.
The code for setting loop devices for LXC fs entries also needs
to have the same logic added, now the XML schema supports this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Extend the <driver> element in filesystem devices to
allow a storage format to be set. The new attribute
uses 'format' to reflect the storage format. This is
different from the <driver> element in disk devices
which use 'type' to reflect the storage format. This
is because the 'type' attribute on filesystem devices
is already used for the driver backend, for which the
disk devices use the 'name' attribute. Arggggh.
Anyway for disks we have
<driver name="qemu" type="raw"/>
And for filesystems this change means we now have
<driver type="loop" format="raw"/>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Osier Yang [Fri, 3 May 2013 18:07:28 +0000 (02:07 +0800)]
security: Manage the security label for scsi host device
To not introduce more redundant code, helpers are added for
both "selinux", "dac", and "apparmor" backends.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com> Signed-off-by: Osier Yang <jyang@redhat>
v2.5 - v3:
* Splitted from 8/10 of v2.5
* Don't forget the other backends (DAC, and apparmor)
Han Cheng [Fri, 3 May 2013 18:07:23 +0000 (02:07 +0800)]
qemu: Build qemu command line for scsi host device
Except the scsi host device's controller is "lsilogic", mapping
between the libvirt attributes and scsi-generic properties is:
libvirt qemu
-----------------------------------------
controller bus ($libvirt_controller.0)
bus channel
target scsi-id
unit lun
For scsi host device with "lsilogic" controller, the mapping is:
('target (libvirt)' must be 0, as it's not used; 'unit (libvirt)
must <= 7).
libvirt qemu
----------------------------------------------------------
controller && bus bus ($libvirt_controller.$libvirt_bus)
unit scsi-id
It's not good to hardcode/hard-check limits of these attributes,
and even worse, these limits are not documented, one has to find
out by either testing or reading the qemu code, I'm looking forward
to qemu expose limits like these one day). For example, exposing
"max_target", "max_lun" for megasas:
Controller is implicitly added for scsi hostdev, though the scsi
controller's model defaults to "lsilogic", which might be not what
the user wants (same problem exists for virtio-scsi disk). It's
the existing problem, will be addressed later.
The device address must be specified manually. Later patch will let
libvirt generate it automatically.
This only introduces the generic XMLs for scsi hostdev, later patches
will add other elements, e.g. <readonly>, <shareable>.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com> Signed-off-by: Osier Yang <jyang@redhat.com>
Osier Yang [Mon, 6 May 2013 12:45:17 +0000 (20:45 +0800)]
tests: Add tests for fc_host
Since the NPIV machine is not easy to get, it's very likely to
introduce regressions when doing changes on the existing code.
This patch dumps part of the sysfs files (the necessary ones)
of fc_host as test input data, to test the related util functions.
It could be extended for more fc_host related testing in future.
Osier Yang [Mon, 6 May 2013 12:45:16 +0000 (20:45 +0800)]
util: Honor the passed sysfs_prefix
The helper works for default sysfs_prefix, but for user specified
prefix, it doesn't work. (Detected when writing test cases. A later
patch will add the test cases for fc_host).
Osier Yang [Mon, 6 May 2013 12:45:13 +0000 (20:45 +0800)]
util: Don't miss the slash in constructed path
In case of the caller can pass a "prefix" (or "sysfs_prefix")
without the trailing slash, and Unix-Like system always eats
up the redundant "slash" in the filepath, let's add it explicitly.
Osier Yang [Mon, 6 May 2013 12:45:11 +0000 (20:45 +0800)]
util: Fix regression of wwn reading
Introduced by commit 244ce462e29, which refactored the helper for wwn
reading, however, it forgot to change the old "strndup" and "sizeof(buf)",
"sizeof(buf)" operates on the fixed length array ("buf") in the old code,
but now "buf" is a pointer.
Eric Blake [Sat, 11 May 2013 02:46:36 +0000 (20:46 -0600)]
build: fix use of mmap
Commit bfe7721d introduced a regression, but only on platforms
like FreeBSD that lack posix_fallocate and where mmap serves as
a nice fallback for safezero.
util/virfile.c: In function 'safezero':
util/virfile.c:837: error: 'PROT_READ' undeclared (first use in this function)
* src/util/virutil.c (includes): Move use of <sys/mman.h>...
* src/util/virfile.c (includes): ...to the file that uses mmap.
Add a test case for the fdstream file read/write code
Add a test case which exercises the virFDStreamOpenFile
and virFDStreamCreateFile methods. Ensure that both the
synchronous and non-blocking iohelper code paths work.
This validates the regression recently fixed which
broke reading in non-blocking mode
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Allow the iohelper path to be customized by test programs
Currently the fdstream function hardcodes the location
of the iohelper to LIBEXECDIR "/libvirt_iohelper". This
is not convenient when trying to write test cases which
use this code. Add a virFDStreamSetIOHelper method to
allow the test cases to point to the location of the
un-installed iohelper binary.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>