Separate the guest created QEMU monitor socket location
from the libvirtd create XML / PID data files, to improve
security separation when running QEMU non-root
* libvirt.spec.in: Leave /var/run/libvirt/qemu as root:root
* src/qemu_conf.h: Add libDir and cacheDir directory paths
* src/qemu_driver.c: Move QEMU monitor socket from
stateDir to libDir to avoid making security critical directory
accessible to QEMU guests.
* src/util.c: Delay running hook till after damonizing to
ensure pidfile is still written before changing UID/GID
If a file descriptor with events=0 was added to the libvirtd
event loop, it would still be added to the poll() fds' array.
While it wouldn't see any POLLIN/OUT events, it'd still get
triggered for HANGUP/ERROR events which was not in compliance
with the libvirt events API contract.
* qemud/event.c: Don't poll on FDs with events=0
* tests/eventtest.c: Add test case to validate fix to event.c
Jim Meyering [Thu, 3 Sep 2009 12:09:49 +0000 (14:09 +0200)]
python: let libvirt_virConnectDomainEventCallback indicate success
* python/libvir.c (libvirt_virConnectDomainEventCallback): Return 0
when successful, rather than always returning -1.
clang flagged this function for its dead-store of "ret=0".
Once "ret" was set to 0, it was never used, and
the function would always return -1.
Jim Meyering [Thu, 3 Sep 2009 10:33:11 +0000 (12:33 +0200)]
openvz_conf.c: don't use undefined local, "net"
* src/openvz_conf.c (openvzReadNetworkConf): Initialize "net".
Otherwise, upon openvzRead... failure, we would "goto error;"
where an uninitialized "net" could be dereferenced.
Jim Meyering [Thu, 3 Sep 2009 10:12:19 +0000 (12:12 +0200)]
test.c: don't use undefined local, "def"
* src/test.c (testOpenVolumesForPool): Upon early virAsprintf or
virXPathNodeSet failure, "goto error" would take us to
virStorageVolDefFree(def), but with "def" not defined.
Initialize it to NULL.
Jim Meyering [Thu, 3 Sep 2009 08:22:57 +0000 (10:22 +0200)]
storage_backend.c: assure clang that inputvol can't be NULL
* src/storage_backend.c: Include "internal.h".
(virStorageBackendCopyToFD): Mark inputvol parameter as "nonnull".
Remove test for non-NULL inputvol. Both callers ensure it's non-NULL.
Jim Meyering [Wed, 2 Sep 2009 15:47:51 +0000 (17:47 +0200)]
libvir.c: avoid NULL dereference in virStoragePoolSetAutostart
* src/libvirt.c (virStoragePoolSetAutostart): Return -1 if the pool
argument is invalid, rather than "goto error" where we could dereference
that possibly-NULL "pool".
(virConnectFindStoragePoolSources): Likewise.
(virConnectNumOfDomains): Likewise.
Daniel P. Berrange spotted that the two latter functions
needed the same treatment.
* docs/schemas/domain.rng: Add <serial> element to disks
* src/domain_conf.h, src/domain_conf.c: XML parsing and
formatting for disk serial numbers
* src/qemu_conf.c: Set serial number when launching guests
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-shared.args,
tests/qemuxml2argvdata/qemuxml2argv-disk-drive-shared.xml: Add
serial number to XML test
* configure.in: Add check for mntent.h
* qemud/libvirtd_qemu.aug, qemud/test_libvirtd_qemu.aug, src/qemu.conf
Add 'hugetlbfs_mount' config parameter
* src/qemu_conf.c, src/qemu_conf.h: Check for -mem-path flag in QEMU,
and pass it when hugepages are requested.
Load hugetlbfs_mount config parameter, search for mount if not given.
* src/qemu_driver.c: Free hugetlbfs_mount/path parameter in driver shutdown.
Create directory for QEMU hugepage usage, chowning if required.
* docs/formatdomain.html.in: Document memoryBacking/hugepages elements
* docs/schemas/domain.rng: Add memoryBacking/hugepages elements to schema
* src/util.c, src/util.h, src/libvirt_private.syms: Add virFileFindMountPoint
helper API
* tests/qemuhelptest.c: Add -mem-path constants
* tests/qemuxml2argvtest.c, tests/qemuxml2xmltest.c: Add tests for hugepage
handling
* tests/qemuxml2argvdata/qemuxml2argv-hugepages.xml,
tests/qemuxml2argvdata/qemuxml2argv-hugepages.args: Data files for
hugepage tests
* tests/testutils.c: Run test function twice, once to prime it for
static allocations, once to count the non-static allocations.
* tests/testutilsqemu.c: Initialize variable correctl
* src/capabilities.c: Don't free machines variable upon failure
since caller must do that
* src/xm_internal.c: Add missing check for OOM in building VIF
config param
* docs/schemas/domain.rng: augment the video model with an optional
acceleration element with optional accel2d and accel3d flags
* src/domain_conf.c src/domain_conf.h: exten the virDomainVideoDef
structure with an optional accel field, virDomainVideoAccelDefParseXML
and virDomainVideoAccelDefFormat functions to parse and serialize
the structure.
Chris Lalancette [Tue, 25 Aug 2009 15:23:12 +0000 (17:23 +0200)]
Fix bugs in virDomainMigrate v2 code.
Paolo Bonzini points out that in my refactoring of the code for
virDomainMigrate(), I added a check for the return value from
virDomainMigratePerform(). The problem is that we don't want to
exit if we fail, we actually want to go on and do
virDomainMigrateFinish2() with a non-0 return code to clean things
up. Remove the check.
While reproducing this issue, I also noticed that we wouldn't
always properly propagate an error message. In particular, I
found that if you blocked off the migration ports (with iptables)
and then tried the migration, it would actually fail but we would
get no failure output from Qemu. Therefore, we would think we
succeeded, and leave a huge mess behind us. Execute the monitor
command "info migrate", and look for a failure string in there
as well.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
* src/esx/esx_util.c: esxUtil_ParseQuery() warns if a known query
parameter should be ignored due to the corresponding char/int pointer
being NULL, instead of silently ignoring it. Fix the control flow.
* src/esx/esx_vmx.c: add an extra type of addressType beside 'static'
and 'generated', 'vpx' indicates that the MAC address was generated
by a vCenter.
Calling qsort() on the disks array causes disk to be
unneccessarily re-ordered, potentially breaking the
ability to boot if the boot disk gets moved later in
the list. The new algorithm will insert a new disk as
far to the end of the list as possible, while being
ordered correctly wrt other disks on the same bus.
* src/domain_conf.c, src/domain_conf.h: Remove disk sorting
routines. Add API to insert a disk into existing list at
the optimal position, without resorting disks
* src/libvirt_private.syms: Export virDomainDiskInsert
* src/xend_internal.c, src/xm_internal.c: Remove calls to
qsort, use virDomainDiskInsert instead.
* src/qemu_driver.c: Remove calls to qsort, use virDoaminDiskInsert
instead. Fix reordering bugs when hotunplugging disks and
networks. Fix memory leak in disk/net unplug
* proxy/Makefile.am: Build storage_encryption_conf.c since its a
dependancy of domain_conf.c
* src/storage_encryption_conf.c: Disable XML parsing APis when
build under proxy
* src/test.c: Add a dummy no-op secrets driver for test suite
The if ((nlptr...)) implicitly assumes commptr != NULL (and that "buf"
starts with "cmd"). Make the assumption explicit, it will be broken in
a future patch.
* src/qemu_driver.c: Don't assume buffered monitor output echoes the
command.
Attach encryption information to virDomainDiskDef.
The XML allows <encryption format='unencrypted'/>, this implementation
canonicalizes the internal representation so that "disk->encryption" is
non-NULL iff encryption information is available.
A domain with partial encryption information can be defined,
completeness of the information is not verified. The domain won't
start until the remaining information is added, of course.
* docs/formatdomain.html, docs/formatdomain.html.in: Document
new encryption options for disks
* docs/schemas/domain.rng: Pull in storage encryption schema
rules
* src/domain_conf.h, src/domain_conf.c: Wire up storage encryption
XML parsing/formatting APIs
Supports only virStorageVolCreateXML, not virStorageVolCreateXMLFrom.
Curiously, qemu-img does not need the passphrase for anything to create
an encrypted volume. This implementation thus does not need to touch
any secrets to work with cooperating clients. More generic passphrase
handling is added in the next patch.
* src/storage_backend.c: Request encryption when creating qcow/qcow2
files
* src/storage_backend_disk.c, src/storage_backend_fs.c,
src/storage_backend_logical.c: Refuse to create volumes with
encryption params set.
Attach encryption information to virStorageVolDef.
The XML allows <encryption format='unencrypted'/>, this implementation
canonicalizes the internal representation so that "vol->encryption" is
non-NULL iff the volume is encrypted.
Note that partial encryption information (e.g. specifying an encryption
format, but not the key/passphrase) is valid, libvirt will automatically
choose value for the missing information during volume creation. The
user can read the volume XML, and use the unmodified <encryption> tag in
future operations (without having to be able to understand) its contents.
* docs/formatstorage.html, docs/formatstorage.html.in: Document
storage volume encryption options
* src/storage_conf.c, src/storage_conf.h: Hook up storage
encryption XML handling
* tests/storagevolschemadata/vol-qcow2.xml: Test case for encryption
schema changes
Miloslav Trmač [Wed, 19 Aug 2009 19:50:10 +0000 (21:50 +0200)]
Add volume encryption information handling.
Define an <encryption> tag specifying volume encryption format and
format-depenedent parameters (e.g. passphrase, cipher name, key
length, key).
Currently the only defined parameter is a reference to a "secret"
(passphrase/key) managed using the virSecret* API.
Only the qcow/qcow2 encryption format, and a "default" format used to
let libvirt choose the format during volume creation, is currently
supported.
This patch does not add any users; the <encryption> tag is added in
the following patches to both volumes (to support encrypted volume
creation) and domains.
* docs/*.html: Re-generate
* docs/formatstorageencryption.html.in, docs/sitemap.html.in:
Add page describing storage encryption data format
* docs/schemas/Makefile.am, docs/schemas/storageencryption.rng:
Add RNG schema for storage encryption format
* po/POTFILES.in: Add src/storage_encryption_conf.c
* src/libvirt_private.syms: Export virStorageEncryption* functions
* src/storage_encryption_conf.h, src/storage_encryption_conf.c: Internal
helper APIs for dealing with storage encryption format
* libvirt.spec.in, mingw32-libvirt.spec.in: Add storageencryption.rng
RNG schema
>>> s = c.secretDefineXML("<secret ephemeral='no' private='no'>\n<description>Something for use</description>\n<volume>/foo/bar</volume>\n</secret>\n")
Miloslav Trmač [Fri, 14 Aug 2009 19:42:19 +0000 (21:42 +0200)]
Secret manipulation internal API
* include/libvirt/virterror.h, src/virterror.c: Add VIR_WAR_NO_SECRET
* src/libvirt_private.syms, src/datatypes.h, src/datatypes.c: Type
virSecret struct definition and helper APIs
* src/driver.h: Sub-driver API definitions for secrets
* src/libvirt.c: Define new sub-driver for secrets
This patch adds a "secret" as a separately managed object, using a
special-purpose API to transfer the secret values between nodes and
libvirt users.
* docs/schemas/secret.rng, docs/schemas/Makefilem.am: Add new
schema for virSecret objects
* docs/*html: Re-generated
* docs/formatsecret.html.in, docs/sitemap.html.in: Add page
describing the virSecret XML schema
* include/libvirt/libvirt.h.in: Define the new virSecret public
API
* src/libvirt_public.syms: Export symbols for new public APIs
* mingw32-libvirt.spec.in, libvirt.spec.in: Add secret.rng to
files list
Charles Duffy [Fri, 28 Aug 2009 18:13:47 +0000 (13:13 -0500)]
support lzop save compression for qemu
Per prior discussion -- this was, indeed, trivial.
I'm a little disappointed to be breaking the ordering characteristics of
the enum (as it had been ordered by increasing time requirements and
decreasing output size), but breaking any save files with the old
constants in the headers would of course be worse.
>From 2a9cdcfc88de091a8d34aa3fc3b1208d7681790e Mon Sep 17 00:00:00 2001
From: Charles Duffy <Charles_Duffy@dell.com>
Date: Fri, 28 Aug 2009 11:49:54 -0500
Subject: [PATCH] support lzop save compression for qemu
One of the larger disincentives towards use of compression for migrated-out save
files is performance impact. This patch adds support for lzop; CPU time for
compression is about 5x faster than gzip (the next most performant algorithm)
and decompression is about 3x faster.
Signed-off-by: Charles Duffy <Charles_Duffy@dell.com> Signed-off-by: Chris Lalancette <clalance@redhat.com>
qemudExtractMonitorPath() was doing a VIR_ALLOC_N followed by a
strncpy. However, this isn't necessary; we can do the same thing
using strndup, which is much safer.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Chris Lalancette [Fri, 21 Aug 2009 12:16:13 +0000 (14:16 +0200)]
Fix up virNodeGetCellsFreeMemory
The documentation for virNodeGetCellsFreeMemory claims the values
returned are in kilobytes, but that's actually wrong; the value
returned is actually in bytes. Fix up the documentation to be
correct.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Jim Fehlig [Tue, 25 Aug 2009 21:54:18 +0000 (15:54 -0600)]
Fix sexpr2string() to handle empty list.
S-expression containing empty lists, e.g. (cpus (() () () ())),
was not being handled properly in sexpr2string() serialization.
Emit an empty list when encountering NIL sexpr kind.
Refactor policycode auth code to avoid compiler warnings
* src/remote_internal.c: Split remoteAuthPolkit into separate
impls for v0 and v1 to avoid compile warnings due to unused
variables/params
* qemud/remote.c: Remove accidental tabs
* configure.in: Check for pkcheck which indicates new policykit
* qemud/Makefile.am: Install different versions of policy
* qemud/libvirtd.policy: Rename to libvirtd.policy-0
* qemud/libvirtd.policy-1: new style policy
* qemud/qemud.c, qemud/qemud.h, qemud/remote.c: Support new
policykit API via external pkcheck helper
* src/remote_internal.c: Don't prompt for polkit auth with new
policykit API
* libvirt.spec.in: deal with new policy install locations & deps
Mattias Bolte [Thu, 20 Aug 2009 11:59:07 +0000 (13:59 +0200)]
Fix phypOpen() escape_specialcharacters
Matthias correctly points out that escape_specialcharaters() takes a
length, and since we are now malloc()'ing string in phypOpen instead of
making it a static array, we can't use sizeof(string) anymore. Calculate
the proper strlen and then use that both to allocate the string and also
pass it to escape_specialcharacters().
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Mattias Bolte [Thu, 20 Aug 2009 10:32:19 +0000 (12:32 +0200)]
Power Hypervisor: fix potential segfault
I came across this line in the phypOpen function:
char string[strlen(conn->uri->path)];
Here the path part of the given URI is used without checking it for
NULL, this can cause a segfault as strlen expects a string != NULL.
Beside that uuid_db and connection_data leak in case of an error.
In this line
conn->uri->path = string;
the original path of the URI leaks. The patch adds a VIR_FREE call
before setting the new path.
The attached patch is compile-tested but I don't have a Power
Hypervisor installation at hand to test it for real.
Matthias
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Chris Lalancette [Mon, 17 Aug 2009 10:34:53 +0000 (12:34 +0200)]
Small fixes for qemu save compression.
Fix up a small memory leak pointed out by DanB; I was forgetting
to release memory allocated to driver->saveImageFormat.
Also add the "save_image_format" and "security" entries to
the augeas lens.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
* tests/sexpr2xmldata/sexpr2xml-pv-vfb-type-crash.sexpr,
tests/sexpr2xmldata/sexpr2xml-pv-vfb-type-crash.xml: Data
files exhibiting the crash
* tests/sexpr2xmltest.c: Process new data files
Mark McLoughlin [Mon, 17 Aug 2009 14:05:23 +0000 (15:05 +0100)]
Simplify PCI hostdev prepare/re-attach using a pciDeviceList type
The qemuPrepareHostDevices() and qemuDomainReAttachHostDevices()
functions are clutter with a bunch of calls to pciGetDevice() and
pciFreeDevice() obscuring the basic logic.
Add a pciDeviceList type and add a qemuGetPciHostDeviceList() function
to build a list from a domain definition. Use this in prepare/re-attach
fto simplify things and eliminate the multiple pciGetDevice calls.
This is especially useful because in the next patch we need to iterate
the hostdevs list a third time and we also need a list type for keeping
track of active devices.
* src/pci.[ch]: add pciDeviceList type and also a per-device 'managed'
property
* src/libvirt_private.syms: export the new functions
* src/qemu_driver.c: add qemuGetPciHostDeviceList() and re-write
qemuPrepareHostDevices() and qemuDomainReAttachHostDevices() to use it
Mark McLoughlin [Mon, 17 Aug 2009 14:05:22 +0000 (15:05 +0100)]
Revert changes to allow pciResetDevice() reset multiple devices
It turns out that the previous attempt at this doesn't work well
in the case of hotplug. We need qemuCheckPciHostDevice() to
disallow the reset affecting devices already attach to the guest,
but we still need to avoid double locking the virDomainObjPtr.
Chris Lalancette [Tue, 11 Aug 2009 13:48:59 +0000 (15:48 +0200)]
Fix up connection reference counting.
Currently the reference counting for connections is busted. I
first noticed it while trying to use virConnectRef; it would
eventually cause a crash in the remote_internal driver, although
that was really just a victim. Really, we should only call the
close callbacks on the methods when the references drop to 0. To
accomplish this, move all of the close callbacks into
virUnrefConnect (since there are lots of internal users of that
function), and arrange for virConnectClose to call that.
V2: Make sure to drop the connection lock before we call the close
callbacks, otherwise we could deadlock the daemon
V3: Fix up a crash when we got an error from one of the drivers
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Fix LXC driver crash when kernel doesn't support clone
* src/domain_conf.c: Make virDomainObjListFree a no-op if list
is NULL
* src/domain_event.c: make virDomainEventCallbackListFree a no-op
if event list is NULL
* src/lxc_driver.c: Log a message if LXC driver does not startup
due to lacking kernel support
Implement a compressed save image format for qemu. While ideally
we would have the choice between compressed/non-compressed
available to the libvirt API, unfortunately there is no "flags"
parameter to the virDomainSave() API. Therefore, implement this
as a qemu.conf option. gzip, bzip2, and lzma are implemented, and
it should be very easy to implement additional compression
methods.
One open question is if/how we should detect the compression
binaries. One way to do it is to do compile-time setting of the
paths (via configure.in), but that doesn't seem like a great thing
to do. My preferred solution is not to detect at all;
when we go to run the commands that need them, if they
aren't available, or aren't available in one of the standard paths,
then we'll fail. That's also the solution implemented in this patch.
In the future, we'll have a more robust (managed) save/restore API,
at which time we can expose this functionality properly in the API.
V2: get rid of redundant dd command and just use >> to append data.
V3: Add back the missing pieces for the enum and bumping the save version.
V4: Make the compressed field in the save_header an int.
Implement LZMA compression.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Mark McLoughlin [Fri, 14 Aug 2009 07:31:11 +0000 (08:31 +0100)]
Allow pciResetDevice() to reset multiple devices
When using a Secondary Bus Reset, all devices on the bus are reset.
Extend the pciResetDevice() API so that a 'check' callback can be
supplied which will verify that it is safe to reset the other devices
on the bus.
The virDomainObjPtr parameter is needed so that when the check function
iterates over the domain list, it can avoid double locking.
* src/pci.[ch]: add a 'check' callback to pciResetDevice(), re-work
pciIterDevices() to pass the check function to the iter functions,
use the check function in the bus iterator, return the first unsafe
device from pciBusCheckOtherDevices() and include its details in
the bus reset error message.
* src/qemu_driver.c, src/xen_uninified.c: just pass NULL as the
check function for now
Mark McLoughlin [Fri, 14 Aug 2009 07:31:11 +0000 (08:31 +0100)]
Improve PCI host device reset error message
Currently, if we are unable to reset a PCI device we return a fairly
generic 'No PCI reset capability available' error message.
Fix that by returning an error from the individual reset messages and
using that error to construct the higher level error mesage.
* src/pci.c: set errors in pciTryPowerManagementReset() and
pciTrySecondaryBusReset() on failure; use those error messages
in pciResetDevice(), or explain that no reset support is available
Mark McLoughlin [Fri, 14 Aug 2009 07:31:11 +0000 (08:31 +0100)]
Reset and re-attach PCI host devices on guest shutdown
When the guest shuts down, we should attempt to restore all PCI host
devices to a sane state.
In the case of managed hostdevs, we should reset and re-attach the
devices. In the case of unmanaged hostdevs, we should just reset them.
Note, KVM will already reset assigned devices when the guest shuts
down using whatever means it can, so we are only doing it to cover the
cases the kernel can't handle.
* src/qemu_driver.c: add qemuDomainReAttachHostDevices() and call
it from qemudShutdownVMDaemon()
Mark McLoughlin [Fri, 14 Aug 2009 07:31:11 +0000 (08:31 +0100)]
Allow PM reset on multi-function PCI devices
It turns out that a PCI Power Management reset only affects individual
functions, and not the whole device.
The PCI Power Management spec talks about resetting the 'device' rather
than the 'function', but Intel's Dexuan Cui informs me that it is
actually a per-function reset.
Also, Yu Zhao has added pci_pm_reset() to the kernel, and it doesn't
reject multi-function devices, so it must be true! :-)
(A side issue is that we could defer the PM reset to the kernel if we
could detect that the kernel has PM reset support, but barring version
number checks we don't have a way to detect that support)
* src/pci.c: remove the pciDeviceContainsOtherFunctions() check from
pciTryPowerManagementReset() and prefer PM reset over bus reset
where both are available
Mark McLoughlin [Fri, 14 Aug 2009 07:31:10 +0000 (08:31 +0100)]
Re-factor hostdev hotplug
Re-factor the hostdev hotplug code so that we can easily add PCI
hostdev hotplug to qemudDomainAttachHostDevice().
* src/qemu_driver.c: rename qemudDomainAttachHostDevice() to
qemudDomainAttachHostUsbDevice(); make qemudDomainAttachHostDevice()
handle all hostdev types
* src/libvirt_private.syms: export a couple of hostdev related
ToString() functions
Make LXC / UML drivers robust against NUMA topology brokenness
Some kernel versions expose broken NUMA topology for some machines.
This causes the LXC/UML drivers to fail to start. QEMU driver was
already fixed for this problem
* src/lxc_conf.c: Log and ignore failure to populate NUMA info
* src/uml_conf.c: Log and ignore failure to populate NUMA info
* src/capabilities.c: Reset nnumaCell to 0 after freeing
A couple of minor fixes to phyp escape_specialcharacters. Make it
a static function (since it's only used in phyp/phyp_driver.c), and
make it take a dstlen parameter. This paves the way for removing
strncpy in the future.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Minor fix to openvzGetVPSUUID to make it take a length parameter.
This ensures that it doesn't make assumptions about the length
of the UUID buffer, and paves the way for removal of strncpy in
the future.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
As of qemu 0.10.6, qemu now honors the -S flag on incoming migration.
That means that when the migration completes, we have to issue a
'cont' command to get the VM running again. We do it unconditionally
since it won't hurt on older qemu.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Chris Lalancette [Thu, 30 Jul 2009 14:52:02 +0000 (16:52 +0200)]
Split virDomainMigrate into functions.
Re-factor virDomainMigrate to split out the version 1 and version 2
protocols into their own functions. In reality, the two versions share
very little in common, so forcing them together in the same function was
just confusing. This will also make adding tunnelled migration easier.
Signed-off-by: Chris Lalancette <clalance@redhat.com>