Laine Stump [Sat, 20 Oct 2012 08:39:18 +0000 (04:39 -0400)]
network: don't allow multiple default portgroups
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=868483
virNetworkUpdate, virNetworkDefine, and virNetworkCreate all three
allow network definitions to contain multiple <portgroup> elements
with default='yes'. Only a single default portgroup should be allowed
for each network.
This patch updates networkValidate() (called by both
virNetworkCreate() and virNetworkDefine()) and
virNetworkDefUpdatePortGroup (called by virNetworkUpdate() to not
allow multiple default portgroups.
Previously, the dnsmasq hosts file (used for static dhcp entries, and
addnhosts file (used for additional dns host entries) were only
created/referenced on the dnsmasq commandline if there was something
to put in them at the time the network was started. Once we can update
a network definition while it's active (which is now possible with
virNetworkUpdate), this is no longer a valid strategy - if there were
0 dhcp static hosts (resulting in no reference to the hosts file on the
commandline), then one was later added, the commandline wouldn't have
linked dnsmasq up to the file, so even though we create it, dnsmasq
doesn't pay any attention.
The solution is to just always create these files and reference them
on the dnsmasq commandline (almost always, anyway). That way dnsmasq
can notice when a new entry is added at runtime (a SIGHUP is sent to
dnsmasq by virNetworkUdpate whenever a host entry is added or removed)
The exception to this is that the dhcp static hosts file isn't created
if there are no lease ranges *and* no static hosts. This is because in
this case dnsmasq won't be setup to listen for dhcp requests anyway -
in that case, if the count of dhcp hosts goes from 0 to 1, dnsmasq
will need to be restarted anyway (to get it listening on the dhcp
port). Likewise, if the dhcp hosts count goes from 1 to 0 (and there
are no dhcp ranges) we need to restart dnsmasq so that it will stop
listening on port 67. These special situations are handled in the
bridge driver's networkUpdate() by checking for ((bool)
nranges||nhosts) both before and after the update, and triggering a
dnsmasq restart if the before and after don't match.
pointed out a crash due to virNetworkObjAssignDef free'ing
network->newDef without NULLing it afterward. A fix for this is in
upstream commit b7e9202401ebaa039b8f05acdefda8c24081537a. While the
NULLing of newDef was a legitimate fix, newDef should have already
been empty (NULL) anyway (as indicated in the comment that was deleted
by that commit).
The reason that newDef had a non-NULL value (i.e. the root cause) was
that networkStartNetwork() had failed after populating
network->newDef, but then neglected to free/NULL newDef in the
cleanup.
(A bit of background here: network->newDef should contain the
persistent config of a network when a network is active (and of course
only when it is persisten), and NULL at all other times. There is also
a network->def which should contain the persistent definition of the
network when it is inactive, and the current live state at all other
times. The idea is that you can make changes to network->newDef which
will take effect the next time the network is restarted, but won't
mess with the current state of the network (virDomainObj has a similar
pair of virDomainDefs that behave in the same fashion). Personally I
think there should be a network->live and network->config, and the
location of the persistent config should *always* be in
network->config, but that's for a later cleanup).
Since I love things to be symmetric, I created a new function called
virNetworkObjUnsetDefTransient(), which reverses the effects of
virNetworkObjSetDefTransient(). I don't really like the name of the
new function, but then I also didn't really like the name of the old
one either (it's just named that way to match a similar function in
the domain conf code).
Eric Blake [Sat, 20 Oct 2012 03:13:37 +0000 (21:13 -0600)]
blockjob: avoid segv on early error
Gcc with optimization warns:
../../src/qemu/qemu_driver.c: In function 'qemuDomainBlockCommit':
../../src/qemu/qemu_driver.c:12813:46: error: 'disk' may be used uninitialized in this function [-Werror=maybe-uninitialized]
../../src/qemu/qemu_driver.c:12698:25: note: 'disk' was declared here
cc1: all warnings being treated as errors
so obviously I had only been testing with optimization off.
Eric Blake [Tue, 16 Oct 2012 14:46:41 +0000 (08:46 -0600)]
blockjob: properly label disks for qemu block-commit
I finally have all the pieces in place to perform a block-commit with
SELinux enforcing. There's still missing cleanup work when the commit
completes, but doing that requires tracking both the backing chain and
the base and top files within that chain in domain XML across libvirtd
restarts. Furthermore, from a security standpoint, once you have
granted access, you must assume any damage that can be done will be
done; later revoking access is nice to minimize the window of damage,
but less important as it does not affect the fact that damage can be
done in the first place. Therefore, deferring the revoke efforts until
we have better XML tracking of what chain operations are in effect,
including across a libvirtd restart, is reasonable.
Eric Blake [Wed, 17 Oct 2012 21:48:14 +0000 (15:48 -0600)]
blockjob: refactor qemu disk chain permission grants
Previously, snapshot code did its own permission granting (lock
manager, cgroup device controller, and security manager labeling)
inline. But now that we are adding block-commit and block-copy
which also have to change permissions, it's better to reuse
common code for the task. While snapshot should fall back to
no access if read-write access failed, block-commit will want to
fall back to read-only access. The common code doesn't know
whether failure to grant read-write access should revert to no
access (snapshot, block-copy) or read-only access (block-commit).
This code can also be used to revoke access to unused files after
block-pull.
It might be nice to clean things up in a future patch by adding
new functions to the lock manager, cgroup manager, and security
manager that takes a single file name and applies context of a
disk to that file, rather than the current semantics of applying
context to the entire chain already associated to a disk. That
way, we could avoid the games this patch plays of temporarily
swapping out the disk->src and related fields of the disk. But
that would involve more code changes, so this patch really is
the smallest hack for doing the necessary work; besides, this
patch is more or less code motion (the hack was already employed
by the snapshot creation code, we are just making it reusable).
Eric Blake [Sat, 13 Oct 2012 13:12:31 +0000 (07:12 -0600)]
blockjob: implement shallow commit flag in qemu
Now that we can crawl the chain of backing files, we can do
argument validation and implement the 'shallow' flag. In
testing this, I discovered that it can be handy to pass the
shallow flag and an explicit base, as a means of validating
that the base is indeed the file we expected.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Crawl through
chain to implement shallow flag.
* src/libvirt.c (virDomainBlockCommit): Relax API.
Eric Blake [Wed, 3 Oct 2012 18:54:09 +0000 (12:54 -0600)]
blockjob: wire up online qemu block-commit
This is the bare minimum to kick off a block commit. In particular,
flags support is missing (shallow requires us to crawl the backing
chain to determine the file name to pass to the qemu monitor command;
delete requires us to track what needs to be deleted at the time
the completion event fires). Also, we are relying on qemu to do
error checking (such as validating 'top' and 'base' as being members
of the backing chain), including the fact that the current qemu code
does not support committing the active layer (although it is still
planned to add that before qemu 1.3). Since the active layer won't
change, we have it easy and do not have to alter the domain XML.
Additionally, this will fail if SELinux is enforcing, because we fail
to grant qemu proper read/write access to the files it will modify.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): New function.
(qemuDriver): Register it.
Eric Blake [Tue, 9 Oct 2012 22:08:14 +0000 (16:08 -0600)]
storage: use cache to walk backing chain
We used to walk the backing file chain at least twice per disk,
once to set up cgroup device whitelisting, and once to set up
security labeling. Rather than walk the chain every iteration,
which possibly includes calls to fork() in order to open root-squashed
NFS files, we can exploit the cache of the previous patch.
* src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter
signature.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller
to supply backing chain via disk, if recursion is desired.
* src/security/security_dac.c
(virSecurityDACSetSecurityImageLabel): Adjust caller.
* src/security/security_selinux.c
(virSecuritySELinuxSetSecurityImageLabel): Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
* src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
(qemuTeardownDiskCgroup): Likewise.
(qemuSetupCgroup): Pre-populate chain.
Eric Blake [Tue, 9 Oct 2012 22:08:14 +0000 (16:08 -0600)]
storage: cache backing chain while qemu domain is live
Technically, we should not be re-probing any file that qemu might
be currently writing to. As such, we should cache the backing
file chain prior to starting qemu. This patch adds the cache,
but does not use it until the next patch.
Ultimately, we want to also store the chain in domain XML, so that
it is remembered across libvirtd restarts, and so that the only
kosher way to modify the backing chain of an offline domain will be
through libvirt API calls, but we aren't there yet. So for now, we
merely invalidate the cache any time we do a live operation that
alters the chain (block-pull, block-commit, external disk snapshot),
as well as tear down the cache when the domain is not running.
* src/conf/domain_conf.h (_virDomainDiskDef): New field.
* src/conf/domain_conf.c (virDomainDiskDefFree): Clean new field.
* src/qemu/qemu_domain.h (qemuDomainDetermineDiskChain): New
prototype.
* src/qemu/qemu_domain.c (qemuDomainDetermineDiskChain): New
function.
* src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive)
(qemuDomainChangeDiskMediaLive): Pre-populate chain.
(qemuDomainSnapshotCreateSingleDiskActive): Uncache chain before
snapshot.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Update
chain after block pull.
Eric Blake [Fri, 12 Oct 2012 22:29:14 +0000 (16:29 -0600)]
storage: make it easier to find file within chain
In order to temporarily label files read/write during a commit
operation, we need to crawl the backing chain and find the absolute
file name that needs labeling in the first place, as well as the
name of the file that owns the backing file.
* src/util/storage_file.c (virStorageFileChainLookup): New
function.
* src/util/storage_file.h: Declare it.
* src/libvirt_private.syms (storage_file.h): Export it.
Eric Blake [Tue, 9 Oct 2012 23:47:42 +0000 (17:47 -0600)]
storage: remember relative names in backing chain
In order to search for a backing file name as literally present
in a chain, we need to remember if the chain had relative names.
Also, searching for absolute names is easier if we only have
to canonicalize once, rather than on every iteration.
* src/util/storage_file.h (_virStorageFileMetadata): Add field.
* src/util/storage_file.c (virStorageFileGetMetadataFromBuf):
(virStorageFileFreeMetadata): Manage it
(absolutePathFromBaseFile): Store absolute names in canonical form.
Eric Blake [Sat, 13 Oct 2012 17:01:27 +0000 (11:01 -0600)]
storage: don't require caller to pre-allocate metadata struct
Requiring pre-allocation was an unusual idiom. It allowed iteration
over the backing chain to use fewer mallocs, but made one-shot
clients harder to read. Also, this makes it easier for a future
patch to move away from opening fds on every iteration over the chain.
Eric Blake [Sat, 13 Oct 2012 16:47:15 +0000 (10:47 -0600)]
storage: get entire metadata chain in one call
Previously, no one was using virStorageFileGetMetadata, and for good
reason - it couldn't support root-squash NFS. Change the signature
and make it useful to future patches, including enhancing the metadata
to recursively track the entire chain.
* src/util/storage_file.h (_virStorageFileMetadata): Add field.
(virStorageFileGetMetadata): Alter signature.
* src/util/storage_file.c (virStorageFileGetMetadata): Rewrite.
(virStorageFileGetMetadataRecurse): New function.
(virStorageFileFreeMetadata): Handle recursion.
Eric Blake [Tue, 2 Oct 2012 16:57:06 +0000 (10:57 -0600)]
storage: use enum for snapshot driver type
This is the last use of raw strings for disk formats throughout
the src/conf directory.
* src/conf/snapshot_conf.h (_virDomainSnapshotDiskDef): Store enum
rather than string for disk type.
* src/conf/snapshot_conf.c (virDomainSnapshotDiskDefClear)
(virDomainSnapshotDiskDefParseXML, virDomainSnapshotDefFormat):
Adjust users.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateSingleDiskActive): Likewise.
Eric Blake [Fri, 28 Sep 2012 21:06:33 +0000 (15:06 -0600)]
storage: use enum for default driver type
Express the default disk type as an enum, for easier handling.
* src/conf/capabilities.h (_virCaps): Store enum rather than
string for disk type.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): Adjust
clients.
* src/qemu/qemu_driver.c (qemuCreateCapabilities): Likewise.
Eric Blake [Fri, 28 Sep 2012 18:57:54 +0000 (12:57 -0600)]
storage: match RNG to supported driver types
At one point, the code passed through arbitrary strings for file
formats, which supposedly lets qemu handle a new file type even
before libvirt has been taught to handle it. However, to properly
label files, libvirt has to learn the file type anyway, so we
might as well make our life easier by only accepting file types
that we are prepared to handle. This patch lets the RNG validation
ensure that only known strings are let through.
* docs/schemas/domaincommon.rng (driverFormat): Limit to list of
supported strings.
* docs/schemas/domainsnapshot.rng (driver): Likewise.
Eric Blake [Fri, 28 Sep 2012 23:48:58 +0000 (17:48 -0600)]
storage: treat 'aio' like 'raw' at parse time
We have historically allowed 'aio' as a synonym for 'raw' for
back-compat to xen, but since a future patch will move to using
an enum value, we have to pick one to be our preferred output
name. This is a slight change in the output XML, but the sexpr
and xm outputs should still be identical, and the input XML can
still use either form.
Eric Blake [Fri, 28 Sep 2012 17:11:07 +0000 (11:11 -0600)]
storage: list more file types
When an image has no backing file, using VIR_STORAGE_FILE_AUTO
for its type is a bit confusing. Additionally, a future patch
would like to reserve a default value for the case of no file
type specified in the XML, but different from the current use
of -1 to imply probing, since probing is not always safe.
Also, a couple of file types were missing compared to supported
code: libxl supports 'vhd', and qemu supports 'fat' for directories
passed through as a file system.
* src/util/storage_file.h (virStorageFileFormat): Add
VIR_STORAGE_FILE_NONE, VIR_STORAGE_FILE_FAT, VIR_STORAGE_FILE_VHD.
* src/util/storage_file.c (virStorageFileMatchesVersion): Match
documentation when version probing not supported.
(cowGetBackingStore, qcowXGetBackingStore, qcow1GetBackingStore)
(qcow2GetBackingStoreFormat, qedGetBackingStore)
(virStorageFileGetMetadataFromBuf)
(virStorageFileGetMetadataFromFD): Take NONE into account.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/conf/storage_conf.c (virStorageVolumeFormatFromString): New
function.
(poolTypeInfo): Use it.
Guannan Ren [Fri, 19 Oct 2012 08:44:30 +0000 (16:44 +0800)]
selinux: relabel tapfd in qemuPhysIfaceConnect
Relabeling tapfd right after the tap device is created.
qemuPhysIfaceConnect is common function called both for static
netdevs and for hotplug netdevs.
Jiri Denemark [Fri, 19 Oct 2012 13:08:29 +0000 (15:08 +0200)]
qemu: Do not require hostuuid in migration cookie
Having hostuuid in migration cookie is a nice bonus since it provides an
easy way of detecting migration to the same host. However, requiring it
breaks backward compatibility with older libvirt releases.
Jiri Denemark [Fri, 19 Oct 2012 11:36:33 +0000 (13:36 +0200)]
qemu: Allow migration with host USB devices
Recently, patches were added support for (managed)saving, restoring, and
migrating domains with host USB devices. However, qemu driver would
still forbid migration of such domains because qemuMigrationIsAllowed
was not updated.
Guido Günther [Thu, 18 Oct 2012 23:14:15 +0000 (01:14 +0200)]
qemu: Set arch to i686 if qemu-system-i386 is found
If we can't probe the architecture from QMP we parse the architecture
from the qemu binaries name. This results in the architecture being i386
instead of i686 which then results in QEMU_CAPS_PCI_MULTIBUS being unset
which gives a broken qemu command line.
This probably didn't show up earlier since most of the time there's also
a /usr/bin/qemu around which results in i686 capabilities.
Michal Privoznik [Thu, 18 Oct 2012 14:28:35 +0000 (16:28 +0200)]
network: Set to NULL after virNetworkDefFree()
which frees all allocated memory but doesn't set the passed pointer to
NULL. Therefore, we must do it ourselves. This is causing actual
libvirtd crash: Basically, when doing 'virsh net-edit' the newDef should
be dropped. And the memory is freed, indeed. However, the pointer is
not set to NULL but kept instead. And the next duo of calls 'virsh
net-start' and 'virsh net-destroy' starts the disaster. The latter one
does the same as 'virsh destroy'; it sees that newDef is nonNULL so it
replaces def with newDef (which has been freed already as said a few
lines above). Therefore any subsequent call accessing def will hit the ground.
Jiri Denemark [Tue, 16 Oct 2012 19:11:29 +0000 (21:11 +0200)]
qemu: Always format CPU topology
When libvirt cannot find a suitable CPU model for host CPU (easily
reproducible by running libvirt in a guest), it would not provide CPU
topology in capabilities XML either. Even though CPU topology is known
and can be queried by virNodeGetInfo. With this patch, CPU topology will
always be provided in capabilities XML regardless on the presence of CPU
model.
Jiri Denemark [Thu, 18 Oct 2012 11:03:26 +0000 (13:03 +0200)]
spec: Fix dependency for lock-sanlock subpackage
This should not make a big difference in real world since libvirt-daemon,
which is already required by libvirt-lock-sanlock, requires
libvirt-client and thus libvirt-lock-sanlock gets this dependency
transitively. However, since libvirt-lock-sanlock contains
sanlock_helper binary linked to libvirt.so, we should start requiring
libvirt-client directly.
Michal Privoznik [Thu, 18 Oct 2012 07:38:41 +0000 (09:38 +0200)]
qemu: Correctly wait for spice to migrate
Currently we query-spice after the main migration has completed
before moving to next state. Qemu reports this as boolean (not
enclosed within quotes). Therefore it is not correct to use
virJSONValueObjectGetString but virJSONValueObjectGetBoolean instead.
qemu: Pin the emulator when only cpuset is specified
According to our recent changes (clarifications), we should be pinning
qemu's emulator processes using the <vcpu> 'cpuset' attribute in case
there is no <emulatorpin> specified. This however doesn't work
entirely as expected and this patch should resolve all the remaining
issues.
Jiri Denemark [Wed, 17 Oct 2012 12:08:17 +0000 (14:08 +0200)]
qemu: Clear async job when p2p migration fails early
When p2p migration fails early because qemuMigrationIsAllowed or
qemuMigrationIsSafe say migration should be cancelled, we fail to clear
the migration-out async job. As a result of that, further APIs called
for the same domain may fail with Timed out during operation: cannot
acquire state change lock.
Doug Goldstein [Sun, 14 Oct 2012 22:02:33 +0000 (17:02 -0500)]
interface: add virInterfaceGetXMLDesc() in udev
Added support for retrieving the XML defining a specific interface via
the udev based backend to virInterface. Implement the following APIs
for the udev based backend:
* virInterfaceGetXMLDesc()
Guannan Ren [Wed, 17 Oct 2012 03:23:19 +0000 (11:23 +0800)]
selinux: fix wrong tapfd relablling
It should relabel tapfd of virtual network of type VIR_DOMAIN_NET_TYPE_DIRECT
rather than VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE
(commit ae368ebfcc4923d0b32e83d4ca96a6f599625785 introduced this bug)
Caution: The context of the two hunks is identical other than indentation.
Please be extremely cautious of where the patch gets applied.
Jiri Denemark [Tue, 16 Oct 2012 21:54:18 +0000 (23:54 +0200)]
spec: Require newer sanlock on recent distros 2
The previous commit was incomplete. We need to also add explicit
Requires for the newer version since RPM's automatic dependencies won't
work with sanlock.
Peter Krempa [Tue, 16 Oct 2012 12:34:35 +0000 (14:34 +0200)]
spec: Add runtime requirement for libssh2
libssh2 unfortunately doesn't support symbol versioning so RPM can't
figure out what version is needed for the currently installed libvirt
package. This patch adds a runtime requirement, so that the correct
version of libssh2 can be installed along with libvirt.
Jiri Denemark [Tue, 16 Oct 2012 10:41:20 +0000 (12:41 +0200)]
locking: Fix build with sanlock < 2.4
libvirt started using sanlock_killpath to implement on_lockfailure
action. Since sanlock_killpath was introduced in sanlock 2.4, libvirt
fails to build with older sanlock.
Currently there is a restriction that multi-threaded applications
must manually call virInitialize, before threads start using
libvirt, because it is not thread-safe. By switching it to use
a virOnceControl initializer we gain thread safety, and thus
applications no longer need to manually call it. They can rely
on virConnectOpen invoking it for them.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Win32 platforms don't have SIGKILL defined, but they do have
SIGABRT. Since our virProcess wrapper treats anything which
isn't SIGTERM/SIGINT as equivalent to SIGKILL, just use
SIGABRT on Win32.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add JSON serialization of virNetServerPtr objects for process re-exec()
Add two new APIs virNetServerNewPostExecRestart and
virNetServerPreExecRestart which allow a virNetServerPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
This includes serialization of all registered services
and clients
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add JSON serialization of virNetServerClientPtr objects for process re-exec()
Add two new APIs virNetServerClientNewPostExecRestart and
virNetServerClientPreExecRestart which allow a virNetServerClientPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
This includes serialization of the connected socket associated
with the client
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add JSON serialization of virNetServerServicePtr objects for process re-exec()
Add two new APIs virNetServerServiceNewPostExecRestart and
virNetServerServicePreExecRestart which allow a virNetServerServicePtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
This includes serialization of the listening sockets associated
with the service
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add JSON serialization of virNetSocketPtr objects for process re-exec()
Add two new APIs virNetSocketNewPostExecRestart and
virNetSocketPreExecRestart which allow a virNetSocketPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()
Since it is not possible to serialize SASL or TLS encryption
state, an error will be raised if attempting to perform
serialization on non-raw sockets
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add JSON serialization of virLockSpacePtr objects for process re-exec()
Add two new APIs virLockSpaceNewPostExecRestart and
virLockSpacePreExecRestart which allow a virLockSpacePtr
object to be created from a JSON object and saved to a
JSON object, for the purposes of re-exec'ing a process.
As well as saving the state in JSON format, the second
method will disable the O_CLOEXEC flag so that the open
file descriptors are preserved across the process re-exec()
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce an internal API for handling file based lockspaces
The previously introduced virFile{Lock,Unlock} APIs provide a
way to acquire/release fcntl() locks on individual files. For
unknown reason though, the POSIX spec says that fcntl() locks
are released when *any* file handle referring to the same path
is closed. In the following sequence
you'd expect threadA to come out holding a lock on 'foo', and
indeed it does hold a lock for a very short time. Unfortunately
when threadB does close(fd2) this releases the lock associated
with fd1. For the current libvirt use case for virFileLock -
pidfiles - this doesn't matter since the lock is acquired
at startup while single threaded an never released until
exit.
To provide a more generally useful API though, it is necessary
to introduce a slightly higher level abstraction, which is to
be referred to as a "lockspace". This is to be provided by
a virLockSpacePtr object in src/util/virlockspace.{c,h}. The
core idea is that the lockspace keeps track of what files are
already open+locked. This means that when a 2nd thread comes
along and tries to acquire a lock, it doesn't end up opening
and closing a new FD. The lockspace just checks the current
list of held locks and immediately returns VIR_ERR_RESOURCE_BUSY.
NB, the API as it stands is designed on the basis that the
files being locked are not being otherwise opened and used
by the application code. One approach to using this API is to
acquire locks based on a hash of the filepath.
eg to lock /var/lib/libvirt/images/foo.img the application
might do
This is only safe to do though if no other part of the process
will be opening the files. This will be the case when this
code is used inside the soon-to-be-reposted virlockd daemon
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There was a crash possible when both <boot dev... and <boot
order... were specified due to virDomainDefParseBootXML() erroring out
before setting *tmp (which was free'd in cleanup). As a fix, I
created this cleanup that uses one pointer for all the temporary
stored XPath strings and values, plus this pointer is correctly
initialized to NULL.
Guannan Ren [Mon, 15 Oct 2012 09:03:49 +0000 (17:03 +0800)]
selinux: add security selinux function to label tapfd
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=851981
When using macvtap, a character device gets first created by
kernel with name /dev/tapN, its selinux context is:
system_u:object_r:device_t:s0
Shortly, when udev gets notification when new file is created
in /dev, it will then jump in and relabel this file back to the
expected default context:
system_u:object_r:tun_tap_device_t:s0
There is a time gap happened.
Sometimes, it will have migration failed, AVC error message:
type=AVC msg=audit(1349858424.233:42507): avc: denied { read write } for
pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524
scontext=unconfined_u:system_r:svirt_t:s0:c598,c908
tcontext=system_u:object_r:device_t:s0 tclass=chr_file
This patch will label the tapfd device before qemu process starts:
system_u:object_r:tun_tap_device_t:MCS(MCS from seclabel->label)
This patch adds support for SUSPEND_DISK event; both lifecycle and
separated. The support is added for QEMU, machines are changed to
PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes
to shut-off. This and much more needs to be done in order for libvirt
to work with transient devices, wake-ups etc. This patch is not
aiming for that functionality.
Ján Tomko [Mon, 15 Oct 2012 07:14:26 +0000 (09:14 +0200)]
util: switch virLogEatParams to virLogSource
Commit e8fd8757c89abbd38571092bbb987650b7658aec changed 'const char *'
category to virLogSource enum. This changes it in virLogEatParams as
well, thus fixing the build with --disable-debug.
--
Hopefully moving the enum declarations is less ugly than using int.
Osier Yang [Fri, 12 Oct 2012 08:25:42 +0000 (16:25 +0800)]
node_memory: Add new parameter field to tune the new sysfs knob
Upstream kernel introduced new sysfs knob "merge_across_nodes" to
specify if pages from different numa nodes can be merged. When set
to 0, only pages which physically reside in the memory area of
same NUMA node can be merged. When set to 1, pages from all nodes
can be merged.
This patch supports the tuning by adding new param field
"shm_merge_across_nodes".
to the extent that it can be resolved with current qemu functionality.
It attempts to detect as many situations as possible when the simple
operation of disconnecting an existing tap device from one bridge and
attaching it to another will satisfy the change requested in
virDomainUpdateDeviceFlags() for a network device. Before this patch,
that situation could only be detected if the pre-change interface
*and* the post-change interface definition were both "type='bridge'".
After this patch, it can also be detected if the before or after
interfaces are any combination of type='bridge' and type='network'
(the networks can be <forward mode='nat|route|bridge'>, as long as
they use a Linux host bridge and not macvtap connections).
This extra effort is especially useful since the recent discovery that
a netdev_del+netdev_add combo (to reconnect the network device with
completely different hostside configuration) doesn't work properly
with current qemu (1.2) unless it is accompanied by the matching
device_del+device_add - see this mailing list message for details:
(A slight modification of the patch referenced there has been prepared
to apply on top of this patch, but won't be pushed until qemu can be
made to work with it.)
* qemuDomainChangeNet needs access to the virDomainDeviceDef that
holds the new netdef (so that it can clear out the virDomainDeviceDef
if it ends up using the NetDef to replace the original), so the
virDomainNetDefPtr arg is replaced with a virDomainDeviceDefPtr.
* qemuDomainChangeNet previously checked for *some* changes to the
interface config, but this check was by no means complete. It was also
a bit disorganized.
This refactoring of the code is (I believe) complete in its check of
all NetDef attributes that might be changed, and either returns a
failure (for changes that are simply impossible), or sets one of three
flags:
needLinkStateChange - if the device link state needs to go up/down
needBridgeChange - if everything else is the same, but it needs
to be connected to a difference linux host
bridge
needReconnect - if the entire host side of the device needs
to be torn down and reconstructed (currently
non-working, as mentioned above)
Note that this function will refuse to make any change that requires
the *guest* side of the device to be detached (e.g. changing the PCI
address or mac address). Those would be disruptive enough to the guest
that it's reasonable to require an explicit detach/attach sequence
from the management application.
* As mentioned above, qemuDomainChangeNet also does its best to
understand when a simple change in attached bridge for the existing
tap device will work vs. the need to completely tear down/reconstruct
the host side of the device (including tap device).
This patch *does not* implement the "reconnect" code anyway - there is
a placeholder that turns that into an error. Rather, the purpose of
this patch is to replicate existing behavior with code that is ready
to have that functionality plugged in in a later patch.
* The expanded uses for qemuDomainChangeNetBridge meant that it needed
to be enhanced as well - it no longer replaces the original brname
string in olddev with the new brname; instead, it relies on the
caller to replace the *entire* olddev with newdev (since we've gone
to great lengths to assure they are functionally identical other
than the name of the bridge, this is now not only safe, but more
correct). Additionally, qemuDomainNetChangeBridge can now set the
bridge for type='network' interfaces as well as plain type='bridge'
interfaces. (Note that I had to make this change simultaneous to the
reorganization of qemuDomainChangeNet because the two are too
closely intertwined to separate).
Laine Stump [Thu, 11 Oct 2012 16:34:14 +0000 (12:34 -0400)]
conf: fix virDevicePCIAddressEqual args
This function really should have been taking virDevicePCIAddress*
instead of the inefficient virDevicePCIAddress (results in copying two
entire structs onto the stack rather than just two pointers), and
returning a bool true/false (not matching is not necessarily a
"failure", as a -1 return would imply, and also using "if
(!virDevicePCIAddressEqual(x, y))" to mean "if x == y" is just a bit
counterintuitive).
Osier Yang [Fri, 12 Oct 2012 14:44:35 +0000 (22:44 +0800)]
conf: Ignore emulatorpin if vcpu placement is auto
When vcpu placement is "auto", the domain process will be pinned
to advisory nodeset from querying numad, While emulatorpin will
override the pinning. That means both of them are to set the
pinning policy for domain process, but conflicts with each other.
This patch ingore emulatorpin if vcpu placement is "auto", because
<vcpu> placement can't be simply ignored for <numatune> placement
could default to it.
Osier Yang [Fri, 12 Oct 2012 09:50:47 +0000 (17:50 +0800)]
qemu: Initialize cpuset for hotplugged vcpu as def->cpuset
The onlined vcpu pinning policy should inherit def->cpuset if
it's not specified explicitly, and the affinity should be set
in this case. Oppositely, the offlined vcpu pinning policy should
be free()'ed.
Osier Yang [Fri, 12 Oct 2012 09:50:46 +0000 (17:50 +0800)]
qemu: Create or remove cgroup when doing vcpu hotpluging
Various APIs use cgroup to either set or get the statistics of
host or guest. Hotplug or hot unplug new vcpus without creating
or removing the cgroup for the vcpus could cause problems for
those APIs. E.g.
% virsh vcpucount dom
maximum config 10
maximum live 10
current config 1
current live 1
% virsh setvcpu dom 2
% virsh schedinfo dom --set vcpu_quota=1000
Scheduler : posix
error: Unable to find vcpu cgroup for rhel6.2(vcpu: 1): No such file or
directory
This patch fixes the problem by creating cgroups for each of the
onlined vcpus, and destroying cgroups for each of the offlined
vcpus.
Osier Yang [Fri, 12 Oct 2012 09:50:45 +0000 (17:50 +0800)]
conf: Initialize the pinning policy for vcpus
Document for <vcpu>'s "cpuset" says:
Since 0.4.4, this element can contain an optional cpuset attribute,
which is a comma-separated list of physical CPU numbers that virtual
CPUs can be pinned to.
However, it's not the truth, libvirt actually pins the domain
process to the specified pCPUs by "cpuset" of <vcpu>. And the
vcpu thread are pinned to all available pCPUs if no <vcpupin>
is specified for it.
This patch is to implement the codes to inherit <vcpu>'s "cpuset" for
vcpu that doesn't have <vcpupin> specified, and <vcpupin>
for these vcpu will be ignored when formating. Underlying
driver implementation will make sure the vcpu thread pinned
to correct pCPUs.
% virsh start linux
error: Failed to start domain linux
error: cannot set CPU affinity on process 32534: No such process
We must have some odd codes underlying which produces the
"on process 32534", but the point is why we not to prevent
earlier when parsing? Note that this is only one of the
problem it could cause.
This patch is to ignore the <vcpupin> for not onlined vcpus.
Osier Yang [Fri, 12 Oct 2012 09:50:43 +0000 (17:50 +0800)]
doc: Sort out the relationship between <vcpu>, <vcpupin>, and <emulatorpin>
These 3 elements conflicts with each other in either the doc
or the underlying codes.
Current problems:
Problem 1:
The doc shouldn't simply say "These settings are superseded
by CPU tuning. " for element <vcpu>. As except the tuning, <vcpu>
allows to specify the current, maxmum vcpu number. Apart from that,
<vcpu> also allows to specify the placement as "auto", which binds
the domain process to the advisory nodeset from numad.
Problem 2:
Doc for <vcpu> says its "cpuset" specify the physical CPUs
that the vcpus can be pinned. But it's not the truth, as
actually it only pin domain process to the specified physical
CPUs. So either it's a document bug, or code bug.
Problem 3:
Doc for <vcpupin> says it supersed "cpuset" of <vcpu>, it's
not quite correct, as each <vcpupin> specify the pinning policy
only for one vcpu. How about the ones which doesn't have
<vcpupin> specified? it says the vcpu will be pinned to all
available physical CPUs, but what's the meaning of attribute
"cpuset" of <vcpu> then?
Problem 4:
Doc for <emulatorpin> says it pin the emulator threads (domain
process in other context, perhaps another follow up patch to
cleanup the inconsistency is needed) to the physical CPUs
specified its attribute "cpuset". Which conflicts with
<vcpu>'s "cpuset". And actually in the underlying codes,
it set the affinity for domain process twice if both
"cpuset" for <vcpu> and <emulatorpin> are specified,
and <emulatorpin>'s pinning will override <vcpu>'s.
Problem 5:
When "placement" of <vcpu> is "auto" (I.e. uses numad to
get the advisory nodeset to which the domain process is
pinned to), it will also be overridden by <emulatorpin>,
This patch is trying to sort out the conflicts or bugs by:
1) Don't say <vcpu> is superseded by <cputune>
2) Keep the semanteme for "cpuset" of <vcpu> (I.e. Still says it
specify the physical CPUs the virtual CPUs). But modifying it
to mention it also set the pinning policy for domain process,
and the CPU placement of domain process specified by "cpuset"
of <vcpu> will be ingored if <emulatorpin> specified, and
similary, the CPU placement of vcpu thread will be ignored
if it has <vcpupin> specified, for vcpu which doesn't have
<vcpupin> specified, it inherits "cpuset" of <vcpu>.
3) Don't say <vcpu> is supersed by <vcpupin>. If neither <vcpupin>
nor "cpuset" of <vcpu> is specified, the vcpu will be pinned
to all available pCPUs.
4) If neither <emulatorpin> nor "cpuset" of <vcpu> is specified,
the domain process (emulator threads in the context) will be
pinned to all available pCPUs.
5) If "placement" of <vcpu> is "auto", <emulatorpin> is not allowed.
6) hotplugged vcpus will also inherit "cpuset" of <vcpu>
Codes changes according to above document changes:
1) Inherit def->cpumask for each vcpu which doesn't have <vcpupin>
specified, during parsing.
2) ping the vcpu which doesn't have <vcpupin> specified to def->cpumask
either by cgroup for sched_setaffinity(2), which is actually done
by 1).
3) Error out if "placement" == "auto", and <emulatorpin> is specified.
Otherwise, <emulatorpin> is honored, and "cpuset" of <cpuset> is
ignored.
4) Setup cgroup for each hotplugged vcpu, and setup the pinning policy
by either cgroup or sched_setaffinity(2).
5) Remove cgroup and <vcpupin> for each hot unplugged vcpu.
Patches are following (6 in total except this patch)
Cole Robinson [Fri, 12 Oct 2012 15:56:17 +0000 (11:56 -0400)]
Tweak comments in the policykit rules file
- Add the XML header so vim gives us syntax highlighting
- polkit-policy-file-validate hasn't existed for 3 years
- Permissions comment was not accurate
Peter Krempa [Fri, 12 Oct 2012 14:50:42 +0000 (16:50 +0200)]
spec: Add support for libssh2 transport
Libssh2 transport support was enabled lately but the spec file wasn't
updated to take this into account. This caused libvirt to be built
without libssh2 support in Red Hat based OSes.
We are currently able to work only with non-translated SELinux
contexts, but we are using functions that work with translated
contexts throughout the code. This patch swaps all SELinux context
translation relative calls with their raw sisters to avoid parsing
problems.
The problems can be experienced with mcstrans for example. The
difference is that if you have translations enabled (yum install
mcstrans; service mcstrans start), fgetfilecon_raw() will get you
something like 'system_u:object_r:virt_image_t:s0', whereas
fgetfilecon() will return 'system_u:object_r:virt_image_t:SystemLow'
that we cannot parse.
I was trying to confirm that the _raw variants were here since the dawn of
time, but the only thing I see now is that it was imported together in
the upstream repo [1] from svn, so before 2008.
Jiri Denemark [Thu, 11 Oct 2012 20:33:46 +0000 (22:33 +0200)]
conf: Mark missing optional USB devices in domain XML
When startupPolicy set for a USB devices allows such device to be
missing, there was no way this could be detected from domain XML. With
this patch, libvirt emits a new missing='yes' attribute for such devices
when active domain XML is generated.
Peter Krempa [Thu, 27 Sep 2012 09:17:19 +0000 (11:17 +0200)]
qemu: Fix misleading comment for qemuDomainObjBeginJobWithDriver()
The comment stated that you may call qemuDomainObjBeginJobWithDriver
without passing qemud_driver to signal it's not locked.
qemuDomainObjBeginJobWithDriver still accesses the qemud_driver
structure and the lock singaling is done through a separate parameter.
Jiri Denemark [Tue, 9 Oct 2012 11:15:46 +0000 (13:15 +0200)]
qemu: Make save/restore with USB devices usable
Save/restore with passed through USB devices currently only works if the
USB device can be found at the same USB address where it used to be
before saving a domain. This makes sense in case a user explicitly
configure the USB address in domain XML. However, if the device was
found automatically by vendor/product identification, we should try to
search for that device when restoring the domain and use any device we
find as long as there is only one available. In other words, the USB
device can now be removed and plugged again or the host can be rebooted
between saving and restoring the domain.
Jiri Denemark [Mon, 8 Oct 2012 09:58:05 +0000 (11:58 +0200)]
Add MIGRATABLE flag for virDomainGetXMLDesc
Using VIR_DOMAIN_XML_MIGRATABLE flag, one can request domain's XML
configuration that is suitable for migration or save/restore. Such XML
may contain extra run-time stuff internal to libvirt and some default
configuration may be removed for better compatibility of the XML with
older libvirt releases.
This flag may serve as an easy way to get the XML that can be passed
(after desired modifications) to APIs that accept custom XMLs, such as
virDomainMigrate{,ToURI}2 or virDomainSaveFlags.