Eric Blake [Tue, 23 Jul 2013 23:45:21 +0000 (17:45 -0600)]
build: work around broken kernel headers
Thanks to a lack of coordination between kernel and glibc folks,
it has been impossible to mix code using <linux/in.h> and
<net/in.h> for some time now (see for example commit c308a9a).
On at least RHEL 6, <linux/if_bridge.h> tries to use the kernel
side, and fails due to our desire to use the glibc side elsewhere:
In file included from /usr/include/linux/if_bridge.h:17,
from util/virnetdevbridge.c:42:
/usr/include/linux/in6.h:31: error: redefinition of ‘struct in6_addr’
/usr/include/linux/in6.h:48: error: redefinition of ‘struct sockaddr_in6’
/usr/include/linux/in6.h:56: error: redefinition of ‘struct ipv6_mreq’
Thankfully, the kernel layout of these structs is ABI-compatible,
they only differ in the type system presented to the C compiler.
While there are other versions of kernel headers that avoid the
problem, it is easier to just work around the issue than to expect
all developers to upgrade to working kernel headers.
* src/util/virnetdevbridge.c (includes): Coerce the kernel version
of in.h to not collide with the normal version.
Eric Blake [Tue, 23 Jul 2013 23:06:26 +0000 (17:06 -0600)]
dbus: work with older dbus
dbus 1.2.24 (on RHEL 6) lacks DBUS_TYPE_UNIX_FD; but as we aren't
trying to pass one of those anyways, we can just drop support for
it in our wrapper. Solves this build error introduced in commit 834c9c94:
CC libvirt_util_la-virdbus.lo
util/virdbus.c:242: error: 'DBUS_TYPE_UNIX_FD' undeclared here (not in a function)
* src/util/virdbus.c (virDBusBasicTypes): Drop support for unix fds.
John Ferlan [Tue, 23 Jul 2013 14:06:02 +0000 (10:06 -0400)]
domain_event: Resolve memory leak found by Valgrind
Commit id '4421e257' strdup'd devAlias, but didn't free
Running qemuhotplugtest under valgrind resulted in the following:
==7375== 9 bytes in 1 blocks are definitely lost in loss record 11 of 70
==7375== at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==7375== by 0x37C1085D71: strdup (strdup.c:42)
==7375== by 0x4CBBD5F: virStrdup (virstring.c:554)
==7375== by 0x4CFF9CB: virDomainEventDeviceRemovedNew (domain_event.c:1174)
==7375== by 0x427791: qemuDomainRemoveChrDevice (qemu_hotplug.c:2508)
==7375== by 0x42C65D: qemuDomainDetachChrDevice (qemu_hotplug.c:3357)
==7375== by 0x41C94F: testQemuHotplug (qemuhotplugtest.c:115)
==7375== by 0x41D817: virtTestRun (testutils.c:168)
==7375== by 0x41C400: mymain (qemuhotplugtest.c:322)
==7375== by 0x41DF3A: virtTestMain (testutils.c:764)
==7375== by 0x37C1021A04: (below main) (libc-start.c:225)
Currently the LXC driver creates the VM's cgroup prior to
forking, and then libvirt_lxc moves the child process
into the cgroup. This won't work with systemd whose APIs
do the creation of cgroups + attachment of processes atomically.
Fortunately we simply move the entire cgroups setup into
the libvirt_lxc child process. We make it take place before
fork'ing into the background, so by the time virCommandRun
returns in the LXC driver, the cgroup is guaranteed to be
present.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Create + setup cgroups atomically for QEMU process
Currently the QEMU driver creates the VM's cgroup prior to
forking, and then uses a virCommand hook to move the child
into the cgroup. This won't work with systemd whose APIs
do the creation of cgroups + attachment of processes atomically.
Fortunately we have a handshake taking place between the
QEMU driver and the child process prior to QEMU being exec()d,
which was introduced to allow setup of disk locking. By good
fortune this synchronization point can be used to enable the
QEMU driver to do atomic setup of cgroups removing the use
of the hook script.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupNewDomainDriver and virCgroupNewDriver methods
are obsolete now that we can auto-detect existing cgroup
placement. Delete them to reduce code bloat.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Use the new virCgroupNewDetect function to determine cgroup
placement of existing running VMs. This will allow the legacy
cgroups creation APIs to be removed entirely
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add API for checking if a cgroup is valid for a domain
Add virCgroupIsValidMachine API to check whether an auto
detected cgroup is valid for a machine. This lets us
check if a VM has just been placed into some generic
shared cgroup, or worse, the root cgroup
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Fix handling of DBus errors emitted by the bus itself
Current code for handling dbus errors only works for errors
received from the remote application itself. We must also
handle errors emitted by the bus itself, for example, when
it fails to spawn the target service.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
John Ferlan [Mon, 15 Jul 2013 20:26:10 +0000 (16:26 -0400)]
storage: Add connection for autostart storage pool
Add a privileged field to storageDriverState
Use the privileged value in order to generate a connection which could
be passed to the various storage backend drivers.
In particular, the iSCSI driver will need a connect in order to perform
pool authentication using the 'chap' secrets and the RBD driver utilizes
the connection during pool refresh for pools using 'ceph' secrets.
For now that connection will be to be to qemu driver until a mechanism
is devised to get a connection to just the secret driver without qemu.
John Ferlan [Mon, 15 Jul 2013 18:44:32 +0000 (14:44 -0400)]
Adjust 'ceph' authentication secret usage for rbd pool.
Update virStorageBackendRBDOpenRADOSConn() to use the internal API to the
secret driver in order to get the secret value instead of the external
virSecretGetValue() path. Without the flag VIR_SECRET_GET_VALUE_INTERNAL_CALL
there is no way to get the value of private secret.
This also requires ensuring there is a connection which wasn't true for
for the refreshPool() path calls from storageDriverAutostart() prior to
adding support for the connection to a qemu driver. It seems calls to
virSecretLookupByUUIDString() and virSecretLookupByUsage() from the
refreshPool() path would have failed with no way to find the secret - that is
theoretically speaking since the 'conn' was NULL the failure would have been
"failed to find the secret".
John Ferlan [Mon, 15 Jul 2013 17:23:45 +0000 (13:23 -0400)]
storage: Support "chap" authentication for iscsi pool
Although the XML for CHAP authentication with plain "password"
was introduced long ago, the function was never implemented. This
patch replaces the login/password mechanism by following the
'ceph' (or RBD) model of using a 'username' with a 'secret' which
has the authentication information.
This patch performs the authentication during startPool() processing
of pools with an authType of VIR_STORAGE_POOL_AUTH_CHAP specified
for iSCSI pools.
There are two types of CHAP configurations supported for iSCSI
authentication:
* Initiator Authentication
Forward, one-way; The initiator is authenticated by the target.
* Target Authentication
Reverse, Bi-directional, mutual, two-way; The target is authenticated
by the initiator; This method also requires Initiator Authentication
This only supports the "Initiator Authentication". (I don't have any
enterprise iSCSI env for testing, only have a iSCSI target setup with
tgtd, which doesn't support "Target Authentication").
"Discovery authentication" is not supported by tgt yet too. So this only
setup the session authentication by executing 3 iscsiadm commands, E.g:
John Ferlan [Fri, 19 Jul 2013 18:38:45 +0000 (14:38 -0400)]
qemu: Add source pool auth info to virDomainDiskDef for iSCSI
During qemuTranslateDiskSourcePool() execution, if the srcpool has been
defined with authentication information, then for iSCSI pools copy the
authentication and host information to virDomainDiskDef.
Peter Krempa [Tue, 23 Jul 2013 13:35:02 +0000 (15:35 +0200)]
qemu: Take error path if acquiring of job fails in qemuDomainSaveInternal
Due to a goto statement missed when refactoring in 2771f8b74c1bf50d1fa
when acquiring of a domain job failed the error path was not taken. This
resulted into a crash afterwards as an extra reference was removed from a
domain object leading to it being freed. An attempt to list the domains
leaded to a crash of the daemon afterwards.
Commit 834c9c94 introduced virDBusMessageEncode and
virDBusMessageDecode functions, however corresponding stubs
were not added to !WITH_DBUS section, therefore 'make check'
started to fail when compiled w/out dbus support like that:
Expected symbol virDBusMessageDecode is not in ELF library
Resolves:https://bugzilla.redhat.com/show_bug.cgi?id=923053
When cdrom is block type, the virsh change-media failed to insert
source info because virsh uses "<source block='/dev/sdb'/>" while
the correct name of the attribute for block disks is "dev".
Osier Yang [Tue, 18 Jun 2013 08:36:42 +0000 (16:36 +0800)]
qemu: Translate the volume type disk source before cgroup setting
The translation must be done before both of cgroup and security
setting, otherwise since the disk source is not translated yet,
it might be skipped on cgroup and security setting.
Osier Yang [Tue, 18 Jun 2013 08:36:41 +0000 (16:36 +0800)]
conf: Ignore the volume type disk if its mode is "direct"
virDomainDiskDefForeachPath is not only used by the security
setting helpers, also used by cgroup setting helpers, so this
is to ignore the volume type disk with mode="direct" for cgroup
setting.
John Ferlan [Thu, 18 Jul 2013 11:00:19 +0000 (07:00 -0400)]
qemu: Translate the iscsi pool/volume disk source
The difference with already supported pool types (dir, fs, block)
is: there are two modes for iscsi pool (or network pools in future),
one can specify it either to use the volume target path (the path
showed up on host) with mode='host', or to use the remote URI qemu
supports (e.g. file=iscsi://example.org:6000/iqn.1992-01.com.example/1)
with mode='direct'.
For 'host' mode, it copies the volume target path into disk->src. For
'direct' mode, the corresponding info in the *one* pool source host def
is copied to disk->hosts[0].
John Ferlan [Thu, 18 Jul 2013 17:18:03 +0000 (13:18 -0400)]
conf: Introduce new XML tag "mode" for disk source
There are two ways to use a iSCSI LUN as disk source for qemu.
* The LUN's path as it shows up on host, e.g.
/dev/disk/by-path/ip-$ip:3260-iscsi-$iqn-fc18:iscsi.iscsi0-lun-1
* The libiscsi URI from the storage pool source element host attribute, e.g.
iscsi://demo.org:6000/iqn.1992-01.com.example/1
For a "volume" type disk, if the specified "pool" is of iscsi
type, we should support to use the LUN in either of above 2 ways.
That's why to introduce a new XML tag "mode" for the disk source
(libvirt should support iscsi pool with libiscsi, but it's another
new feature, which should be done later).
The "mode" can be either of "host" or "direct". Use "host" to indicate
use of the LUN with the path as it shows up on host. Use "direct" to
indicate to use it with the source pool host URI (future patches may support
to use network type libvirt storage too, e.g. Ceph)
tests: Free test at the end of GetDeviceAliases JSON test
Commit 58b147ad07c9432b53e66ca20aff74d812647c57 added a test for
qemuMonitorGetDeviceAliases but forgot to free the test object at the
end which causes all sort of weird errors and failures when new tests
are added after the GetDeviceAliases.
To register virtual machines and containers with systemd-machined,
and thus have cgroups auto-created, we need to talk over DBus.
This is somewhat tedious code, so introduce a dedicated function
to isolate the DBus call in one place.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Doing DBus method calls using libdbus.so is tedious in the
extreme. systemd developers came up with a nice high level
API for DBus method calls (sd_bus_call_method). While
systemd doesn't use libdbus.so, their API design can easily
be ported to libdbus.so.
This patch thus introduces methods virDBusCallMethod &
virDBusMessageRead, which are based on the code used for
sd_bus_call_method and sd_bus_message_read. This code in
systemd is under the LGPLv2+, so we're license compatible.
This code is probably pretty unintelligible unless you are
familiar with the DBus type system. So I added some API
docs trying to explain how to use them, as well as test
cases to validate that I didn't screw up the adaptation
from the original systemd code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
cpu: Let explicit features override model features
Until now CPU features inherited from a specified CPU model could only
be overridden with 'disable' policy. With this patch, any explicitly
specified feature always overrides the same feature inherited from a CPU
model regardless on the specified policy.
The CPU in x86-exact-force-Haswell.xml would previously be incompatible
with x86-host-SandyBridge.xml CPU even though x86-host-SandyBridge.xml
provides all features required by x86-exact-force-Haswell.xml.
A couple of places in LXC setup for filesystems did not do
a "goto cleanup" after reporting errors. While fixing this,
also add in many more debug statements to aid troubleshooting
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
qemu: Unplug devices that disappeared when libvirtd was down
In case libvirtd is asked to unplug a device but the device is actually
unplugged later when libvirtd is not running, we need to detect that and
remove such device when libvirtd starts again and reconnects to running
domains.
Eric Blake [Fri, 19 Jul 2013 15:07:19 +0000 (09:07 -0600)]
security: fix deadlock with prefork
Attempts to start a domain with both SELinux and DAC security
modules loaded will deadlock; latent problem introduced in commit fdb3bde and exposed in commit 29fe5d7. Basically, when recursing
into the security manager for other driver's prefork, we have to
undo the asymmetric lock taken at the manager level.
Reported by Jiri Denemark, with diagnosis help from Dan Berrange.
* src/security/security_stack.c (virSecurityStackPreFork): Undo
extra lock grabbed during recursion.
When debugging a failing test with many test cases, it is useful
to be able to skip most tests. Introducing a new environment
variable VIR_TEST_RANGE=N-M enables execution of only the test
cases numbered N-M inclusive, starting from 1.
For example, to skip all the cgroup tests except 2
$ VIR_TEST_RANGE=2-3 VIR_TEST_DEBUG=1 ./vircgrouptest
TEST: vircgrouptest
2) New cgroup for driver ... Unexpected found LXC cgroup: 1
libvirt: Cgroup error : Failed to create controller cpu for group: No such file or directory
FAILED
3) New cgroup for domain driver ... Cannot find LXC cgroup: 1
libvirt: Cgroup error : Failed to create controller cpu for group: No such file or directory
FAILED
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Eric Blake [Thu, 18 Jul 2013 21:47:41 +0000 (15:47 -0600)]
maint: update to latest gnulib
Upstream gnulib recently patched a bug in bootstrap, for projects
that use a different name than build-aux for a subdirectory. We
don't, but it doesn't hurt to update.
* .gnulib: Update, for bootstrap fix.
* bootstrap: Sync to upstream.
* bootstrap.conf: Match upstream bug fix.
Michal Privoznik [Fri, 19 Jul 2013 07:07:56 +0000 (09:07 +0200)]
autogen: Handle case when libvirt's submodule
Currently, in the autogen.sh script we check whether .git is an existing
directory in which case bootstrap is run. However, if libvirt is a
submodule, then the .git is just a file (with reference to the topmost
.git directory). However, our submodule routines work well. So there's
no real reason why we should prohibit users to build libvirt from
submodule.
Eric Blake [Thu, 18 Jul 2013 15:37:52 +0000 (09:37 -0600)]
maint: split long lines in Makefiles
Makefiles are another easy file to enforce line limits.
Mostly straightforward; interesting tricks worth noting:
src/Makefile.am: $(confdir) was already defined, use it in more places
tests/Makefile.am: path_add and VG required some interesting compression
Eric Blake [Fri, 12 Jul 2013 20:55:21 +0000 (14:55 -0600)]
security_dac: compute supplemental groups before fork
Commit 75c1256 states that virGetGroupList must not be called
between fork and exec, then commit ee777e99 promptly violated
that for lxc's use of virSecurityManagerSetProcessLabel. Hoist
the supplemental group detection to the time that the security
manager needs to fork. Qemu is safe, as it uses
virSecurityManagerSetChildProcessLabel which in turn uses
virCommand to determine supplemental groups.
This does not fix the fact that virSecurityManagerSetProcessLabel
calls virSecurityDACParseIds calls parseIds which eventually
calls getpwnam_r, which also violates fork/exec async-signal-safe
safety rules, but so far no one has complained of hitting
deadlock in that case.
* src/security/security_dac.c (_virSecurityDACData): Track groups
in private data.
(virSecurityDACPreFork): New function, to set them.
(virSecurityDACClose): Clean up new fields.
(virSecurityDACGetIds): Alter signature.
(virSecurityDACSetSecurityHostdevLabelHelper)
(virSecurityDACSetChardevLabel, virSecurityDACSetProcessLabel)
(virSecurityDACSetChildProcessLabel): Update callers.
Eric Blake [Wed, 17 Jul 2013 21:35:50 +0000 (15:35 -0600)]
security: framework for driver PreFork handler
A future patch wants the DAC security manager to be able to safely
get the supplemental group list for a given uid, but at the time
of a fork rather than during initialization so as to pick up on
live changes to the system's group database. This patch adds the
framework, including the possibility of a pre-fork callback
failing.
For now, any driver that implements a prefork callback must be
robust against the possibility of being part of a security stack
where a later element in the chain fails prefork. This means
that drivers cannot do any action that requires a call to postfork
for proper cleanup (no grabbing a mutex, for example). If this
is too prohibitive in the future, we would have to switch to a
transactioning sequence, where each driver has (up to) 3 callbacks:
PreForkPrepare, PreForkCommit, and PreForkAbort, to either clean
up or commit changes made during prepare.
* src/security/security_driver.h (virSecurityDriverPreFork): New
callback.
* src/security/security_manager.h (virSecurityManagerPreFork):
Change signature.
* src/security/security_manager.c (virSecurityManagerPreFork):
Optionally call into driver, and allow returning failure.
* src/security/security_stack.c (virSecurityDriverStack):
Wrap the handler for the stack driver.
* src/qemu/qemu_process.c (qemuProcessStart): Adjust caller.
Eric Blake [Wed, 17 Jul 2013 17:47:01 +0000 (11:47 -0600)]
tests: split long lines
Long lines are harder to read and harder to diff; in fact, if lines get
too long (> 1000 bytes), it starts causing issues where git send-email
refuses to send patches for the file. I've cleaned up the tests
directory in the past (see commits bd6c46f, 3b750d1), but new long
lines have been introduced in the meantime.
Why 90 instead of 80? Because there were too many tests on the fringe
edge, and I didn't want to edit that many files.
Add a syntax check to prevent future long lines.
* cfg.mk (sc_prohibit_long_lines): New rule.
* tests/qemuxml2argvdata/qemuxml2argv-*.args: Split lines of any
file with content longer than 90 columns.
* tests/storagevolxml2argvdata/*.argv: Likewise.
Some versions of kFreeBSD (like 9.0) declare link_addr in a header
but lack an implementation. This makes ./configure pass but breaks
compilation later with a
undefined reference to `link_addr'
Althought that's a bug in the OS header we can detect it easily by also
trying to link.
Osier Yang [Fri, 24 May 2013 09:08:28 +0000 (17:08 +0800)]
qemu: Set cpuset.cpus for domain process
When either "cpuset" of <vcpu> is specified, or the "placement" of
<vcpu> is "auto", only setting the cpuset.mems might cause the guest
starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is
"auto"):
1) Related XMLs
<vcpu placement='auto'>4</vcpu>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
to '0-63'
5) The advisory nodeset got from querying numad (from debug log)
2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
Nodeset returned from numad: 1
6) cpuset.mems will be set as: (from debug log)
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
to '0-7'
I.E, the domain process's memory is restricted on the first NUMA node,
however, it can use all of the CPUs, which will likely cause the domain
process to fail to start because of the kernel fails to allocate
memory with the the memory policy as "strict".
Peter Krempa [Thu, 18 Jul 2013 09:21:48 +0000 (11:21 +0200)]
caps: Add helpers to convert NUMA nodes to corresponding CPUs
These helpers use the remembered host capabilities to retrieve the cpu
map rather than query the host again. The intended usage for this
helpers is to fix automatic NUMA placement with strict memory alloc. The
code doing the prepare needs to pin the emulator process only to cpus
belonging to a subset of NUMA nodes of the host.
Add virtio-scsi to fallback models of scsi controller
When user does not specify any model for scsi controller, or worse, no
controller at all, but libvirt automatically adds scsi controller with
no model, we are not searching for virtio-scsi and thus this can fail
for example on qemu which doesn't support lsi logic adapter.
This means that when qemu on x86 doesn't support lsi53c895a and the
user adds the following to an XML without any scsi controller:
<disk ...>
...
<target dev='sda'>
</disk>
libvirt fails like this:
# virsh define asdf.xml
error: Failed to define domain from asdf.xml
error: internal error Unable to determine model for scsi controller
Michal Privoznik [Wed, 17 Jul 2013 07:20:26 +0000 (09:20 +0200)]
Remove lxcDriverLock from almost everywhere
With the majority of fields in the virLXCDriverPtr struct
now immutable or self-locking, there is no need for practically
any methods to be using the LXC driver lock. Only a handful
of helper APIs now need it.
Michal Privoznik [Wed, 17 Jul 2013 07:14:42 +0000 (09:14 +0200)]
lxc: Make activeUsbHostdevs use locks
The activeUsbHostdevs item in LXCDriver are lockable, but the lock has
to be called explicitly. Call the virObject(Un)Lock() in order to
achieve mutual exclusion once lxcDriverLock is removed.
Michal Privoznik [Mon, 15 Jul 2013 09:43:10 +0000 (11:43 +0200)]
Stop accessing driver->caps directly in LXC driver
The 'driver->caps' pointer can be changed on the fly. Accessing
it currently requires the global driver lock. Isolate this
access in a single helper, so a future patch can relax the
locking constraints.
Michal Privoznik [Tue, 16 Jul 2013 15:45:05 +0000 (17:45 +0200)]
Introduce a virLXCDriverConfigPtr object
Currently the virLXCDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virLXCDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virLXCDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.
NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
Michal Privoznik [Thu, 18 Jul 2013 10:38:02 +0000 (12:38 +0200)]
qemuhotplugtest: Resolve some memleaks
If testQemuHotplugAttach succeeds, the vm->def steals the dev pointer.
However, not the envelope, which needs to be freed. In addition,
driver.config is allocated, but never freed.
Ján Tomko [Thu, 18 Jul 2013 10:13:46 +0000 (12:13 +0200)]
virAsprintf: correctly check return value
When virAsprintf was changed from a function to a macro
reporting OOM error in dc6f2da, it was documented as returning
0 on success. This is incorrect, it returns the number of bytes
written as asprintf does.
Some of the functions were converted to use virAsprintf's return
value directly, changing the return value on success from 0 to >= 0.
For most of these, this is not a problem, but the change in
virPCIDriverDir breaks PCI passthrough.
The return value check in virhashtest pre-dates virAsprintf OOM
conversion.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a "--pass-fds N,M,..." arg to the virsh start/create
methods. This allows pre-opened file descriptors from the
shell to be passed on into the guest
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
LXC: Wire up the virDomainCreate{XML}WithFiles methods
Wire up the new virDomainCreate{XML}WithFiles methods in the
LXC driver, so that FDs get passed down to the init process.
The lxc_container code needs to do a little dance in order
to renumber the file descriptors it receives into linear
order, starting from STDERR_FILENO + 1.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
remote: fix dom->id after virDomainCreateWithFlags
The virDomainCreateWithFlags remote client helper was made to
invoke REMOTE_PROC_DOMAIN_LOOKUP_BY_UUID to refresh the 'id'
of the domain, following the pattern used in the previous
virDomainCreate method impl.
The remote protocol for virDomainCreateWithFlags though did
actually fix the design flaw in virDomainCreate, by directly
returning the new domain info. For some reason, this data was
never used. So we can just use that data now instead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce new domain create APIs to pass pre-opened FDs to LXC
With container based virt, it is useful to be able to pass
pre-opened file descriptors to the container init process.
This allows for containers to be auto-activated from incoming
socket connections, passing the active socket into the container.
To do this, introduce a pair of new APIs, virDomainCreateXMLWithFiles
and virDomainCreateWithFiles, which accept an array of file
descriptors. For the LXC driver, UNIX file descriptor passing
will be used to send them to libvirtd, which will them pass
them down to libvirt_lxc, which will then pass them to the container
init process.
This will only be implemented for LXC right now, but the design
is generic enough it could work with other hypervisors, hence
I suggest adding this to libvirt.so, rather than libvirt-lxc.so
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The .ctags file specifies default options for ctags so that it does not
ignore libvirt.h.in and ignores uninteresting files. As a result, you
can just run "ctags" and navigating to a public API won't get you to a
useless entry in api.html.
esx: Support for disk-only and quiescing snapshots.
Add support for creating disk-only (no memory) snapshots in esx, and
for quiescing the VM before taking the snapshot. The VMware API
supports these operations directly, so adding support to libvirt is
just a matter of setting the flags correctly when calling
VMware. VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY and
VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE are now valid flags for esx.
Although, having it depending on Xen >= 4.3 (by using the proper
libxl feature flag).
Xen currently implements a NUMA placement policy which is basically
the same as the 'interleaved' policy of `numactl', although it can
be applied on a subset of the available nodes. We therefore hardcode
"interleave" as 'numa_mode', and we use the newly introduced libxl
interface to figure out what nodes a domain spans ('numa_nodeset').
With this change, it is now possible to query the NUMA node
affinity of a running domain:
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list
Id Name State
----------------------------------------------------
23 F18_x64 running
Ján Tomko [Wed, 17 Jul 2013 08:56:05 +0000 (10:56 +0200)]
cgroup: reuse buffer for getline
Reuse the buffer for getline and track buffer allocation
separately from the string length to prevent unlikely
out-of-bounds memory access.
This fixes the following leak that happened when zero bytes were read:
==404== 120 bytes in 1 blocks are definitely lost in loss record 1,344 of 1,671
==404== at 0x4C2C71B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==404== by 0x906F862: getdelim (iogetdelim.c:68)
==404== by 0x52A48FB: virCgroupPartitionNeedsEscaping (vircgroup.c:1136)
==404== by 0x52A0FB4: virCgroupPartitionEscape (vircgroup.c:1171)
==404== by 0x52A0EA4: virCgroupNewDomainPartition (vircgroup.c:1450)