To register virtual machines and containers with systemd-machined,
and thus have cgroups auto-created, we need to talk over DBus.
This is somewhat tedious code, so introduce a dedicated function
to isolate the DBus call in one place.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Doing DBus method calls using libdbus.so is tedious in the
extreme. systemd developers came up with a nice high level
API for DBus method calls (sd_bus_call_method). While
systemd doesn't use libdbus.so, their API design can easily
be ported to libdbus.so.
This patch thus introduces methods virDBusCallMethod &
virDBusMessageRead, which are based on the code used for
sd_bus_call_method and sd_bus_message_read. This code in
systemd is under the LGPLv2+, so we're license compatible.
This code is probably pretty unintelligible unless you are
familiar with the DBus type system. So I added some API
docs trying to explain how to use them, as well as test
cases to validate that I didn't screw up the adaptation
from the original systemd code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
cpu: Let explicit features override model features
Until now CPU features inherited from a specified CPU model could only
be overridden with 'disable' policy. With this patch, any explicitly
specified feature always overrides the same feature inherited from a CPU
model regardless on the specified policy.
The CPU in x86-exact-force-Haswell.xml would previously be incompatible
with x86-host-SandyBridge.xml CPU even though x86-host-SandyBridge.xml
provides all features required by x86-exact-force-Haswell.xml.
A couple of places in LXC setup for filesystems did not do
a "goto cleanup" after reporting errors. While fixing this,
also add in many more debug statements to aid troubleshooting
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
qemu: Unplug devices that disappeared when libvirtd was down
In case libvirtd is asked to unplug a device but the device is actually
unplugged later when libvirtd is not running, we need to detect that and
remove such device when libvirtd starts again and reconnects to running
domains.
Eric Blake [Fri, 19 Jul 2013 15:07:19 +0000 (09:07 -0600)]
security: fix deadlock with prefork
Attempts to start a domain with both SELinux and DAC security
modules loaded will deadlock; latent problem introduced in commit fdb3bde and exposed in commit 29fe5d7. Basically, when recursing
into the security manager for other driver's prefork, we have to
undo the asymmetric lock taken at the manager level.
Reported by Jiri Denemark, with diagnosis help from Dan Berrange.
* src/security/security_stack.c (virSecurityStackPreFork): Undo
extra lock grabbed during recursion.
When debugging a failing test with many test cases, it is useful
to be able to skip most tests. Introducing a new environment
variable VIR_TEST_RANGE=N-M enables execution of only the test
cases numbered N-M inclusive, starting from 1.
For example, to skip all the cgroup tests except 2
$ VIR_TEST_RANGE=2-3 VIR_TEST_DEBUG=1 ./vircgrouptest
TEST: vircgrouptest
2) New cgroup for driver ... Unexpected found LXC cgroup: 1
libvirt: Cgroup error : Failed to create controller cpu for group: No such file or directory
FAILED
3) New cgroup for domain driver ... Cannot find LXC cgroup: 1
libvirt: Cgroup error : Failed to create controller cpu for group: No such file or directory
FAILED
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Eric Blake [Thu, 18 Jul 2013 21:47:41 +0000 (15:47 -0600)]
maint: update to latest gnulib
Upstream gnulib recently patched a bug in bootstrap, for projects
that use a different name than build-aux for a subdirectory. We
don't, but it doesn't hurt to update.
* .gnulib: Update, for bootstrap fix.
* bootstrap: Sync to upstream.
* bootstrap.conf: Match upstream bug fix.
Michal Privoznik [Fri, 19 Jul 2013 07:07:56 +0000 (09:07 +0200)]
autogen: Handle case when libvirt's submodule
Currently, in the autogen.sh script we check whether .git is an existing
directory in which case bootstrap is run. However, if libvirt is a
submodule, then the .git is just a file (with reference to the topmost
.git directory). However, our submodule routines work well. So there's
no real reason why we should prohibit users to build libvirt from
submodule.
Eric Blake [Thu, 18 Jul 2013 15:37:52 +0000 (09:37 -0600)]
maint: split long lines in Makefiles
Makefiles are another easy file to enforce line limits.
Mostly straightforward; interesting tricks worth noting:
src/Makefile.am: $(confdir) was already defined, use it in more places
tests/Makefile.am: path_add and VG required some interesting compression
Eric Blake [Fri, 12 Jul 2013 20:55:21 +0000 (14:55 -0600)]
security_dac: compute supplemental groups before fork
Commit 75c1256 states that virGetGroupList must not be called
between fork and exec, then commit ee777e99 promptly violated
that for lxc's use of virSecurityManagerSetProcessLabel. Hoist
the supplemental group detection to the time that the security
manager needs to fork. Qemu is safe, as it uses
virSecurityManagerSetChildProcessLabel which in turn uses
virCommand to determine supplemental groups.
This does not fix the fact that virSecurityManagerSetProcessLabel
calls virSecurityDACParseIds calls parseIds which eventually
calls getpwnam_r, which also violates fork/exec async-signal-safe
safety rules, but so far no one has complained of hitting
deadlock in that case.
* src/security/security_dac.c (_virSecurityDACData): Track groups
in private data.
(virSecurityDACPreFork): New function, to set them.
(virSecurityDACClose): Clean up new fields.
(virSecurityDACGetIds): Alter signature.
(virSecurityDACSetSecurityHostdevLabelHelper)
(virSecurityDACSetChardevLabel, virSecurityDACSetProcessLabel)
(virSecurityDACSetChildProcessLabel): Update callers.
Eric Blake [Wed, 17 Jul 2013 21:35:50 +0000 (15:35 -0600)]
security: framework for driver PreFork handler
A future patch wants the DAC security manager to be able to safely
get the supplemental group list for a given uid, but at the time
of a fork rather than during initialization so as to pick up on
live changes to the system's group database. This patch adds the
framework, including the possibility of a pre-fork callback
failing.
For now, any driver that implements a prefork callback must be
robust against the possibility of being part of a security stack
where a later element in the chain fails prefork. This means
that drivers cannot do any action that requires a call to postfork
for proper cleanup (no grabbing a mutex, for example). If this
is too prohibitive in the future, we would have to switch to a
transactioning sequence, where each driver has (up to) 3 callbacks:
PreForkPrepare, PreForkCommit, and PreForkAbort, to either clean
up or commit changes made during prepare.
* src/security/security_driver.h (virSecurityDriverPreFork): New
callback.
* src/security/security_manager.h (virSecurityManagerPreFork):
Change signature.
* src/security/security_manager.c (virSecurityManagerPreFork):
Optionally call into driver, and allow returning failure.
* src/security/security_stack.c (virSecurityDriverStack):
Wrap the handler for the stack driver.
* src/qemu/qemu_process.c (qemuProcessStart): Adjust caller.
Eric Blake [Wed, 17 Jul 2013 17:47:01 +0000 (11:47 -0600)]
tests: split long lines
Long lines are harder to read and harder to diff; in fact, if lines get
too long (> 1000 bytes), it starts causing issues where git send-email
refuses to send patches for the file. I've cleaned up the tests
directory in the past (see commits bd6c46f, 3b750d1), but new long
lines have been introduced in the meantime.
Why 90 instead of 80? Because there were too many tests on the fringe
edge, and I didn't want to edit that many files.
Add a syntax check to prevent future long lines.
* cfg.mk (sc_prohibit_long_lines): New rule.
* tests/qemuxml2argvdata/qemuxml2argv-*.args: Split lines of any
file with content longer than 90 columns.
* tests/storagevolxml2argvdata/*.argv: Likewise.
Some versions of kFreeBSD (like 9.0) declare link_addr in a header
but lack an implementation. This makes ./configure pass but breaks
compilation later with a
undefined reference to `link_addr'
Althought that's a bug in the OS header we can detect it easily by also
trying to link.
Osier Yang [Fri, 24 May 2013 09:08:28 +0000 (17:08 +0800)]
qemu: Set cpuset.cpus for domain process
When either "cpuset" of <vcpu> is specified, or the "placement" of
<vcpu> is "auto", only setting the cpuset.mems might cause the guest
starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is
"auto"):
1) Related XMLs
<vcpu placement='auto'>4</vcpu>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
to '0-63'
5) The advisory nodeset got from querying numad (from debug log)
2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
Nodeset returned from numad: 1
6) cpuset.mems will be set as: (from debug log)
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
to '0-7'
I.E, the domain process's memory is restricted on the first NUMA node,
however, it can use all of the CPUs, which will likely cause the domain
process to fail to start because of the kernel fails to allocate
memory with the the memory policy as "strict".
Peter Krempa [Thu, 18 Jul 2013 09:21:48 +0000 (11:21 +0200)]
caps: Add helpers to convert NUMA nodes to corresponding CPUs
These helpers use the remembered host capabilities to retrieve the cpu
map rather than query the host again. The intended usage for this
helpers is to fix automatic NUMA placement with strict memory alloc. The
code doing the prepare needs to pin the emulator process only to cpus
belonging to a subset of NUMA nodes of the host.
Add virtio-scsi to fallback models of scsi controller
When user does not specify any model for scsi controller, or worse, no
controller at all, but libvirt automatically adds scsi controller with
no model, we are not searching for virtio-scsi and thus this can fail
for example on qemu which doesn't support lsi logic adapter.
This means that when qemu on x86 doesn't support lsi53c895a and the
user adds the following to an XML without any scsi controller:
<disk ...>
...
<target dev='sda'>
</disk>
libvirt fails like this:
# virsh define asdf.xml
error: Failed to define domain from asdf.xml
error: internal error Unable to determine model for scsi controller
Michal Privoznik [Wed, 17 Jul 2013 07:20:26 +0000 (09:20 +0200)]
Remove lxcDriverLock from almost everywhere
With the majority of fields in the virLXCDriverPtr struct
now immutable or self-locking, there is no need for practically
any methods to be using the LXC driver lock. Only a handful
of helper APIs now need it.
Michal Privoznik [Wed, 17 Jul 2013 07:14:42 +0000 (09:14 +0200)]
lxc: Make activeUsbHostdevs use locks
The activeUsbHostdevs item in LXCDriver are lockable, but the lock has
to be called explicitly. Call the virObject(Un)Lock() in order to
achieve mutual exclusion once lxcDriverLock is removed.
Michal Privoznik [Mon, 15 Jul 2013 09:43:10 +0000 (11:43 +0200)]
Stop accessing driver->caps directly in LXC driver
The 'driver->caps' pointer can be changed on the fly. Accessing
it currently requires the global driver lock. Isolate this
access in a single helper, so a future patch can relax the
locking constraints.
Michal Privoznik [Tue, 16 Jul 2013 15:45:05 +0000 (17:45 +0200)]
Introduce a virLXCDriverConfigPtr object
Currently the virLXCDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virLXCDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virLXCDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.
NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
Michal Privoznik [Thu, 18 Jul 2013 10:38:02 +0000 (12:38 +0200)]
qemuhotplugtest: Resolve some memleaks
If testQemuHotplugAttach succeeds, the vm->def steals the dev pointer.
However, not the envelope, which needs to be freed. In addition,
driver.config is allocated, but never freed.
Ján Tomko [Thu, 18 Jul 2013 10:13:46 +0000 (12:13 +0200)]
virAsprintf: correctly check return value
When virAsprintf was changed from a function to a macro
reporting OOM error in dc6f2da, it was documented as returning
0 on success. This is incorrect, it returns the number of bytes
written as asprintf does.
Some of the functions were converted to use virAsprintf's return
value directly, changing the return value on success from 0 to >= 0.
For most of these, this is not a problem, but the change in
virPCIDriverDir breaks PCI passthrough.
The return value check in virhashtest pre-dates virAsprintf OOM
conversion.
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a "--pass-fds N,M,..." arg to the virsh start/create
methods. This allows pre-opened file descriptors from the
shell to be passed on into the guest
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
LXC: Wire up the virDomainCreate{XML}WithFiles methods
Wire up the new virDomainCreate{XML}WithFiles methods in the
LXC driver, so that FDs get passed down to the init process.
The lxc_container code needs to do a little dance in order
to renumber the file descriptors it receives into linear
order, starting from STDERR_FILENO + 1.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
remote: fix dom->id after virDomainCreateWithFlags
The virDomainCreateWithFlags remote client helper was made to
invoke REMOTE_PROC_DOMAIN_LOOKUP_BY_UUID to refresh the 'id'
of the domain, following the pattern used in the previous
virDomainCreate method impl.
The remote protocol for virDomainCreateWithFlags though did
actually fix the design flaw in virDomainCreate, by directly
returning the new domain info. For some reason, this data was
never used. So we can just use that data now instead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce new domain create APIs to pass pre-opened FDs to LXC
With container based virt, it is useful to be able to pass
pre-opened file descriptors to the container init process.
This allows for containers to be auto-activated from incoming
socket connections, passing the active socket into the container.
To do this, introduce a pair of new APIs, virDomainCreateXMLWithFiles
and virDomainCreateWithFiles, which accept an array of file
descriptors. For the LXC driver, UNIX file descriptor passing
will be used to send them to libvirtd, which will them pass
them down to libvirt_lxc, which will then pass them to the container
init process.
This will only be implemented for LXC right now, but the design
is generic enough it could work with other hypervisors, hence
I suggest adding this to libvirt.so, rather than libvirt-lxc.so
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The .ctags file specifies default options for ctags so that it does not
ignore libvirt.h.in and ignores uninteresting files. As a result, you
can just run "ctags" and navigating to a public API won't get you to a
useless entry in api.html.
esx: Support for disk-only and quiescing snapshots.
Add support for creating disk-only (no memory) snapshots in esx, and
for quiescing the VM before taking the snapshot. The VMware API
supports these operations directly, so adding support to libvirt is
just a matter of setting the flags correctly when calling
VMware. VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY and
VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE are now valid flags for esx.
Although, having it depending on Xen >= 4.3 (by using the proper
libxl feature flag).
Xen currently implements a NUMA placement policy which is basically
the same as the 'interleaved' policy of `numactl', although it can
be applied on a subset of the available nodes. We therefore hardcode
"interleave" as 'numa_mode', and we use the newly introduced libxl
interface to figure out what nodes a domain spans ('numa_nodeset').
With this change, it is now possible to query the NUMA node
affinity of a running domain:
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list
Id Name State
----------------------------------------------------
23 F18_x64 running
Ján Tomko [Wed, 17 Jul 2013 08:56:05 +0000 (10:56 +0200)]
cgroup: reuse buffer for getline
Reuse the buffer for getline and track buffer allocation
separately from the string length to prevent unlikely
out-of-bounds memory access.
This fixes the following leak that happened when zero bytes were read:
==404== 120 bytes in 1 blocks are definitely lost in loss record 1,344 of 1,671
==404== at 0x4C2C71B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==404== by 0x906F862: getdelim (iogetdelim.c:68)
==404== by 0x52A48FB: virCgroupPartitionNeedsEscaping (vircgroup.c:1136)
==404== by 0x52A0FB4: virCgroupPartitionEscape (vircgroup.c:1171)
==404== by 0x52A0EA4: virCgroupNewDomainPartition (vircgroup.c:1450)
Michal Privoznik [Mon, 15 Jul 2013 13:50:29 +0000 (15:50 +0200)]
virSecurityManagerGenLabel: Skip seclabels without model
While generating seclabels, we check the seclabel stack if required
driver is in the stack. If not, an error is returned. However, it is
possible for a seclabel to not have any model set (happens with LXC
domains that have just <seclabel type='none'>). If that's the case,
we should just skip the iteration instead of calling STREQ(NULL, ...)
and SIGSEGV-ing subsequently.
Currently, if the primary security driver is 'none', we skip
initializing caps->host.secModels. This means, later, when LXC domain
XML is parsed and <seclabel type='none'/> is found (see
virSecurityLabelDefsParseXML), the model name is not copied to the
seclabel. This leads to subsequent crash in virSecurityManagerGenLabel
where we call STREQ() over the model (note, that we are expecting model
to be !NULL).
Eric Blake [Tue, 16 Jul 2013 16:11:32 +0000 (10:11 -0600)]
build: avoid compiler warning on shadowed name
Introduced in commit 24b08219; compilation on RHEL 6.4 complained:
qemu/qemu_hotplug.c: In function 'qemuDomainAttachChrDevice':
qemu/qemu_hotplug.c:1257: error: declaration of 'remove' shadows a global declaration [-Wshadow]
/usr/include/stdio.h:177: error: shadowed declaration is here [-Wshadow]
* src/qemu/qemu_hotplug.c (qemuDomainAttachChrDevice): Avoid the
name 'remove'.
John Ferlan [Mon, 8 Jul 2013 17:19:43 +0000 (13:19 -0400)]
Allow balloon driver collection to be adjusted dynamically
Use the virDomainSetMemoryStatsPeriodFlags() to pass a period defined by
usage of a new --period option in order to set the collection period for the
balloon driver. This may enable or disable the collection based on the value.
Add the --current, --live, & --config options to dommemstat.
John Ferlan [Mon, 8 Jul 2013 16:47:23 +0000 (12:47 -0400)]
Implement the virDomainSetMemoryStatsPeriod for QEMU driver
Implement the new API that will handle setting the balloon driver statistics
collection period in order to enable or disable the collection dynamically.
John Ferlan [Mon, 8 Jul 2013 14:22:38 +0000 (10:22 -0400)]
Add new public API virDomainSetMemoryStatsPeriod
Add new API in order to set the balloon memory driver statistics collection
period in order to allow dynamic period adjustment for the virsh dommemstats to
display balloon stats data
John Ferlan [Thu, 11 Jul 2013 23:18:48 +0000 (19:18 -0400)]
Add capability to fetch balloon stats
This patch will add the qemuMonitorJSONGetMemoryStats() to execute a
"guest-stats" on the balloonpath using "get-qom" replacing the former
mechanism which looked through the "query-ballon" returned data for
the fields. The "query-balloon" code only returns 'actual' memory.
Rather than duplicating the existing code, have the JSON API use the
GetBalloonInfo API.
A check in the qemuMonitorGetMemoryStats() will be made to ensure the
balloon driver path has been set. Since the underlying JSON code can
return data not associated with the balloon driver, we don't fail on
a failure to get the balloonpath. Of course since we've made the check,
we can then set the ballooninit flag. Getting the path here is primarily
due to the process reconnect path which doesn't attempt to set the
collection period.
John Ferlan [Thu, 27 Jun 2013 15:00:31 +0000 (11:00 -0400)]
Determine whether to start balloon memory stats gathering.
At vm startup and attach attempt to set the balloon driver statistics
collection period based on the value found in the domain xml file. This
is not done at reconnect since it's possible that a collection period
was set on the live guest and making the set period call would reset to
whatever value is stored in the config file.
Setting the stats collection period has a side effect of searching through
the qom-list output for the virtio balloon driver and making sure that it
has the right properties in order to allow setting of a collection period
and eventually fetching of statistics.
The walk through the qom-list is expensive and thus the balloonpath will
be saved in the monitor private structure as well as a flag indicating
that the initialization has already been attempted (in the event that a
path is not found, no sense to keep checking).
This processing model conforms to the qom object model model which
requires setting object properties after device startup. That is, it's
not possible to pass the period along via the startup code as it won't
be recognized.
Alex Jia [Tue, 16 Jul 2013 09:30:20 +0000 (17:30 +0800)]
qemu: Prevent crash of libvirtd without guest agent configuration
If users haven't configured guest agent then qemuAgentCommand() will
dereference a NULL 'mon' pointer, which causes crash of libvirtd when
using agent based cpu (un)plug.
With the patch, when the qemu-ga service isn't running in the guest,
a expected error "error: Guest agent is not responding: Guest agent
not available for now" will be raised, and the error "error: argument
unsupported: QEMU guest agent is not configured" is raised when the
guest hasn't configured guest agent.
GDB backtrace:
(gdb) bt
#0 virNetServerFatalSignal (sig=11, siginfo=<value optimized out>, context=<value optimized out>) at rpc/virnetserver.c:326
#1 <signal handler called>
#2 qemuAgentCommand (mon=0x0, cmd=0x7f39300017b0, reply=0x7f394b090910, seconds=-2) at qemu/qemu_agent.c:975
#3 0x00007f39429507f6 in qemuAgentGetVCPUs (mon=0x0, info=0x7f394b0909b8) at qemu/qemu_agent.c:1475
#4 0x00007f39429d9857 in qemuDomainGetVcpusFlags (dom=<value optimized out>, flags=9) at qemu/qemu_driver.c:4849
#5 0x00007f3957dffd8d in virDomainGetVcpusFlags (domain=0x7f39300009c0, flags=8) at libvirt.c:9843
How to reproduce?
# To start a guest without guest agent configuration
# then run the following cmdline
# virsh vcpucount foobar --guest
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor
When using logical pools, we had to trust the target->path provided.
This parameter, however, can be completely ommited and we can use
'/dev/<source.name>' safely and populate it to target.path.
qemuhotplugtest: Introduce test for chardev hotplug
The test is currently testing just device update function. However,
chardev hotplug is implemented just for device attach and detach. This
fact means, the test needs to be rewritten (the majority of the code is
still shared). Moreover, we are now able to pass VM among multiple test
runs. So for instance, while we add a device in the first run, we can
remove it in the second run.
Michal Privoznik [Tue, 12 Mar 2013 14:59:25 +0000 (15:59 +0100)]
qemu: Implement chardev hotplug on config level
There are two levels on which a device may be hotplugged: config
and live. The config level requires just an insert or remove from
internal domain definition structure, which is exactly what this
patch does. There is currently no implementation for a chardev
update action, as there's not much to be updated. But more
importantly, the only thing that can be updated is path or socket
address by which chardevs are distinguished. So the update action
is currently not supported.
Michal Privoznik [Fri, 12 Jul 2013 16:52:17 +0000 (18:52 +0200)]
domain_conf: Auto fill chardev port
Now that we have callbacks, we should auto fill in omitted pieces of
information. It's important for chardev hotplug to fill in the correct
/{serial,parallel,console,channel}/target/@port if no value has been
provided by user.
Until now, the "host-model" cpu mode couldn't be influenced. This patch
allows to use the <feature> elements to either enable or disable
specific CPU flags. This can be used to force flags that can be emulated
even if the host CPU doesn't support them.