Peter Krempa [Fri, 5 Feb 2016 15:57:58 +0000 (16:57 +0100)]
qemu: snapshot: Avoid infinite loop if vCPUs can't be resumed
In b3d2a42e I've refactored the code and moved the 'cleanup' label.
Unfortunately the code that was originally in the 'endjob' label and
wanted to jump to cleanup is now in the cleanup label. Remove the jump
and let the function finish.
Peter Krempa [Fri, 5 Feb 2016 15:55:09 +0000 (16:55 +0100)]
qemu: snapshot: Don't overwrite existing errors when thawing filesystems
If we are attempting to thaw the filesystems on error, the code would
overwrite the error code that caused the snapshot to fail with the error
of thawing the filesystem. Since the thawing function allows control of
error reporting behavior we can use this feature.
Add a stub for nodeDeviceSysfsGetPCIRelatedDevCaps() for non-Linux
platforms. It allows nodedev driver to work on non-Linux platoforms
that, however, have HAL.
cppi: src/bhyve/bhyve_driver.h: line 26: not properly indented
cppi: src/bhyve/bhyve_driver.h: line 27: not properly indented
maint.mk: incorrect preprocessor indentation
After 1036ddadb276e we use bhyveDriverGetCapabilities from other
sources too, not only from bhyve_driver.c. However, the function
was static so not properly expose to other files. In order to
expose it, we need to move couple of #include-s too.
Then, there has been a copy paste error in
virBhyveProcessReconnect: s/privconn/data->driver/.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Cole Robinson [Wed, 13 Jan 2016 16:02:03 +0000 (11:02 -0500)]
tests: qemuargv2xml: separate from qemuxml2argv data
Most of the qemuargv2xml tests are parsing old style qemu command
lines (with -disk, -serial, etc), and it gets its input from
qemuxml2argv output.
But since we've raise the minimum supported qemu version to 0.12.0,
which supports -device, once that changes propagates through libvirt
the vast majority of qemuxml2argv output is _not_ going to be using
old style qemu options.
In preparation for this, switch qemuargv2xml to use its own copies
of input and output, so it's not tied to qemuxml2argv results.
This is just a straight copy of the current tests.
I've noticed that variable @reply is not initialized and if
something at the beginning of the function fails, e.g.
virDBusGetSystemBus(), the control jump straight to cleanup label
where dbus_message_unref() is then called over this uninitialized
variable.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I've noticed couple of warning in dmesg while debugging
something:
[ 9683.973754] HTB: quantum of class 10001 is big. Consider r2q change.
[ 9683.976460] HTB: quantum of class 10002 is big. Consider r2q change.
I've read the HTB documentation and linux kernel code to find out
what's wrong. Basically we need to pass another argument
"quantum" to our tc cmd line because the default computed by HTB
does not always work in which case the warning message is printed
out.
Peter Krempa [Tue, 12 Jan 2016 12:11:56 +0000 (13:11 +0100)]
conf: Extract code for parsing thread resource scheduler info
As the scheduler info elements are represented orthogonally to how it
makes sense to actually store the information, the extracted code will
be later used when converting between XML and internal definitions.
So, systemd-machined has this philosophy that machine names are like
hostnames and hence should follow the same rules. But we always allowed
international characters in domain names. Thus we need to modify the
machine name we are passing to systemd.
In order to change some machine names that we will be passing to systemd,
we also need to call TerminateMachine at the end of a lifetime of a
domain. Even for domains that were started with older libvirt. That
can be achieved thanks to virSystemdGetMachineNameByPID(). And because
we can change machine names, we can get rid of the inconsistent and
pointless escaping of domain names when creating machine names.
So this patch modifies the naming in the following way. It creates the
name as <drivername>-<id>-<name> where invalid hostname characters are
stripped out of the name and if the resulting name is longer, it
truncates it to 64 characters. That way we can start domains we
couldn't start before. Well, at least on systemd.
To make it work all together, the machineName (which is needed only with
systemd) is saved in domain's private data. That way the generation is
moved to the driver and we don't need to pass various unnecessary
arguments to cgroup functions.
The only thing this complicates a bit is the scope generation when
validating a cgroup where we must check both old and new naming, so a
slight modification was needed there.
Joao Martins [Thu, 4 Feb 2016 22:55:05 +0000 (22:55 +0000)]
conf: add caps to virDomainSnapshotDefFormat
The virDomainSnapshotDefFormat calls into virDomainDefFormat,
so should be providing a non-NULL virCapsPtr instance. On the
qemu driver we change qemuDomainSnapshotWriteMetadata to also
include caps since it calls virDomainSnapshotDefFormat.
Joao Martins [Wed, 3 Feb 2016 21:40:37 +0000 (21:40 +0000)]
libxl: set net device prefix
Use the newly added virCapabilitiesSetNetPrefix to set
the network prefix for the driver. This in return will
be use by NetDefFormat() and NetDefParseXML() routines
to free any interface name that start with the registered
prefix.
Acked-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Joao Martins [Wed, 3 Feb 2016 21:40:36 +0000 (21:40 +0000)]
conf: add caps to virDomainSaveConfig
virDomainSaveConfig calls virDomainDefFormat which was setting the caps
to NULL, thus keeping the old behaviour (i.e. not looking at
netprefix). This patch adds the virCapsPtr to the function and allows
the configuration to be saved and skipping interface names that were
registered with virCapabilitiesSetNetPrefix().
Joao Martins [Wed, 3 Feb 2016 21:40:33 +0000 (21:40 +0000)]
conf: add net device prefix to capabilities
In the reverted commit d2e5538b1, the libxl driver was changed to copy
interface names autogenerated by libxl to the corresponding network def
in the domain's virDomainDef object. The copied name is freed when the
domain transitions to the shutoff state. But when migrating a domain,
the autogenerated name is included in the XML sent to the destination
host. It is possible an interface with the same name already exists on
the destination host, causing migration to fail.
This patch defines a new capability for setting the network device
prefix that will be used in the driver. Valid prefixes are
VIR_NET_GENERATED_PREFIX or the one announced by the driver.
ZFS-on-Linux implementation of ZFS starting with version 0.6.4
contains all the features we use. Additionally, as we support
'volmode' option handling that's not available on ZoL but is
available on FreeBSD, there is no need to block ZFS storage driver
on Linux anymore.
There are slight differences in various ZFS implementations.
Specifically, ZFS on FreeBSD requires to set value of 'volmode'
option to 'dev' to expose volumes as raw disk device (that's what
we need) rather than geom provides, for example.
With ZFS on Linux, however, such option is not available and
volumes exposed like we need by default.
To make our implementation more flexible, only pass 'volmode'
when it's supported. Support is checked by parsing usage
information of the 'zfs get' command.
Erik Skultety [Tue, 2 Feb 2016 13:13:15 +0000 (14:13 +0100)]
util: Export remoteSerializeTypedParameters internally via util
Same as for deserializer, this method might get handy for admin one day.
The major reason for this patch is to stay consistent with idea, i.e.
when deserializer can be shared, why not serializer as well. The only
problem to be solved was that the daemon side serializer uses a code
snippet which handles sparse arrays returned by some APIs as well as
removes any string parameters that can't be returned to older clients.
This patch makes of the new virTypedParameterRemote datatype introduced
by one of the pvious patches.
Erik Skultety [Tue, 2 Feb 2016 12:19:35 +0000 (13:19 +0100)]
util: Export remoteFreeTypedParameters internally via util
Since the method is static to remote_driver, it can't even be used by our
daemon. Other than that, it would be useful to be able to use it with admin as
well. This patch uses the new virTypedParameterRemote datatype introduced in
one of previous patches.
Erik Skultety [Thu, 28 Jan 2016 16:27:42 +0000 (17:27 +0100)]
util: Export remoteDeserializeTypedParameters internally via util
Currently, the deserializer is hardcoded into remote_driver which makes
it impossible for admin to use it. One way to achieve a shared implementation
(besides moving the code to another module) would be pass @ret_params_val as a
void pointer as opposed to the remote_typed_param pointer and add a new extra
argument specifying which of those two protocols is being used and typecast
the pointer at the function entry. An example from remote_protocol:
struct remote_typed_param_value {
int type;
union {
int i;
u_int ui;
int64_t l;
uint64_t ul;
double d;
int b;
remote_nonnull_string s;
} remote_typed_param_value_u;
};
typedef struct remote_typed_param_value remote_typed_param_value;
That would leave us with a bunch of if-then-elses that needed to be used across
the method. This patch takes the other approach using the new datatype
introduced in one of earlier commits.
Erik Skultety [Tue, 2 Feb 2016 14:33:30 +0000 (15:33 +0100)]
util: Introduce virTypedParameterRemote datatype
Both admin and remote protocols define their own types
(remote_typed_param vs admin_typed_param). Because of the naming convention,
admin typed params wouldn't be able to reuse the serialization/deserialization
methods, which are tailored for use by remote protocol, even if those method
were exported properly. In that case, introduce a new internal data type
structurally copying both admin and remote protocols which, eventually, would
allow serializer and deserializer to be used in a more generic way.
qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix
A pretty nasty deadlock occurs while trying to rename a VM in parallel
with virDomainObjListNumOfDomains.
The short description of the problem is as follows:
Thread #1:
qemuDomainRename:
------> aquires domain lock by qemuDomObjFromDomain
---------> waits for domain list lock in any of the listed functions:
- virDomainObjListFindByName
- virDomainObjListRenameAddNew
- virDomainObjListRenameRemove
Thread #2:
virDomainObjListNumOfDomains:
------> aquires domain list lock
---------> waits for domain lock in virDomainObjListCount
Introduce generic virDomainObjListRename function for renaming domains.
It aquires list lock in right order to avoid deadlock. Callback is used
to make driver specific domain updates.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Thu, 17 Dec 2015 15:04:23 +0000 (16:04 +0100)]
qemu: domain: Prepare qemuDomainDetectVcpuPids for reuse
Free the old vcpupids array in case when this function is called again
during the run of the VM. It will be later reused in the vCPU hotplug
code. The function now returns the number of detected VCPUs.
Peter Krempa [Mon, 7 Dec 2015 05:36:37 +0000 (06:36 +0100)]
conf: Add helper to retrieve bitmap of active vcpus for a definition
In some cases it may be better to have a bitmap representing state of
individual vcpus rather than iterating the definition. The new helper
creates a bitmap representing the state from the domain definition.
Peter Krempa [Mon, 7 Dec 2015 06:39:34 +0000 (07:39 +0100)]
cgroup: Clean up virCgroupGetPercpuStats
Use 'ret' for return variable name, clarify use of 'param_idx' and avoid
unnecessary 'success' label. No functional changes. Also document the
function.
Since commit 714080791778e3dfbd484ccb3953bffd820b8ba9 we are generating
socket path later than before -- when starting a domain. That makes one
particular inconsistent state of a chardev, which was not possible
before, currently valid. However, SELinux security driver forgot to
guard the main restoring function by a check for NULL-paths. So make it
no-op for NULL paths, as in the DAC driver.
Erik Skultety [Thu, 28 Jan 2016 15:46:32 +0000 (16:46 +0100)]
cfg.mk: Adjust sc_prohibit_int_ijk to support 'exempt from syntax-check'
There might be cases, like with typed params, where triggering this check isn't
desirable. But including the whole module in the exception regex is not always
to right way of doing things. By adding an option to manually disable this check
on a specific occurrence, the module itself will still be checked against the
rule.
Dmitry Andreev [Sat, 23 Jan 2016 16:08:45 +0000 (19:08 +0300)]
qemuDomainResume: allow to resume domain with guest panicked
In case of guest panicked, preserved crashed domain has stopped CPUs.
It's not possible to use tools like WinDbg for the problem investigation
until we start CPUs back.
qemu: return -1 on error paths in qemuDomainSaveImageStartVM
Error paths after sending the event that domain is started written as if ret = -1
which is set at the beginning of the function. It's common idioma to keep 'ret'
equal to -1 until the end of function where it is set to 0. But here we use ret
to keep result of restore operation too and thus breaks the idioma and its users :)
Let's use different variable to hold restore result.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
John Ferlan [Mon, 1 Feb 2016 15:26:02 +0000 (10:26 -0500)]
logical: Clean up allocation when building regex on the fly
Rather than a loop reallocating space to build the regex, just allocate
it once up front, then if there's more than 1 nextent, append a comma and
another regex_unit string.
John Ferlan [Mon, 1 Feb 2016 14:59:44 +0000 (09:59 -0500)]
logical: Use 'stripes' value for mirror/raid segtype
The 'stripes' value is described as the "Number of stripes or mirrors in
a logical volume". So add "mirror" and anything that starts with "raid"
to the list of segtypes that can have an 'nextents' value greater than one.
Use of raid segtypes (raid1, raid4, raid5*, raid6*, and raid10) is favored
over mirror in more recent lvm code.
John Ferlan [Mon, 1 Feb 2016 14:39:00 +0000 (09:39 -0500)]
logical: Use VIR_APPEND_ELEMENT instead of VIR_REALLOC_N
Rather than preallocating a set number of elements, then walking through
the extents and adjusting the specific element in place, use the APPEND
macros to handle that chore.
In my previous commit a70f3b1c779120129 I've tried to fix case
when building from VPATH and a file wasn't being installed.
However, my fix broke non-VPATH build.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michael Chapman [Wed, 27 Jan 2016 02:24:54 +0000 (13:24 +1100)]
virsh: improve waiting for block job readiness
After a block job hits 100%, we only need to apply a timeout waiting for
a block job event if exactly one of the BLOCK_JOB or BLOCK_JOB_2
callbacks were able to be registered.
If neither callback could be registered, there's clearly no need for a
timeout.
If both callbacks were registered, then we're guaranteed to eventually
get one of the events. The path being used by virsh must be exactly the
source path or target device in the domain's disk definition, and these
are the respective strings sent back in these two events.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Michael Chapman [Wed, 27 Jan 2016 02:24:52 +0000 (13:24 +1100)]
virsh: be consistent with style of loop exit
When waiting for a block job, the various statuses (COMPLETED, READY,
CANCELED, etc.) should all be treated consistently by having the loop be
exited with "break". Use "goto cleanup" for the error cases only, when
no block job status is available.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Boris Fiuczynski [Wed, 20 Jan 2016 12:13:45 +0000 (13:13 +0100)]
qemu: Align dump options for watchdog and on_crash events
Having on_crash set to either coredump-destroy or coredump-restart
creates core dumps with option memory-only in the directory specified
by auto_dump_path. When a watchdog is triggered with the action dump
the core dump is also placed into the directory specified by auto_dump_path
but is created without the option memory-only.
This patch sets the option memory-only also for core dumps created by the
watchdog event.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Michal Privoznik [Sat, 30 Jan 2016 10:55:45 +0000 (11:55 +0100)]
includes: Install libvirt-common.h
The libvirt-common.h is build time generated file from .in.
Obviously, it's generated into builddir and not srcdir. Problem
is, the list of header files to install, virinc_HEADERS contains
only $(srcdir)/*.h and this misses libvirt-common.h. This problem
is pretty obvious when doing a VPATH build.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Create a helper routine in order to parse any extents information
including the extent size, length, and the device string contained
within the generated 'lvs' output string.
A future patch would then be able to avoid the code more cleanly
RBD supports cloning by creating a snapshot, protecting it and create
a child image based on that snapshot afterwards.
The RBD storage driver will try to find a snapshot with zero deltas between
the current state of the original volume and the snapshot.
If such a snapshot is found a clone/child image will be created using
the rbd_clone2() function from librbd.
rbd_clone2() is available in librbd since Ceph version Dumpling (0.67) which
dates back to August 2013.
It will use the same features, strip size and stripe count as the parent image.
This implementation will only create a single snapshot on the parent image if
never changes. This reduces the amount of snapshots created for that RBD image
which benefits the performance of the Ceph cluster.
During build the decision will be made to use either rbd_diff_iterate() or
rbd_diff_iterate2().
The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
Cloning is only supported if RBD format 2 is used. All images created by libvirt
are already format 2.
If a RBD format 1 image is used as the original volume the backend will report
a VIR_ERR_OPERATION_UNSUPPORTED error.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
rbd: Add support for wiping RBD volumes using TRIM.
Using VIR_STORAGE_VOL_WIPE_ALG_TRIM a RBD volume can be trimmed down
to 0 bytes using rbd_discard()
Effectively all the data on the volume will be lost/gone, but the volume
remains available for use afterwards.
Starting at offset 0 the storage pool will call rbd_discard() in stripe
size * count increments which is usually 4MB. Stripe size being 4MB and
count 1.
rbd_discard() is available since Ceph version Dumpling (0.67) which dates
back to August 2013.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When wiping the RBD image will be filled with zeros started
at offset 0 and until the end of the volume.
This will result in the RBD volume growing to it's full allocation
on the Ceph cluster. All data on the volume will be overwritten
however, making it unavailable.
It does NOT take any RBD snapshots into account. The original data
might still be in a snapshot of that RBD volume.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Use the cast of (virStorageVolWipeAlgorithm) adding the missing case:'s
(VIR_STORAGE_VOL_WIPE_ALG_ZERO and VIR_STORAGE_VOL_WIPE_ALG_LAST).
Additionally, the old code would also still run the SCRUB command on
default since it didn't go to cleanup when a invalid flag was supplied.
We now go to cleanup and exit if a invalid flag would be provided.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
John Ferlan [Fri, 22 Jan 2016 16:01:05 +0000 (11:01 -0500)]
logical: Fix comment examples for virStorageBackendLogicalFindLVs
When commit id '82c1740a' made changes to the output format (changing from
using a ',' separator to '#'), the examples in the lvs output from the
comments weren't changed.
Additionally, the two new fields added ('segtype' and 'stripes') were
not included in the output, leaving it well confusing.
This patch fixes the sample output, adds a 'striped' example, and makes
other comment related adjustments for long line and spacing between followup
'NB' remarks (while I'm there).
Change their return type from unsigned int to bool: the corresponding
members in struct _virPCIDevice are defined as bool, and even the
corresponding virPCIDeviceSet*() functions take a bool value as input
so there's no point in these functions having unsigned int as return
type.
Michal Privoznik [Thu, 28 Jan 2016 14:20:17 +0000 (15:20 +0100)]
gendispatch: Don't output spaces on empty line
In our generator for some code we put empty lines in the output
to separate blocks of code. However, in some cases we put couple
of spaces on the empty line too. It's not bug, it just isn't
nice.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Andrea Bolognani [Fri, 15 Jan 2016 10:57:26 +0000 (11:57 +0100)]
pci: Add debug messages when unbinding from stub driver
Unbinding a PCI device from the stub driver can require several steps,
and it can be useful for debugging to be able to trace which of these
steps are performed and which are skipped for each device.
Andrea Bolognani [Tue, 19 Jan 2016 14:58:31 +0000 (15:58 +0100)]
pci: Phase out virPCIDeviceReattachInit()
The name is confusing, and there are just two uses: one is a test case,
and the other will be removed as part of an upcoming refactoring of
the hostdev code.
Andrea Bolognani [Wed, 27 Jan 2016 09:35:17 +0000 (10:35 +0100)]
virnetdevopenvswitch: Don't call strlen() twice on the same string
Commit 871e10f fixed a memory corruption error, but called strlen()
twice on the same string to do so. Even though the compiler is
probably smart enough to optimize the second call away, having a
single invocation makes the code slightly cleaner.
Suggested-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Wed, 27 Jan 2016 08:49:54 +0000 (09:49 +0100)]
virnetdevmacvlan: Provide stubs for build without macvtap
In 370608b4c76f we have introduced two new internal APIs.
However, there are no stubs for build without macvtap. Therefore
build on systems lacking macvtap support (e.g. mingw or freebds)
fails when trying to link.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The problem is that virCommandRun can return an empty string in the event that
the port being queried does not exist. When this happens then we are
unconditionally overwriting a newline character at position strlen()-1. When
strlen is 0, we overwrite memory that does not belong to the string.
The fix: Only overwrite the newline if the string is not empty.
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Laine Stump [Tue, 19 Jan 2016 19:20:54 +0000 (14:20 -0500)]
util: keep/use a bitmap of in-use macvtap devices
This patch creates two bitmaps, one for macvlan device names and one
for macvtap. The bitmap position is used to indicate that libvirt is
currently using a device with the name macvtap%d/macvlan%d, where %d
is the position in the bitmap. When requested to create a new
macvtap/macvlan device, libvirt will now look for the first clear bit
in the appropriate bitmap and derive the device name from that rather
than just starting at 0 and counting up until one works.
When libvirtd is restarted, the qemu driver code that reattaches to
active domains calls the appropriate function to "re-reserve" the
device names as it is scanning the status of running domains.
Note that it may seem strange that the retry counter now starts at
8191 instead of 5. This is because we now don't do a "pre-check" for
the existence of a device once we've reserved it in the bitmap - we
move straight to creating it; although very unlikely, it's possible
that someone has a running system where they have a large number of
network devices *created outside libvirt* named "macvtap%d" or
"macvlan%d" - such a setup would still allow creating more devices
with the old code, while a low retry max in the new code would cause a
failure. Since the objective of the retry max is just to prevent an
infinite loop, and it's highly unlikely to do more than 1 iteration
anyway, having a high max is a reasonable concession in order to
prevent lots of new failures.
Leno Hou [Mon, 11 Jan 2016 06:59:00 +0000 (14:59 +0800)]
util: increase libnl buffer size
In the following cases nl_recv() was returning the error "No buffer
space available":
* When switching CPUs to offline/online in a system more than 128 cpus
* When using virsh to destroy domain in a system with many interfaces
This patch sets the buffer size for all netlink sockets created by
libnl to 128K and turns on message peeking for nl_recv(). This
eliminates the "No buffer space available" errors seen in the cases
above, and also preempts other future errors the smaller buffers could
have caused.
Signed-off-by: Leno Hou <houqy@linux.vnet.ibm.com> Signed-off-by: Laine Stump <laine@laine.org>
Pavel Hrdina [Mon, 11 Jan 2016 11:40:32 +0000 (12:40 +0100)]
device: cleanup input device code
The current code was a little bit odd. At first we've removed all
possible implicit input devices from domain definition to add them later
back if there was any graphics device defined while parsing XML
description. That's not all, while formating domain definition to XML
description we at first ignore any input devices with bus different to
USB and VIRTIO and few lines later we add implicit input devices to XML.
This seems to me as a lot of code for nothing. This patch may look
to be more complicated than original approach, but this is a preferred
way to modify/add driver specific stuff only in those drivers and not
deal with them in common parsing/formating functions.
The update is to add those implicit input devices into config XML to
follow the real HW configuration visible by guest OS.
There was also inconsistence between our behavior and QEMU's in the way,
that in QEMU there is no way how to disable those implicit input devices
for x86 architecture and they are available always, even without graphics
device. This applies also to XEN hypervisor. VZ driver already does its
part by putting correct implicit devices into live XML.
Michal Privoznik [Tue, 26 Jan 2016 16:37:29 +0000 (17:37 +0100)]
vircgroup: Finish renaming of virCgroupIsolateMount
In dc576025c360 we renamed virCgroupIsolateMount function to
virCgroupBindMount. However, we forgot about one occurrence in
section of the code which provides stubs for platforms without
support for CGroups like *BSD for instance.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
lxc: don't try to hide parent cgroups inside container
On the host when we start a container, it will be
placed in a cgroup path of
/machine.slice/machine-lxc\x2ddemo.scope
under /sys/fs/cgroup/*
Inside the containers' namespace we need to setup
/sys/fs/cgroup mounts, and currently will bind
mount /machine.slice/machine-lxc\x2ddemo.scope on
the host to appear as / in the container.
While this may sound nice, it confuses applications
dealing with cgroups, because /proc/$PID/cgroup
now does not match the directory in /sys/fs/cgroup
This particularly causes problems for systems and
will make it create repeated path components in
the cgroup for apps run in the container eg
This also causes any systemd service that uses
sd-notify to fail to start, because when systemd
receives the notification it won't be able to
identify the corresponding unit it came from.
In particular this break rabbitmq-server startup
Future kernels will provide proper cgroup namespacing
which will handle this problem, but until that time
we should not try to play games with hiding parent
cgroups.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The VIR_DOMAIN_STATS_VCPU flag to virDomainListGetStats
enables reporting of stats about vCPUs. Currently we
only report the cumulative CPU running time and the
execution state.
This adds reporting of the wait time - time the vCPU
wants to run, but the host scheduler has something else
running ahead of it.
In implementing this I notice our reporting of CPU execute
time has very poor granularity, since we are getting it
from /proc/$PID/stat. As a future enhancement we should
prefer to get CPU execute time from /proc/$PID/schedstat
or /proc/$PID/sched (if either exist on the running kernel)
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Peter Krempa [Mon, 4 Jan 2016 18:46:31 +0000 (19:46 +0100)]
test: Touch up error message when attempting to pin invalid vCPU
Report
error: invalid argument: requested vcpu '100' is not present in the domain
instead of
error: invalid argument: requested vcpu is higher than allocated vcpus
Peter Krempa [Mon, 4 Jan 2016 16:00:51 +0000 (17:00 +0100)]
tests: qemuxml2xml: Order pinning information numerically
A future patch will refactor the storage of the pinning information in a
way where the ordering will be lost. Order them numerically to avoid
changing the tests later.
Peter Krempa [Mon, 7 Dec 2015 11:49:12 +0000 (12:49 +0100)]
virsh: cpu-stats: Remove unneeded flags
virDomainGetCPUStats doesn't support flags so there's no need to carry
the 'flags' variable around. Additionally since the API is poorly
designed I doubt that it will be extended.
Michal Privoznik [Mon, 25 Jan 2016 06:57:21 +0000 (07:57 +0100)]
virt-host-validate: Fix error level for user namespace check
From the code it seems to me that we need user namespace if
configured in domain XML. Otherwise we don't use it at all.
However our tool is more strict about that. Fix this discrepancy.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>