Pavel Hrdina [Fri, 17 Aug 2018 14:49:33 +0000 (16:49 +0200)]
vircgroup: introduce virCgroupV2AddTask
In cgroups v2 we need to handle threads and processes differently.
If you need to move a process you need to write its pid into
cgrou.procs file and it will move the process with all its threads
as well. The whole process will be moved if you use tid of any thread.
In order to move only threads at first we need to create threaded group
and after that we can write the relevant thread tids into cgroup.threads
file. Threads can be moved only into cgroups that are children of
cgroup of its process.
Pavel Hrdina [Tue, 18 Sep 2018 15:49:19 +0000 (17:49 +0200)]
vircgroup: introduce virCgroupV2MakeGroup
When creating cgroup hierarchy we need to enable controllers in the
parent cgroup in order to be usable. That means writing "+{controller}"
into cgroup.subtree_control file. We can enable only controllers that
are enabled for parent cgroup, that means we need to do that for the
whole cgroup tree.
Cgroups for threads needs to be handled differently in cgroup v2. There
are two types of controllers:
- domain controllers: these cannot be enabled for threads
- threaded controllers: these can be enabled for threads
In addition there are multiple types of cgroups:
- domain: normal cgroup
- domain threaded: a domain cgroup that serves as root for threaded
cgroups
- domain invalid: invalid cgroup, can be changed into threaded, this
is the default state if you create subgroup inside
domain threaded group or threaded group
- threaded: threaded cgroup which can have domain threaded or
threaded as parent group
In order to create threaded cgroup it's sufficient to write "threaded"
into cgroup.type file, it will automatically make parent cgroup
"domain threaded" if it was only "domain". In case the parent cgroup
is already "domain threaded" or "threaded" it will modify only the type
of current cgroup. After that we can enable threaded controllers.
Pavel Hrdina [Tue, 25 Sep 2018 12:37:21 +0000 (14:37 +0200)]
vircgroup: introduce virCgroupV2DetectControllers
Cgroup v2 has only single mount point for all controllers. The list
of controllers is stored in cgroup.controllers file, name of controllers
are separated by space.
In cgroup v2 there is no cpuacct controller, the cpu.stat file always
exists with usage stats.
Pavel Hrdina [Fri, 14 Sep 2018 21:46:42 +0000 (23:46 +0200)]
vircgroup: introduce virCgroupV2DetectPlacement
If the placement was copied from parent or set to absolute path
there is nothing to do, otherwise set the placement based on
process placement from /proc/self/cgroup or /proc/{pid}/cgroup.
When reconnecting to a domain we are validating the cgroup name.
In case of cgroup v2 we need to validate only the new format for host
without systemd '{machinename}.libvirt-{drivername}' or scope name
generated by systemd.
Pavel Hrdina [Fri, 17 Aug 2018 14:28:11 +0000 (16:28 +0200)]
vircgroup: introduce virCgroupV2Available
We cannot detect only mount points to figure out whether cgroup v2
is available because systemd uses cgroup v2 for process tracking and
all controllers are mounted as cgroup v1 controllers.
To make sure that this is no the situation we need to check
'cgroup.controllers' file if it's not empty to make sure that cgroup
v2 is not mounted only for process tracking.
Pavel Hrdina [Tue, 18 Sep 2018 15:48:33 +0000 (17:48 +0200)]
util: introduce cgroup v2 files
Place cgroup v2 backend type before cgroup v1 to make it obvious
that cgroup v2 is preferred implementation.
Following patches will introduce support for hybrid configuration
which will allow us to use both at the same time, but we should
prefer cgroup v2 regardless.
Ján Tomko [Wed, 3 Oct 2018 12:10:13 +0000 (14:10 +0200)]
qemu: fix up permissions for pre-created UNIX sockets
My commit d6b8838 fixed the uid:gid for the pre-created UNIX sockets
but did not account for the different umask of libvirtd and QEMU.
Since commit 0e1a1a8c we set umask to '0002' for the QEMU process.
Manually tune-up the permissions to match what we would have gotten
if QEMU had created the socket.
GlusterFS is typically safe when it comes to migration. It's a
network FS after all. However, it can be mounted via FUSE driver
they provide. If that is the case we fail to identify it and
think migration is not safe and require VIR_MIGRATE_UNSAFE flag.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Ján Tomko [Thu, 27 Sep 2018 14:13:18 +0000 (16:13 +0200)]
security: dac: also label listen UNIX sockets
We switched to opening mode='bind' sockets ourselves:
commit 30fb2276d88b275dc2aad6ddd28c100d944b59a5
qemu: support passing pre-opened UNIX socket listen FD
in v4.5.0-rc1~251
Then fixed qemuBuildChrChardevStr to change libvirtd's label
while creating the socket:
commit b0c6300fc42bbc3e5eb0b236392f7344581c5810
qemu: ensure FDs passed to QEMU for chardevs have correct SELinux labels
v4.5.0-rc1~52
Also add labeling of these sockets to the DAC driver.
Instead of duplicating the logic which decides whether libvirt should
pre-create the socket, assume an existing path meaning that it was created
by libvirt.
Marc Hartmayer [Thu, 20 Sep 2018 17:44:48 +0000 (19:44 +0200)]
qemu: Introduce qemuDomainUpdateQEMUCaps()
This function updates the used QEMU capabilities of @vm by querying
the QEMU capabilities cache.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Marc Hartmayer [Thu, 20 Sep 2018 17:44:47 +0000 (19:44 +0200)]
qemu: Use VIR_STEAL_PTR macro
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
John Ferlan [Fri, 28 Sep 2018 00:36:58 +0000 (20:36 -0400)]
nwfilter: Alter virNWFilterSnoopReqLeaseDel logic
Move the fetch of @ipAddrLeft to after the goto skip_instantiate
and remove the (req->binding) guard since we know that as long
as req->binding is created, then req->threadkey is filled in.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com> ACKed-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Thu, 27 Sep 2018 23:09:44 +0000 (19:09 -0400)]
tests: Alter logic in testCompareXMLToDomConfig
Rather than initialize actualconfig and expectconfig before
having the possibility that libxlDriverConfigNew could fail
and thus land in cleanup, let's just move them and return
immediately upon failure.
Signed-off-by: John Ferlan <jferlan@redhat.com> ACKed-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Thu, 27 Sep 2018 22:49:41 +0000 (18:49 -0400)]
util: Data overrun may lead to divide by zero
Commit 87a8a30d6 added the function based on the virsh function,
but used an unsigned long long instead of a double and thus that
limits the maximum result.
Signed-off-by: John Ferlan <jferlan@redhat.com> ACKed-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Thu, 27 Sep 2018 22:46:36 +0000 (18:46 -0400)]
tests: Inline a sysconf call for linuxCPUStatsToBuf
While unlikely, sysconf(_SC_CLK_TCK) could fail leading to
indeterminate results for the subsequent division. So let's
just remove the # define and inline the same change.
Signed-off-by: John Ferlan <jferlan@redhat.com> ACKed-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Thu, 27 Sep 2018 21:41:07 +0000 (17:41 -0400)]
libxl: Fix possible object refcnt issue
When libxlDomainMigrationDstPrepare adds the @args to an
virNetSocketAddIOCallback using libxlMigrateDstReceive as
the target of the virNetSocketIOFunc @func with the knowledge
that the libxlMigrateDstReceive will virObjectUnref @args
at the end thus not needing to Unref during normal processing
for libxlDomainMigrationDstPrepare.
However, Coverity believes there's an issue with this. The
problem is there can be @nsocks virNetSocketAddIOCallback's
added, but only one virObjectUnref. That means the first
one done will Unref and the subsequent callers may not get
the @args (or @opaque) as they expected. If there's only
one socket returned from virNetSocketNewListenTCP, then sure
that works. However, if it returned more than one there's
going to be a problem.
To resolve this, since we start with 1 reference from the
virObjectNew for @args, we will add 1 reference for each
time @args is used for virNetSocketAddIOCallback. Then
since libxlDomainMigrationDstPrepare would be done with
@args, move it's virObjectUnref from the error: label to
the done: label (since error: falls through). That way
once the last IOCallback is done, then @args will be freed.
Signed-off-by: John Ferlan <jferlan@redhat.com> ACKed-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Thu, 27 Sep 2018 10:54:12 +0000 (06:54 -0400)]
lxc: Only check @nparams in lxcDomainBlockStatsFlags
Remove the "!params" check from the condition since it's possible
someone could pass a non NULL value there, but a 0 for the nparams
and thus continue on. The external API only checks if @nparams is
non-zero, then check for NULL @params.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com> ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Jim Fehlig [Wed, 26 Sep 2018 17:31:19 +0000 (11:31 -0600)]
tests: reintroduce tests for libxl's legacy nested setting
The preferred location for setting the nested CPU flag changed in
Xen 4.10 and is advertised via the LIBXL_HAVE_BUILDINFO_NESTED_HVM
define. Commit 95d19cd0 changed libxl to use the new preferred
location but unconditionally changed the tests, causing 'make check'
failures against Xen < 4.10 that do not contain the new location.
Commit e94415d5 fixed the failures by only running the tests when
LIBXL_HAVE_BUILDINFO_NESTED_HVM is defined. Since libvirt supports
several versions of Xen that use the old nested location, it is
prudent to test the flag is set correctly. This patch reintroduces
the tests for the legacy location of the nested setting.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Julio Faracco [Fri, 28 Sep 2018 04:13:28 +0000 (01:13 -0300)]
uml: umlConnectOpen: Check the driver pointer before accessing it
The pointer related to uml_driver needs to be checked before its usage
inside the function. Some attributes of the driver are being accessed
while the pointer is NULL considering the current logic.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Michal Privoznik [Thu, 27 Sep 2018 13:03:03 +0000 (15:03 +0200)]
qemu: Temporarily disable metadata locking
Turns out, there are couple of bugs that prevent this feature
from being operational. Given how close to the release we are
disable the feature temporarily. Hopefully, it can be enabled
back after all the bugs are fixed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Pavel Hrdina [Thu, 27 Sep 2018 10:19:17 +0000 (12:19 +0200)]
vircgroup: include system headers only on linux
All the system headers are used only if we are compiling on linux
and they all are present otherwise we would have seen build errors
because in our tests/vircgrouptest.c we use only __linux__ to check
whether to skip the cgroup tests or not.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Pavel Hrdina [Thu, 27 Sep 2018 10:16:57 +0000 (12:16 +0200)]
vircgroup: remove VIR_CGROUP_SUPPORTED
tests/vircgrouptest.c uses #ifdef __linux__ for a long time and no
failure was reported so far so it's safe to assume that __linux__ is
good enough to guard cgroup code.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
virsh: Require explicit --domain for domxml-to-native
The domxml-to-native virsh command accepts either --xml or --domain
option followed by a file or domain name respectively. The --domain
option is documented as required, which means an argument with no option
is treated as --xml. Commit v4.3.0-127-gd86531daf2 broke this by making
--domain optional and thus an argument with no option was treated as
--domain.
Michal Privoznik [Mon, 24 Sep 2018 13:22:25 +0000 (15:22 +0200)]
security: Don't try to lock NULL paths
It may happen that in the list of paths/disk sources to relabel
there is a disk source. If that is the case, the path is NULL. In
that case, we shouldn't try to lock the path. It's likely a
network disk anyway and therefore there is nothing to lock.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
Michal Privoznik [Thu, 20 Sep 2018 12:22:13 +0000 (14:22 +0200)]
security: Grab a reference to virSecurityManager for transactions
This shouldn't be needed per-se. Security manager shouldn't
disappear during transactions - it's immutable. However, it
doesn't hurt to grab a reference either - transaction code uses
it after all.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Marc Hartmayer [Fri, 21 Sep 2018 13:02:04 +0000 (15:02 +0200)]
virdbus: Use the mnemonic macros for dbus_bool_t values
Use the mnemonic macros of libdbus for 1 (TRUE) and 0 (FALSE).
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Marc Hartmayer [Fri, 21 Sep 2018 13:02:03 +0000 (15:02 +0200)]
virdbus: Report a debug message that dbus_watch_handle() has failed
Report a debug message if dbus_watch_handle() returns FALSE.
dbus_watch_handle() returns FALSE if there wasn't enough memory for
reading or writing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Marc Hartmayer [Fri, 21 Sep 2018 13:02:02 +0000 (15:02 +0200)]
virdbus: Unref the D-Bus connection when closing
As documented at
https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga2522ac5075dfe0a1535471f6e045e1ee
the creator of a non-shared D-Bus connection has to release the last
reference after closing for freeing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Marc Hartmayer [Fri, 21 Sep 2018 13:02:01 +0000 (15:02 +0200)]
virdbus: Grab a ref as long as the while loop is executed
Grab a ref for info->bus (a DBus connection) as long as the while loop
is running. With the grabbed reference it is ensured that info->bus
isn't freed as long as the while loop is executed. This is necessary
as it's allowed to drop the last ref for the bus connection in a
handler.
There was already a bug of this kind in libdbus itself:
https://bugs.freedesktop.org/show_bug.cgi?id=15635.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
qemu: Avoid duplicate resume events and state changes
The only place where VIR_DOMAIN_EVENT_RESUMED should be generated is the
RESUME event handler to make sure we don't generate duplicate events or
state changes. In the worse case the duplicity can revert or cover
changes done by other event handlers.
For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events
we could happily mark the domain as running and report
VIR_DOMAIN_EVENT_RESUMED to registered clients.
Thanks to the previous commit the RESUME event handler knows what reason
should be used when changing the domain state to VIR_DOMAIN_RUNNING, but
the emitted VIR_DOMAIN_EVENT_RESUMED event still uses a generic
VIR_DOMAIN_EVENT_RESUMED_UNPAUSED detail. Luckily, the event detail can
be easily deduced from the running reason, which saves us from having to
pass one more value to the handler.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Whenever we get the RESUME event from QEMU, we change the state of the
affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED
reason. This is fine if the domain is resumed unexpectedly, but when we
sent "cont" to QEMU we usually have a better reason for the state
change. The better reason is used in qemuProcessStartCPUs which also
sets the domain state to running if qemuMonitorStartCPUs reports
success. Thus we may end up with two state updates in a row, but the
final reason is correct.
This patch is a preparation for dropping the state change done in
qemuMonitorStartCPUs for which we need to pass the actual running reason
to the RESUME event handler and use it there instead of
VIR_DOMAIN_RUNNING_UNPAUSED.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch replaces some rather generic VIR_DOMAIN_RUNNING_UNPAUSED
reasons when changing domain state to running with more specific ones.
All of them are done when libvirtd reconnects to an existing domain
after being restarted and sees an unfinished migration or save.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT was defined but not used anywhere
in our event generation code. This fixes qemuDomainRevertToSnapshot to
properly report why the domain was resumed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Mark Asselstine [Mon, 24 Sep 2018 15:11:35 +0000 (11:11 -0400)]
lxc_monitor: Avoid AB / BA lock race
A deadlock situation can occur when autostarting a LXC domain 'guest'
due to two threads attempting to take opposing locks while holding
opposing locks (AB BA problem). Thread A takes and holds the 'vm' lock
while attempting to take the 'client' lock, meanwhile, thread B takes
and holds the 'client' lock while attempting to take the 'vm' lock.
Since these threads are scheduled independently and are preemptible it
is possible for the deadlock scenario to occur where each thread locks
their first lock but both will fail to get their second lock and just
spin forever. You get something like:
virLXCProcessAutostartDomain (takes vm lock)
--> virLXCProcessStart
--> virLXCProcessConnectMonitor
--> virLXCMonitorNew
<...>
virNetClientIncomingEvent (takes client lock)
--> virNetClientIOHandleInput
--> virNetClientCallDispatch
--> virNetClientCallDispatchMessage
--> virNetClientProgramDispatch
--> virLXCMonitorHandleEventInit
--> virLXCProcessMonitorInitNotify (wants vm lock but spins)
<...>
--> virNetClientSetCloseCallback (wants client lock but spins)
Neither thread ever gets the lock it needs to be able to continue
while holding the lock that the other thread needs.
The actual window for preemption which can cause this deadlock is
rather small, between the calls to virNetClientProgramNew() and
execution of virNetClientSetCloseCallback(), both in
virLXCMonitorNew(). But it can be seen in real world use that this
small window is enough.
By moving the call to virNetClientSetCloseCallback() ahead of
virNetClientProgramNew() we can close any possible chance of the
deadlock taking place. There should be no other implications to the
move since the close callback (in the unlikely event was called) will
spin on the vm lock. The remaining work that takes place between the
old call location of virNetClientSetCloseCallback() and the new
location is unaffected by the move.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>