Peter Krempa [Wed, 19 Feb 2020 07:38:49 +0000 (08:38 +0100)]
qemu: blockcopy: Allow late opening of the backing chain of a shallow copy
oVirt used a quirk in the pre-blockdev semantics of drive-mirror which
opened the backing chain of the mirror destination only once
'block-job-complete' was called.
Our introduction of blockdev made qemu open the backing chain images
right at the start of the job. This broke oVirt's usage of this API
because they copy the data into the backing chain during the time the
block copy job is running.
Re-introduce late open of the backing chain if qemu allows us to use
blockdev-snapshot on write-only nodes as it can be used to install the
backing chain even for an existing image now.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Peter Krempa [Thu, 27 Feb 2020 14:50:08 +0000 (15:50 +0100)]
qemuDomainBlockCopyCommon: Record updated flags to block job
For a long time we've masked out VIR_DOMAIN_BLOCK_COPY_SHALLOW if
there's no backing chain for the copied disk to simplify the code.
One of the refactors of the block copy code caused that we no longer
update the 'flags' variable just the local copies. This was okay until
in ccd4228afff we started storing the job flags in the block job data.
Given that we modify how we call qemu we also should modify @flags so
that the correct value is recorded in the block job data.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Peter Krempa [Tue, 18 Feb 2020 16:10:46 +0000 (17:10 +0100)]
qemuDomainBlockPivot: Move check prior to executing the pivot steps
Move the check whether the job is already synchronised to the beginning
of the function so that we don't try to do some of the steps necessary
for pivoting prior to actually wanting to pivot.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
util: ensure min/maj are initialized in virGetDeviceID
The stub impl of virGetDeviceID just returns ENOSYS and does not
initialize the min/maj output parameters. This lead to a false
positive warning on mingw about possible use of uninitialized
variables.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
tests: fix double unlock of monitor in hotplug test
The qemuMonitorTestNew() function returns with the monitor object
locked, and expects it to still be locked when qemuMonitorTestFree
is called. The qemuhotplug test, however, explicitly unlocks the
monitor, but then forgets to lock it again. As a result the
qemuMonitorTestFree function is unlocking a mutex it doesn't own.
This bug has existed forever, but since we use normal POSIX mutexes
and don't check the return value of pthread_mutex_lock/unlock we
didn't see the error. It was harmless until the switch to the per
monitor event loop which requires the thread synchronization to
work reliably, whereupon it started crashing.
Reviewed-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Fri, 28 Feb 2020 16:12:41 +0000 (17:12 +0100)]
conf: Don't generate machine names with a dot
According to the linked BZ, machined expects either valid
hostname or valid FQDN (see systemd commit v239-3092-gd65652f1f2). While in case of multiple dots, a
trailing one doesn't violate FQDN, it does violate the rule in
case of something simple, like "domain.". But it's safe to remove
it in both cases.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1808499 Fixes: 45464db8ba502764cf37ec9335770248bdb3d9a8 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Michal Privoznik [Wed, 11 Mar 2020 15:55:24 +0000 (16:55 +0100)]
virQEMUCaps: Drop unused usedQMP member
The virQEMUCaps structure has usedQMP member which in the past
used to tell if qemu we are dealing with is capable of QMP. Well,
we don't support HMP anymore (minus a few HMP passthrough
commands, which are wrapped into QMP anyways) and the member is
not used really.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
qemu: remove redundant needReply argument of qemuAgentCommand
needReply added in [1] looks redundant. Indeed it is set to false only
when mon->await_event is set too (the only exception qemuAgentFSTrim
which is mistaken).
However it fixes the issue when qemuAgentCommand exits on error path and
mon->await_event is not reset. Let's instead reset mon->await_event properly.
Also remove "Woken up by event" debug message as it can be misleading.
We can get it also if monitor is closed due to serial changed event
currently. Anyway both qemuAgentClose and qemuAgentNotifyEvent log
itself.
[1] qemu: make sure agent returns error when required data are missing
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
qemu: agent: sync once if qemu has serial port event
Sync was introduced in [1] to check for ga presence. This
check is racy but in the era before serial events are available
there was not better solution I guess.
In case we have the events the sync function is different. It allows us
to flush stateless ga channel from remnants of previous communications.
But we need to do it only once. Until we get timeout on issued command
channel state is ok.
[1] qemu_agent: Issue guest-sync prior to every command
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Wed, 13 Nov 2019 14:34:50 +0000 (15:34 +0100)]
qemu: Create multipath targets for PRs
If a disk has persistent reservations enabled, qemu-pr-helper
might open not only /dev/mapper/control but also individual
targets of the multipath device. We are already querying for them
in CGroups, but now we have to create them in the namespace too.
This was brought up in [1].
tests: start/stop an event thread for QEMU monitor/agent tests
Tests which are using the QEMU monitor / agent need to have an
event thread running a private GMainContext.
There is already a thread running the main libvirt event loop
but this can't be eliminated yet as it is used for more than
just the monitor client I/O.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The event loop thread will be responsible for handling
any per-domain I/O operations, most notably the QEMU
monitor and agent sockets.
We start this event loop when launching QEMU, but stopping
the event loop is a little more complicated. The obvious
idea is to stop it in qemuProcessStop(), but if we do that
we risk loosing the final events from the QEMU monitor, as
they might not have been read by the event thread at the
time we tell the thread to stop.
The solution is to delay shutdown of the event thread until
we have seen EOF from the QEMU monitor, and thus we know
there are no further events to process.
Note that this assumes that we don't have events to process
from the QEMU agent.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Wed, 26 Feb 2020 08:24:27 +0000 (09:24 +0100)]
virbpf: Set errno instead of reporting errors
The virbpf module wraps syscalls to BPF. However, if the kernel
headers used at the compile time don't have support for BPF the
module offers stubs which return a negative one to signal error
to the caller. But there is a slight discrepancy between real
functions and these stubs. While the former set errno and return
-1 the latter report an error (without setting the errno) and
return -1. This is not optimal because the caller might see stale
errno and overwrite the error message with a less accurate one.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
virCgroupV2DevicesAvailable: Print stringified errno in the debug log
In the virCgroupV2DevicesAvailable() function we try to determine
whether CGroups version 2 are available. We do this by opening
what we believe is the CGroup mount point and issuing a BPF call.
When the call fails, a debug message is printed. However, the BPF
call sets errno too. Include it in the debug message to help us
with debugging.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
virDomainDiskTranslateSourcePool: Check for disk type correctly
When rewriting the virDomainDiskTranslateSourcePool() function in
v6.1.0-rc1~184 a typo was introduced. Previously, we allowed
startup policy only for those volumes which translated to
VIR_STORAGE_TYPE_FILE. But starting with the referenced commit,
the value we checked for was changed to VIR_STORAGE_VOL_FILE
which comes from a different enum and has a different value too.
This is wrong, because virStorageSourceGetActualType() returns a
value from the original enum.
Michal Privoznik [Thu, 27 Feb 2020 10:20:51 +0000 (11:20 +0100)]
qemu: Tell secdrivers which images are top parent
When preparing images for block jobs we modify their seclabels so
that QEMU can open them. However, as mentioned in the previous
commit, secdrivers base some it their decisions whether the image
they are working on is top of of the backing chain. Fortunately,
in places where we call secdrivers we know this and the
information can be passed to secdrivers.
The problem is the following: after the first blockcommit from
the base to one of the parents the XATTRs on the base image are
not cleared and therefore the second attempt to do another
blockcommit fails. This is caused by blockcommit code calling
qemuSecuritySetImageLabel() over the base image, possibly
multiple times (to ensure RW/RO access). A naive fix would be to
call the restore function. But this is not possible, because that
would deny QEMU the access to the base image. Fortunately, we
can use the fact that seclabels are remembered only for the top
of the backing chain and not for the rest of the backing chain.
And thanks to the previous commit we can tell secdrivers which
images are top of the backing chain.
Michal Privoznik [Thu, 27 Feb 2020 10:06:22 +0000 (11:06 +0100)]
security: Introduce VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP flag
Our decision whether to remember seclabel for a disk image
depends on a few factors. If the image is readonly or shared or
not the chain top the remembering is suppressed for the image.
However, the virSecurityManagerSetImageLabel() is too low level
to determine whether passed @src is chain top or not. Even though
the function has the @parent argument it does not necessarily
reflect the chain top - it only points to the top level image in
the chain we want to relabel and not to the topmost image of the
whole chain. And this can't be derived from the passed domain
definition reliably neither - in some cases (like snapshots or
block copy) the @src is added to the definition only after the
operation succeeded. Therefore, introduce a flag which callers
can use to help us with the decision.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Zhimin Feng [Mon, 2 Mar 2020 08:26:51 +0000 (16:26 +0800)]
rpc: getaddrinfo: also accept IPv4-mapped IPv6 addresses
If only IPv6 is configured on the host, getaddrinfo with AI_ADDRCONFIG
in hints would return EAI_ADDRFAMILY for nodenames that resolve to IPv4.
Also pass AI_V4MAPPED to accept IPv4-mapped addresses on IPv6-only
systems.
Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com>
[rewrote the commit message - jtomko] Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
logging: Use default timeout of 120 seconds for virtlogd
This is the same timeout of all other daemons, and just like them
virtlogd is socket-activated, so it will automatically be started
on demand whenever that's necessary.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
util: add API for reading password from the console
the fact that "bufptr" pointer may point to either heap or stack
allocated data was overlooked. As a result, when the strdup was
removed, we ended up returning a pointer to the local stack to
the caller. When the caller referenced this stack pointer they
got out garbage which fairly quickly resulted in a crash.
We need to copy the stack buffer into heap memory in the username
case.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
virthread: Free thread name only after worker has finished
When spawning a thread via our virThread APIs we let pthread
spawn this helper thread which sets couple of thread local
variables (e.g. thread job name or thread worker name) and as of v6.1.0-40-gc85256b31b it also sets pthread name (which is then
visible in `ps' output for instance). Only after these steps the
intended function is called. However, just before calling it we
free the buffer that holds the thread name which results in
invalid memory reads:
==47027== Invalid read of size 1
==47027== at 0x48389C2: strlen (vg_replace_strmem.c:459)
==47027== by 0x58BB3D6: __vfprintf_internal (vfprintf-internal.c:1645)
==47027== by 0x58CE6E0: __vasprintf_internal (vasprintf.c:57)
==47027== by 0x574BA28: g_vasprintf (in /usr/lib64/libglib-2.0.so.0.6000.7)
==47027== by 0x57240CC: g_strdup_vprintf (in /usr/lib64/libglib-2.0.so.0.6000.7)
==47027== by 0x48E0EFA: vir_g_strdup_vprintf (glibcompat.c:209)
==47027== by 0x493AA05: virLogVMessage (virlog.c:573)
==47027== by 0x493A8FE: virLogMessage (virlog.c:513)
==47027== by 0x4992FC7: virThreadJobClear (virthreadjob.c:121)
==47027== by 0x4992844: virThreadHelper (virthread.c:237)
==47027== by 0x5817496: start_thread (pthread_create.c:486)
==47027== by 0x59563CE: clone (clone.S:95)
The problem is that neither virThreadJobSetWorker() nor
virThreadJobSet() create a copy of passed name. They just set a
thread local variable to point to the buffer which is then
freed. Moving the free towards the end of the wrapper function
solves the issue.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Now that we have more than just the libvirtd daemon, we should be
explaining to users what they are all for & important aspects of their
configuration.
Reviewed-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Thu, 5 Mar 2020 08:42:23 +0000 (09:42 +0100)]
VIR_FREE: Replace internals by g_clear_pointer
Our implementation masks GCC warnings of uninitialized use of the passed
argument. After changing this I got a load of following warnings:
src/conf/virnetworkportdef.c: In function 'virNetworkPortDefSaveStatus':
/usr/include/glib-2.0/glib/gmem.h:136:8: error: 'path' may be used uninitialized in this function [-Werror=maybe-uninitialized]
136 | if (_p) \
| ^
src/conf/virnetworkportdef.c:447:11: note: 'path' was declared here
447 | char *path;
| ^~~~
For the curious, g_clear_pointer is still safe for arguments with
side-effect. Here's the pre-processed output of trying to do a
VIR_FREE(*(test2++)):
src: improve thread naming with human targetted names
Historically threads are given a name based on the C function,
and this name is just used inside libvirt. With OS level thread
naming this name is now visible to debuggers, but also has to
fit in 15 characters on Linux, so function names are too long
in some cases.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Libvirt has never configured the QEMU agent to support
running on a PTY implicitly. In theory an end user may
have written such an XML config, but this is reasonably
unlikely since when a bare <channel> is provided, libvirt
will auto-expand it to a UNIX socket backend.
With this change a user who has use the PTY backend will
have to switch to the UNIX backend if they wish to use
libvirt APIs for interacting with the agent. This will
not have guest ABI impact.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Wed, 4 Mar 2020 11:19:39 +0000 (12:19 +0100)]
qemuhotplugtestcpus: Always use 'query-cpus-fast'
Use the new command in the test suite by asserting the capability
and adjusting test data to the correct field names as they changed
compared to 'query-cpus'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Peter Krempa [Wed, 4 Mar 2020 09:10:56 +0000 (10:10 +0100)]
qemuMonitorJSONSetMigrationParams: Refactor command construction and cleanup
qemuMonitorJSONMakeCommandInternal does the full command construction if
you pass in what would become the value of the 'arguments' key. Refactor
the open-coded implementation to use the helper and use modern cleanup
helpers at the same time.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Michal Privoznik [Fri, 28 Feb 2020 14:36:49 +0000 (15:36 +0100)]
qemu_shim: Ignore SIGPIPE
I've found that if my virtlogd is socket activated but the daemon
doesn't run yet, then the virt-qemu-run is killed right after it
tries to start the domain. The problem is that because the default
setting is to use virtlogd, the domain create code tries to
connect to virtlogd socket, which in turn tries to detect who is
connecting (virNetSocketGetUNIXIdentity()) and as a part of it,
it will try to open /proc/${PID_OF_SHIM}/stat which is denied by
SELinux:
Virtlogd reacts by closing the connection which the shim sees as
SIGPIPE. Since the default response to the signal is Term, we
don't even get to reporting any error nor to removing the
temporary directory.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Michal Privoznik [Fri, 28 Feb 2020 13:42:44 +0000 (14:42 +0100)]
qemu_shim: Allow other users to enter the root dir
When virt-qemu-run is ran without any root directory specified on
the command line, a temporary directory is made and used instead.
But since we are using g_dir_make_tmp() to create the directory
it is going to have 0700 mode. So even though we create the whole
directory structure under it and label everything, QEMU is very
likely to not have the access. This is because in this case there
is no qemu.conf and thus distro default UID:GID is used to run
QEMU (e.g. qemu:kvm on Fedora). Change the mode of the temporary
directory so that everybody has eXecute permission.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Michal Privoznik [Tue, 25 Feb 2020 14:53:12 +0000 (15:53 +0100)]
qemu: Don't compare local and remote hostnames on migration
Libvirt tries to forbid migration onto the same host and it does
that by checking if local and remote hostnames are the same and
whether local and remote UUIDs are the same. Well, the latter
makes sense but the former doesn't really because libvirtd can be
running inside an UTS namespace and hostnames can appear the same
on both sides of migration. On the other hand, host UUIDs are
unique, so rely on them when trying to prevent migration onto the
same host.
Gaurav Agrawal [Wed, 4 Mar 2020 18:06:13 +0000 (23:36 +0530)]
admin: use g_autofree
Signed-off-by: Gaurav Agrawal <agrawalgaurav@gnome.org>
[removed dead assignment] Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Ján Tomko [Wed, 4 Mar 2020 15:48:27 +0000 (16:48 +0100)]
tests: do not include skipped tests in failedTests
We recognize three return values from tests:
* OK -> 0
* SKIP -> EXIT_AM_SKIP
* ERROR -> anything else
Also check for EXIT_AM_SKIP when building a bitmap of failed tests,
otherwise the skipped tests would be printed in the suggested range
of tests that shoud be re-run.
Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: cebb468ef5e82b8d4253e27ef70c67812cf93c5a Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Ján Tomko [Tue, 25 Feb 2020 14:44:23 +0000 (15:44 +0100)]
conf: only allow virtio bus for input passthrough
Other buses are not supported.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1724928 Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Ján Tomko [Tue, 6 Aug 2019 16:21:32 +0000 (18:21 +0200)]
qemu: build vhost-user-fs device command line
Format the 'vhost-user-fs' device on the QEMU command line.
This device provides shared file system access using the FUSE protocol
carried over virtio.
The actual file server is implemented in an external vhost-user-fs device
backend process.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Tue, 28 Jan 2020 14:38:52 +0000 (15:38 +0100)]
qemu: use the vhost-user schemas to find binary
Look into /usr/share/qemu/vhost-user to see whether we can find
a suitable virtiofsd binary, in case the user did not provide one
in the domain XML.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Thu, 30 Jan 2020 16:28:27 +0000 (17:28 +0100)]
qemu: forbid migration with vhost-user-fs device
This is not yet supported.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Wed, 11 Dec 2019 21:30:06 +0000 (22:30 +0100)]
qemu: add virtiofsd_debug to qemu.conf
Add a 'virtiofsd_debug' option for tuning whether to run virtiofsd
in debug mode.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Tue, 21 Jan 2020 08:02:58 +0000 (09:02 +0100)]
schema: wrap fsDriver in a choice group
Allow adding new groups without changing indentation.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Peter Krempa [Thu, 27 Feb 2020 08:08:26 +0000 (09:08 +0100)]
kbase: backing_chains: Add steps how to securely probe image format
We document steps how to fix images if they are rejected for missing
the 'backing file format' field. Document also how to securely probe
the image format if it's unknown.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Pavel Hrdina [Wed, 26 Feb 2020 12:23:00 +0000 (13:23 +0100)]
daemon: set default memlock limit for systemd service
The default memlock limit is 64k which is not enough to start a single
VM. The requirements for one VM are 12k, 8k for eBPF map and 4k for eBPF
program, however, it fails to create eBPF map and program with 64k limit.
By testing I figured out that the minimal limit is 80k to start a single
VM with functional eBPF and if I add 12k I can start another one.
This leads into following calculation:
80k as memlock limit worked to start a VM with eBPF which means there
is 68k of lock memory that I was not able to figure out what was using
it. So to get a number for 4096 VMs:
68 + 12 * 4096 = 49220
If we round it up we will get 64M of memory lock limit to support 4096
VMs with default map size which can hold 64 entries for devices.
This should be good enough as a sane default and users can change it if
the need to.