Michal Privoznik [Wed, 26 Feb 2020 08:24:27 +0000 (09:24 +0100)]
virbpf: Set errno instead of reporting errors
The virbpf module wraps syscalls to BPF. However, if the kernel
headers used at the compile time don't have support for BPF the
module offers stubs which return a negative one to signal error
to the caller. But there is a slight discrepancy between real
functions and these stubs. While the former set errno and return
-1 the latter report an error (without setting the errno) and
return -1. This is not optimal because the caller might see stale
errno and overwrite the error message with a less accurate one.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
virCgroupV2DevicesAvailable: Print stringified errno in the debug log
In the virCgroupV2DevicesAvailable() function we try to determine
whether CGroups version 2 are available. We do this by opening
what we believe is the CGroup mount point and issuing a BPF call.
When the call fails, a debug message is printed. However, the BPF
call sets errno too. Include it in the debug message to help us
with debugging.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
virDomainDiskTranslateSourcePool: Check for disk type correctly
When rewriting the virDomainDiskTranslateSourcePool() function in
v6.1.0-rc1~184 a typo was introduced. Previously, we allowed
startup policy only for those volumes which translated to
VIR_STORAGE_TYPE_FILE. But starting with the referenced commit,
the value we checked for was changed to VIR_STORAGE_VOL_FILE
which comes from a different enum and has a different value too.
This is wrong, because virStorageSourceGetActualType() returns a
value from the original enum.
Michal Privoznik [Thu, 27 Feb 2020 10:20:51 +0000 (11:20 +0100)]
qemu: Tell secdrivers which images are top parent
When preparing images for block jobs we modify their seclabels so
that QEMU can open them. However, as mentioned in the previous
commit, secdrivers base some it their decisions whether the image
they are working on is top of of the backing chain. Fortunately,
in places where we call secdrivers we know this and the
information can be passed to secdrivers.
The problem is the following: after the first blockcommit from
the base to one of the parents the XATTRs on the base image are
not cleared and therefore the second attempt to do another
blockcommit fails. This is caused by blockcommit code calling
qemuSecuritySetImageLabel() over the base image, possibly
multiple times (to ensure RW/RO access). A naive fix would be to
call the restore function. But this is not possible, because that
would deny QEMU the access to the base image. Fortunately, we
can use the fact that seclabels are remembered only for the top
of the backing chain and not for the rest of the backing chain.
And thanks to the previous commit we can tell secdrivers which
images are top of the backing chain.
Michal Privoznik [Thu, 27 Feb 2020 10:06:22 +0000 (11:06 +0100)]
security: Introduce VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP flag
Our decision whether to remember seclabel for a disk image
depends on a few factors. If the image is readonly or shared or
not the chain top the remembering is suppressed for the image.
However, the virSecurityManagerSetImageLabel() is too low level
to determine whether passed @src is chain top or not. Even though
the function has the @parent argument it does not necessarily
reflect the chain top - it only points to the top level image in
the chain we want to relabel and not to the topmost image of the
whole chain. And this can't be derived from the passed domain
definition reliably neither - in some cases (like snapshots or
block copy) the @src is added to the definition only after the
operation succeeded. Therefore, introduce a flag which callers
can use to help us with the decision.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Zhimin Feng [Mon, 2 Mar 2020 08:26:51 +0000 (16:26 +0800)]
rpc: getaddrinfo: also accept IPv4-mapped IPv6 addresses
If only IPv6 is configured on the host, getaddrinfo with AI_ADDRCONFIG
in hints would return EAI_ADDRFAMILY for nodenames that resolve to IPv4.
Also pass AI_V4MAPPED to accept IPv4-mapped addresses on IPv6-only
systems.
Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com>
[rewrote the commit message - jtomko] Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
logging: Use default timeout of 120 seconds for virtlogd
This is the same timeout of all other daemons, and just like them
virtlogd is socket-activated, so it will automatically be started
on demand whenever that's necessary.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
util: add API for reading password from the console
the fact that "bufptr" pointer may point to either heap or stack
allocated data was overlooked. As a result, when the strdup was
removed, we ended up returning a pointer to the local stack to
the caller. When the caller referenced this stack pointer they
got out garbage which fairly quickly resulted in a crash.
We need to copy the stack buffer into heap memory in the username
case.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
virthread: Free thread name only after worker has finished
When spawning a thread via our virThread APIs we let pthread
spawn this helper thread which sets couple of thread local
variables (e.g. thread job name or thread worker name) and as of v6.1.0-40-gc85256b31b it also sets pthread name (which is then
visible in `ps' output for instance). Only after these steps the
intended function is called. However, just before calling it we
free the buffer that holds the thread name which results in
invalid memory reads:
==47027== Invalid read of size 1
==47027== at 0x48389C2: strlen (vg_replace_strmem.c:459)
==47027== by 0x58BB3D6: __vfprintf_internal (vfprintf-internal.c:1645)
==47027== by 0x58CE6E0: __vasprintf_internal (vasprintf.c:57)
==47027== by 0x574BA28: g_vasprintf (in /usr/lib64/libglib-2.0.so.0.6000.7)
==47027== by 0x57240CC: g_strdup_vprintf (in /usr/lib64/libglib-2.0.so.0.6000.7)
==47027== by 0x48E0EFA: vir_g_strdup_vprintf (glibcompat.c:209)
==47027== by 0x493AA05: virLogVMessage (virlog.c:573)
==47027== by 0x493A8FE: virLogMessage (virlog.c:513)
==47027== by 0x4992FC7: virThreadJobClear (virthreadjob.c:121)
==47027== by 0x4992844: virThreadHelper (virthread.c:237)
==47027== by 0x5817496: start_thread (pthread_create.c:486)
==47027== by 0x59563CE: clone (clone.S:95)
The problem is that neither virThreadJobSetWorker() nor
virThreadJobSet() create a copy of passed name. They just set a
thread local variable to point to the buffer which is then
freed. Moving the free towards the end of the wrapper function
solves the issue.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Now that we have more than just the libvirtd daemon, we should be
explaining to users what they are all for & important aspects of their
configuration.
Reviewed-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Thu, 5 Mar 2020 08:42:23 +0000 (09:42 +0100)]
VIR_FREE: Replace internals by g_clear_pointer
Our implementation masks GCC warnings of uninitialized use of the passed
argument. After changing this I got a load of following warnings:
src/conf/virnetworkportdef.c: In function 'virNetworkPortDefSaveStatus':
/usr/include/glib-2.0/glib/gmem.h:136:8: error: 'path' may be used uninitialized in this function [-Werror=maybe-uninitialized]
136 | if (_p) \
| ^
src/conf/virnetworkportdef.c:447:11: note: 'path' was declared here
447 | char *path;
| ^~~~
For the curious, g_clear_pointer is still safe for arguments with
side-effect. Here's the pre-processed output of trying to do a
VIR_FREE(*(test2++)):
src: improve thread naming with human targetted names
Historically threads are given a name based on the C function,
and this name is just used inside libvirt. With OS level thread
naming this name is now visible to debuggers, but also has to
fit in 15 characters on Linux, so function names are too long
in some cases.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Libvirt has never configured the QEMU agent to support
running on a PTY implicitly. In theory an end user may
have written such an XML config, but this is reasonably
unlikely since when a bare <channel> is provided, libvirt
will auto-expand it to a UNIX socket backend.
With this change a user who has use the PTY backend will
have to switch to the UNIX backend if they wish to use
libvirt APIs for interacting with the agent. This will
not have guest ABI impact.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Wed, 4 Mar 2020 11:19:39 +0000 (12:19 +0100)]
qemuhotplugtestcpus: Always use 'query-cpus-fast'
Use the new command in the test suite by asserting the capability
and adjusting test data to the correct field names as they changed
compared to 'query-cpus'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Peter Krempa [Wed, 4 Mar 2020 09:10:56 +0000 (10:10 +0100)]
qemuMonitorJSONSetMigrationParams: Refactor command construction and cleanup
qemuMonitorJSONMakeCommandInternal does the full command construction if
you pass in what would become the value of the 'arguments' key. Refactor
the open-coded implementation to use the helper and use modern cleanup
helpers at the same time.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Michal Privoznik [Fri, 28 Feb 2020 14:36:49 +0000 (15:36 +0100)]
qemu_shim: Ignore SIGPIPE
I've found that if my virtlogd is socket activated but the daemon
doesn't run yet, then the virt-qemu-run is killed right after it
tries to start the domain. The problem is that because the default
setting is to use virtlogd, the domain create code tries to
connect to virtlogd socket, which in turn tries to detect who is
connecting (virNetSocketGetUNIXIdentity()) and as a part of it,
it will try to open /proc/${PID_OF_SHIM}/stat which is denied by
SELinux:
Virtlogd reacts by closing the connection which the shim sees as
SIGPIPE. Since the default response to the signal is Term, we
don't even get to reporting any error nor to removing the
temporary directory.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Michal Privoznik [Fri, 28 Feb 2020 13:42:44 +0000 (14:42 +0100)]
qemu_shim: Allow other users to enter the root dir
When virt-qemu-run is ran without any root directory specified on
the command line, a temporary directory is made and used instead.
But since we are using g_dir_make_tmp() to create the directory
it is going to have 0700 mode. So even though we create the whole
directory structure under it and label everything, QEMU is very
likely to not have the access. This is because in this case there
is no qemu.conf and thus distro default UID:GID is used to run
QEMU (e.g. qemu:kvm on Fedora). Change the mode of the temporary
directory so that everybody has eXecute permission.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Michal Privoznik [Tue, 25 Feb 2020 14:53:12 +0000 (15:53 +0100)]
qemu: Don't compare local and remote hostnames on migration
Libvirt tries to forbid migration onto the same host and it does
that by checking if local and remote hostnames are the same and
whether local and remote UUIDs are the same. Well, the latter
makes sense but the former doesn't really because libvirtd can be
running inside an UTS namespace and hostnames can appear the same
on both sides of migration. On the other hand, host UUIDs are
unique, so rely on them when trying to prevent migration onto the
same host.
Gaurav Agrawal [Wed, 4 Mar 2020 18:06:13 +0000 (23:36 +0530)]
admin: use g_autofree
Signed-off-by: Gaurav Agrawal <agrawalgaurav@gnome.org>
[removed dead assignment] Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Ján Tomko [Wed, 4 Mar 2020 15:48:27 +0000 (16:48 +0100)]
tests: do not include skipped tests in failedTests
We recognize three return values from tests:
* OK -> 0
* SKIP -> EXIT_AM_SKIP
* ERROR -> anything else
Also check for EXIT_AM_SKIP when building a bitmap of failed tests,
otherwise the skipped tests would be printed in the suggested range
of tests that shoud be re-run.
Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: cebb468ef5e82b8d4253e27ef70c67812cf93c5a Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Ján Tomko [Tue, 25 Feb 2020 14:44:23 +0000 (15:44 +0100)]
conf: only allow virtio bus for input passthrough
Other buses are not supported.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1724928 Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Ján Tomko [Tue, 6 Aug 2019 16:21:32 +0000 (18:21 +0200)]
qemu: build vhost-user-fs device command line
Format the 'vhost-user-fs' device on the QEMU command line.
This device provides shared file system access using the FUSE protocol
carried over virtio.
The actual file server is implemented in an external vhost-user-fs device
backend process.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Tue, 28 Jan 2020 14:38:52 +0000 (15:38 +0100)]
qemu: use the vhost-user schemas to find binary
Look into /usr/share/qemu/vhost-user to see whether we can find
a suitable virtiofsd binary, in case the user did not provide one
in the domain XML.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Thu, 30 Jan 2020 16:28:27 +0000 (17:28 +0100)]
qemu: forbid migration with vhost-user-fs device
This is not yet supported.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Wed, 11 Dec 2019 21:30:06 +0000 (22:30 +0100)]
qemu: add virtiofsd_debug to qemu.conf
Add a 'virtiofsd_debug' option for tuning whether to run virtiofsd
in debug mode.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Ján Tomko [Tue, 21 Jan 2020 08:02:58 +0000 (09:02 +0100)]
schema: wrap fsDriver in a choice group
Allow adding new groups without changing indentation.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
Peter Krempa [Thu, 27 Feb 2020 08:08:26 +0000 (09:08 +0100)]
kbase: backing_chains: Add steps how to securely probe image format
We document steps how to fix images if they are rejected for missing
the 'backing file format' field. Document also how to securely probe
the image format if it's unknown.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Pavel Hrdina [Wed, 26 Feb 2020 12:23:00 +0000 (13:23 +0100)]
daemon: set default memlock limit for systemd service
The default memlock limit is 64k which is not enough to start a single
VM. The requirements for one VM are 12k, 8k for eBPF map and 4k for eBPF
program, however, it fails to create eBPF map and program with 64k limit.
By testing I figured out that the minimal limit is 80k to start a single
VM with functional eBPF and if I add 12k I can start another one.
This leads into following calculation:
80k as memlock limit worked to start a VM with eBPF which means there
is 68k of lock memory that I was not able to figure out what was using
it. So to get a number for 4096 VMs:
68 + 12 * 4096 = 49220
If we round it up we will get 64M of memory lock limit to support 4096
VMs with default map size which can hold 64 entries for devices.
This should be good enough as a sane default and users can change it if
the need to.
Jiri Denemark [Tue, 25 Feb 2020 15:05:06 +0000 (16:05 +0100)]
qemu: Do not set default CPU for archs without CPU driver
Whenever there is a guest CPU configured in domain XML, we will call
some CPU driver APIs to validate the CPU definition and check its
compatibility with the hypervisor. Thus domains with guest CPU
specification can only be started if the guest architecture is supported
by the CPU driver. But we would add a default CPU to any domain as long
as QEMU reports it causing failures to start any domain on affected
architectures.
Peter Krempa [Mon, 17 Feb 2020 09:08:25 +0000 (10:08 +0100)]
virStorageFileGetMetadataRecurse: Allow format probing under special circumstances
Allow format probing to work around lazy clients which did not specify
their format in the overlay. Format probing will be allowed only, if we
are able to probe the image, the probing result was successful and the
probed image does not have any backing or data file.
This relaxes the restrictions which were imposed in commit 3615e8b39bad
in cases when we know that the image probing will not result in security
issues or data corruption.
We perform the image format detection and in the case that we were able
to probe the format and the format does not specify a backing store (or
doesn't support backing store) we can use this format.
With pre-blockdev configurations this will restore the previous
behaviour for the images mentioned above as qemu would probe the format
anyways. It also improves error reporting compared to the old state as
we now report that the backing chain will be broken in case when there
is a backing file.
In blockdev configurations this ensures that libvirt will not cause data
corruption by ending the chain prematurely without notifying the user,
but still allows the old semantics when the users forgot to specify the
format.
Users thus don't have to re-invent when image format detection is safe
to do.
The price for this is that libvirt will need to keep the image format
detector still current and working or replace it by invocation of
qemu-img.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Tue, 25 Feb 2020 12:28:10 +0000 (13:28 +0100)]
qemu: domain: Convert detected 'iso' image format into 'raw'
While our code can detect ISO as a separate format, qemu does not use it
as such and just passes it through as raw. Add conversion for detected
parts of the backing chain so that the validation code does not reject
it right away.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Julio Faracco [Mon, 24 Feb 2020 14:24:27 +0000 (11:24 -0300)]
lxc: Replacing default strings definitions by g_autofree statement
There are a lots of strings being handled inside some LXC functions.
They can be moved to g_autofree to avoid declaring a return value to get
proper code cleanups. This commit is changing functions from
lxc_{controller,cgroup,fuse} only.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Ján Tomko [Sat, 22 Feb 2020 12:56:19 +0000 (13:56 +0100)]
tests: libxl: do not run the emulator
Ever since commit c5a00350 the libxl parser invokes the emulator
to probe which device model to use.
Commit b90c4b5 introduced a workaround that used a stable path
which was very likely to result in the answer matching the default.
However the test is still affected by the host state and the binary
gets invoked if present.
Mock the libxlDomainGetEmulatorType function to stop wasting CPU
cycles every time a 'make check' is run on a system with xen installed.
For example xlconfigtest gets faster by 90 %
Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: b90c4b5f505698d600303c5b4f03f5d229b329dd Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Ján Tomko [Sat, 22 Feb 2020 11:44:45 +0000 (12:44 +0100)]
testutilsxen: error out on initialization failure
libxlDriverConfigNew can possibly fail on wrong
firmware values (unlikely) or on failure to create
the log directory (possible if you're debugging
tests with VIR_FILE_ACCESS)
Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: 4a4132b4625778cf80acb9c92d06351b44468ac3 Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Michal Privoznik [Thu, 20 Feb 2020 14:38:43 +0000 (15:38 +0100)]
security: Don't fail if locking a file on NFS mount fails
The way that our file locking works is that we open() the file we
want to lock and then use fcntl(fd, F_SETLKW, ...) to lock it.
The problem is, we are doing all of these as root which doesn't
work if the file lives on root squashed NFS, because if it does
then the open() fails. The way to resolve this is to make this a
non fatal error and leave callers deal with this (i.e. disable
remembering) - implemented in the previous commit.