Michal Privoznik [Thu, 25 Sep 2014 12:39:19 +0000 (14:39 +0200)]
qemuPrepareNVRAM: Save domain after NVRAM path generation
On a domain startup, the variable store path is generated if needed.
The path is intended to be generated only once. However, the updated
domain definition is not saved into config dir rather than state XML
only. So later, whenever the domain is destroyed and the daemon is
restarted, the generated path is forgotten and the file may be left
behind on virDomainUndefine() call.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There's no one to free() it anyway. Instead, we can just pass the
provided array pointer directly.
==20039== 48 bytes in 4 blocks are definitely lost in loss record 658 of 787
==20039== at 0x4C2A700: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20039== by 0x4EA661F: virAllocN (viralloc.c:191)
==20039== by 0x50386EF: remoteNodeGetFreePages (remote_driver.c:7625)
==20039== by 0x5003504: virNodeGetFreePages (libvirt.c:21379)
==20039== by 0x154625: cmdFreepages (virsh-host.c:374)
==20039== by 0x12F718: vshCommandRun (virsh.c:1935)
==20039== by 0x1339FB: main (virsh.c:3747)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Thu, 25 Sep 2014 15:30:28 +0000 (17:30 +0200)]
qemu: Always re-detect backing chain
Since 363e9a68 we track backing chain metadata when creating snapshots
the right way even for the inactive configuration. As we did not yet
update other code paths that modify the backing chain (blockpull) the
newDef backing chain gets out of sync.
After stopping of a VM the new definition gets copied to the next start
one. The new VM then has incorrect backing chain info. This patch
switches the backing chain detector to always purge the existing backing
chain and forces re-detection to avoid this issue until we'll have full
backing chain tracking support.
Michal Privoznik [Thu, 25 Sep 2014 15:12:46 +0000 (17:12 +0200)]
virNodeAllocPages: Disallow RO connection
Due to a missing check the API can be successfully called even if
the connection is ReadOnly. Fortunately, the API hasn't been
released yet, so there's no need for a CVE.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add files parallels_sdk.c and parallels_sdk.h for code
which works with SDK, so libvirt's code will not mix with
dealing with parallels SDK.
To use Parallels SDK you must first call PrlApi_InitEx function,
and then you will be able to connect to a server with
PrlSrv_LoginLocalEx function. When you've done you must call
PrlApi_Deinit. So let's call PrlApi_InitEx on first .connectOpen,
count number of connections and deinitialize, when this counter
becomes zero.
Executing prlctl command is not an optimal way to interact with
Parallels Cloud Server (PCS), it's better to use parallels SDK,
which is a remote API to paralles dispatcher service.
We prepared opensource version of this SDK and published it on
github, it's distributed under LGPL license. Here is a git repo:
https://github.com/Parallels/parallels-sdk.
To build with parallels SDK user should get compiler and linker
options from pkg-config 'parallels-sdk' file. So fix checks in
configure script and build with parallels SDK, if that pkg-config
file exists and add gcc options to makefile.
Michal Privoznik [Thu, 25 Sep 2014 09:50:04 +0000 (11:50 +0200)]
virnetserver: Raise log level of max_clients related messages
We have these configuration knobs, like max_clients and
max_anonymous_clients. They limit the number of clients
connected. Whenever the limit is reached, the daemon stops
accepting new ones and resumes if one of the connected clients
disconnects. If that's the case, a debug message is printed into
the logs. And when the daemon starts over to accept new clients
too. However, the problem is the messages have debug priority.
This may be unfortunate, because if the daemon stops accepting
new clients all of a sudden, and users don't have debug logs
enabled they have no idea what's going on. Raise the messages
level to INFO at least.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Pavel Hrdina [Thu, 25 Sep 2014 09:13:29 +0000 (11:13 +0200)]
polkit_driver: fix possible segfault
The changes in commit c7542573 introduced possible segfault. Looking
deeper into the code and the original code before the patch series were
applied I think that we should report error for each function failure
and also we shouldn't call some of the function twice.
Pavel Hrdina [Thu, 25 Sep 2014 09:28:25 +0000 (11:28 +0200)]
blkdeviotune: fix bug with saving values into live XML
When you updated some blkdeviotune values for running domain the values
were stored only internally, but not saved into the live XML so they
won't survive restarting the libvirtd.
Pavel Hrdina [Thu, 25 Sep 2014 08:57:24 +0000 (10:57 +0200)]
Fix build without polkit
The commit 1b854c76 introduced a new function 'virPolkitCheckAuth' and
in the #else section when you don't have polkit all attributes should be
follwed by ATTRIBUTE_UNUSED.
Pavel Hrdina [Wed, 24 Sep 2014 07:43:31 +0000 (09:43 +0200)]
tunable_event: extend debug message and tweak limit for remote message
It would be nice to also print a params pointer and number of params in
the debug message and the previous limit for number of params in the rpc
message was too large. The 2048 params will be enough for future events.
Michal Privoznik [Tue, 16 Sep 2014 16:17:22 +0000 (18:17 +0200)]
Introduce virNodeAllocPages
A long time ago in a galaxy far, far away it has been decided
that libvirt will manage not only domains but host as well. And
with my latest work on qemu driver supporting huge pages, we miss
the cherry on top: an API to allocate huge pages on the run.
Currently users are forced to log into the host and adjust the
huge pages pool themselves. However, with this API the problem
is gone - they can both size up and size down the pool.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The check for ISCSI devices was missing a check of subsys type, which
meant we could skip labelling of other host devices as well. This fixes
USB hotplug on F21
Convert polkit code to use DBus API instead of CLI helper
Spawning the pkcheck program every time a permission check is
required is hugely expensive on CPU. The pkcheck program is just
a dumb wrapper for the DBus API, so rewrite the code to use the
DBus API directly. This also simplifies error handling a bit.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are now two places in libvirt which use polkit. Currently
they use pkexec, which is set to be replaced by direct DBus API
calls. Add a common API which they will both be able to use for
this purpose.
No tests are added at this time, since the impl will be gutted
in favour of a DBus API call shortly.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ján Tomko [Thu, 11 Sep 2014 10:56:31 +0000 (12:56 +0200)]
conf: add options for disabling segment offloading
Add options for tuning segment offloading:
<driver>
<host csum='off' gso='off' tso4='off' tso6='off'
ecn='off' ufo='off'/>
<guest csum='off' tso4='off' tso6='off' ecn='off' ufo='off'/>
</driver>
which control the respective host_ and guest_ properties
of the virtio-net device.
Peter Krempa [Thu, 11 Sep 2014 16:59:32 +0000 (18:59 +0200)]
qemu: Report better errors from broken backing chains
Request erroring out from the backing chain traveller and drop qemu's
internal backing chain integrity tester.
The backing chain traveller reports errors by itself with possibly more
detail than qemuDiskChainCheckBroken ever could.
We also need to make sure that we reconnect to existing qemu instances
even at the cost of losing the backing chain info (this really should be
stored in the XML rather than reloaded from disk, but that needs some
work).
Peter Krempa [Thu, 11 Sep 2014 16:28:47 +0000 (18:28 +0200)]
util: storage: Allow metadata crawler to report useful errors
Add a new parameter to virStorageFileGetMetadata that will break the
backing chain detection process and report useful error message rather
than having to use virStorageFileChainGetBroken.
This patch just introduces the option, usage will be provided
separately.
Jim Fehlig [Mon, 8 Sep 2014 16:22:14 +0000 (10:22 -0600)]
libvirt-guests: run after time-sync.target
When libvirt-guests is configured to start guests on host
boot, it is possible for guests start and read the host
clock before it is synchronized. Services such as
libvirt-guests that require correct time should use the
Special Passive System Unit time-sync.target
Pavel Hrdina [Tue, 9 Sep 2014 14:34:12 +0000 (16:34 +0200)]
cputune_event: queue the event for cputune updates
Now we have universal tunable event so we can use it for reporting
changes to user. The cputune values will be prefixed with "cputune" to
distinguish it from other tunable events.
Pavel Hrdina [Wed, 10 Sep 2014 11:28:24 +0000 (13:28 +0200)]
event: introduce new event for tunable values
This new event will use typedParameters to expose what has been actually
updated and the reason is that we can in the future extend any tunable
values or add new tunable values. With typedParameters we don't have to
worry about creating some other events, we will just use this universal
event to inform user about updates.
Michal Privoznik [Tue, 23 Sep 2014 11:08:39 +0000 (13:08 +0200)]
qemuBuildNumaArgStr: Discard def->cpu check
In the function at one place we check if def->cpu is NULL prior
to accessing def->cpu->ncells. Then, later in the code,
def->cpu->ncells is accessed directly, without the check. This
makes coverity unhappy, because the first check makes it think
def->cpu can be NULL. However, the function is not called if
def->cpu is NULL. Therefore, remove the first check and hopefully
make coverity cheer again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michael R. Hines [Mon, 13 Jan 2014 06:28:12 +0000 (14:28 +0800)]
qemu: Memory pre-pinning support for RDMA migration
RDMA Live migration requires registering memory with the hardware, and
thus QEMU offers a new 'capability' to pre-register / mlock() the guest
memory in advance for higher RDMA performance before the migration
begins. This capability is disabled by default, which means QEMU will
register the memory with the hardware in an on-demand basis.
This patch exposes this capability with the following example usage:
Since libvirt runs QEMU in a pretty restricted environment, several
files needs to be added to cgroup_device_acl (in qemu.conf) for QEMU to
be able to access the host's infiniband hardware. Full documenation of
the feature can be found on QEMU wiki:
http://wiki.qemu.org/Features/RDMALiveMigration
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
qemu: Prepare support for arbitrary migration protocol
Currently we only support TCP protocol for native QEMU migration but
this is going to be changed. Let's make the code more general and remove
hardcoded TCP protocol from several places.
Michael R. Hines [Mon, 13 Jan 2014 06:28:10 +0000 (14:28 +0800)]
qemu: Expose additional migration statistics
RDMA migration uses the 'setup' state in QEMU to optionally lock
all memory before the migration starts. The total time spent in
this state is exposed as VIR_DOMAIN_JOB_SETUP_TIME.
Additionally, QEMU also exports migration throughput (mbps) for both
memory and disk, so let's add them too: VIR_DOMAIN_JOB_MEMORY_BPS,
VIR_DOMAIN_JOB_DISK_BPS.
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Peter Krempa [Wed, 17 Sep 2014 12:50:04 +0000 (14:50 +0200)]
qemu: save image: Add possibility to return XML stored in the image
Add a new parameter that will allow to return the XML stored in the save
image for further manipulation and adjust the callers. This option will
be used in later patches.
Jim Fehlig [Thu, 18 Sep 2014 21:05:34 +0000 (15:05 -0600)]
libxl: Drop driver lock in libxlDomainDefineXML
There is no need to acquire the driver-wide lock in
libxlDomainDefineXML. When switching to jobs in the libxl
driver, most driver-wide locks were removed. The locking here
was preserved since I mistakenly thought virDomainObjListAdd
needed protection. This is not the case, so remove the
unnecessary locking.
John Ferlan [Fri, 19 Sep 2014 09:53:04 +0000 (05:53 -0400)]
qemu: Add missing goto on rawio
Commit id '9a2f36ec' added a build conditional of CAP_SYS_RAWIO
in order to determine whether or not a disk definition using rawio
should be allowed on platforms without CAP_SYS_RAWIO. If one was
found, virReportError was used but the code didn't goto cleanup.
Pavel Hrdina [Thu, 18 Sep 2014 15:38:32 +0000 (17:38 +0200)]
Move the FIPS detection from capabilities
We are not detecting the presence of FIPS from QEMU, but from procfs and
that means it's not QEMU capability. It was decided that we will pass
this flag to QEMU even if it's not supported by old QEMU binaries.
This patch also reverts changes done by commit a21cfb0f to
qemucapabilitestest and implements a new test case in qemuxml2argvtest.
A long time ago I've implemented support for so called multiqueue
net. The idea was to let guest network traffic be processed by
multiple host CPUs and thus increasing performance. However, this
behavior is enabled by QEMU via special ioctl() iterated over the
all tap FDs passed in by libvirt. Unfortunately, SELinux comes in
and disallows the ioctl() call because the /dev/net/tun has label
system_u:object_r:tun_tap_device_t:s0 and 'attach_queue' ioctl()
is not allowed on tun_tap_device_t type. So after discussion with
a SELinux developer we've decided that the FDs passed to the QEMU
should be labelled with svirt_t type and SELinux policy will
allow the ioctl(). Therefore I've made a patch
(cf976d9dcf4e592261b14f03572) that does exactly this. The patch
was fixed then by a4431931393aeb1ac5893f121151fa3df4fde612 and b635b7a1af0e64754016d758376f382470bc11e7. However, things are not
that easy - even though the API to label FD is called
(fsetfilecon_raw) the underlying file is labelled too! So
effectively we are mangling /dev/net/tun label. Yes, that broke
dozen of other application from openvpn, or boxes, to qemu
running other domains.
The best solution would be if SELinux provides a way to label an
FD only, which could be then labeled when passed to the qemu.
However that's a long path to go and we should fix this
regression AQAP. So I went to talk to the SELinux developer again
and we agreed on temporary solution that:
1) All the three patches are reverted
2) SELinux temporarily allows 'attach_queue' on the
tun_tap_device_t
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
- docs/formatstorage.html.in: document 'zfs' pool type, add it
to a list of pool types that could use source physical devices
- docs/storage.html.in: update a ZFS pool example XML with
source physical devices, mention that starting from 1.2.9 a
pool could be created from this devices by libvirt and in earlier
versions user still has to create a pool manually
- docs/drvbhyve.html.in: add an example with ZFS pools
- Provide an implementation for buildPool and deletePool operations
for the ZFS storage backend.
- Add VIR_STORAGE_POOL_SOURCE_DEVICE flag to ZFS pool poolOptions
as now we can specify devices to build pool from
- storagepool.rng: add an optional 'sourceinfodev' to 'sourcezfs' and
add an optional 'target' to 'poolzfs' entity
- Add a couple of tests to storagepoolxml2xmltest
John Ferlan [Wed, 17 Sep 2014 18:43:12 +0000 (14:43 -0400)]
qemu: Don't fail startup/attach for IOThreads if no JSON
If the qemu being used doesn't support JSON, then querying for IOThread
data would fail. In that case, ensure the *iothreads is NULL and return 0
as the count of iothreads available.
CC qemu/libvirt_driver_qemu_impl_la-qemu_command.lo
qemu/qemu_command.c:6580:58: error: implicit conversion from enumeration type
'virMemAccess' to different enumeration type 'virTristateSwitch'
[-Werror,-Wenum-conversion]
virTristateSwitch memAccess = def->cpu->cells[i].memAccess;
~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
1 error generated.
Fix that by using virMemAccess instead of virTristateSwitch.
Commit f05b6a91 added virQEMUDriverConfigPtr argument to the
virQEMUCapsFillDomainCaps function and it uses forward declaration
of virQEMUDriverConfig and virQEMUDriverConfigPtr that casues clang
build to fail:
gmake[3]: Entering directory `/usr/home/novel/code/libvirt/src'
CC qemu/libvirt_driver_qemu_impl_la-qemu_capabilities.lo
In file included from qemu/qemu_capabilities.c:43:
In file included from qemu/qemu_hostdev.h:27:
qemu/qemu_conf.h:63:37: error: redefinition of typedef 'virQEMUDriverConfig'
is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef struct _virQEMUDriverConfig virQEMUDriverConfig;
^
qemu/qemu_capabilities.h:328:37: note: previous definition is here
typedef struct _virQEMUDriverConfig virQEMUDriverConfig;
^
Fix that by passing loader and nloader config attributes directly
instead of passing complete config.
Commit b20d39a introduced a new argument for the
virNetDevTapCreateInBridgePort function, however, its mock
in bhyve tests wasn't updated, so the build failed.
Fix build by adding this new argument to the mock version.
Peter Krempa [Thu, 11 Sep 2014 14:35:53 +0000 (16:35 +0200)]
CVE-2014-3633: qemu: blkiotune: Use correct definition when looking up disk
Live definition was used to look up the disk index while persistent one
was indexed leading to a crash in qemuDomainGetBlockIoTune. Use the
correct def and report a nice error.
Unfortunately it's accessible via read-only connection, though it can
only crash libvirtd in the cases where the guest is hot-plugging disks
without reflecting those changes to the persistent definition. So
avoiding hotplug, or doing hotplug where persistent is always modified
alongside live definition, will avoid the out-of-bounds access.
There are two ways how to tell qemu to use huge pages. The first one
is suitable for domains with NUMA nodes: the path to hugetlbfs mount
is appended to NUMA node definition on the command line. The second
one is suitable for UMA domains: here there's this global '-mem-path'
argument that accepts path to the hugetlbfs mount point. However, the
latter case was not used for all the cases that it should be. For
instance:
Michal Privoznik [Mon, 15 Sep 2014 09:59:09 +0000 (11:59 +0200)]
conf: Disallow nonexistent NUMA nodes for hugepages
As of 136ad4974 it is possible to specify different huge pages per
guest NUMA node. However, there's no check if nodeset specified in
./hugepages/page contains only those guest NUMA nodes that exist.
In other words with current code it is possible to define meaningless
combination: