The disk tuning group parameter is ignored by qemu if no other
throttling options are set. Reject such configuration, since the name
would not be honored after setting parameters via the live tuning API.
Peter Krempa [Fri, 17 Mar 2017 08:23:54 +0000 (09:23 +0100)]
qemu: command: Extract tests for subsets of blkdeviotune settings
When checking capabilities for qemu we need to check whether subsets of
the disk throttling settings are supported. Extract the checks into a
separate functions as they will be reused in next patch.
Peter Krempa [Fri, 17 Mar 2017 07:43:27 +0000 (08:43 +0100)]
qemu: Don't steal pointers from 'persistentDef' in qemuDomainGetBlockIoTune
While the code path that queries the monitor allocates a separate copy
of the 'group_name' string the path querying the config would not copy
it. The call to virTypedParameterAssign would then steal the pointer
(without clearing it) and the RPC layer freed it. Any subsequent call
resulted into a crash.
Nitesh Konkar [Thu, 16 Mar 2017 11:55:15 +0000 (17:25 +0530)]
perf: remote: Compare perf nparams against the correct constant
Currently 'virsh perf domain' errors out as the perf nparams is
incorrectly compared against REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX
instead of REMOTE_DOMAIN_PERF_EVENTS_MAX.
Andrea Bolognani [Thu, 16 Mar 2017 16:41:21 +0000 (17:41 +0100)]
tests: Test generic PCIe Root Ports
We want pcie-root-ports to be used when available in QEMU,
but at the same time we need to ensure that hosts running
older QEMU releases keep working and that the user can
override the default at any time.
Add a comment for the original pcie-root-port test cases
to make it clear how these new test cases are different.
Andrea Bolognani [Tue, 14 Mar 2017 13:42:51 +0000 (14:42 +0100)]
qemu: Use generic PCIe Root Ports by default when available
ioh3420 is emulated Intel hardware, so it always looked
quite out of place in aarch64/virt guests. Even for x86/q35
guests, the recently-introduced pcie-root-port is a better
choice because, unlike ioh3420, it doesn't require IO space
(a fairly constrained resource) to work.
If pcie-root-port is available in QEMU, use it; ioh3420 is
still used as fallback for when pcie-root-port is not
available.
Short circuit SASL auth when no mechanisms are available
If the SASL config does not have any mechanisms we currently
just report an empty list to the client which will then
fail to identify a usable mechanism. This is a server config
error, so we should fail immediately on the server side.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When providing explicit x509 cert/key paths in libvirtd.conf,
the user must provide all three. If one or more is missed,
this leads to obscure errors at runtime when negotiating
the TLS session
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Linux still defaults to a 1024 open file handle limit. This causes
scalability problems for libvirtd / virtlockd / virtlogd on large
hosts which might want > 1024 guest to be running. In fact if each
guest needs > 1 FD, we can't even get to 500 guests. This is not
good enough when we see machines with 100's of physical cores and
TBs of RAM.
In comparison to other memory requirements of libvirtd & related
daemons, the resource usage associated with open file handles
is essentially line noise. It is thus reasonable to increase the
limits unconditionally for all installs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Michal Privoznik [Sat, 11 Mar 2017 06:23:42 +0000 (07:23 +0100)]
qemu: Adaptive timeout for connecting to monitor
There were couple of reports on the list (e.g. [1]) that guests
with huge amounts of RAM are unable to start because libvirt
kills qemu in the initialization phase. The problem is that if
guest is configured to use hugepages kernel has to zero them all
out before handing over to qemu process. For instance, 402GiB
worth of 1GiB pages took around 105 seconds (~3.8GiB/s). Since we
do not want to make the timeout for connecting to monitor
configurable, we have to teach libvirt to count with this
fact. This commit implements "1s per each 1GiB of RAM" approach
as suggested here [2].
Michal Privoznik [Mon, 13 Mar 2017 10:05:08 +0000 (11:05 +0100)]
virTimeBackOffWait: Avoid long periods of sleep
While connecting to qemu monitor, the first thing we do is wait
for it to show up. However, we are doing it with some timeout to
avoid indefinite waits (e.g. when qemu doesn't create the monitor
socket at all). After beaa447a29 we are using exponential back
off timeout meaning, after the first connection attempt we wait
1ms, then 2ms, then 4 and so on. This allows us to bring down
wait time for small domains where qemu initializes quickly.
However, on the other end of this scale are some domains with
huge amounts of guest memory. Now imagine that we've gotten up to
wait time of 15 seconds. The next one is going to be 30 seconds,
and the one after that whole minute. Well, okay - with current
code we are not going to wait longer than 30 seconds in total,
but this is going to change in the next commit.
The exponential back off is usable only for first few iterations.
Then it needs to be caped (one second was chosen as the limit)
and switch to constant wait time.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Wed, 15 Mar 2017 19:09:35 +0000 (15:09 -0400)]
test: Don't assume a configFile exists for Storage Pool tests
Fix a "bug" in the storage pool test driver code which "assumed"
testStoragePoolObjSetDefaults should fill in the configFile for
both the Define/Create (persistent) and CreateXML (transient) pools
by just VIR_FREE()'ing it during CreateXML. Because the configFile
was filled in, during Destroy the pool wouldn't be free'd which
could cause issues for future patches which add tests to validate
vHBA creation for the storage pool using the same name.
John Ferlan [Wed, 15 Mar 2017 19:07:21 +0000 (15:07 -0400)]
conf: Alter error message for vHBA creation using parent wwnn/wwpn
Commit id 'bb74a7ffe' added a fairly non specific message when providing
only the <parent wwnn='xxx'/> or <parent wwpn='xxx'/> instead of providing
both wwnn and wwpn. This patch just modifies the message to be more specific
about which was missing.
John Ferlan [Fri, 27 Jan 2017 23:50:57 +0000 (18:50 -0500)]
conf: Return the vHBA name from virNodeDeviceCreateVport
Rather than returning true/false and having the caller check if the
vHBA was actually created, let's do that check within the CreateVport
function. That way the caller can faithfully assume success based
on a name start the thread looking for the LUNs. Prior to this change
it's possible that the vHBA wasn't really created (e.g if the call to
virVHBAGetHostByWWN returned NULL), we'd claim success, but in reality
there'd be no vHBA for the pool. This also fixes a second yet seen
issue that if the nodedev was present, but the parent by name wasn't
provided (perhaps parent by wwnn/wwpn or by fabric_name), then a failure
would be returned. For this path it shouldn't be an error - we should
just be happy that something else is managing the device and we don't
have to create/delete it.
The end result is that the createVport code can now just start the
refresh thread once it gets a non NULL name back.
John Ferlan [Mon, 20 Feb 2017 12:00:51 +0000 (07:00 -0500)]
util: Rename virFileWaitForDevices
The function is actually in virutil.c, but prototyped in virfile.h.
This patch fixes that by renaming the function to virWaitForDevices,
adding the prototype in virutil.h and libvirt_private.syms, and then
changing the callers to use the new name.
John Ferlan [Fri, 10 Mar 2017 14:29:57 +0000 (09:29 -0500)]
conf: Extract SCSI adapter type processing into their own helpers
Rather than have lots of ugly inline code, create helpers to try and
make things more readable. While creating the helpers realign the code
as necessary.
John Ferlan [Fri, 10 Mar 2017 14:05:09 +0000 (09:05 -0500)]
conf: Extract FCHost adapter type processing into their own helpers
Rather than have lots of ugly inline code, create helpers to try and
make things more readable. While creating the helpers realign the code
as necessary.
John Ferlan [Fri, 10 Mar 2017 15:04:20 +0000 (10:04 -0500)]
conf: Add missing validate for fchost search fields
Commit id 'bb74a7ffe' added some new fields to search for a fchost by
parent wwnn/wwpn or parent_fabric_name, but neglected to validate that
the data within the fields was valid at parse time. This could lead to
eventual failure at run time, so rather than have the failure then, let's
validate now.
Commit id 'bb74a7ffe' neglected to check that both the parent_wwnn
parent_wwpn are in the XML if one or the other is similar to how
the node device code checked (commit id '2b13361bc').
If only one is provided, the "default" is to use a vHBA capable
adapter (see commit id '78be2e8b'), so the vHBA could start, but
perhaps not on the expected adapter.
Switch to GSSAPI (kerberos) instead of the insecure DIGEST-MD5
RFC 6331 documents a number of serious security weaknesses in
the SASL DIGEST-MD5 mechanism. As such, libvirtd should not
by using it as a default mechanism. GSSAPI is the only other
viable SASL mechanism that can provide secure session encryption
so enable that by defalt as the replacement.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Michal Privoznik [Wed, 22 Feb 2017 15:33:12 +0000 (16:33 +0100)]
qemu: Allow nvdimm in devices CGroups
Some users might want to pass a blockdev or a chardev as a
backend for NVDIMM. In fact, this is expected to be the mostly
used configuration. Therefore libvirt should allow the device in
devices CGroup then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that we have APIs for relabel memdevs on hotplug, fill in the
missing implementation in qemu hotplug code.
The qemuSecurity wrappers might look like overkill for now,
because qemu namespace code does not deal with the nvdimms yet.
Nor does our cgroup code. But hey, there's cgroup_device_acl
variable in qemu.conf. If users add their /dev/pmem* device in
there, the device is allowed in cgroups and created in the
namespace so they can successfully passthrough it to the domain.
It doesn't look like overkill after all, does it?
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Mon, 27 Feb 2017 10:20:26 +0000 (11:20 +0100)]
qemu: Introduce label-size for NVDIMMs
For NVDIMM devices it is optionally possible to specify the size
of internal storage for namespaces. Namespaces are a feature that
allows users to partition the NVDIMM for different uses.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that NVDIMM has found its way into libvirt, users might want
to fine tune some settings for each module separately. One such
setting is 'share=on|off' for the memory-backend-file object.
This setting - just like its name suggest already - enables
sharing the nvdimm module with other applications. Under the hood
it controls whether qemu mmaps() the file as MAP_PRIVATE or
MAP_SHARED.
Yet again, we have such config knob in domain XML, but it's just
an attribute to numa <cell/>. This does not give fine enough
tuning on per-memdevice basis so we need to have the attribute
for each device too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Fri, 29 Jul 2016 09:02:25 +0000 (11:02 +0200)]
qemu: Implement NVDIMM
So, majority of the code is just ready as-is. Well, with one
slight change: differentiate between dimm and nvdimm in places
like device alias generation, generating the command line and so
on.
Speaking of the command line, we also need to append 'nvdimm=on'
to the '-machine' argument so that the nvdimm feature is
advertised in the ACPI tables properly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Thu, 28 Jul 2016 16:54:18 +0000 (18:54 +0200)]
Introduce NVDIMM memory model
NVDIMM is new type of memory introduced into QEMU 2.6. The idea
is that we have a Non-Volatile memory module that keeps the data
persistent across domain reboots.
At the domain XML level, we already have some representation of
'dimm' modules. Long story short, NVDIMM will utilize the
existing <memory/> element that lives under <devices/> by adding
a new attribute 'nvdimm' to the existing @model and introduce a
new <path/> element for <source/> while reusing other fields. The
resulting XML would appear as:
Michal Privoznik [Thu, 16 Feb 2017 14:17:47 +0000 (15:17 +0100)]
qemuBuildMemoryBackendStr: Reorder args and update comment
Frankly, this function is one big mess. A lot of arguments,
complicated behaviour. It's really surprising that arguments were
in random order (input and output arguments were mixed together),
the documentation was outdated, the description of return values
was bogus.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Introduce config file support for the bhyve driver. The only available
setting at present is 'firmware_dir' for specifying a directory with
UEFI firmware files.
Jiri Denemark [Wed, 8 Mar 2017 12:32:46 +0000 (13:32 +0100)]
qemu: Report better host-model CPUs in domain caps
One of the main reasons for introducing host-model CPU definition in a
domain capabilities XML was the inability to express disabled features
in a host capabilities XML. That is, when a host CPU is, e.g., Haswell
without x2apic support, host capabilities XML will have to report it as
Westmere + a bunch of additional features., but we really want to use
Haswell - x2apic when creating a host-model CPU.
Unfortunately, I somehow forgot to do the last step and the code would
just copy the CPU definition found in the host capabilities XML. This
changed recently for new QEMU versions which allow us to query host CPU,
but any slightly older QEMU will not benefit from any change I did. This
patch makes sure the right CPU model is filled in the domain
capabilities even with old QEMU.
The issue was reported in
https://bugzilla.redhat.com/show_bug.cgi?id=1426456
Jiri Denemark [Tue, 7 Mar 2017 18:33:37 +0000 (19:33 +0100)]
qemu: Refactor virQEMUCapsInitCPU
The function is now called virQEMUCapsProbeHostCPU. Both the refactoring
and the change of the name is done for consistency with a new function
which will be introduced in the following commit.
Jiri Denemark [Tue, 7 Mar 2017 11:20:01 +0000 (12:20 +0100)]
cpu: Add list of allowed CPU models to virCPUGetHost
When creating host CPU definition usable with a given emulator, the CPU
should not be defined using an unsupported CPU model. The new @models
and @nmodels parameters can be used to limit CPU models which can be
used in the result.
Jiri Denemark [Mon, 6 Mar 2017 20:35:49 +0000 (21:35 +0100)]
cpu: Replace cpuNodeData with virCPUGetHost
cpuNodeData has always been followed by cpuDecode as no hypervisor
driver is really interested in raw CPUID data for a host CPU. Let's
create a new CPU driver API which returns virCPUDefPtr directly.
Yeah, that's right. A mount point doesn't have to be a directory.
It can be a file too. However, the code that tries to preserve
mount points under /dev for new namespace for qemu does not count
with that option.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
VIR_CONNECT_LIST_STORAGE_POOLS_VSTORAGE and
VIR_CONNECT_LIST_STORAGE_POOLS_ZFS were added to libvirt but the listing
API was not properly updated to use them.
bhyve supports 'gop' video device that allows clients to connect
to VMs using VNC clients. This commit adds support for that to
the bhyve driver:
- Introducr 'gop' video device type
- Add capabilities probing for the 'fbuf' device that's
responsible for graphics
- Update command builder routines to let users configure
domain's VNC via gop graphics.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Extend domain capabilities XML with the information about
available UEFI firmware files. It searches in the location
that the sysutils/bhyve-firmware FreeBSD port installs
files to.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Fabian Freyer [Tue, 24 May 2016 13:30:56 +0000 (15:30 +0200)]
bhyve: virBhyveProbeCaps: BHYVE_CAP_LPC_BOOTROM
Implement the BHACE_CAP_LPC_BOOTROM capability by checking the stderr
output of 'bhyve -l bootrom'. If the bootrom option is unsupported, this
will contain the following output:
bhyve: invalid lpc device configuration 'bootrom'
On newer bhyve versions that do support specifying a bootrom image, the
standard help will be printed.
Add bhyve support to virt-host-validate(1). It checks for the
essential kernel modules to be available so that user can actually
start VMs, have networking and console access.
It uses the kldnext(2)/kldstat(2) routines to retrieve modules list.
As bhyve is only available on FreeBSD and these routines were available
long before bhyve appeared, not adding any specific configure checks
for that.
Also, update tools/Makefile.am to add
virt-host-validate-$driver.[hc] to the build only if the
appropriate driver is enabled.
John Ferlan [Sun, 29 Jan 2017 16:12:48 +0000 (11:12 -0500)]
tests: Add createVHBAByStoragePool-by-parent to fchosttest
Add a new test to fchosttest in order to test creation of our vHBA
via the Storage Pool logic. Unlike the real code, we cannot yet use
the virVHBA* API's because they (currently) traverse the file system
in order to get the parent vport capable scsi_host. Besides there's
no "real" NPIV device here - so we have to take some liberties, at
least for now.
Instead, we'll follow the node device tests partially in order to
create and destroy the vHBA with the test node devices.
If a qemu process has died, we get EOF on its monitor. At this
point, since qemu process was the only one running in the
namespace kernel has already cleaned the namespace up. Any
attempt of ours to enter it has to fail.
This really happened in the bug linked above. We've tried to
attach a disk to qemu and while we were in the monitor talking to
qemu it just died. Therefore our code tried to do some roll back
(e.g. deny the device in cgroups again, restore labels, etc.).
However, during the roll back (esp. when restoring labels) we
still thought that domain has a namespace. So we used secdriver's
transactions. This failed as there is no namespace to enter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Fri, 10 Mar 2017 08:55:43 +0000 (09:55 +0100)]
qemuxml2argvtest: Don't overwrite driver stateDir
This is a very historic artefact. Back in the old days of 830ba76c3e when we had macros to add arguments onto qemu command
line (!) we thought it was a good idea to let qemu write out the
PID file. So we passed -pidfile $stateDir/$domName onto the
command line. Thus, in order for tests to work we needed stable
stateDir in the qemu driver. Unfortunately, after 16efa11aa696
where stateDir is mkdtemp()-d, this approach lead to a leak of
temp dir.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Fri, 3 Mar 2017 15:04:57 +0000 (16:04 +0100)]
qemu: hotplug: Reset device removal waiting code after vCPU unplug
If the delivery of the DEVICE_DELETED event for the vCPU being deleted
would time out, the code would not call 'qemuDomainResetDeviceRemoval'.
Since the waiting thread did not unregister itself prior to stopping the
waiting the monitor code would try to wake it up instead of dispatching
it to the event worker. As a result the unplug process would not be
completed and the definition would not be updated.
So our code is one big mess and we modify domain definition while
building qemu_command line and our hotplug code share only part
of the parsing and command line building code. Let's revert
that change because to fix it properly would require refactor and
move a lot of things.
configure: disable scsi stroage driver on non-Linux
Even though scsi storage driver builds fine on non-Linux, it
will not work properly because it relies on Linux procfs, so
disable that in configure if we're not building for Linux.
Pavel Hrdina [Mon, 27 Feb 2017 16:16:17 +0000 (17:16 +0100)]
conf: properly skip graphics listen element in migratable XML
We should skip <listen type='socket'/> only if the 'socket' path
is specified because if there is no 'socket' path we need to
keep that element in migratable XML.
Pavel Hrdina [Mon, 27 Feb 2017 16:00:15 +0000 (17:00 +0100)]
conf: store "autoGenerated" for graphics listen in status XML
When libvirtd is started we call qemuDomainRecheckInternalPaths
to detect whether a domain has VNC socket path generated by libvirt
based on option from qemu.conf. However if we are parsing status XML
for running domain the existing socket path can be generated also if
the config XML uses the new <listen type='socket'/> element without
specifying any socket.
The current code doesn't make difference how the socket was generated
and always marks it as "fromConfig". We need to store the
"autoGenerated" value in the status XML in order to preserve that
information.
The difference between "fromConfig" and "autoGenerated" is important
for migration, because if the socket is based on "fromConfig" we don't
print it into the migratable XML and we assume that user has properly
configured qemu.conf on both hosts. However if the socket is based
on "autoGenerated" it means that a new feature was used and therefore
we need to leave the socket in migratable XML to make sure that if
this feature is not supported on destination the migration will fail.
John Ferlan [Fri, 17 Feb 2017 15:06:14 +0000 (10:06 -0500)]
qemu: Introduce qemuDomainGetTLSObjects
Split apart and rename qemuDomainGetChardevTLSObjects in order to make a
more generic API that can create the TLS JSON prop objects (secret and
tls-creds-x509) to be used to create the objects
John Ferlan [Wed, 22 Feb 2017 17:54:10 +0000 (12:54 -0500)]
qemu: Refactor qemuDomainGetChardevTLSObjects to converge code
Create a qemuDomainAddChardevTLSObjects which will encapsulate the
qemuDomainGetChardevTLSObjects and qemuDomainAddTLSObjects so that
the callers don't need to worry about the props.
Move the dev->type and haveTLS checks in to the Add function to avoid
an unnecessary call to qemuDomainAddTLSObjects
John Ferlan [Thu, 16 Feb 2017 15:33:35 +0000 (10:33 -0500)]
qemu: Refactor hotplug to introduce qemuDomain{Add|Del}TLSObjects
Refactor the TLS object adding code to make two separate API's that will
handle the add/remove of the "secret" and "tls-creds-x509" objects including
the Enter/Exit monitor commands.
John Ferlan [Wed, 22 Feb 2017 17:39:17 +0000 (12:39 -0500)]
qemu: Move exit monitor calls in failure paths
Since qemuDomainObjExitMonitor can also generate error messages,
let's move it inside any error message saving code on error paths
for various hotplug add activities.