Andrea Bolognani [Mon, 29 May 2017 15:18:35 +0000 (17:18 +0200)]
qemu: Use PHBs when extending the guest PCI topology
When looking for slots suitable for a PCI device, libvirt
might need to add an extra PCI controller: for pSeries guests,
we want that extra controller to be a PHB (pci-root) rather
than a PCI bridge.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Fri, 26 May 2017 17:34:21 +0000 (19:34 +0200)]
qemu: Use PHBs to fill holes in PCI bus numbering
PCI bus has to be numbered sequentially, and no index can be
missing, so libvirt will fill in the blanks automatically for
the user.
Up until now, it has done so using either pci-bridge, for machine
types based on legacy PCI, or pcie-root-port, for machine types
based on PCI Express. Neither choice is good for pSeries guests,
where PHBs (pci-root) should be used instead.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Fri, 26 May 2017 16:33:36 +0000 (18:33 +0200)]
tests: Add baseline tests for automatic PHB usage
These tests demonstrate that, while it's now possible for the
user to create PHB explicitly and manually assign devices to
them, libvirt still defaults to extending the guest PCI
topology using PCI bridges and making suboptimal device
placement choices.
The next few commits will improve on these behaviors and the
tests outputs will automatically be updated to reflect this.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Tue, 28 Feb 2017 09:50:01 +0000 (10:50 +0100)]
qemu: Format additional PHBs on the command line
Additional PHBs (pci-root controllers) will be created for
the guest using the spapr-pci-host-bridge QEMU device, if
available; the implicit default PHB, while present in the
guest configuration, will be skipped.
Usually, a controller with alias 'x' will create a bus with the
same name; however, the bus created by a PHBs with alias 'x' will
be named 'x.0' instead, so we need to account for that.
As an exception to the exception, the implicit PHB that's added
automatically to every pSeries guest creates the 'pci.0' bus.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Tue, 28 Feb 2017 13:58:33 +0000 (14:58 +0100)]
qemu: Automatically pick target index and model for pci-root controllers
pSeries guests will soon need the new information; luckily,
we can figure it out automatically most of the time, so
users won't have to worry about it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Tue, 28 Feb 2017 13:58:08 +0000 (14:58 +0100)]
conf: Add 'spapr-pci-host-bridge' controller model
Adding it to the virDomainControllerPCIModelName enumeration
is enough for existing code to handle it, so parsing and
formatting will work without further tweaking.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Mon, 27 Feb 2017 19:18:32 +0000 (20:18 +0100)]
qemu: Relax pci-root index requirement for pSeries guests
pSeries guests will soon be allowed to have multiple
PHBs (pci-root controllers), meaning the current check
on the controller index no longer applies to them.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Fri, 24 Feb 2017 15:45:13 +0000 (16:45 +0100)]
conf: Move index number checking to drivers
pSeries guests will soon be allowed to have multiple
PHBs (pci-root controllers), which of course means that
all but one of them will have a non-zero index; hence,
we'll need to relax the current check.
However, right now the check is performed in the conf
module, which is generic rather than tied to the QEMU
driver, and where we don't have information such as the
guest machine type available.
To make this change of behavior possible down the line,
we need to move the check from the XML parser to the
drivers. Luckily, only QEMU and bhyve are using PCI
controllers, so this doesn't result in much duplication.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Andrea Bolognani [Tue, 28 Feb 2017 09:46:30 +0000 (10:46 +0100)]
qemu: Allow qemuBuildControllerDevStr() to return NULL
We will soon need to be able to return a NULL pointer
without the caller considering that an error: to make
it possible, change the return type to int and use
an out parameter for the string instead.
Add some documentation for the function as well.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
The current algorithm for slot allocation tries to be clever
and avoid looking at buses / slots more than once unless it's
necessary. Unfortunately that makes the code more complex,
and it will cause problem later on in some situations unless
even more complex code is added.
Since the performance gains are going to be pretty modest
anyway, we can just get rid of the extra complexity and use a
completely straighforward implementation instead.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Later on we're going to need access to information about IOMMU
groups for host devices. Implement the support in virpcimock,
and start using that mock library in a few QEMU test cases.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
Use 0001:01:00.0 instead of 0000:04:02.0 as the source address
for the host device. This doesn't change anything at the moment,
but it will make a difference later on.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Laine Stump <laine@laine.org>
John Ferlan [Tue, 9 May 2017 19:57:48 +0000 (15:57 -0400)]
network: Perform some formatting cleanup in bridge_driver
Modify code to have two spaces between functions, follow function more
recent function formatting w/r/t args per line and function return type
and name on separate lines.
Peter Krempa [Fri, 7 Jul 2017 16:00:04 +0000 (18:00 +0200)]
qemu: block: rename and refactor qemuBuildGlusterDriveJSON
New name is qemuBlockStorageSourceGetGlusterProps and also hardcode the
protocol name rather than calling the ToString function, since this
function can't be made universal.
Peter Krempa [Fri, 7 Jul 2017 15:37:42 +0000 (17:37 +0200)]
qemu: block: Refactor and rename qemuGetDriveSourceProps
Rename it to qemuBlockStorageSourceGetBackendProps and refactor it to
return the JSON object instead of filling a pointer since now it's
always expected to return data.
Peter Krempa [Fri, 7 Jul 2017 12:41:25 +0000 (14:41 +0200)]
qemu: command: Call qemuGetDriveSourceProps only if necessary
Add logic which will call qemuGetDriveSourceProps only in cases where we
need the JSON representation. This will allow qemuGetDriveSourceProps to
generate the JSON representation for all possible disk sources.
Peter Krempa [Fri, 7 Jul 2017 14:11:56 +0000 (16:11 +0200)]
qemu: command: Remove default port numbers for NBD and GLUSTER
The command line generators for the protocols above hardcoded a default
port number. Since we now always assign it when parsing the source
definition, this ad-hoc code is not required any more.
When a binary links against a .a archive (as opposed to a shared library),
any symbols which are marked as 'weak' get silently dropped. As a result
when the binary later runs, those 'weak' functions have an address of
0x0 and thus crash when run.
This happened with virtlogd and virtlockd because they don't link to
libvirt.so, but instead just libvirt_util.a and libvirt_rpc.a. The
virRandomBits symbols was weak and so left out of the virtlogd &
virtlockd binaries, despite being required by virHashTable functions.
Various other binaries like libvirt_lxc, libvirt_iohelper, etc also
link directly to .a files instead of libvirt.so, so are potentially
at risk of dropping symbols leading to a later runtime crash.
This is normal linker behaviour because a weak symbol is not treated
as undefined, so nothing forces it to be pulled in from the .a You
have to force the linker to pull in weak symbols using -u$SYMNAME
which is not a practical approach.
This risk is silent bad linkage that affects runtime behaviour is
not acceptable for a fix that was merely trying to fix the test
suite. So stop using __weak__ again.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When libvirt starts a new QEMU domain, it replaces host-model CPUs with
the appropriate custom CPU definition. However, when reconnecting to a
domain started by older libvirt (< 2.3), the domain would still have a
host-model CPU in its active definition.
In addition to updating a guest CPU definition the function verifies
that all required features are provided to the guest. Let's make it
obvious by calling it qemuProcessUpdateAndVerifyCPU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Separated from qemuProcessUpdateLiveGuestCPU. Its purpose is to fetch
guest CPU data from a running QEMU process. The data can later be used
to verify and update the active guest CPU definition.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Mon, 19 Jun 2017 11:18:52 +0000 (13:18 +0200)]
cpu_x86: Properly disable unknown CPU features
CPU features unknown to a hypervisor will not be present in dataDisabled
even though the features won't naturally be enabled because.
Thus any features we asked for which are not in dataEnabled should be
considered disabled.
Jiri Denemark [Tue, 27 Jun 2017 13:06:10 +0000 (15:06 +0200)]
qemu: Don't update CPU when checking ABI stability
When checking ABI stability between two domain definitions, we first
make migratable copies of them. However, we also asked for the guest CPU
to be updated, even though the updated CPU is supposed to be already
included in the original definitions. Moreover, if we do this on the
destination host during migration, we're potentially updating the
definition with according to an incompatible host CPU.
While updating the CPU when checking ABI stability doesn't make any
sense, it actually just worked because updating the CPU doesn't do
anything for custom CPUs (only host-model CPUs are affected) and we
updated both definitions in the same way.
Less then a year ago commit v2.3.0-rc1~42 stopped updating the CPU in
the definition we got internally and only the user supplied definition
was updated. However, the same commit started updating host-model CPUs
to custom CPUs which are not affected by the request to update the CPU.
So it still seemed to work right, unless a user upgraded libvirt 2.2.0
to a newer version while there were some domains with host-model CPUs
running on the host. Such domains couldn't be migrated with a user
supplied XML since libvirt would complain:
Target CPU mode custom does not match source host-model
The fix is pretty straightforward, we just need to stop updating the CPU
when checking ABI stability.
Juan Hernandez [Thu, 6 Jul 2017 15:03:31 +0000 (17:03 +0200)]
Avoid hidden cgroup mount points
Currently the scan of the /proc/mounts file used to find cgroup mount
points doesn't take into account that mount points may hidden by other
mount points. For, example in certain Kubernetes environments the
/proc/mounts contains the following lines:
In this particular environment the first mount point is hidden by the
second one. The correct mount point is the third one, but libvirt will
never process it because it only checks the first mount point for each
controller (net_cls in this case). So libvirt will try to use the first
mount point, which doesn't actually exist, and the complete detection
process will fail.
To avoid that issue this patch changes the virCgroupDetectMountsFromFile
function so that when there are duplicates it takes the information from
the last line in /proc/mounts. This requires removing the previous
explicit condition to skip duplicates, and adding code to free the
memory used by the processing of duplicated lines.
Related-To: https://bugzilla.redhat.com/1468214 Related-To: https://github.com/kubevirt/libvirt/issues/4 Signed-off-by: Juan Hernandez <jhernand@redhat.com>
Michal Privoznik [Wed, 12 Jul 2017 08:01:25 +0000 (10:01 +0200)]
qemuDomainGetPreservedMountPath: rename @mount
Obviously, old gcc-s ale sad when a variable shares the name with
a function. And we do have such variable (added in 4d8a914be0):
@mount. Rename it to @mountpoint so that compiler's happy again.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Tue, 11 Jul 2017 16:00:08 +0000 (18:00 +0200)]
qemu: Provide non-linux stub for qemuDomainAttachDeviceMknodRecursive
The way we create devices under /dev is highly linux specific.
For instance we do mknod(), mount(), umount(), etc. Some
platforms are even missing some of these functions. Then again,
as declared in qemuDomainNamespaceAvailable(): namespaces are
linux only. Therefore, to avoid obfuscating the code by trying to
make it compile on weird platforms, just provide a non-linux stub
for qemuDomainAttachDeviceMknodRecursive(). At the same time,
qemuDomainAttachDeviceMknodHelper() which actually calls the
non-existent functions is moved under ifdef __linux__ block since
its only caller is in that block too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Commit id 'b9b1aa639' was supposed to add logic to set the allocation
for sparse files when wr_highest_offset was zero; however, an unconditional
setting was done just prior. For block devices, this means allocation is
always returning 0 since 'actual-size' will be zero.
Remove the unconditional setting and add the note about it being possible
to still be zero for block devices. As soon as the guest starts writing to
the volume, the allocation value will then be obtainable from qemu via
the wr_highest_offset.
Peter Krempa [Thu, 11 May 2017 14:50:04 +0000 (16:50 +0200)]
tests: storage: Fully register storage driver
Use the full storage driver registration method that also fails if one
of the storage backends is not present. This makes the test fail if a
submodule fails registration, which is useful for testing.
Additionally return EXIT_FAILURE as usual in tests rather than -1.
If we exceed a fixed limit in RPC code we get a horrible message
like this, if the parameter type is a 'string', because we forgot
to initialize the error message type field:
$ virsh snapshot-list ostack1
error: too many remote undefineds: 1329 > 1024
It would also be useful to know which RPC call and field was
exceeded. So this patch makes us report:
$ virsh snapshot-list ostack1
error: too many remote undefineds: 1329 > 1024,
in parameter 'names' for 'virDomainSnapshotListNames'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Prevent more compiler optimization of mockable functions
Currently all mockable functions are annotated with the 'noinline'
attribute. This is insufficient to guarantee that a function can
be reliably mocked with an LD_PRELOAD. The C language spec allows
the compiler to assume there is only a single implementation of
each function. It can thus do things like propagating constant
return values into the caller at compile time, or creating
multiple specialized copies of the function body each optimized
for a different caller. To prevent these optimizations we must
also set the 'noclone' and 'weak' attributes.
This fixes the test suite when libvirt.so is built with CLang
with optimization enabled.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The TODO macro expands to an fprintf() call and is used in several
places in the Xen driver. Anything that wishes to print such debug
messages should use the logging macros. In this case though, all the
places in the Xen driver should have been raising a formal libvirt
error instead. Add proper error handling and delete the TODO macro
to prevent future misuse.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The HOST_NAME_MAX, INET_ADDRSTRLEN and VIR_LOOPBACK_IPV4_ADDR
constants are only used by a handful of files, so are better
kept in virsocketaddr.h or the source file that uses them.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We only ever test libvirt with GCC or CLang which provides a
GCC compatible compilation environment. Between them, these
compilers cover every important operating system platform,
even Windows.
Mandate their use to make it explicit that we don't care about
compilers like Microsoft VCC or other UNIX vendor C compilers.
GCC 4.4 was picked as the baseline, since RHEL-6 ships 4.4.7
and that lets us remove a large set of checks. There is a slight
issue that CLang reports itself as GCC 4.2, so we must also check
if __clang__ is defined. We could check a particular CLang version
too, but that would require someone to figure out a suitable min
version which is fun because OS-X reports totally different CLang
version numbers from CLang builds on Linux/BSD
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
config-post.h was modified to define __GNUC_PREREQ, but the
original definition was never removed from internal.h, and
that is now dead code since config.h is always the first file
included.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Michal Privoznik [Mon, 19 Jun 2017 15:05:31 +0000 (17:05 +0200)]
qemu ns: Create chardev backends more frequently
Currently, the only type of chardev that we create the backend
for in the namespace is type='dev'. This is not enough, other
backends might have files under /dev too. For instance channels
might have a unix socket under /dev (well, bind mounted under
/dev from a different place).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Just like in the previous commit, when attaching a file based
device which has its source living under /dev (that is not a
device rather than a regular file), calling mknod() is no help.
We need to:
1) bind mount device to some temporary location
2) enter the namespace
3) move the mount point to desired place
4) umount it in the parent namespace from the temporary location
At the same time, the check in qemuDomainNamespaceSetupDisk makes
no longer sense. Therefore remove it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Michal Privoznik [Thu, 15 Jun 2017 13:50:39 +0000 (15:50 +0200)]
qemuDomainAttachDeviceMknodHelper: Fail on unsupported file type
Currently, we silently assume that file we are creating in the
namespace is either a link or a device (character or block one).
This is not always the case. Therefore instead of doing something
wrong, claim about unsupported file type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Michal Privoznik [Mon, 19 Jun 2017 08:56:20 +0000 (10:56 +0200)]
qemuDomainCreateDeviceRecursive: Fail on unsupported file type
Currently, we silently assume that file we are creating in the
namespace is either a link or a device (character or block one).
This is not always the case. Therefore instead of doing something
wrong, claim about unsupported file type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
In 290a00e41d I've tried to fix the process of building a
qemu namespace when dealing with file mount points. What I
haven't realized then is that we might be dealing not with just
regular files but also special files (like sockets). Indeed, try
the following:
Problem with my previous approach is that I wasn't creating the
temporary location (where mount points under /dev are moved) for
anything but directories and regular files.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
Michal Privoznik [Mon, 19 Jun 2017 14:43:25 +0000 (16:43 +0200)]
conf: Rename and expose virDomainChrSourceDefPath
It comes very handy to have source path for chardevs. We already
have such function: virDomainAuditChardevPath() but it's static
and has name not suitable for exposing. Moreover, while exposing
it change its name slightly to virDomainChrSourceDefGetPath.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
If a value of the first level object contains more objects needing
deflattening which would be wrapped in an actual object the function
would not recurse into them.
By this simple addition we can fully deflatten the objects.
Peter Krempa [Tue, 27 Jun 2017 11:48:56 +0000 (13:48 +0200)]
util: json: Properly implement JSON deflattening
As it turns out sometimes users pass in an arbitrarily nested structure
e.g. for the qemu backing chains JSON pseudo protocol. This new
implementation deflattens now a single object fully even with nested
keys.
Additionally it's not necessary now to stick with the "file." prefix for
the properties.
Peter Krempa [Mon, 26 Jun 2017 16:42:07 +0000 (18:42 +0200)]
util: json: Don't remove the 'file' subobject when deflattening
Currently the function would deflatten the object by dropping the 'file'
prefix from the attributes. This does not really scale well or adhere to
the documentation.
Until we refactor the worker to properly deflatten everything we at
least simulate it by adding the "file" wrapper object back.
Users may want to run the init command of a container as a special
user / group. This is achieved by adding <inituser> and <initgroup>
elements. Note that the user can either provide a name or an ID to
specify the user / group to be used.
This commit also fixes a side effect of being able to run the command
as a non-root user: the user needs rights on the tty to allow shell
job control.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Cédric Bosdonnat [Wed, 31 May 2017 13:32:11 +0000 (15:32 +0200)]
lxc: allow user to specify command working directory
Some containers may want the application to run in a special directory.
Add <initdir> element in the domain configuration to handle this case
and use it in the lxc driver.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
All of these four functions (virStreamRecvAll, virStreamSendAll,
virStreamSparseRecvAll, virStreamSparseSendAll) take one or more
callback functions that handle various aspects of streams.
However, if any of them fails no error is reported therefore
caller does not know what went wrong.
At the same time, we silently presumed callbacks to set errno on
failure. With this change we should document it explicitly as the
error is not properly reported.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
If one these four functions fail (virStreamRecvAll,
virStreamSendAll, virStreamSparseRecvAll, virStreamSparseSendAll)
the stream is aborted by calling virStreamAbort(). This is a
public API; therefore, the first thing it does is error reset. At
that point any error that caused us to abort stream in the first
place is gone.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>