Laine Stump [Wed, 23 Mar 2016 19:49:29 +0000 (15:49 -0400)]
qemu: support new pci controller model "pcie-expander-bus"
This is backed by the qemu device pxb-pcie, which will be available in
qemu 2.6.0.
As with pci-expander-bus (which uses qemu's pxb device), the busNr
attribute and <node> subelement of <target> are used to set the bus_nr
and numa_node options.
During post-parse we validate that the domain's machinetype is
q35-based (since the device shows up for 440fx-based machinetypes, but
is unusable), as well as checking that <node> specifies a node that is
actually configured on the guest.
Laine Stump [Wed, 16 Mar 2016 17:37:14 +0000 (13:37 -0400)]
conf: new pci controller model pcie-expander-bus
This controller provides a single PCIe port on a new root. It is
similar to pci-expander-bus, intended to provide a bus that can be
associated with a guest-identifiable NUMA node, but is for
machinetypes with PCIe rather than PCI (e.g. q35-based machinetypes).
Aside from PCIe vs. PCI, the other main difference is that a
pci-expander-bus has a companion pci-bridge that is automatically
attached along with it, but pcie-expander-bus has only a single port,
and that port will only connect to a pcie-root-port, or to a
pcie-switch-upstream-port. In order for the bus to be of any use in
the guest, it must have either a pcie-root-port or a
pcie-switch-upstream-port attached (and one or more
pcie-switch-downstream-ports attached to the
pcie-switch-upstream-port).
Laine Stump [Tue, 8 Mar 2016 21:25:22 +0000 (16:25 -0500)]
qemu: add capabilities bit for device "pxb-pcie"
The pxb device is a PCIe expander bus that can be added to any
Q35-based machinetype. A single PCIe port (*not* hotpluggable) is
provided; if more than one device is desired, or if hotplug
support is needed, either a pcie-root-port, or some combination of
pcie-switch-upstream-port and pcie-swith-downstream-ports must be
added to it. It can have a NUMA node number associated with it, as
well as a bus number.
Laine Stump [Fri, 4 Mar 2016 19:35:20 +0000 (14:35 -0500)]
qemu: support new pci controller model "pci-expander-bus"
This is backed by the qemu device "pxb".
The pxb device always includes a pci-bridge that is at the bus number
of the pxb + 1.
busNr and <node> from the <target> subelement are used to set the
bus_nr and numa_node options for pxb.
During post-parse we validate that the domain's machinetype is
440fx-based (since the pxb device only works on 440fx-based machines),
and <node> also gets a sanity check to assure that the NUMA node
specified for the pxb (if any - it's optional) actually exists on the
guest.
Laine Stump [Fri, 4 Mar 2016 15:26:23 +0000 (10:26 -0500)]
conf: new pci controller model pci-expander-bus
This is a standard PCI root bus (not a bridge) that can be added to a
440fx-based domain. Although it uses a PCI slot, this is *not* how it
is connected into the PCI bus hierarchy, but is only used for
control. Each pci-expander-bus provides 32 slots (0-31) that can
accept hotplug of standard PCI devices.
The usefulness of pci-expander-bus relative to a pci-bridge is that
the NUMA node of the bus can be specified with the <node> subelement
of <target>. This gives guest-side visibility to the NUMA node of
attached devices (presuming that management apps only assign a device
to a bus that has a NUMA node number matching the node number of the
device on the host).
Each pci-expander-bus also has a "busNr" attribute. The expander-bus
itself will take the busNr specified, and all buses that are connected
to this bus (including the pci-bridge that is automatically added to
any expander bus of model "pxb" (see the next commit)) will use
busNr+1, busNr+2, etc, and the pci-root (or the expander-bus with next
lower busNr) will use bus numbers lower than busNr.
Laine Stump [Wed, 24 Feb 2016 21:40:49 +0000 (16:40 -0500)]
qemu: add capabilities bit for device "pxb"
The pxb device is a PCI expander bus that can be added to any
440fx-based machinetype. The PCI bus that is created has 32 standard
PCI slots (hotpluggable). It can have a NUMA node number associated
with it, as well as a bus number.
Laine Stump [Wed, 16 Mar 2016 18:20:52 +0000 (14:20 -0400)]
conf: utility function to convert PCI controller model into connect type
There are two places in qemu_domain_address.c where we have a switch
statement to convert PCI controller models
(VIR_DOMAIN_CONTROLLER_MODEL_PCI*) into the connection type flag that
is matched when looking for an upstream connection for that model of
controller (VIR_PCI_CONNECT_TYPE_*). This patch makes a utility
function in conf/domain_addr.c to do that, so that when a new PCI
controller is added, we only need to add the new model-->connect-type
in a single place.
Laine Stump [Tue, 15 Mar 2016 19:49:22 +0000 (15:49 -0400)]
conf/qemu: change the way VIR_PCI_CONNECT_TYPE_* flags work
The flags used to determine which devices could be plugged into which
controllers were quite confusing, as they tried to create classes of
connections, then put particular devices into possibly multiple
classes, while sometimes setting multiple flags for the controllers
themselves. The attempt to have a single flag indicate, e.g. that a
root-port or a switch-downstream-port could connect was not only
confusing, it was leading to a situation where it would be impossible
to specify exactly the right combinations for a new controller.
The solution is for the VIR_PCI_CONNECT_TYPE_* flags to have a 1:1
correspondence with each type of PCI controller, plus a flag for a PCI
endpoint device and another for a PCIe endpoint device (the only
exception to this is that pci-bridge and pcie-expander-bus controllers
have their upstream connection classified as
VIR_PCI_CONNECT_TYPE_PCI_DEVICE since they can be plugged into
*exactly* the same ports as any endpoint device). Each device then
has a single flag for connect type (plus the HOTPLUG flag if that
device can e hotplugged), and each controller sets the CONNECT bits
for all controllers that can be plugged into it, as well as for either
type of endpoint device that can be plugged in (and the HOTPLUG flag
if it can accept hotplugged devices).
With this change, it is *slightly* easier to understand the matching
of connections (as long as you remember that the flag for a
device/upstream-facing connection of a controller is the same as that
device's type, while the flags for a controller's downstream
connections is the OR of all device types that can be plugged into
that controller). More importantly, it will be possible to correctly
specify what can be plugged into a pcie-switch-expander-bus, when
support for it is added.
Laine Stump [Wed, 2 Mar 2016 20:29:33 +0000 (15:29 -0500)]
conf: allow use of slot 0 in a dmi-to-pci-bridge
When support for dmi-to-pci-bridge was added, it was assumed that,
just as with the pci-root bus, slot 0 was reserved. This is not the
case - it can be used to connect a device just like any other slot, so
remove the restriction and update the test cases that auto-assign an
address on a dmi-to-pci-bridge.
Laine Stump [Wed, 2 Mar 2016 20:31:02 +0000 (15:31 -0500)]
conf: use #define instead of literal for highest slot in upstream port
Every other maxSlot was either set to 0 or to
VIR_PCI_ADDRESS_SLOT_LAST, but this one was for some reason set to the
literal value 31 (which is the same as VIR_PCI_ADDRESS_SLOT_LAST).
This makes them all consistent.
Laine Stump [Tue, 22 Mar 2016 16:24:08 +0000 (12:24 -0400)]
schema: allow pci address attributes to be in decimal
This is especially useful for "bus", since the bus of a device's pci
address is matched to the "index" of a controller to determine which
bus it will be connected to, and "index" is always specified in
decimal - being able to specify both in decimal at least makes it
easier to assure a device is being assigned to the correct bus when it
is added. For the other attributes, it is just a convenience.
(MB: the parser already allows for any of these attributes to be given
in decimal, and there are even examples floating around on the
internet that give them in decimal rather than hex (written in the
days before virsh did schema validation on all XML). This only updates
the schema to match the parser.)
Laine Stump [Tue, 22 Mar 2016 16:00:40 +0000 (12:00 -0400)]
schema: rename uint8range/uint24range to uint8/uint24
nwfilter.rng defines uint16range and uint32range, but in a different
manner (it also allows a variable name as the value, rather than just
a decimal or hex number). I wanted to add uint16range to
basictypes.rng, but my desired definition was parallel to those for
uint8range and uint24range which are defined in basictypes.rng - they
*don't* allow a variable name for the value.
The simplest path to make everyone happy is to make the "plain"
versions in basictypes.rng have simpler names - "uint8", "uint16", and
"uint24". This patch renames uint8range and uint24range to uint8 and
uint24, while the next patch will add uint16.
Laine Stump [Tue, 22 Mar 2016 15:38:53 +0000 (11:38 -0400)]
schema: make pci slot and function optional
The pcie-switch-downstream-port and pcie-root-port controllers have
only a single slot, numbered 0, and the greate majority of all guest
PCI devices are plugged into function 0 of whatever slot they're
using. The parser makes these optional, setting them to 0 when not
specified, and it's logical for the schema to also make them optional.
Take setlocale/gettext error handling pattern from tools/virsh-*
and use it for all standalone binaries via a new shared
virGettextInitialize routine. The virsh* pattern differed slightly
from other callers. All users now consistently:
* Ignore setlocale errors. virsh has done this forever, presumably for
good reason. This has been partially responsible for some bug reports:
We use device-mapper to enumerate all dm devices, and filter out
the list of multipath devices by checking the target_type string
name. The code however cancels all scanning if we encounter
target_type=NULL
I don't know how to reproduce that situation, but a user was hitting
it in their setup, and inspecting the lvm2/device-mapper code shows
many places where !target_type is explicitly ignored and processing
continues on to the next device. So I think we should do the same
The watchdog cli refactoring in 4666b762 dropped the temporary variable
we use to convert to action=dump to action=pause for the qemu cli, and
stored the converted value in the domain structure. Our other watchdog
handling code then treated it as though the user requested action=pause,
which broke action=dump handling.
test: genericxml2xml: test graphics listen= compat
* Add a test for listen=XXX and <listen address=YYY/> collision error
* Add an explicit test for listen=XXX duplicated to <listen address=XXX/>
We implicitly test it elsewhere but I figure it's better to be explicit,
and this test case can be extended in the future for additional listen
back compat if/when we support <listen type='socket'/> syntax
virsh: support up to 64 migration options for command
Upcoming compression options for migration command patch
series hits current limit of 32 possible options for a command.
Lets take one step further and support 64 possible options.
And all it takes is moving from 32 bit integers to 64 bit ones.
The only less then trivial change i found is moving from
'ffs' to 'ffsl'.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
fix build by correcting functions order and src/Makefile.am
commit 30c61901 added new functions to libvirt_private.syms
not alpabetically sorted and erroneously added vz sources to
STATEFUL_DRIVER_SOURCE_FILES, which triggered check-aclrules
running while vz driver isn't ready for it yet.
SDK does not allocate memory when getting strings thus we
need to call every function that returns string twice.
First to obtain string length, second to obtain string
itself. It is tedious so let's create helper functions
for cases when we know length of the result beforehand
and we are not.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
We don't need them anymore as all pointers within vzDriver structure
are not changed during the time it exists.
Where we still need to synchronize we use virObjectLock/Unlock as far
as vzDriver is lockable object.
vz: fix possible vzDomainDefineXMLFlags and prlsdkNewDomainByHandle race
Lock driver when a new domain is created in prlsdkNewDomainByHandle
and try to find it in the list under lock again because it can race
with vzDomainDefineXMLFlags when a domain with the same uuid is added
via vz dispatcher directly and libvirt define.
Maxim Nestratov [Mon, 28 Mar 2016 19:11:30 +0000 (22:11 +0300)]
vz: introduce new vzDriver lockable structure and use it
This patch introduces a new 'vzDriver' lockable object and provides
helper functions to allocate/destroy it and we pass it to prlsdkXxx
functions instead of virConnectPtr.
Now we store domain related objects such as domain list, capabitilies
etc. within a single vz_driver vzDriver structure, which is shared by
all driver connections. It is allocated during daemon initialization or
in a lazy manner when a new connection to 'vz' driver is established.
When a connection to vz daemon drops, vzDestroyConnection is called,
which in turn relays disconnect event to all connection to 'vz' driver.
Maxim Nestratov [Mon, 28 Mar 2016 13:03:00 +0000 (16:03 +0300)]
vz: build driver as module and don't register it on client's side
Make it possible to build vz driver as a module and don't link it with
libvirt.so statically.
Remove registering it on client's side as far as we start relying on daemon
Pavel Hrdina [Wed, 13 Apr 2016 15:26:17 +0000 (17:26 +0200)]
build: fix build on RHEL-6
GCC in RHEL-6 complains about listen:
../../src/conf/domain_conf.c:23718: error: declaration of 'listen' shadows a global declaration [-Wshadow]
/usr/include/sys/socket.h:204: error: shadowed declaration is here [-Wshadow]
Trying to reload/SIGUSR1 virtlogd or virtlockd fails with:
error : virNetDaemonRun:747 : internal error: Not all servers restored, cannot run server
Commit 252610f7 changed the daemon state json to allow tracking
multiple servers. However it missed clearing dmn->srvObject after
the json is empty, like the previous code paths handled. Later on in
virNewDaemonRun, dmn->srvObject is expected to be empty otherwise we
throw the above error.
Peter Krempa [Fri, 1 Apr 2016 15:48:20 +0000 (17:48 +0200)]
qemu: process: Wire up ACPI OST events to notify users of failed memory unplug
Since qemu is now able to notify us that the guest rejected the memory
unplug operation we can relay this to the user and make the API fail
right away.
Additionally document the possible values from the ACPI docs for future
reference.
Peter Krempa [Fri, 1 Apr 2016 14:41:08 +0000 (16:41 +0200)]
qemu: monitor: Add support for ACPI_DEVICE_OST event handling
The event is emitted on ACPI OSPM Status Indication events.
ACPI standard documentation describes the method as:
This object is an optional control method that is invoked by OSPM to
indicate processing status to the platform. During device ejection,
device hot add, or other event processing, OSPM may need to perform
specific handshaking with the platform. OSPM may also need to indicate
to the platform its inability to complete a requested operation; for
example, when a user presses an ejection button for a device that is
currently in use or is otherwise currently incapable of being ejected.
In this case, the processing of the ACPI Eject Request notification by
OSPM fails. OSPM may indicate this failure to the platform through the
invocation of the _OST control method. As a result of the status
notification indicating ejection failure, the platform may take certain
action including reissuing the notification or perhaps turning on an
appropriate indicator light to signal the failure to the user.
Since we didn't opt to use one single event for device lifecycle for a
VM we are missing one last event if the device removal failed. This
event will be emitted once we asked to eject the device but for some
reason it is not possible.
Peter Krempa [Mon, 4 Apr 2016 15:17:43 +0000 (17:17 +0200)]
qemu: hotplug: Add support for signalling device unplug failure
Similarly to the DEVICE_DELETED event we will be able to tell when
unplug of certain device types will be rejected by the guest OS. Wire up
the device deletion signalling code to allow handling this.
Peter Krempa [Mon, 4 Apr 2016 13:08:40 +0000 (15:08 +0200)]
qemu: hotplug: Refactor semantics of qemuDomainWaitForDeviceRemoval
Neither of the callers cares whether the DEVICE_DELETED event isn't
supported or the event was received. Simplify the code and callers by
unifying the two values and changing the return value constants so that
a temporary variable can be omitted.
Peter Krempa [Mon, 4 Apr 2016 13:27:25 +0000 (15:27 +0200)]
qemu: hotplug: Properly handle errors in qemuDomainWaitForDeviceRemoval
Callers ignore if this function returns -1 and continue as though the
DEVICE_DELETED event was not received. Since we can't be sure that the
event was not received we should behave as if the event was not
supported and remove the device definition right away. The error
fortunately won't really happen here.
Ján Tomko [Wed, 13 Apr 2016 07:38:29 +0000 (09:38 +0200)]
qemu: assign addresses before aliases
The address assigning code might add new pci bridges.
We need them to have an alias when building the command line.
In real word usage, this is not a problem because all the code
paths already call qemuDomainAssignAddresses. However moving
this call lets us remove one extra call from qemuxml2argvtest.
Pavel Hrdina [Thu, 7 Apr 2016 11:08:58 +0000 (13:08 +0200)]
domain_conf: call ...ListensParseXML only for appropriate graphics
Instead of calling the virDomainGraphicsListensParseXML function for all
graphics types and ignore the wrong ones move the call only to graphics
types where we supports listen elements.
Pavel Hrdina [Sun, 10 Apr 2016 16:57:12 +0000 (18:57 +0200)]
domain_conf: cleanup virDomainGraphicsGetListen
Removes the check for graphics type, it's not a public API and developer
know what he's doing and this check makes no sense. It also removes
the ability to allocate a new array if there is none. This was used by
the virDomainGraphicsListenAdd* functions and isn't used anymore.
This is now a simple getter with simple check for listens array presence
and whether the index in out of bounds.
This effectively removes virDomainGraphicsListenSetAddress which was
used only to change the address of listen structure and possible change
the listen type. The new function will auto-expand the listens array
and append a new listen.
The old function was used on pre-allocated array of listens and in most
cases it only "add" a new listen. The two remaining uses can access the
listen structure directly.
Peter Krempa [Tue, 12 Apr 2016 13:03:19 +0000 (15:03 +0200)]
conf: virDomainDiskDefIotuneParse: Report malformed number errors
Rest of the fields of the iotune data structure did not check for
malformed integers. Use the previously defined macro to extract them
which will simplify the code and add error reporting.
Since the structure was pre-initialized to 0 we don't need to set every
single member to 0 if it's not present in the XML. Additionally if we
put the name of the field into the error message the code can be
simplified using a macro to parse the members.
Maxim Nestratov [Tue, 29 Mar 2016 08:21:17 +0000 (11:21 +0300)]
vz: remove drivername field from vzConn structure
No need to remember connection name and have corresponding
domain type to keep backward compatibility with former
'parallels' driver. It is enough to be able to accept 'parallels'
uri and domain types.
Ján Tomko [Mon, 11 Apr 2016 12:26:06 +0000 (14:26 +0200)]
conf: also mark the implicit video as primary
Commit 119cd06 started setting the primary bool for the first
user-specified video even if user omitted the 'primary' attribute.
However this was done before the addition of the implicit device.
This broke startup of transient qemu domains with no <video>:
https://bugzilla.redhat.com/show_bug.cgi?id=1325757
Move this default to virDomainDefPostParseInternal,
after the addition of the implicit video device, to catch the implicit
video as well.
Andrea Bolognani [Mon, 11 Apr 2016 14:45:53 +0000 (16:45 +0200)]
cfg.mk: Use single quotes wherever possible
Being consistent is nice, especially when it comes to defining our
regular expression, where using single quotes instead of double
quotes allows us to leave out a few backslashes.
Changing this required altering a few error messages.
The only remaining use of double quotes is one where they are
actually required for the check to work.
Quite straigthforward as vz sdk memory setting function makes
just what we want to that is set "amount of physical memory
allocated to a domain".
'useflags' is introduced for non flag function implementation.
We can't just use combination of flags like "live | config" or
we fail for inactive domains. Other combinations have drawbacks
too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Actually this is not pure refactoring. Part of common code is
replaced with virDomainObjUpdateModificationImpact and this
a good replacement. It includes removed check of inactive
domain and active flags set. Additionally we resolve
current flag in accordance with current state of domain.
Thus it becames possible to attach/detach devices for
inactive domains if this flag is set.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Pavel Hrdina [Mon, 11 Apr 2016 11:05:42 +0000 (13:05 +0200)]
domain_conf: fix graphics parsing
Commit dc98a5bc refactored the code a lot and forget about checking if
listen attribute is specified. This ensures that listen attribute and
first listen element are compared only if both exist.
Peter Krempa [Fri, 8 Apr 2016 08:08:37 +0000 (10:08 +0200)]
qemu: agent: Fix incorrect and weird debug/warning log entries
Replace the nonsensical debug statement by adding the expected event
code into the existing debug statement.
Since the monitor code always notifies the agent on guest
reboot/shutdown even if that was not initiated by the agent the warning
emitted later is bogus and pollutes the logs in such cases. Delete it
and keep just the original debug message where this info can be
inferred.
host-validate: Be more careful when checking for cgroup support
Simply checking whether the cgroup name appears somewhere inside
/proc/self/cgroup is enough most of the time, but there are some
corner cases that require a more mindful parsing.