Laine Stump [Mon, 24 Jun 2013 16:00:51 +0000 (12:00 -0400)]
qemu: don't reset PCI devices being assigned with VFIO
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.
Jiri Denemark [Tue, 7 May 2013 12:29:19 +0000 (14:29 +0200)]
Extensible migration APIs
This patch introduces two new APIs virDomainMigrate3 and
virDomainMigrateToURI3 that may be used in place of their older
variants. These new APIs take optional migration parameters (such as
bandwidth, domain XML, ...) in an array of virTypedParameters, which
makes adding new parameters easier as there's no need to introduce new
APIs whenever a new migration parameter needs to be added. Both APIs are
backward compatible and will automatically use older migration calls in
case the new calls are not supported as long as the typed parameters
array does not contain any parameter which was not supported by the
older calls.
Jiri Denemark [Thu, 6 Jun 2013 16:54:48 +0000 (18:54 +0200)]
Introduce VIR_TYPED_PARAMS_DEBUG macro for dumping typed params
All APIs that take typed parameters are only using params address in
their entry point debug messages. With the new VIR_TYPED_PARAMS_DEBUG
macro, all functions can easily log all individual typed parameters
passed to them.
Jiri Denemark [Tue, 18 Jun 2013 07:24:57 +0000 (09:24 +0200)]
util: Emit proper error code in virTypedParamsValidate
When unsupported parameter is passed to virTypedParamsValidate,
VIR_ERR_ARGUMENT_UNSUPPORTED should be returned rather than
VIR_ERR_INVALID_ARG, which is more appropriate for supported parameters
used incorrectly.
Laine Stump [Tue, 4 Jun 2013 19:54:45 +0000 (15:54 -0400)]
pci: make virPCIDeviceDetach consistent in behavior
virPCIDeviceDetach would previously sometimes consume the input device
object (to put it on the inactive list) and sometimes not. Avoiding
memory leaks required checking beforehand to see if the device was
already on the list, and freeing the device object in the caller only
if there wasn't already an identical object on the inactive list.
This patch makes it consistent - virPCIDeviceDetach will *never*
consume the input virPCIDevice object; if it needs to put one on the
inactive list, it will create a copy and put *that* on the list. This
way the caller knows that it is always their responsibility to free
the device object they created.
Laine Stump [Tue, 4 Jun 2013 20:02:33 +0000 (16:02 -0400)]
pci: eliminate memory leak in virPCIDeviceReattach
virPCIDeviceReattach was making the assumption that the dev object
given to it was one and the same with the dev object on the
inactiveDevs list. If that had been the case, it would not need to
free the dev object it removed from the inactive list, because the
caller of virPCIDeviceReattach always frees the dev object that it
passes in. Since the dev object passed in is *never* the same object
that's on the list (it is a different object with the same name and
attributes, created just for the purpose of searching for the actual
object), simply doing a "ListSteal" to remove the object from the list
results in one leaked object; we need to actually free the object
after removing it from the list.
Laine Stump [Fri, 31 May 2013 15:06:32 +0000 (11:06 -0400)]
pci: new utility functions
* virPCIDeviceFindByIDs - find a device on a list w/o creating an object
This makes searching for an existing device on a list lighter weight.
* virPCIDeviceCopy - make a copy of an existing virPCIDevice object.
* virPCIDeviceGetDriverPathAndName - construct new strings containing
1) the name of the driver bound to this device.
2) the full path to the sysfs config for that driver.
(This code was lifted from virPCIDeviceUnbindFromStub, and replaced
there with a call to this new function).
Laine Stump [Fri, 31 May 2013 18:26:56 +0000 (14:26 -0400)]
pci: change stubDriver from const char* to char*
Previously stubDriver was always set from a string literal, so it was
okay to use a const char * that wasn't freed when the virPCIDevice was
freed. This will not be the case in the near future, so it is now a
char* that is allocated in virPCIDeviceSetStubDriver() and freed
during virPCIDeviceFree().
Jim Fehlig [Mon, 13 May 2013 18:14:21 +0000 (12:14 -0600)]
libxl: support qdisk backend
libxl supports the LIBXL_DISK_BACKEND_QDISK disk backend, where qemu
is used to provide the disk backend. This patch simply maps the
existing <driver name='qemu'/> to LIBXL_DISK_BACKEND_QDISK.
Add a script which parses the driver API code and validates
that every API registered in a virNNNDriverPtr table contains
an ACL check matching the API name.
NB this currently whitelists a few xen driver functions
which are temporarily lacking in access control checks.
The xen driver is considered insecure until these are
fixed.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Auto-generate helpers for checking access control rules
Extend the 'gendispatch.pl' script to be able to generate
three new types of file.
- 'aclheader' - defines signatures of helper APIs for
doing authorization checks. There is one helper API
for each API requiring an auth check. Any @acl
annotations result in a method being generated with
a suffix of 'EnsureACL'. If the ACL check requires
examination of flags, an extra 'flags' param will be
present. Some examples
extern int virConnectBaselineCPUEnsureACL(void);
extern int virConnectDomainEventDeregisterEnsureACL(virDomainDefPtr domain);
extern int virDomainAttachDeviceFlagsEnsureACL(virDomainDefPtr domain, unsigned int flags);
Any @aclfilter annotations resuilt in a method being
generated with a suffix of 'CheckACL'.
extern int virConnectListAllDomainsCheckACL(virDomainDefPtr domain);
These are used for filtering individual objects from APIs
which return a list of objects
- 'aclbody' - defines the actual implementation of the
methods described above. This calls into the access
manager APIs. A complex example:
/* Returns: -1 on error (denied==error), 0 on allowed */
int virDomainAttachDeviceFlagsEnsureACL(virConnectPtr conn,
virDomainDefPtr domain,
unsigned int flags)
{
virAccessManagerPtr mgr;
int rv;
if (!(mgr = virAccessManagerGetDefault()))
return -1;
Declare the access control requirements for the API. May be repeated
multiple times, if multiple rules are required.
<object> is one of 'connect', 'domain', 'network', 'storagepool',
'interface', 'nodedev', 'secret'.
<permission> is one of the permissions in access/viraccessperm.h
<flagname> indicates the rule only applies if the named flag
is set in the API call
@aclfilter: <object>:<permission>
Declare an access control filter that will be applied to a list
of objects being returned by an API. This allows the returned
list to be filtered to only show those the user has permissions
against
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an access control driver that uses the pkcheck command
to check authorization requests. This is fairly inefficient,
particularly for cases where an API returns a list of objects
and needs to check permission for each object.
It would be desirable to use the polkit API but this links
to glib with abort-on-OOM behaviour, so can't be used. The
other alternative is to speak to dbus directly
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new 'access_drivers' config parameter to the libvirtd.conf
configuration file. This allows admins to setup the default
access control drivers to use for API authorization. The same
driver is to be used by all internal drivers & APIs
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Set conn->driver before running driver connectOpen method
The access control checks in the 'connectOpen' driver method
will require 'conn->driver' to be non-NULL. Set this before
running the 'connectOpen' method and NULL-ify it again on
failure.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch introduces the virAccessManagerPtr class as the
interface between virtualization drivers and the access
control drivers. The viraccessperm.h file defines the
various permissions that will be used for each type of object
libvirt manages
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ján Tomko [Mon, 24 Jun 2013 06:35:59 +0000 (08:35 +0200)]
Get rid of useless VIR_STORAGE_FILE_FEATURE_NONE
It's not used anywhere except for the switch in
virStorageBackendCreateQemuImgOpts, where leaving it in causes
a dead code coverity warning and omitting it breaks compilation
because of unhandled enum value.
Jim Fehlig [Thu, 20 Jun 2013 17:38:37 +0000 (11:38 -0600)]
libxl: Allow libxl to set NIC devid
libxl contains logic to determine an appropriate devid for new devices
that do not specify one in their configuration. For all device types
except NICs, the libxl driver allows libxl to determine devid. Do the
same for NICs.
Ján Tomko [Thu, 16 May 2013 10:38:26 +0000 (12:38 +0200)]
conf: add features to volume target XML
Add <features> and <compat> elements to volume target XML.
<compat> is a string which for qcow2 represents the QEMU version
it should be compatible with. Valid values are 0.10 and 1.1.
1.1 is implicit if the <features> element is present, otherwise
qemu-img default is used. 0.10 can be specified to explicitly
create older images after the qemu-img default changes.
<features> contains optional features, so far
<lazy_refcounts/> is available, which enables caching of reference
counters, improving performance for snapshots.
qemu/qemu_hotplug.c: In function 'qemuDomainChangeGraphics':
qemu/qemu_hotplug.c:1980:39: error: declaration of 'listen' shadows a
global declaration [-Werror=shadow]
When checking for a collision of a new libvirt network's subnet with
any existing routes, we read all of /proc/net/route into memory, then
parse all the entries. The function that we use to read this file
requires a "maximum length" parameter, which had previously been set
to 64*1024. As each line in /proc/net/route is 128 bytes, this would
allow for a maximum of 512 entries in the routing table.
This patch increases that number to 128 * 100000, which allows for
100,000 routing table entries. This means that it's possible that 12MB
would be allocated, but that would only happen if there really were
100,000 route table entries on the system, it's only held for a very
short time.
Since there is no method of specifying and unlimited max (and that
would create a potential denial of service anyway) hopefully this
limit is large enough to accomodate everyone.
Michal Privoznik [Thu, 20 Jun 2013 16:27:35 +0000 (18:27 +0200)]
qemuDomainChangeGraphics: Check listen address change by listen type
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in 7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
libxl: populate xenstore memory entries at startup, handle dom0_mem
libxl uses some xenstore entries for hints in memory management
(especially when starting new domain). This includes dom0 memory limit
and Xen free memory margin, based on current system state. Entries are
created at first function usage, so force such call at daemon startup,
which most likely will be before any domain startup.
Also prevent automatic memory management if dom0_mem= option passed to
xen hypervisor - it is known to be incompatible with autoballoon.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
John Ferlan [Tue, 28 May 2013 19:00:59 +0000 (15:00 -0400)]
lxc: Resolve issue with GetScheduler APIs for non running domain
As a consequence of the cgroup layout changes from commit 'cfed9ad4', the
lxcDomainGetSchedulerParameters[Flags]()' and lxcGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
John Ferlan [Tue, 28 May 2013 18:52:39 +0000 (14:52 -0400)]
qemu: Resolve issue with GetScheduler APIs for non running domain
As a consequence of the cgroup layout changes from commit '632f78ca', the
qemuDomainGetSchedulerParameters[Flags]()' and qemuGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Ján Tomko [Thu, 16 May 2013 06:24:56 +0000 (08:24 +0200)]
conf: split out snapshot disk XML formatting
Just to reduce the indentation levels. Remove the unneeded
NULL check for disk->file, as virBufferEscapeString doesn't
print anything with NULL arguments.
This flag is meant for errors happening on the source of the migration
and isn't used on the destination. To allow better migration
compatibility, don't propagate it to the destination.
Peter Krempa [Wed, 12 Jun 2013 14:11:21 +0000 (16:11 +0200)]
migration: Make erroring out on I/O error controllable by flag
Paolo Bonzini pointed out that it's actually possible to migrate a qemu
instance that was paused due to I/O error and it will be able to work on
the destination if the storage is accessible.
This patch introduces flag VIR_MIGRATE_ABORT_ON_ERROR that cancels the
migration in case an I/O error happens while it's being performed and
allows migration without this flag. This flag can be possibly used for
other error reasons that may be introduced in the future.
Michal Privoznik [Mon, 10 Jun 2013 13:35:03 +0000 (15:35 +0200)]
qemu_migration: Move waiting for SPICE migration
Currently, we wait for SPICE to migrate in the very same loop where we
wait for qemu to migrate. This has a disadvantage of slowing seamless
migration down. One one hand, we should not kill the domain until all
SPICE data has been migrated. On the other hand, there is no need to
wait in the very same loop and hence slowing down 'cont' on the
destination. For instance, if users are watching a movie, they can
experience the movie to be stopped for a couple of seconds, as
processors are not running nor on src nor on dst as libvirt waits for
SPICE to migrate. We should move the waiting phase to migration CONFIRM
phase.
Osier Yang [Mon, 3 Jun 2013 10:05:32 +0000 (18:05 +0800)]
nodedev_udev: Enumerate scsi generic device
Since scsi generic device doesn't have DEVTYPE property set, the
only way to know if it's a scsi generic device or not is to read
the "SUBSYSTEM" property.
Guannan Ren [Tue, 18 Jun 2013 06:09:20 +0000 (14:09 +0800)]
qemu: set QEMU_CAPS_DEVICE_VIDEO_PRIMARY cap flag in QMP detection
When qemu >= 1.20, it is safe to use -device for primary video
device as described in 4c993d8ab.
So, we are missing the cap flag in QMP capabilities detection, this
flag can be initialized safely in virQEMUCapsInitQMPBasic.
Osier Yang [Mon, 3 Jun 2013 10:05:31 +0000 (18:05 +0800)]
nodedev_udev: Refactor udevGetDeviceType
Checking if the "devtype" is NULL along with each "if" statements
is bad. It wastes the performance, and also not good for reading.
And also when the "devtype" is NULL, the logic is also not clear.
This reorgnizes the logic of with "if...else" and a bunch of "else if".
Other changes:
* Change the function style.
* Remove the useless debug statement.
* Get rid of the goto
* New helper udevDeviceHasProperty to simplify the logic for checking
if a property is existing for the device.
* Add comment to clarify "PCI devices don't set the DEVTYPE property"
* s/sysfs path/sysfs name/, as udev_device_get_sysname returns the
name instead of the full path. E.g. "sg0"
* Refactor the comment for setting VIR_NODE_DEV_CAP_NET cap type
a bit.
Osier Yang [Mon, 3 Jun 2013 10:05:30 +0000 (18:05 +0800)]
nodedev: Expose sysfs path of device
The name format is constructed by libvirt, it's not that clear to
get what the device's sysfs path should be. This exposes the device's
sysfs path by a new tag <path>.
Since the sysfspath is filled during enumerating the devices by
either udev or HAL. It's an output-only tag.
Peter Krempa [Mon, 10 Jun 2013 12:41:22 +0000 (14:41 +0200)]
remote: Forbid default "/session" connections when using ssh transport
Without the socket path explicitly specified, the remote driver tried to
connect to the "/system" instance socket even if "/session" was
specified in the uri. With this patch this configuration now produces an
error.
It is still possible to initiate a session connection with specifying
the path to the socket manually and also manually starting the session
daemon. This was also possible prior to this patch,
This is a minimal fix. We may decide to support remote session
connections using ssh but this will require changes to the remote driver
code so this fix shouldn't cause regressions in the case we decide to do
that.
Frediano Ziglio [Thu, 13 Jun 2013 15:35:11 +0000 (16:35 +0100)]
Implement dispose method for libxlDomainObjPrivate
When creating a timer/event handler reference counting is used. So it could
be possible (in theory) that libxlDomainObjPrivateFree is called with
reference counting >1. The problem is that libxlDomainObjPrivateFree leave
the object in an invalid state with ctx freed (but still having dandling
pointer). This can lead timer/event handler to core.
This patch implements a dispose method for libxlDomainObjPrivate, and moves
freeing the libxl ctx to the dispose method, ensuring the ctx is valid while
the object's reference count is > 0.
libxl: allow only 'ethernet' and 'bridge' interfaces, allow script there
Actually only those interface types are handled correctly so reject
others instead of ignoring settings (i.e. treating as bridge/ethernet
anyway).
Also allow <script/> in 'ethernet' (which should be the only
script-allowing type). Keep <script/> allowed in bridge to be compatible
with legacy 'xen' driver.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Ján Tomko [Tue, 11 Jun 2013 13:03:17 +0000 (15:03 +0200)]
qemu: allow restore with non-migratable XML input
Convert input XML to migratable before using it in
qemuDomainSaveImageOpen.
XML in the save image is migratable, i.e. doesn't contain implicit
controllers. If these controllers were in a non-default order in the
input XML, the ABI check would fail. Removing and re-adding these
controllers fixes it.
Jim Fehlig [Fri, 17 May 2013 15:30:08 +0000 (09:30 -0600)]
libxl: set bootloader for PV domains if not specified
The legacy xen toolstack will set pygrub as the bootloader if not
specified. For compatibility, do the same in the libxl driver
iff not using direct kernel boot.
Jim Fehlig [Fri, 17 May 2013 15:17:03 +0000 (09:17 -0600)]
libxl: Report connect type as Xen
Currently, the libxl driver reports a connection type of "xenlight".
To be compatible with the legacy Xen driver, it should return "Xen".
Note: I noticed this while testing the libxl driver on OpenStack.
After switching my Xen compute nodes to use the libxl stack, I
could no longer launch instances on those nodes since
hypervisor_type was reported as "xenlight" instead of "xen".