Eric Blake [Tue, 14 Apr 2015 19:15:06 +0000 (13:15 -0600)]
build: include correct header for time()
Found by ./autobuild.sh during a mingw cross-compile:
Commit 8a96e87 was not innocuous - glibc happens to leak the
definition of time() through other headers, so that even without
<sys/select.h>, virrandom.c compiled just fine. But on mingw,
we were not so lucky; <sys/select.h> was important for its side
effect of dragging in <time.h>, and we now have nothing providing
the declaration of time():
../../src/util/virrandom.c: In function 'virRandomOnceInit':
../../src/util/virrandom.c:65:5: error: implicit declaration of function 'time' [-Werror=implicit-function-declaration]
unsigned int seed = time(NULL) ^ getpid();
^
../../src/util/virrandom.c:65:5: error: nested extern declaration of 'time' [-Werror=nested-externs]
John Ferlan [Tue, 14 Apr 2015 11:28:57 +0000 (07:28 -0400)]
qemu: Adjust the prototype to match the function
Changing the prototype to not have "int *index" since we'll soon be
disallowing index as a name. Curiously the original commit (a4504ac)
for the function used 'int idx' in the function - so they didn't match.
Now they do.
Huanle Han [Thu, 2 Apr 2015 15:56:19 +0000 (23:56 +0800)]
hostdev: fix loop index error when resetvfnetconfig
The variable 'last_processed_hostdev_vf' indicates index of the last
successfully configed vf. When resetvfnetconfig because of failure,
hostdevs[last_processed_hostdev_vf] should also be reset.
Huanle Han [Tue, 7 Apr 2015 18:40:15 +0000 (02:40 +0800)]
qemu: fix index error when clean up vport profile
1. 'last_good_net' indicates the index of last successfully configured
net. so def->nets[last_good_net] should also be clean up if error occurs.
2. if error occurs in 'virNetDevMacVLanVPortProfileRegisterCallback'
(second 'goto err_exit' in loop), we should also do
'virNetDevVPortProfileDisassociate' cleanup for the
'virNetDevVPortProfileAssociate'(first code block in loop). So we should
consider the net is successfully configured after first code block in
loop finishes.
Peter Krempa [Wed, 1 Apr 2015 17:00:20 +0000 (19:00 +0200)]
qemu: drivePivot: Fix assumption when 'block-job-complete' fails
QEMU does not abandon the mirror. The job carries on in the synchronised
phase and it might be either pivoted again or cancelled. The commit
hints that the described behavior was happening in a downstream version.
If the command returns false there are two possible options:
1) qemu did not reach the point where it would ask the block job to
pivot
2) pivotting failed in the actual qemu coroutine
If either of those would happen we return failure and reset the
condition that waits for the block job to complete. This makes the API
fail but in case where qemu would actually abandon the mirror the fact
is notified via the event and handled asynchronously.
Peter Krempa [Wed, 1 Apr 2015 07:47:04 +0000 (09:47 +0200)]
qemu: blockjob: Split qemuDomainBlockJobSetSpeed from qemuDomainBlockJobImpl
qemuDomainBlockJobImpl become an unmaintainable mess over the years of
adding new stuff to it. This patch starts splitting up individual
functions from it until it can be killed entirely.
In bulk this will add lines of code rather than delete them but it will
be traded for maintainability.
Peter Krempa [Tue, 31 Mar 2015 15:13:21 +0000 (17:13 +0200)]
qemu: monitor: Extract handling of JSON block job error codes
My intention is to split qemuMonitorJSONBlockJob() into simpler separate
functions for every block job type. Since the error handling code is the
same for all block jobs, this patch extracts the code into a separate
function that will later be reused in more places.
With the new helper qemuMonitorJSONErrorIsClass we can save a few
function calls as we can extract the error object once.
Peter Krempa [Thu, 9 Apr 2015 09:26:43 +0000 (11:26 +0200)]
qemu: monitor: json: Refactor error code class checker
Split out the function that checks the actual error class string into a
separate helper as it will be useful later and refactor
qemuMonitorJSONHasError to return bool type and remove few useless
checks.
Basically virJSONValueObjectHasKey are useless here since the next call
to virJSONValueObjectGet is checking the return value again (which can't
fail at that point). By removing the first check we save a function
call.
Peter Krempa [Tue, 7 Apr 2015 18:44:15 +0000 (20:44 +0200)]
qemu: Fix condition for checking vcpu when pinning vcpus
Previously we checked that the vcpu we are trying to set is in range of
the number of threads presented by qemu. The problem is that if the VM
is offline the count is 0. Since the condition subtracted 1 from the
count the number would overflow and the check would never trigger.
Change the condition for more sensible ones with specific error
messages.
Peter Krempa [Tue, 7 Apr 2015 18:09:04 +0000 (20:09 +0200)]
conf: Refactor virDomainVcpuPinDefParseXML
Refactor the code to parse the vcpupin in a similar way the iothreadpin
code is now structured. This allows to get rid of some very strange
conditions and error messages.
Additionally since a existing bug
( https://bugzilla.redhat.com/show_bug.cgi?id=1208434 ) allows to add
vcpupin definitions for vcpus that don't exist, this patch makes the
parser to ignore all vcpupins that don't have a matching vCPU in the
definition rather than just offlined ones.
Peter Krempa [Fri, 10 Apr 2015 08:48:34 +0000 (10:48 +0200)]
lib: snapshot: Explain that only one layer of images is inserted
When creating a snapshot with _REUSE_EXTERNAL when the pre-created image
does not directly link to the current active layer libvirt would
re-detect the backing chain incorrectly and it would not match with
qemu's view. Since the configuration is an operator mistake, document
that only the top layer image gets inserted.
Parallels Cloud Server supports block devices and virtual NIC
live attachment. I implemented that function for block devices so
OpenStack volume attachment is now works.
Signed-off-by: Alexander Burluka <aburluka@parallels.com>
OpenStack needs disk serial number setup because
nova boot --block-device-mapping command generates that param in
libvirt xml. I took QEMU libvirt driver behavior as a base.
QEMU driver skips inability to set serial and continues work.
So Parallels driver will ignore this param too and let domain
boot.
Erik Skultety [Fri, 10 Apr 2015 09:11:21 +0000 (11:11 +0200)]
virBitmap: Place virBitmapIsAllClear check after virBitmapParse calls
This patch adds checks for empty bitmaps right after the calls of
virBitmapParse. These only include spots where set API's are called and
where domain's XML is parsed.
Also, it partially reverts commit 983f5a which added a check for
invalid nodeset "0,^0" into virBitmapParse function. This change broke
the logic, as an empty bitmap should not cause an error.
When pre-creating storage for domains, we need to find corresponding
disk in the XML on the destination (domain XML may differ there, e.g.
disk is accessible under different path). For better debugging, I'm
printing all info I received on a disk. But there was a typo when
printing the disk capacity: "%lluu" instead of "%llu".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Xing Lin [Thu, 9 Apr 2015 22:02:02 +0000 (16:02 -0600)]
qemu_migration.c: sleep first before checking for migration status.
The problem with the previous implementation is,
even when qemuMigrationUpdateJobStatus() detects a migration job
has completed, it will do a sleep for 50 ms (which is unnecessary
and only adds up to the VM pause time).
Signed-off-by: Xing Lin <xinglin@cs.utah.edu> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
John Ferlan [Wed, 8 Apr 2015 13:25:47 +0000 (09:25 -0400)]
qemu: qemuDomainHotplugVcpus - separate out pin adjustment code
Future IOThread setting patches would copy the code anyway, so create
and generalize the adding of pindef for the vcpu and the pinning of the
thread into their own APIs.
Luyao Huang [Tue, 10 Mar 2015 00:04:58 +0000 (20:04 -0400)]
util: Update virNetDevGetIPAddress to get IPv6 addresses
Add static virNetDevGetifaddrsAddress to attempt to get the interface
IP address. If getifaddrs is not supported, fall back to
virNetDevGetIPv4AddressIoctl to get the IP address.
This allows IPv6 addresses to be used for <listen type='network>
with device-backed networks.
John Ferlan [Tue, 10 Mar 2015 00:04:57 +0000 (20:04 -0400)]
util: Replace virNetDevGetIPv4Address with virNetDevGetIPAddress
Rename it to virNetDevGetIPv4AddressIoctl and make
virNetDevGetIPAddress a wrapper around it, allowing
other ways of getting the address to be implemented,
and still falling back to the old method.
Signed-off-by: John Ferlan <jferlan@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com>
parallels: fix virDomainDefineXML for domain in saved state
PCS doesn't store domain config in managed save state file.
It's forbidden to change config for VMs in this state.
It's possible to change config for containers, but after
restoring domain will have that new config, not a config,
which domain had at the moment of virDomainManagedSave.
So we need to handle this case differently from other states.
Let's forbid this operation, if config is changed and if it's
not changed - just do nothing.
Openstack/nova calls virDomainDefineXML on resume with
current domain config, so we can't forbid this operation
in managed save state.
conf: return proper default video type for parallels
Fix function virDomainVideoDefaultType for
parallels VMs and containers. It should return
VGA for VMs and VIR_DOMAIN_VIDEO_TYPE_PARALLELS
for containers.
conf: add VIR_DOMAIN_VIDEO_TYPE_PARALLELS video type
We support VNC for containers to have the same
interface with VMs. At this moment it just renders
linux text console.
Of course we don't pass any physical devices and
don't emulate virtual devices. Our VNC server
renders text from terminal master and sends
input events from VNC client to terminal.
So add special video type VIR_DOMAIN_VIDEO_TYPE_PARALLELS
for these pseudo-devices.
We handle this parameter for VMs while defining
domains, so let's get this property from PCS and
set corresponding field of virDomainNetDef in
prlsdkLoadDomains function.
Call virDomainDefAddImplicitControllers to add disk
controllers, so virDomainDef, filled by this function
will look exactly like the one returned by virDomainDefParseString.
Implement virDomainManagedSave api function. In PCS
this feature called "suspend". You can suspend VM or
CT while it is in running or paused state. And after
resuming (or starting) it will have the same state, as
before suspend.
Split function prlsdkDomainChangeState into
prlsdkDomainChangeStateLocked and prlsdkDomainChangeState.
So it can be used from places, where virDomainObj already
found and locked.
Return value of functions prlsdkStart/Kill/Stop e.t.c.
is PRL_RESULT in parallels_sdk.c and int in parallels_sdk.h.
PRL_RESULT is int, so compiler didn't report errors.
Let's fix the difference.
If the backend driver updates the pool available and/or allocation values,
then the storage_driver VolCreateXML, VolCreateXMLFrom, and VolDelete APIs
should not change the value; otherwise, it will appear as if the values
were "doubled" for each change. Additionally since unsigned arithmetic will
be used depending on the size and operation, either or both values could be
appear to be much larger than they should be (in the EiB range).
Currently only the disk pool updates the values, but other pools could.
Assume a "fresh" disk pool of 500 MiB using /dev/sde:
John Ferlan [Fri, 27 Mar 2015 15:04:22 +0000 (11:04 -0400)]
storage: Need to update freeExtent at delete primary partition
Commit id '471e1c4e' only considered updating the pool if the extended
partition was removed. As it turns out removing a primary partition
would also need to update the freeExtent list otherwise the following
sequence would fail (assuming a "fresh" disk pool for /dev/sde of 500M):
When creating a volume in a pool, the creation allows the 'capacity'
value to be larger than the available space in the pool. As long as
the 'allocation' value will fit in the space, the volume will be created.
However, resizing the volume checks were made with the new absolute
capacity value against existing capacity + the available space without
regard for whether the new absolute capacity was actually allocating
space or not. For example, a pool with 75G of available space creates
a volume of 10G using a capacity of 100G and allocation of 10G will succeed;
however, if the allocation used a capacity of 10G instead and then tried
to resize the allocation to 100G the code would fail to allow the backend
to try the resize.
Furthermore, when updating the pool "available" and "allocation" values,
the resize code would just "blindly" adjust them regardless of whether
space was "allocated" or just "capacity" was being adjusted. This left
a scenario whereby a resize to 100G would fail; however, a resize to 50G
followed by one to 100G would both succeed. Again, neither was adjusting
the allocation value, just the "capacity" value.
This patch adds more logic to the resize code to understand whether the
new capacity value is actually "allocating" space as well and whether it
shrinking or expanding. Since unsigned arithmatic is involved, the possibility
that we adjust the pool size values incorrectly is probable.
This patch also ensures that updates to the pool values only occur if we
actually performed the allocation.
NB: The storageVolDelete, storageVolCreateXML, and storageVolCreateXMLFrom
each only updates the pool allocation/availability values by the target
volume allocation value.
Peter Krempa [Wed, 1 Apr 2015 08:13:34 +0000 (10:13 +0200)]
qemu: blockPivot: Don't pause the VM any more since we don't use drive-reopen
Support for drive-reopen was never present in the upstream code so we
don't need to pause the VM when doing the block pivot. Kill all the
code related to this semi-upstream artifact.
Open /proc/PID/ns/* read-only to avoid getting permission denied
lxc-enter-namespace stopped working on recent kernels (at least 3.19+)
due to /proc/PID/ns/* file descriptors being opened RW. From outside
the namespace these can only be opened RO.
lxc: create the required directories upon driver start
/var/run may reside on a tmpfs and we fail to create the PID file if
/var/run/lxc does not exist.
Since commit 0a8addc1, the lxc driver's state directory isn't
automatically created before starting a domain. Now, the lxc driver
makes sure the state directory exists when it initializes.
Peter Krempa [Wed, 8 Apr 2015 08:57:07 +0000 (10:57 +0200)]
util: file: Don't carelessly sanitize URIs
rfc3986 states that the separator in URI path is a single slash.
Multiple slashes may potentially lead to different resources and thus we
should not remove them.
Like we are doing in qemu driver (ea576ee543d6fb95583), lets call
virNumaSetupMemoryPolicy() only if really needed. Problem is, if
we numa_set_membind() child, there's no way to change it from the
daemon afterwards. So any later attempts to change the pinning
will fail. But in very weird way - CGroups will be set, but due
to membind child will not allocate memory from any other node.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
131,088 bytes in 16 blocks are definitely lost in loss record 2,174 of 2,176
at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x52A026F: virReallocN (viralloc.c:245)
by 0x52BFCB5: saferead_lim (virfile.c:1268)
by 0x52C00EF: virFileReadLimFD (virfile.c:1328)
by 0x52C019A: virFileReadAll (virfile.c:1351)
by 0x52A5D4F: virCgroupGetValueStr (vircgroup.c:763)
by 0x1DDA0DA3: qemuRestoreCgroupState (qemu_cgroup.c:805)
by 0x1DDA0DA3: qemuConnectCgroup (qemu_cgroup.c:857)
by 0x1DDB7BA1: qemuProcessReconnect (qemu_process.c:3694)
by 0x52FD171: virThreadHelper (virthread.c:206)
by 0x82B8DF4: start_thread (pthread_create.c:308)
by 0x85C31AC: clone (clone.S:113)
Since the holdtime is not supported by VBOX SDK, it's being simulated
by sleeping before sending the key-up codes. The key-up codes are
auto-generated based on XT codeset rules (adding of 0x80 to key-down)
which results in the same behavior as for QEMU implementation.
Oh little me, said the domain, what will I do with so little memory.
If I only had a few megabytes more. But the old admin noticed the
whimpering, barely audible to untrained human ear. And good admin he
was, he gave the domain yet more memory. But the old NUMA topology
witch forbade to allocate more memory on the node zero. So he
decided to allocate it on a different node:
The little domain was happy. For a while. Until bad, sharp teeth
shaped creature came. Every process in the system was afraid of him.
The OOM Killer they called him. Oh no, he's after the little domain.
There's no escape.
Do you kids know why? Because when the little domain was born, her
father, Libvirt, called numa_set_membind(). So even if the admin
allowed her to allocate memory from other nodes in the cgroups, the
membind() forbid it.
So what's the lesson? Libvirt should rely on cgroups, whenever
possible and use numa_set_membind() as the last ditch effort.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Tue, 31 Mar 2015 09:39:13 +0000 (11:39 +0200)]
vircgroup: Introduce virCgroupControllerAvailable
This new internal API checks if given CGroup controller is
available. It is going to be needed later when we need to make a
decision whether pin domain memory onto NUMA nodes using cpuset
CGroup controller or using numa_set_membind().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michael Chapman [Wed, 8 Apr 2015 06:51:51 +0000 (16:51 +1000)]
qemu_driver: check caps after starting block job
Currently we check qemuCaps before starting the block job. But qemuCaps
isn't available on a stopped domain, which means we get a misleading
error message in this case:
# virsh domstate example
shut off
# virsh blockjob example vda
error: unsupported configuration: block jobs not supported with this QEMU binary
Move the qemuCaps check into the block job so that we are guaranteed the
domain is running.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Michael Chapman [Wed, 8 Apr 2015 06:51:42 +0000 (16:51 +1000)]
qemu_migrate: use nested job when adding NBD to cookie
qemuMigrationCookieAddNBD is usually called from within an async
MIGRATION_OUT or MIGRATION_IN job, so it needs to start a nested job.
(The one exception is during the Begin phase when change protection
isn't enabled, but qemuDomainObjEnterMonitorAsync will behave the same
as qemuDomainObjEnterMonitor in this case.)
This bug was encountered with a libvirt client that repeatedly queries
the disk mirroring block job info during a migration. If one of these
queries occurs just as the Perform migration cookie is baked, libvirt
crashes.
Relevant logs are as follows:
6701: warning : qemuDomainObjEnterMonitorInternal:1544 : This thread seems to be the async job owner; entering monitor without asking for a nested job is dangerous
[1] 6701: info : qemuMonitorSend:972 : QEMU_MONITOR_SEND_MSG: mon=0x7fefdc004700 msg={"execute":"query-block","id":"libvirt-629"}
[2] 6699: info : qemuMonitorIOWrite:503 : QEMU_MONITOR_IO_WRITE: mon=0x7fefdc004700 buf={"execute":"query-block","id":"libvirt-629"}
[3] 6704: info : qemuMonitorSend:972 : QEMU_MONITOR_SEND_MSG: mon=0x7fefdc004700 msg={"execute":"query-block-jobs","id":"libvirt-630"}
[4] 6699: info : qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_REPLY: mon=0x7fefdc004700 reply={"return": [...], "id": "libvirt-629"}
6699: error : qemuMonitorJSONIOProcessLine:211 : internal error: Unexpected JSON reply '{"return": [...], "id": "libvirt-629"}'
At [1] qemuMonitorBlockStatsUpdateCapacity sends its request, then waits
on mon->notify. At [2] the request is written out to the monitor socket.
At [3] qemuMonitorBlockJobInfo sends its request, and also waits on
mon->notify. The reply from the first request is received at [4].
However, qemuMonitorJSONIOProcessLine is not expecting this reply since
the second request hadn't completed sending. The reply is dropped and an
error is returned.
qemuMonitorIO signals mon->notify twice during its error handling,
waking up both of the threads waiting on it. One of them clears mon->msg
as it exits qemuMonitorSend; the other crashes:
qemuMonitorSend (mon=0x7fefdc004700, msg=<value optimized out>) at qemu/qemu_monitor.c:975
975 while (!mon->msg->finished) {
(gdb) print mon->msg
$1 = (qemuMonitorMessagePtr) 0x0
Signed-off-by: Michael Chapman <mike@very.puzzling.org>