Igor Mammedov [Fri, 1 Jul 2016 15:53:56 +0000 (17:53 +0200)]
apic: Use apic_id as apic's migration instance_id
instance_id is generated by last_used_id + 1 for a given device type
so for QEMU with 3 CPUs instance_id for APICs is a seti of [0, 1, 2]
When CPU in the middle is hot-removed and migration started
APICs with instance_ids 0 and 2 are transferred in migration stream.
However target starts with 2 CPUs and APICs' instance_ids are
generated from scratch [0, 1] hence migration fails with error
Unknown savevm section or instance 'apic' 2
Fix issue by manually registering APIC's vmsd with apic_id as
instance_id, in this case instance_id on target will always
match instance_id on source as apic_id is the same for a given
cpu instance.
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Igor Mammedov [Thu, 14 Jul 2016 14:58:02 +0000 (16:58 +0200)]
apic: kvm-apic: Fix crash due to access to freed memory region
kvm-apic.io_memory memory region had its parent set to NULL at
memory_region_init_io() time, so it ended up as a child in
/unattached contaner.
As result when kvm-apic instance was deleted, the child property
/unattached/kvm-apic-msi[XXX] contained a reference to
kvm-apic.io_memory address which was freed as part of kvm-apic.
Do the same as 'apic' and make kvm-apic instance the owner
of the memory region so that it won't end up in /unattached
and gets cleanly released along with related kvm-apic instance.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Igor Mammedov [Fri, 27 May 2016 11:50:48 +0000 (13:50 +0200)]
pc: cpu: Allow device_add to be used with x86 cpu
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Igor Mammedov [Mon, 18 Jul 2016 08:32:36 +0000 (10:32 +0200)]
pc: Enforce adding CPUs contiguously and removing them in opposite order
It will still allow us to use cpu_index as migration instance_id
since when CPUs are added contiguously (from the first to the last)
and removed in opposite order, cpu_index stays stable and it's
reproducible on destination side.
While there is work in progress to support migration when there
are holes in cpu_index range resulting from out-of-order plug or
unplug, this patch is intended as an interim solution until
cpu_index usage is cleaned up.
As result of this patch it would be possible to plug/unplug CPUs,
but in limited order that doesn't break migration.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Igor Mammedov [Mon, 18 Jul 2016 08:31:22 +0000 (10:31 +0200)]
pc: Forbid BSP removal
Boot CPU is assumed to always present in QEMU code, so
untile that assumptions are gone, deny removal request,
In another words QEMU won't support BSP hot-unplug.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Igor Mammedov [Thu, 14 Jul 2016 16:54:31 +0000 (18:54 +0200)]
pc: Delay setting number of boot CPUs to machine_done time
Currently present CPUs counter in CMOS only contains
smp_cpus (i.e. initial CPUs specified with -smp X) and
doesn't account for CPUs created with -device.
If VM is started with additional CPUs added with
-device, it will hang in BIOS waiting for condition
smp_cpus == counted_cpus
forever as counted_cpus will include -device CPUs as well
and be more than smp_cpus.
Make present CPUs counter in CMOS to count all CPUs
(initial and coldplugged with -device) by delaying
it to machine done time when it possible to count
CPUs added with -device.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Igor Mammedov [Wed, 6 Jul 2016 06:20:52 +0000 (08:20 +0200)]
target-i386: cpu: Do not ignore error and fix apic parent
object_property_add_child() silently fails with error that it can't
create duplicate propery 'apic' as we already have 'apic' property
registered for 'apic' feature. As result generic device_realize puts
apic into unattached container.
As it's programming error, abort if name collision happens in future
and fix property name for apic_state to 'lapic', this way apic is
a child of cpu instance.
Paolo Bonzini [Tue, 12 Jul 2016 09:15:44 +0000 (11:15 +0200)]
target-i386: Add support for UMIP and RDPID CPUID bits
These are both stored in CPUID[EAX=7,EBX=0].ECX. KVM is going to
be able to emulate both (albeit with a performance loss in the case
of RDPID, which therefore will be in KVM_GET_EMULATED_CPUID rather
than KVM_GET_SUPPORTED_CPUID).
It's also possible to implement both in TCG, but this is for 2.8.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Igor Mammedov [Wed, 6 Jul 2016 06:20:41 +0000 (08:20 +0200)]
target-i386: Replace custom apic-id setter/getter with static property
Custom apic-id setter/getter doesn't do any property specific
checks anymore, so clean it up and use more compact static
property DEFINE_PROP_UINT32 instead.
Igor Mammedov [Wed, 6 Jul 2016 06:20:40 +0000 (08:20 +0200)]
pc: cpu: Consolidate apic-id validity checks in pc_cpu_pre_plug()
Machine code knows about all possible APIC IDs so use that
instead of hack which does O(n^2) complexity duplicate
checks, interating over global CPUs list.
As result duplicate check is done only once with O(log n) complexity.
target-i386: Set physical address bits based on host
Add the host-phys-bits boolean property, if true, take phys-bits
from the hosts physical bits value, overriding either the default
or the user specified value.
We can also use the value we read from the host to check the users
explicitly set value and warn them if it doesn't match.
Note:
a) We only read the hosts value in KVM mode (because on non-x86
we get an abort if we try)
b) We don't warn about trying to use host-phys-bits in TCG mode,
we just fall back to the TCG default. This allows the machine
type to set the host-phys-bits flag if it wants and then to
work in both TCG and KVM.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Igor Mammedov [Wed, 6 Jul 2016 06:20:38 +0000 (08:20 +0200)]
pc: Add x86_topo_ids_from_apicid()
It's reverse of apicid_from_topo_ids() and will be used in follow up
patches to fill in data structures for query-hotpluggable-cpus and
for user friendly error reporting.
Igor Mammedov [Wed, 6 Jul 2016 06:20:37 +0000 (08:20 +0200)]
target-i386: Use uint32_t for X86CPU.apic_id
Redo 9886e834 (target-i386: Require APIC ID to be explicitly set before
CPU realize) in another way that doesn't use int64_t to detect
if apic-id property has been set.
Use the fact that 0xFFFFFFFF is the broadcast
value that a CPU can't have and set default
uint32_t apic_id to it instead of using int64_t.
Later uint32_t apic_id will be used to drop custom
property setter/getter in favor of static property.
Fill the bits between 51..number-of-physical-address-bits in the
MTRR_PHYSMASKn variable range mtrr masks so that they're consistent
in the migration stream irrespective of the physical address space
of the source VM in a migration.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
target-i386: Mask mtrr mask based on CPU physical address limits
The CPU GPs if we try and set a bit in a variable MTRR mask above
the limit of physical address bits on the host. We hit this
when loading a migration from a host with a larger physical
address limit than our destination (e.g. a Xeon->i7 of same
generation) but previously used to get away with it
until 48e1a45 started checking that msr writes actually worked.
It seems in our case the GP probably comes from KVM emulating
that GP.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
target-i386: Allow physical address bits to be set
Currently QEMU sets the x86 number of physical address bits to the
magic number 40. This is only correct on some small AMD systems;
Intel systems tend to have 36, 39, 46 bits, and large AMD systems
tend to have 48.
Having the value different from your actual hardware is detectable
by the guest and in principal can cause problems;
The current limit of 40 stops TB VMs being created by those lucky
enough to have that much.
This patch lets you set the physical bits by a cpu property but
defaults to the same 40bits which matches TCGs setup.
I've removed the ancient warning about the 42 bit limit in exec.c;
I can't find that limit in there and no one else seems to know where
it is.
We use a magic value of 0 as the property default so that we can
later distinguish between the default and a user set value.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* remotes/pmaydell/tags/pull-target-arm-20160719:
arm_gicv3: Add assert()s to tell Coverity that offsets are aligned
target-arm: Fix unreachable code in gicv3_class_name()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/riku/tags/pull-linux-user-20160719-2:
linux-user: AArch64 has sync_file_range, not sync_file_range2
linux-user: Fix type for SIOCATMARK ioctl
linux-user: define missing sparc syscalls
linux-user: Fix terminal control ioctls
linux-user: Add some new blk ioctls
linux-user: Handle short lengths in host_to_target_sockaddr()
linux-user: Forget about synchronous signal once it is delivered
linux-user: Correct type for LOOP_GET_STATUS{,64} ioctls
linux-user: Correct type for BLKSSZGET
linux-user: Add loop control ioctls
linux-user: Check sigsetsize argument to syscalls
linux-user: add nested netlink types
linux-user: convert sockaddr_ll from host to target
linux-user: add fd_trans helper in do_recvfrom()
linux-user: fix netlink memory corruption
linux-user: fd_trans_*_data() returns the length
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 11 Jul 2016 18:22:52 +0000 (19:22 +0100)]
arm_gicv3: Add assert()s to tell Coverity that offsets are aligned
Coverity complains that the GICR_IPRIORITYR case in gicv3_readl()
can overflow an array, because it doesn't know that the offsets
passed to that function must be word aligned. Add some assert()s
which hopefully tell Coverity that this isn't possible.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1468261372-17508-1-git-send-email-peter.maydell@linaro.org
Peter Maydell [Mon, 11 Jul 2016 18:09:12 +0000 (19:09 +0100)]
target-arm: Fix unreachable code in gicv3_class_name()
Coverity complains that the exit() in gicv3_class_name()
can be unreachable, because if TARGET_AARCH64 is defined
then all code paths return before reaching it. Move the
exit() up to the error_report() that it belongs with.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1468260552-8400-1-git-send-email-peter.maydell@linaro.org
Peter Maydell [Tue, 19 Jul 2016 14:04:36 +0000 (15:04 +0100)]
disas: Fix ATTRIBUTE_UNUSED define clash with ALSA headers
disas/bfd.h defines ATTRIBUTE_UNUSED, but unfortunately the
ALSA system headers also define this macro, which means that
you can get a compilation failure if building with ALSA and
any files happen to include the alsa headers before bfd.h
rather than the other way around.
This is unfortunate namespace pollution by the ALSA headers but
we can work around it. Add an #ifndef guard to bfd.h and remove
the unnecessary extra definition in disas/arm.c to fix this.
Reported-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468937076-21503-1-git-send-email-peter.maydell@linaro.org
Peter Maydell [Tue, 19 Jul 2016 14:08:05 +0000 (15:08 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* two old patches from prospective GSoC students
* i386 -kernel device tree support
* Coverity fix
* memory usage improvement from Peter
* checkpatch fix
* g_path_get_dirname cleanup
* caching of block status for iSCSI
* remotes/bonzini/tags/for-upstream:
target-i386: Remove redundant HF_SOFTMMU_MASK
block/iscsi: allow caching of the allocation map
block/iscsi: fix rounding in iscsi_allocationmap_set
Move README to markdown
cpu-exec: Move down some declarations in cpu_exec()
exec: avoid realloc in phys_map_node_reserve
checkpatch: consider git extended headers valid patches
megasas: remove useless check for cmd->frame
compiler: never omit assertions if using a static analysis tool
hw/i386: add device tree support
Changed malloc to g_malloc, free to g_free in bsd-user/qemu.h
use g_path_get_dirname instead of dirname
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 15 Jul 2016 16:28:06 +0000 (17:28 +0100)]
linux-user: AArch64 has sync_file_range, not sync_file_range2
The AArch64 Linux ABI syscall 84 is sync_file_range, not
sync_file_range2 (in the kernel it uses the asm-generic
headers and does not define __ARCH_WANT_SYNC_FILE_RANGE2).
Update our TARGET_NR_* definitions accordingly.
This fixes the sync_file_range syscall which otherwise
gets its arguments in the wrong order.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Fri, 15 Jul 2016 11:09:31 +0000 (12:09 +0100)]
linux-user: Fix type for SIOCATMARK ioctl
The SIOCATMARK ioctl takes an argument which should be a
pointer to an integer where the kernel will write the result.
We were incorrectly declaring it as TYPE_NULL which would mean
it would always fail (with EFAULT) when it should succeed.
Correct the type.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Timothy Pearson [Sun, 19 Jun 2016 00:15:35 +0000 (19:15 -0500)]
linux-user: Fix terminal control ioctls
TIOCGPTN and related terminal control ioctls were not converted to the guest ioctl format on x86_64 targets. Convert these ioctls to enable terminal functionality on x86_64 guests.
Peter Maydell [Thu, 7 Jul 2016 14:44:43 +0000 (15:44 +0100)]
linux-user: Handle short lengths in host_to_target_sockaddr()
If userspace specifies a short buffer for a target sockaddr,
the kernel will only copy in as much as it has space for
(or none at all if the length is zero) -- see the kernel
move_addr_to_user() function. Mimic this in QEMU's
host_to_target_sockaddr() routine.
In particular, this fixes a segfault running the LTP
recvfrom01 test, where the guest makes a recvfrom()
call with a bad buffer pointer and other parameters which
cause the kernel to set the addrlen to zero; because we
did not skip the attempt to swap the sa_family field we
segfaulted on the bad address.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Wed, 6 Jul 2016 14:09:29 +0000 (15:09 +0100)]
linux-user: Forget about synchronous signal once it is delivered
Commit 655ed67c2a248cf which switched synchronous signals to
benig recorded in ts->sync_signal rather than in a queue
with every other signal had a bug: we failed to clear
the flag indicating that a synchronous signal was pending
when we delivered it. This meant that we would take the signal
again and again every time the guest made a syscall.
(This is a bug introduced in my refactoring of Timothy Baldwin's
original code.)
Fix this by passing in the struct emulated_sigtable* to
handle_pending_signal(), so that we clear the pending flag
in the ts->sync_signal struct when handling a synchronous signal.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Tue, 5 Jul 2016 15:36:29 +0000 (16:36 +0100)]
linux-user: Correct type for LOOP_GET_STATUS{,64} ioctls
The LOOP_GET_STATUS and LOOP_GET_STATUS64 ioctls were incorrectly
defined as IOC_W rather than IOC_R, which meant we weren't
correctly copying the information back from the kernel to the guest.
The loop_info64 structure definition was also missing a member
and using the wrong type for several 32-bit fields.
In particular, this meant that "kpartx -d image.img" didn't work
and "losetup -a" behaved strangely. Correct the ioctl type definitions.
Reported-by: Chanho Park <chanho61.park@samsung.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Mon, 4 Jul 2016 16:06:18 +0000 (17:06 +0100)]
linux-user: Correct type for BLKSSZGET
The BLKSSZGET ioctl takes an argument which is a pointer to an int.
We were incorrectly declaring it to take a pointer to a long, which
meant that we would incorrectly write to memory which we should not
if the guest is a 64-bit architecture.
In particular, kpartx uses this ioctl to write to an int on the
stack, which tends to result in it crashing immediately.
Reported-by: Chanho Park <chanho61.park@samsung.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Mon, 4 Jul 2016 16:06:17 +0000 (17:06 +0100)]
linux-user: Add loop control ioctls
Add support for the /dev/loop-control ioctls:
LOOP_CTL_ADD
LOOP_CTL_REMOVE
LOOP_CTL_GET_FREE
[RV: fixed to apply to new header guards] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Peter Maydell [Thu, 30 Jun 2016 13:23:24 +0000 (14:23 +0100)]
linux-user: Check sigsetsize argument to syscalls
Many syscalls which take a sigset_t argument also take an argument
giving the size of the sigset_t. The kernel insists that this
matches its idea of the type size and fails EINVAL if it is not.
Implement this logic in QEMU. (This mostly just means some LTP test
cases which check error cases now pass.)
Laurent Vivier [Mon, 27 Jun 2016 16:54:30 +0000 (18:54 +0200)]
linux-user: add nested netlink types
Nested types are used by the kernel to send link information and
protocol properties.
We can see following errors with "ip link show":
Unimplemented nested type 26
Unimplemented nested type 26
Unimplemented nested type 18
Unimplemented nested type 26
Unimplemented nested type 18
Unimplemented nested type 26
This patch implements nested types 18 (IFLA_LINKINFO) and
26 (IFLA_AF_SPEC).
Laurent Vivier [Tue, 21 Jun 2016 17:51:14 +0000 (19:51 +0200)]
linux-user: fix netlink memory corruption
Netlink is byte-swapping data in the guest memory (it's bad).
It's ok when the data come from the host as they are generated by the
host.
But it doesn't work when data come from the guest: the guest can
try to reuse these data whereas they have been byte-swapped.
This is what happens in glibc:
glibc generates a sequence number in nlh.nlmsg_seq and calls
sendto() with this nlh. In sendto(), we byte-swap nlmsg.seq.
Later, after the recvmsg(), glibc compares nlh.nlmsg_seq with
sequence number given in return, and of course it fails (hangs),
because nlh.nlmsg_seq is not valid anymore.
The involved code in glibc is:
sysdeps/unix/sysv/linux/check_pf.c:make_request()
...
req.nlh.nlmsg_seq = time (NULL);
...
if (TEMP_FAILURE_RETRY (__sendto (fd, (void *) &req, sizeof (req), 0,
(struct sockaddr *) &nladdr,
sizeof (nladdr))) < 0)
<here req.nlh.nlmsg_seq has been byte-swapped>
...
do
{
...
ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
...
struct nlmsghdr *nlmh;
for (nlmh = (struct nlmsghdr *) buf;
NLMSG_OK (nlmh, (size_t) read_len);
nlmh = (struct nlmsghdr *) NLMSG_NEXT (nlmh, read_len))
{
<we compare nlmh->nlmsg_seq with corrupted req.nlh.nlmsg_seq>
if (nladdr.nl_pid != 0 || (pid_t) nlmh->nlmsg_pid != pid
|| nlmh->nlmsg_seq != req.nlh.nlmsg_seq)
continue;
...
else if (nlmh->nlmsg_type == NLMSG_DONE)
/* We found the end, leave the loop. */
done = true;
}
}
while (! done);
As we have a continue on "nlmh->nlmsg_seq != req.nlh.nlmsg_seq",
"done" cannot be set to "true" and we have an infinite loop.
It's why commands like "apt-get update" or "dnf update hangs".
Peter Maydell [Tue, 19 Jul 2016 12:00:35 +0000 (13:00 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Tue 19 Jul 2016 03:33:40 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
e1000e: fix building without CONFIG_VMXNET3_PCI
MAINTAINERS: release Scott from being a rocker maintainer
tap: fix memory leak on failure to create a multiqueue tap device
net: fix incorrect argument to iov_to_buf
net: fix incorrect access to pointer
e1000e: fix incorrect access to pointer
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/jnsnow/tags/ide-pull-request:
block: ignore flush requests when storage is clean
tests: in IDE and AHCI tests perform DMA write before flushing
ide: set retry_unit for PIO and FLUSH requests
ide: refactor retry_unit set and clear into separate function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/stefanha/tags/tracing-pull-request:
trace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state
trace: Allow event name pattern in "info trace-events"
trace: Conditionally trace events based on their per-vCPU state
trace: Add per-vCPU tracing states for events with the 'vcpu' property
trace: Cosmetic changes on fast-path tracing
disas: Remove unused macro '_'
trace: Identify events with the 'vcpu' property
trace: [bsd-user] Commandline arguments to control tracing
trace: [linux-user] Commandline arguments to control tracing
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 19 Jul 2016 08:02:05 +0000 (09:02 +0100)]
Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160718.0' into staging
VFIO update 2016-07-18
One fix for 2.7-rc0 which hides the ARI extended capability, fixing
multifunction support in PCIe configurations where the assigned device
function topology does not match the host (Alex Williamson)
Peter Lieven [Mon, 18 Jul 2016 08:52:20 +0000 (10:52 +0200)]
block/iscsi: allow caching of the allocation map
until now the allocation map was used only as a hint if a cluster
is allocated or not. If a block was not allocated (or Qemu had
no info about the allocation status) a get_block_status call was
issued to check the allocation status and possibly avoid
a subsequent read of unallocated sectors. If a block known to be
allocated the get_block_status call was omitted. In the other case
a get_block_status call was issued before every read to avoid
the necessity for a consistent allocation map. To avoid the
potential overhead of calling get_block_status for each and
every read request this took only place for the bigger requests.
This patch enhances this mechanism to cache the allocation
status and avoid calling get_block_status for blocks where
the allocation status has been queried before. This allows
for bypassing the read request even for smaller requests and
additionally omits calling get_block_status for known to be
unallocated blocks.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the README file to markdown so that it makes the github page look
prettier. I know that github repo is a mirror and not the official
repo, but I think it doesn't hurt to have it in markdown format.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160715043111.29007-1-bobby.prani@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
block: ignore flush requests when storage is clean
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.
This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.
The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.
This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).
This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
tests: in IDE and AHCI tests perform DMA write before flushing
Due to changes in flush behaviour clean disks stopped generating
flush_to_disk events and IDE and AHCI tests that test flush commands
started to fail.
This change adds additional DMA writes to affected tests before sending
flush commands so that bdrv_flush actually generates flush_to_disk event.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-4-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
The following sequence of tests discovered a problem in IDE emulation:
1. Send DMA write to IDE device 0
2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block
layer using blkdebug script in tests/ide-test:test_retry_flush
When doing DMA request ide/core.c will set s->retry_unit to s->unit in
ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
After that ide_flush_cache runs and fails thanks to blkdebug.
ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
== s->unit. But s->retry_unit is still -1 after previous DMA completion
and flush does not use anything related to retry.
This patch restricts retry unit assertion only to ops that actually use
retry logic.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-3-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
ide: refactor retry_unit set and clear into separate function
Code to set and clear state associated with retry in moved into
ide_set_retry and ide_clear_retry to make adding retry setups easier.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-2-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
trace: Conditionally trace events based on their per-vCPU state
Events with the 'vcpu' property are conditionally emitted according to
their per-vCPU state. Other events are emitted normally based on their
global tracing state.
Note that the per-vCPU condition check applies to all tracing backends.
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eliminates a future compilation error when UI code includes the tracing
headers (indirectly pulling "disas/bfd.h" through "qom/cpu.h") and
GLib's i18n '_' macro.
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
trace: [bsd-user] Commandline arguments to control tracing
[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed. This type is also
returned by trace_opt_parse() and used in vl.c.
Also fixed coding style on for(;;) and else statement as suggested by
Eric Blake <eblake@redhat.com> since the patch modifies these lines or
close enough.
--Stefan]
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860252322.30668.18276041739086338328.stgit@fimbulvetr.bsc.es Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
trace: [linux-user] Commandline arguments to control tracing
[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed. This type is also
returned by trace_opt_parse() and used in vl.c.
--Stefan]
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860251784.30668.17339867835129075077.stgit@fimbulvetr.bsc.es Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Alex Williamson [Mon, 18 Jul 2016 16:55:17 +0000 (10:55 -0600)]
vfio/pci: Hide ARI capability
QEMU supports ARI on downstream ports and assigned devices may support
ARI in their extended capabilities. The endpoint ARI capability
specifies the next function, such that the OS doesn't need to walk
each possible function, however this next function is relative to the
host, not the guest. This leads to device discovery issues when we
combine separate functions into virtual multi-function packages in a
guest. For example, SR-IOV VFs are not enumerated by simply probing
the function address space, therefore the ARI next-function field is
zero. When we combine multiple VFs together as a multi-function
device in the guest, the guest OS identifies ARI is enabled, relies on
this next-function field, and stops looking for additional function
after the first is found.
Long term we should expose the ARI capability to the guest to enable
configurations with more than 8 functions per slot, but this requires
additional QEMU PCI infrastructure to manage the next-function field
for multiple, otherwise independent devices. In the short term,
hiding this capability allows equivalent functionality to what we
currently have on non-express chipsets.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Pranith Kumar [Mon, 27 Jun 2016 18:13:22 +0000 (14:13 -0400)]
.travis.yml: Disable IRC build status updates from forks
We want the travis build bot to post notifications on IRC only for the
master qemu repository and not the various forks/branches of
others. Currently there is no direct option to restrict the updates to
one repository. This is being worked upon by the developers and
tracked in https://github.com/travis-ci/travis-ci/issues/1094.
Until such time, we can use the workaround as posted in
ref. https://github.com/facebook/flow/pull/1822.
This basically creates an ecrypted string which decrypts to qemu IRC
channel only on "qemu/qemu" repo and not on the forks. This enables
the build bot to notify the IRC only for the main repo.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cao jin [Mon, 18 Jul 2016 04:05:49 +0000 (12:05 +0800)]
virtio-blk: dataplane cleanup
No need duplicate the judgment, there is one in function entry.
Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468814749-14510-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Renames look like this with git-diff(1) when diff.renames = true is set:
diff --git a/a b/b
similarity index 100%
rename from a
rename to b
This raises the "Does not appear to be a unified-diff format patch"
error because checkpatch.pl only considers a diff valid if it contains
at least one "@@" hunk.
This patch accepts renames and copies too so that checkpatch.pl exits
successfully when a diff only renames/copies files. The git diff
extended header format is described on the git-diff(1) man page.
Reported-by: Colin Lord <clord@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468576014-28788-1-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cao jin [Fri, 15 Jul 2016 10:28:44 +0000 (18:28 +0800)]
aio-posix: remove useless parameter
Parameter **errp of aio_context_setup() is useless, remove it
and clean up the related code.
Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468578524-23433-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Roman Pen [Wed, 13 Jul 2016 13:03:24 +0000 (15:03 +0200)]
linux-aio: prevent submitting more than MAX_EVENTS
Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us
with specified number of events. But kernel ring buffer allocation logic
is a bit tricky (ring buffer is page size aligned + some percpu allocation
are required) so eventually more than requested events number is allocated.
From a userspace side we have to follow the convention and should not try
to io_submit() more or logic, which consumes completed events, should be
changed accordingly. The pitfall is in the following sequence:
MAX_EVENTS = 128
io_setup(MAX_EVENTS)
io_submit(MAX_EVENTS)
io_submit(MAX_EVENTS)
/* now 256 events are in-flight */
io_getevents(MAX_EVENTS) = 128
/* we can handle only 128 events at once, to be sure
* that nothing is pended the io_getevents(MAX_EVENTS)
* call must be invoked once more or hang will happen. */
To prevent the hang or reiteration of io_getevents() call this patch
restricts the number of in-flights, which is now limited to MAX_EVENTS.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cao jin [Thu, 14 Jul 2016 13:10:43 +0000 (21:10 +0800)]
aio_ctx_check: follow CODING_STYLE
replace tab with spaces
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-id: 1468501843-14927-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini [Mon, 4 Jul 2016 16:33:20 +0000 (18:33 +0200)]
linux-aio: share one LinuxAioState within an AioContext
This has better performance because it executes fewer system calls
and does not use a bottom half per disk.
Originally proposed by Ming Lei.
[Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix
build error as reported by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]
Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
squash! linux-aio: share one LinuxAioState within an AioContext
We have only one flag for now - Empty Image flag. The patch fixes unused
bits specification and marks bit 1 as usused.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Peter Maydell [Mon, 18 Jul 2016 10:24:15 +0000 (11:24 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160718' into staging
ppc patch queue 2016-07-18
Here's what ought to be the final ppc pull request before the 2.7 hard
freeze. This set contains a rework of the DBDMA device for Mac
platforms, and some assorted cleanups and bugfixes.
# gpg: Signature made Mon 18 Jul 2016 05:35:27 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160718:
ppc: Yet another fix for the huge page support detection mechanism
target-ppc: fix left shift overflow in hpte_page_shift
ppc/mmu-hash64: Remove duplicated #include statement
ppc: abort if compat property contains an unknown value
spapr: Ensure CPU cores are added contiguously and removed in LIFO order
vfio/spapr: Remove stale ioctl() call
ppc: Fix support for odd MSR combinations
dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
dbdma: fix load_word/store_word value endianness
dbdma: fix endian of DBDMA_CMDPTR_LO during branch
dbdma: add per-channel debugging enabled via DEBUG_DBDMA_CHANMASK
dbdma: always define DBDMA_DPRINTF and enable debug with DEBUG_DBDMA
spapr: fix core unplug crash
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Jason Wang [Tue, 12 Jul 2016 08:28:23 +0000 (16:28 +0800)]
e1000e: fix building without CONFIG_VMXNET3_PCI
e1000e needs net_tx_pkt.o and net_rx_pkt.o too.
Cc: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Cc: Leonid Bloch <leonid.bloch@ravellosystems.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Thomas Huth [Fri, 15 Jul 2016 08:10:25 +0000 (10:10 +0200)]
ppc: Yet another fix for the huge page support detection mechanism
Commit 86b50f2e1bef ("Disable huge page support if it is not available
for main RAM") already made sure that huge page support is not announced
to the guest if the normal RAM of non-NUMA configurations is not backed
by a huge page filesystem. However, there is one more case that can go
wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
with huge page support (and only the memory of a DIMM is configured with
it). When QEMU is started with the following command line for example,
the Linux guest currently crashes because it is trying to use huge pages
on a memory region that does not support huge pages:
To fix this issue, we've got to make sure to disable huge page support,
too, when there is a NUMA node that is not using a memory backend with
huge page support.
Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Greg Kurz [Wed, 13 Jul 2016 10:00:17 +0000 (12:00 +0200)]
ppc: abort if compat property contains an unknown value
It is not possible to set the compat property to an unknown value with
powerpc_set_compat(). Something must have gone terribly wrong in QEMU,
if we detect an "Internal error" in powerpc_get_compat(). Let's abort then.
This patch also drops the "max_compat ? *max_compat : -1" construct. It is
useless since max_compat is dereferenced a few lines above.
Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr: Ensure CPU cores are added contiguously and removed in LIFO order
If CPU core addition or removal is allowed in random order leading to
holes in the core id range (and hence in the cpu_index range), migration
can fail as migration with holes in cpu_index range isn't yet handled
correctly.
Prevent this situation by enforcing the addition in contiguous order
and removal in LIFO order so that we never end up with holes in
cpu_index range.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
David Gibson [Tue, 12 Jul 2016 06:54:03 +0000 (16:54 +1000)]
vfio/spapr: Remove stale ioctl() call
This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an
earlier version of the code and has since been folded into
vfio_spapr_remove_window().
It wasn't caught because although the argument structure has been removed,
the libc function remove() means this didn't trigger a compile failure.
The ioctl() was also almost certain to fail silently and harmlessly with
the bogus argument, so this wasn't caught in testing.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
MacOS uses an architecturally illegal MSR combination that
seems nonetheless supported by 32-bit processors, which is
to have MSR[PR]=1 and one or more of MSR[DR/IR/EE]=0.
This adds support for it. To work properly we need to also
properly include support for PR=1,{I,D}R=0 to the MMU index
used by the qemu TLB.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:58 +0000 (19:08 +0100)]
dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
Otherwise MacOS 9 hangs upon shutdown.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mark Cave-Ayland [Sun, 10 Jul 2016 18:08:57 +0000 (19:08 +0100)]
dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
This fixes MacOS 9 whereby it continually flushes and polls the status bits
until they are set to indicate a successful flush.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>