Wei Yang [Mon, 15 Jul 2019 02:05:49 +0000 (10:05 +0800)]
migration/postcopy: fix document of postcopy_send_discard_bm_ram()
Commit 6b6712efccd3 ('ram: Split dirty bitmap by RAMBlock') changes the
parameter of postcopy_send_discard_bm_ram(), while left the document
part untouched.
This patch correct the document and fix two typo by hand.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190715020549.15018-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peng Tao [Fri, 14 Jun 2019 06:35:13 +0000 (14:35 +0800)]
migration: allow private destination ram with x-ignore-shared
By removing the share ram check, qemu is able to migrate
to private destination ram when x-ignore-shared capability
is on. Then we can create multiple destination VMs based
on the same source VM.
This changes the x-ignore-shared migration capability to
work similar to Lai's original bypass-shared-memory
work(https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg00003.html)
which enables kata containers (https://katacontainers.io)
to implement the VM templating feature.
An example usage in kata containers(https://katacontainers.io):
1. Start the source VM:
qemu-system-x86 -m 2G \
-object memory-backend-file,id=mem0,size=2G,share=on,mem-path=/tmpfs/template-memory \
-numa node,memdev=mem0
2. Stop the template VM, set migration x-ignore-shared capability,
migrate "exec:cat>/tmpfs/state", quit it
3. Start target VM:
qemu-system-x86 -m 2G \
-object memory-backend-file,id=mem0,size=2G,share=off,mem-path=/tmpfs/template-memory \
-numa node,memdev=mem0 \
-incoming defer
4. connect to target VM qmp, set migration x-ignore-shared capability,
migrate_incoming "exec:cat /tmpfs/state"
5. create more target VMs repeating 3 and 4
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Yury Kotov <yury-kotov@yandex-team.ru> Cc: Jiangshan Lai <laijs@hyper.sh> Cc: Xu Wang <xu@hyper.sh> Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1560494113-1141-1-git-send-email-tao.peng@linux.alibaba.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:56 +0000 (14:50 +0800)]
migration: Split log_clear() into smaller chunks
Currently we are doing log_clear() right after log_sync() which mostly
keeps the old behavior when log_clear() was still part of log_sync().
This patch tries to further optimize the migration log_clear() code
path to split huge log_clear()s into smaller chunks.
We do this by spliting the whole guest memory region into memory
chunks, whose size is decided by MigrationState.clear_bitmap_shift (an
example will be given below). With that, we don't do the dirty bitmap
clear operation on the remote node (e.g., KVM) when we fetch the dirty
bitmap, instead we explicitly clear the dirty bitmap for the memory
chunk for each of the first time we send a page in that chunk.
Here comes an example.
Assuming the guest has 64G memory, then before this patch the KVM
ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory.
If after the patch, let's assume when the clear bitmap shift is 18,
then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB. Then
instead of sending a big 64G ioctl, we'll send 64 small ioctls, each
of the ioctl will cover 1G of the guest memory. For each of the 64
small ioctls, we'll only send if any of the page in that small chunk
was going to be sent right away.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190603065056.25211-12-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:55 +0000 (14:50 +0800)]
kvm: Support KVM_CLEAR_DIRTY_LOG
Firstly detect the interface using KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
and mark it. When failed to enable the new feature we'll fall back to
the old sync.
Provide the log_clear() hook for the memory listeners for both address
spaces of KVM (normal system memory, and SMM) and deliever the clear
message to kernel.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-11-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:54 +0000 (14:50 +0800)]
kvm: Introduce slots lock for memory listener
Introduce KVMMemoryListener.slots_lock to protect the slots inside the
kvm memory listener. Currently it is close to useless because all the
KVM code path now is always protected by the BQL. But it'll start to
make sense in follow up patches where we might do remote dirty bitmap
clear and also we'll update the per-slot cached dirty bitmap even
without the BQL. So let's prepare for it.
We can also use per-slot lock for above reason but it seems to be an
overkill. Let's just use this bigger one (which covers all the slots
of a single address space) but anyway this lock is still much smaller
than the BQL.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-10-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:53 +0000 (14:50 +0800)]
kvm: Persistent per kvmslot dirty bitmap
When synchronizing dirty bitmap from kernel KVM we do it in a
per-kvmslot fashion and we allocate the userspace bitmap for each of
the ioctl. This patch instead make the bitmap cache be persistent
then we don't need to g_malloc0() every time.
More importantly, the cached per-kvmslot dirty bitmap will be further
used when we want to add support for the KVM_CLEAR_DIRTY_LOG and this
cached bitmap will be used to guarantee we won't clear any unknown
dirty bits otherwise that can be a severe data loss issue for
migration code.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190603065056.25211-9-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:52 +0000 (14:50 +0800)]
kvm: Update comments for sync_dirty_bitmap
It's obviously obsolete. Do some update.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190603065056.25211-8-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Introduce a new memory region listener hook log_clear() to allow the
listeners to hook onto the points where the dirty bitmap is cleared by
the bitmap users.
Let's take KVM as example - log_sync() for KVM will first copy the
kernel dirty bitmap to userspace, and at the same time we'll clear the
dirty bitmap there along with re-protecting all the guest pages again.
We add this new log_clear() interface only to split the old log_sync()
into two separated procedures:
- use log_sync() to collect the collection only, and,
- use log_clear() to clear the remote dirty bitmap.
With the new interface, the memory listener users will still be able
to decide how to implement the log synchronization procedure, e.g.,
they can still only provide log_sync() method only and put all the two
procedures within log_sync() (that's how the old KVM works before
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is introduced). However with this
new interface the memory listener users will start to have a chance to
postpone the log clear operation explicitly if the module supports.
That can really benefit users like KVM at least for host kernels that
support KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2.
There are three places that can clear dirty bits in any one of the
dirty bitmap in the ram_list.dirty_memory[3] array:
Currently we hook directly into each of the functions to notify about
the log_clear().
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-7-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:50 +0000 (14:50 +0800)]
memory: Pass mr into snapshot_and_clear_dirty
Also we change the 2nd parameter of it to be the relative offset
within the memory region. This is to be used in follow up patches.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190603065056.25211-6-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:49 +0000 (14:50 +0800)]
bitmap: Add bitmap_copy_with_{src|dst}_offset()
These helpers copy the source bitmap to destination bitmap with a
shift either on the src or dst bitmap.
Meanwhile, we never have bitmap tests but we should.
This patch also introduces the initial test cases for utils/bitmap.c
but it only tests the newly introduced functions.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190603065056.25211-5-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
---
Bitmap test used sizeof(unsigned long) instead of BITS_PER_LONG.
Peter Xu [Mon, 3 Jun 2019 06:50:48 +0000 (14:50 +0800)]
memory: Don't set migration bitmap when without migration
Similar to 9460dee4b2 ("memory: do not touch code dirty bitmap unless
TCG is enabled", 2015-06-05) but for the migration bitmap - we can
skip the MIGRATION bitmap update if migration not enabled.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-4-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter Xu [Mon, 3 Jun 2019 06:50:46 +0000 (14:50 +0800)]
migration: No need to take rcu during sync_dirty_bitmap
cpu_physical_memory_sync_dirty_bitmap() has one RAMBlock* as
parameter, which means that it must be with RCU read lock held
already. Taking it again inside seems redundant. Removing it.
Instead comment on the functions about the RCU read lock.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-2-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Wei Yang [Wed, 5 Jun 2019 01:08:28 +0000 (09:08 +0800)]
migration/ram.c: reset complete_round when we gets a queued page
In case we gets a queued page, the order of block is interrupted. We may
not rely on the complete_round flag to say we have already searched the
whole blocks on the list.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190605010828.6969-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Wei Yang [Tue, 4 Jun 2019 02:35:40 +0000 (10:35 +0800)]
migration/multifd: sync packet_num after all thread are done
Notification from recv thread is not ordered, which means we may be
notified by one MultiFDRecvParams but adjust packet_num for another.
Move the adjustment after we are sure each recv thread are sync-ed.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190604023540.26532-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Wei Yang [Mon, 10 Jun 2019 03:08:51 +0000 (11:08 +0800)]
cutils: remove one unnecessary pointer operation
Since we will not operate on the next address pointed by out, it is not
necessary to do addition on it.
After removing the operation, the function size reduced 16/18 bytes.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190610030852.16039-2-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Wei Yang [Mon, 10 Jun 2019 00:41:59 +0000 (08:41 +0800)]
migration/xbzrle: update cache and current_data in one place
When we are not in the last_stage, we need to update the cache if page
is not the same.
Currently this procedure is scattered in two places and mixed with
encoding status check.
This patch extract this general step out to make the code a little bit
easy to read.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190610004159.20966-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Wei Yang [Wed, 12 Jun 2019 01:43:37 +0000 (09:43 +0800)]
migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS
On receiving RAM_SAVE_FLAG_EOS, multifd_recv_sync_main() is called to
synchronize receive threads. Current synchronization mechanism is to wait
for each channel's sem_sync semaphore. This semaphore is triggered by a
packet with MULTIFD_FLAG_SYNC flag. While in current implementation, we
don't do multifd_send_sync_main() to send such packet when
blk_mig_bulk_active() is true.
This will leads to the receive threads won't notify
multifd_recv_sync_main() by sem_sync. And multifd_recv_sync_main() will
always wait there.
[Note]: normal migration test works, while didn't test the
blk_mig_bulk_active() case. Since not sure how to produce this
situation.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190612014337.11255-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Juan Quintela [Wed, 3 Apr 2019 08:54:31 +0000 (10:54 +0200)]
migration-test: rename parameter to parameter_int
We would need _str ones on the next patch.
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
Juan Quintela [Wed, 3 Apr 2019 10:14:31 +0000 (12:14 +0200)]
migration: fix multifd_recv event typo
It uses num in multifd_send(). Make it coherent.
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
In hmp_change(), the variable hmp_mon is only used
by code under #ifdef CONFIG_VNC. This results in a build
error when VNC is configured out with the default of
treating warnings as errors:
monitor/hmp-cmds.c: In function ‘hmp_change’:
monitor/hmp-cmds.c:1946:17: error: unused variable ‘hmp_mon’ [-Werror=unused-variable]
1946 | MonitorHMP *hmp_mon = container_of(mon, MonitorHMP, common);
| ^~~~~~~
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Message-Id: <20190625123905.25434-1-dinechin@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Turn helper_retaddr into a multi-state flag that may now also
indicate when we're performing a read on behalf of the translator.
In this case, release the mmap_lock before the longjmp back to
the main cpu loop, and thereby avoid a failing assert therein.
Fixes: https://bugs.launchpad.net/qemu/+bug/1832353 Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
At present we have a potential error in that helper_retaddr contains
data for handle_cpu_signal, but we have not ensured that those stores
will be scheduled properly before the operation that may fault.
It might be that these races are not in practice observable, due to
our use of -fno-strict-aliasing, but better safe than sorry.
Adjust all of the setters of helper_retaddr.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This patch fixes two problems:
(1) The inputs to the EXTR insn were reversed,
(2) The input constraints use rZ, which means that we need to use
the REG0 macro in order to supply XZR for a constant 0 input.
Fixes: 464c2969d5d Reported-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Fri, 12 Jul 2019 16:34:13 +0000 (17:34 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pc, pci: fixes, cleanups, tests
A bunch of fixes all over the place.
ACPI tests will now run on more systems: might
introduce new failure reports but that's for
the best, isn't it?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 12 Jul 2019 15:57:40 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio pmem: remove transitional names
virtio pmem: remove memdev null check
virtio pmem: fix wrong mem region condition
tests: acpi: do not skip tests when IASL is not installed
tests: acpi: do not require IASL for dumping AML blobs
virtio-balloon: fix QEMU 4.0 config size migration incompatibility
pcie: consistent names for function args
xio3130_downstream: typo fix
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Remove transitional & non transitional names for virtio pmem.
Only virtio 1.0 and up is supported.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <20190712073554.21918-4-pagupta@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Coverity reports that when we're assigning vi->size we handle the
"pmem->memdev is NULL" case; but we then pass it into
object_get_canonical_path(), which unconditionally dereferences it
and will crash if it is NULL. If this pointer can be NULL then we
need to do something else here.
We are removing 'pmem->memdev' null check here as memdev will never
be null in this function.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <20190712073554.21918-3-pagupta@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Igor Mammedov [Mon, 8 Jul 2019 09:24:10 +0000 (05:24 -0400)]
tests: acpi: do not skip tests when IASL is not installed
tests do binary comparision so we can check tables without
IASL. Move IASL condition right before decompilation step
and skip it if IASL is not installed.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190708092410.11167-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Igor Mammedov [Mon, 8 Jul 2019 09:24:09 +0000 (05:24 -0400)]
tests: acpi: do not require IASL for dumping AML blobs
IASL isn't needed when dumping ACPI tables from guest for
rebuild purposes. So move this part out from IASL branch.
Makes rebuild-expected-aml.sh work without IASL installed
on host.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20190708092410.11167-2-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The virtio-balloon config size changed in QEMU 4.0 even for existing
machine types. Migration from QEMU 3.1 to 4.0 can fail in some
circumstances with the following error:
This happens because the virtio-balloon config size affects the VIRTIO
Legacy I/O Memory PCI BAR size.
Introduce a qdev property called "qemu-4-0-config-size" and enable it
only for the QEMU 4.0 machine types. This way <4.0 machine types use
the old size, 4.0 uses the larger size, and >4.0 machine types use the
appropriate size depending on enabled virtio-balloon features.
Live migration to and from old QEMUs to QEMU 4.1 works again as long as
a versioned machine type is specified (do not use just "pc"!).
Originally-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190710141440.27635-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The function declarations for pci_cap_slot_get and
pci_cap_slot_write_config call the argument "slot_ctl", but the function
definitions and all the call sites drop the 'o' and call it "slt_ctl".
Let's be consistent.
Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
slt ctl/status are passed in incorrect order.
Fix this up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
file-posix: Use max transfer length/segment count only for SCSI passthrough
Regular kernel block devices (/dev/sda*, /dev/nvme*, etc) don't have
max segment size/max segment count hardware requirements exposed
to the userspace, but rather the kernel block layer
takes care to split the incoming requests that
violate these requirements.
Allowing the kernel to do the splitting allows qemu to avoid
various overheads that arise otherwise from this.
This is especially visible in nbd server,
exposing as a raw file, a mostly empty qcow2 image over the net.
In this case most of the reads by the remote user
won't even hit the underlying kernel block device,
and therefore most of the overhead will be in the
nbd traffic which increases significantly with lower max transfer size.
In addition to that even for local block device
access the peformance improves a bit due to less
traffic between qemu and the kernel when large
transfer sizes are used (e.g for image conversion)
More info can be found at:
https://bugzilla.redhat.com/show_bug.cgi?id=1647104
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Mon, 8 Jul 2019 18:47:03 +0000 (13:47 -0500)]
iotests: Update 082 expected output
A recent tweak to the '-o help' output for qemu-img needs to be
reflected into the iotests expected outputs.
Fixes: f7077c98 Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Maydell [Fri, 12 Jul 2019 10:06:48 +0000 (11:06 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190712' into staging
ppc patch queue for 2019-07-12
First 4.1 hard freeze pull request. Not much here, just a bug fix for
the XICS interrupt controller and a SLOF firmware update to fix a bug
with IP discovery when there are multiple NICs.
Greg Kurz [Wed, 3 Jul 2019 17:22:20 +0000 (19:22 +0200)]
xics/kvm: Always set the MASKED bit if interrupt is masked
The ics_set_kvm_state_one() function is called either to restore the
state of an interrupt source during migration or to set the interrupt
source to a default state during reset.
Since always, ie. 2013, the code only sets the MASKED bit if the 'current
priority' and the 'saved priority' are different. This is likely true
when restoring an interrupt that had been previously masked with the
ibm,int-off RTAS call. However this is always false in the case of
reset since both 'current priority' and 'saved priority' are equal to
0xff, and the MASKED bit is never set.
The legacy KVM XICS device gets away with that because it ends updating
its internal structure the same way, whether the MASKED bit is set or
the priority is 0xff.
The XICS-on-XIVE device for POWER9 is different. It sticks to the KVM
documentation [1] and _really_ relies on the MASKED bit to correctly
set. If not, it will configure the interrupt source in the XIVE HW, even
though the guest hasn't configured the interrupt yet. This disturbs the
complex logic implemented in XICS-on-XIVE and may result in the loss of
subsequent queued events.
Always set the MASKED bit if interrupt is masked as expected by the KVM
XICS-on-XIVE device. This has no impact on the legacy KVM XICS.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156217454083.559957.7359208229523652842.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Peter Maydell [Thu, 11 Jul 2019 09:03:42 +0000 (10:03 +0100)]
Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-and-gdbstub-100719-1' into staging
Testing and gdbstub fixes:
- fix diff-out pass in check-tcg
- ensure generation of fprem reference
- fix gdb set_reg fallback
# gpg: Signature made Wed 10 Jul 2019 11:24:28 BST
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-testing-and-gdbstub-100719-1:
gdbstub: revert to previous set_reg behaviour
gdbstub: add some notes to the header comment
tests/tcg: fix diff-out pass to properly report failure
tests/tcg: fix up test-i386-fprem.ref generation
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
John Snow [Wed, 10 Jul 2019 19:08:07 +0000 (15:08 -0400)]
docs/bitmaps: use QMP lexer instead of json
The annotated style json we use in QMP documentation is not strict json
and depending on the version of Sphinx (2.0+) or Pygments installed,
might cause the build to fail.
Use the new QMP lexer.
Further, some versions of Sphinx can not apply custom lexers to "code"
directives and require the use of "code-block" directives instead, so
make that change at this time as well.
Tested under:
- Sphinx 1.3.6 and Pygments 2.4
- Sphinx 1.7.6 and Pygments 2.2 (Fedora 29 packages)
- Sphinx 2.0.1 and Pygments 2.4
- Sphinx 3.0.0+/f396b3a783 and Pygments 2.4 (From Sphinx git c4f44bdd)
Reported-by: Aarushi Mehta <mehta.aaru20@gmail.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 20190603214653.29369-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
John Snow [Wed, 10 Jul 2019 19:08:06 +0000 (15:08 -0400)]
sphinx: add qmp_lexer
Sphinx, through Pygments, does not like annotated json examples very
much. In some versions of Sphinx (1.7), it will render the non-json
portions of code blocks in red, but in newer versions (2.0) it will
throw an exception and not highlight the block at all. Though we can
suppress this warning, it doesn't bring back highlighting on non-strict
json blocks.
We can alleviate this by creating a custom lexer for QMP examples that
allows us to properly highlight these examples in a robust way, keeping
our directionality and elision notations.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reported-by: Aarushi Mehta <mehta.aaru20@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190603214653.29369-3-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
Alex Bennée [Fri, 5 Jul 2019 13:23:07 +0000 (14:23 +0100)]
gdbstub: revert to previous set_reg behaviour
The refactoring of handle_set_reg missed the fact we previously had
responded with an empty packet when we were not using XML based
protocols. This broke the fallback behaviour for architectures that
don't have registers defined in QEMU's gdb-xml directory.
Revert to the previous behaviour and clean up the commentary for what
is going on.
Fixes: 62b3320bddd Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Cc: Jon Doron <arilou@gmail.com>
Alex Bennée [Fri, 5 Jul 2019 12:28:19 +0000 (13:28 +0100)]
gdbstub: add some notes to the header comment
Add a link to the remote protocol spec and an SPDX tag.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Alex Bennée [Fri, 5 Jul 2019 11:56:35 +0000 (12:56 +0100)]
tests/tcg: fix diff-out pass to properly report failure
A side effect of piping the output to head is squash the exit status
of the diff command. Fix this by only doing the pipe if the diff
failed and then ensuring the status is non-zero.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Alex Bennée [Fri, 5 Jul 2019 10:48:02 +0000 (11:48 +0100)]
tests/tcg: fix up test-i386-fprem.ref generation
We never shipped the reference data in the source tree because it's
quite big (64M). As a result the only option is to generate it
locally. Although we have a rule to generate the reference file we
missed the dependency and location changes, probably because it's only
run for SLOW test runs.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Makefile: Fix "make clean" in "unconfigured" source directory
Recent commit "Makefile: Reuse all's recursion machinery for clean and
install" broke targets clean and distclean in the source directory
before running configure:
$ make clean
LD recurse-clean.mo
cc: fatal error: no input files
compilation terminated.
make: *** [rules.mak:118: recurse-clean.mo] Error 1
Root cause is missing .PHONY. Fix that.
Fixes: 1338a4b72659ce08eacb9de0205fe16202a22d9c Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Stephen Checkoway noticed commit 3ae0343db69 is incorrect.
This commit state all parallel flashes are limited to 16-bit
accesses, however the x32 configuration exists in some models,
such the Cypress S29CL032J, which CFI Device Geometry Definition
announces:
CFI ADDR DATA
0x28,0x29 = 0x0003 (x32-only asynchronous interface)
Guests should not be affected by the previous change, because
QEMU does not announce itself as x32 capable:
Commit 3ae0343db69 does not restrict the bus to 16-bit accesses,
but restrict the implementation as 16-bit access max, so a guest
32-bit access will result in 2x 16-bit calls.
Now, we have 2 boards that register the flash device in 32-bit
access:
- PPC: taihu_405ep
The CFI id matches the S29AL008J that is a 1MB in x16, while
the code QEMU forces it to be 2MB, and checking Linux it expects
a 4MB flash.
- ARM: Digic4
While the comment says "Samsung K8P3215UQB 64M Bit (4Mx16)",
this flash is 32Mb (2MB). Also note the CFI id does not match
the comment.
To avoid unexpected side effect, we revert commit 3ae0343db69,
and will clean the board code later.
Reported-by: Stephen Checkoway <stephen.checkoway@oberlin.edu> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* remotes/cohuck/tags/s390x-20190709:
s390x/tcg: move fallthrough annotation
s390: cpumodel: fix description for the new vector facility
s390x/cpumodel: Set up CPU model for AQIC interception
vfio-ccw: Test vfio_set_irq_signaling() return value
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reported-by: Stefan Weil <sw@weilnetz.de> Fixes: f180da83c039 ("s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO")
Message-Id: <20190708125433.16927-3-cohuck@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
s390: cpumodel: fix description for the new vector facility
The new facility is called "Vector-Packed-Decimal-Enhancement Facility"
and not "Vector BCD enhancements facility 1". As the shortname might
have already found its way into some backports, let's keep vxbeh.
Alistair Francis [Thu, 20 Jun 2019 14:04:18 +0000 (07:04 -0700)]
tcg/riscv: Fix RISC-VH host build failure
Commit 269bd5d8 "cpu: Move the softmmu tlb to CPUNegativeOffsetState'
broke the RISC-V host build as there are two variables that are used but
not defined.
This patch renames the undefined variables mask_off and table_off to the
existing (but unused) mask_ofs and table_ofs variables.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <79729cc88ca509e08b5c4aa0aa8a52847af70c0f.1561039316.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Mon, 8 Jul 2019 16:40:05 +0000 (17:40 +0100)]
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2019-07-08-1' into staging
Merge tpm 2019/07/08 v1
# gpg: Signature made Mon 08 Jul 2019 15:04:46 BST
# gpg: using RSA key 75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2019-07-08-1:
hw/tpm: Only build tpm_ppi.o if any of TPM_TIS/TPM_CRB is built
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 8 Jul 2019 14:21:20 +0000 (15:21 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- virtio-scsi: Fix request resubmission after I/O error with iothreads
- qcow2: Fix missing v2/v3 subformat aliases for amend
- qcow(1): More specific error message for wrong format version
- MAINTAINERS: update RBD block maintainer
# gpg: Signature made Mon 08 Jul 2019 15:17:27 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
qcow2: Allow -o compat=v3 during qemu-img amend
MAINTAINERS: update RBD block maintainer
block/qcow: Improve error when opening qcow2 files as qcow
virtio-scsi: restart DMA after iothread
qdev: add qdev_add_vm_change_state_handler()
vl: add qemu_add_vm_change_state_handler_prio()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Eric Blake [Fri, 5 Jul 2019 15:28:12 +0000 (10:28 -0500)]
qcow2: Allow -o compat=v3 during qemu-img amend
Commit b76b4f60 allowed '-o compat=v3' as an alias for the
less-appealing '-o compat=1.1' for 'qemu-img create' since we want to
use the QMP form as much as possible, but forgot to do likewise for
qemu-img amend. Also, it doesn't help that '-o help' doesn't list our
new preferred spellings.
Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
John Snow [Wed, 26 Jun 2019 21:53:01 +0000 (17:53 -0400)]
block/qcow: Improve error when opening qcow2 files as qcow
Reported-by: radmehrsaeed7@gmail.com Fixes: https://bugs.launchpad.net/bugs/1832914 Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Thu, 20 Jun 2019 17:37:09 +0000 (18:37 +0100)]
virtio-scsi: restart DMA after iothread
When the 'cont' command resumes guest execution the vm change state
handlers are invoked. Unfortunately there is no explicit ordering
between classic qemu_add_vm_change_state_handler() callbacks. When two
layers of code both use vm change state handlers, we don't control which
handler runs first.
virtio-scsi with iothreads hits a deadlock when a failed SCSI command is
restarted and completes before the iothread is re-initialized.
This patch uses the new qdev_add_vm_change_state_handler() API to
guarantee that virtio-scsi's virtio change state handler executes before
the SCSI bus children. This way DMA is restarted after the iothread has
re-initialized.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Thu, 20 Jun 2019 17:37:08 +0000 (18:37 +0100)]
qdev: add qdev_add_vm_change_state_handler()
Children sometimes depend on their parent's vm change state handler
having completed. Add a vm change state handler API for devices that
guarantees tree depth ordering.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* remotes/pmaydell/tags/pull-target-arm-20190708:
target/arm/vfp_helper: Call set_fpscr_to_host before updating to FPSCR
hw/arm/sbsa-ref: Remove unnecessary check for secure_sysmem == NULL
tests/migration-test: Fix read off end of aarch64_kernel array
target/arm: Fix sve_zcr_len_for_el
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
target/arm/vfp_helper: Call set_fpscr_to_host before updating to FPSCR
In commit e9d652824b0 we extracted the vfp_set_fpscr_to_host()
function but failed at calling it in the correct place, we call
it after xregs[ARM_VFP_FPSCR] is modified.
Fix by calling this function before we update FPSCR.
Peter Maydell [Mon, 8 Jul 2019 13:11:31 +0000 (14:11 +0100)]
hw/arm/sbsa-ref: Remove unnecessary check for secure_sysmem == NULL
In the virt machine, we support TrustZone being either present or
absent, and so the code must deal with the secure_sysmem pointer
possibly being NULL. In the sbsa-ref machine, TrustZone is always
present, but some code and comments copied from virt still treat
it as possibly not being present.
This causes Coverity to complain (CID 1407287) that we check
secure_sysmem for being NULL after an unconditional dereference.
Simplify the code so that instead of initializing the variable
to NULL, unconditionally assigning it, and then testing it for NULL,
we just initialize it correctly in the variable declaration and
then assume it to be non-NULL. We also delete a comment which
only applied to the non-TrustZone config.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190704142004.7150-1-peter.maydell@linaro.org Tested-by: Radosław Biernacki <radoslaw.biernacki@linaro.org> Reviewed-by: Radosław Biernacki <radoslaw.biernacki@linaro.org>
Peter Maydell [Mon, 8 Jul 2019 13:11:31 +0000 (14:11 +0100)]
tests/migration-test: Fix read off end of aarch64_kernel array
The test aarch64 kernel is in an array defined with
unsigned char aarch64_kernel[] = { [...] }
which means it could be any size; currently it's quite small.
However we write it to a file using init_bootfile(), which
writes exactly 512 bytes to the file. This will break if
we ever end up with a kernel larger than that, and will
read garbage off the end of the array in the current setup
where the kernel is smaller.
Make init_bootfile() take an argument giving the length of
the data to write. This allows us to use it for all architectures
(previously s390 had a special-purpose init_bootfile_s390x
which hardcoded the file to write so it could write the
correct length). We assert that the x86 bootfile really is
exactly 512 bytes as it should be (and as we were previously
just assuming it was).
This was detected by the clang-7 asan:
==15607==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a796f51d20 at pc 0x55a796b89c2f bp 0x7ffc58e89160 sp 0x7ffc58e88908
READ of size 512 at 0x55a796f51d20 thread T0
#0 0x55a796b89c2e in fwrite (/home/petmay01/linaro/qemu-from-laptop/qemu/build/sanitizers/tests/migration-test+0xb0c2e)
#1 0x55a796c46492 in init_bootfile /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:99:5
#2 0x55a796c46492 in test_migrate_start /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:593
#3 0x55a796c44101 in test_baddest /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:854:9
#4 0x7f906ffd3cc9 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72cc9)
#5 0x7f906ffd3bfa (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72bfa)
#6 0x7f906ffd3bfa (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72bfa)
#7 0x7f906ffd3ea1 in g_test_run_suite (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72ea1)
#8 0x7f906ffd3ec0 in g_test_run (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72ec0)
#9 0x55a796c43707 in main /home/petmay01/linaro/qemu-from-laptop/qemu/tests/migration-test.c:1187:11
#10 0x7f906e9abb96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#11 0x55a796b6c2d9 in _start (/home/petmay01/linaro/qemu-from-laptop/qemu/build/sanitizers/tests/migration-test+0x932d9)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190702150311.20467-1-peter.maydell@linaro.org
Off by one error in the EL2 and EL3 tests. Remove the test
against EL3 entirely, since it must always be true.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190702104732.31154-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Pierre Morel [Fri, 5 Jul 2019 15:32:49 +0000 (17:32 +0200)]
s390x/cpumodel: Set up CPU model for AQIC interception
Let's add support for the AP-Queue interruption facility to the CPU
model.
The S390_FEAT_AP_QUEUE_INTERRUPT_CONTROL, CPU facility indicates
whether the PQAP instruction with the AQIC command is available
to the guest.
This feature will be enabled only if the AP instructions are
available on the linux host and AQIC facility is installed on
the host.
This feature must be turned on from userspace to intercept AP
instructions on the KVM guest. The QEMU command line to turn
this feature on looks something like this:
qemu-system-s390x ... -cpu xxx,apqi=on ...
or
... -cpu host
Right now AP pass-through devices do not support migration,
which means that we do not have to take care of migrating
the interrupt data:
virsh migrate apguest --live qemu+ssh://root@target.lan/system
error: Requested operation is not valid: domain has assigned non-USB host devices
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[rebase to newest qemu and fixup description]
Message-Id: <20190705153249.12525-1-borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Alex Williamson [Tue, 2 Jul 2019 19:41:34 +0000 (13:41 -0600)]
vfio-ccw: Test vfio_set_irq_signaling() return value
Coverity doesn't like that most callers of vfio_set_irq_signaling() check
the return value and doesn't understand the equivalence of testing the
error pointer instead. Test the return value consistently.
Reported-by: Coverity (CID 1402783) Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <156209642116.14915.9598593247782519613.stgit@gimli.home> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
* remotes/bonzini/tags/for-upstream:
ioapic: use irq number instead of vector in ioapic_eoi_broadcast
hw/i386: Fix linker error when ISAPC is disabled
Makefile: generate header file with the list of devices enabled
target/i386: kvm: Fix when nested state is needed for migration
minikconf: do not include variables from MINIKCONF_ARGS in config-all-devices.mak
target/i386: fix feature check in hyperv-stub.c
ioapic: clear irq_eoi when updating the ioapic redirect table entry
intel_iommu: Fix unexpected unmaps during global unmap
intel_iommu: Fix incorrect "end" for vtd_address_space_unmap
i386/kvm: Fix build with -m32
checkpatch: do not warn for multiline parenthesized returned value
pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 8 Jul 2019 08:46:19 +0000 (09:46 +0100)]
Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging
Machine and x86 queue, 2019-07-05
* CPU die topology support (Like Xu)
* Deprecation of features (Igor Mammedov):
* 'mem' parameter of '-numa node' option
* implict memory distribution between NUMA nodes
* deprecate -mem-path fallback to anonymous RAM
* x86 versioned CPU models (Eduardo Habkost)
* SnowRidge CPU model (Paul Lai)
* Add deprecation information to query-machines (Eduardo Habkost)
* Other i386 fixes
* remotes/ehabkost/tags/machine-next-pull-request: (42 commits)
tests: use -numa memdev option in tests instead of legacy 'mem' option
numa: allow memory-less nodes when using memdev as backend
numa: Make deprecation warnings conditional on !qtest_enabled()
i386: Add Cascadelake-Server-v2 CPU model
docs: Deprecate CPU model runnability guarantees
i386: Make unversioned CPU models be aliases
i386: Replace -noTSX, -IBRS, -IBPB CPU models with aliases
i386: Define -IBRS, -noTSX, -IBRS versions of CPU models
i386: Register versioned CPU models
i386: Get model-id from CPU object on "-cpu help"
i386: Add x-force-features option for testing
qmp: Add "alias-of" field to query-cpu-definitions
i386: Introduce SnowRidge CPU model
qmp: Add deprecation information to query-machines
vl.c: Add -smp, dies=* command line support and update doc
machine: Refactor smp_parse() in vl.c as MachineClass::smp_parse()
target/i386: Add CPUID.1F generation support for multi-dies PCMachine
i386: Remove unused host_cpudef variable
x86/cpu: use FeatureWordArray to define filtered_features
i386: make 'hv-spinlocks' a regular uint32 property
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(qemu) /home/test/qemu5/qemu/hw/intc/ioapic.c:266:27: runtime error: index 35 out of bounds for type 'int [24]'
=================================================================
==113504==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61b000003114 at pc 0x5579e3c7a80f bp 0x7fd004bf8c10 sp 0x7fd004bf8c00
WRITE of size 4 at 0x61b000003114 thread T4
#0 0x5579e3c7a80e in ioapic_eoi_broadcast /home/test/qemu5/qemu/hw/intc/ioapic.c:266
#1 0x5579e3c6f480 in apic_eoi /home/test/qemu5/qemu/hw/intc/apic.c:428
#2 0x5579e3c720a7 in apic_mem_write /home/test/qemu5/qemu/hw/intc/apic.c:802
#3 0x5579e3b1e31a in memory_region_write_accessor /home/test/qemu5/qemu/memory.c:503
#4 0x5579e3b1e6a2 in access_with_adjusted_size /home/test/qemu5/qemu/memory.c:569
#5 0x5579e3b28d77 in memory_region_dispatch_write /home/test/qemu5/qemu/memory.c:1497
#6 0x5579e3a1b36b in flatview_write_continue /home/test/qemu5/qemu/exec.c:3323
#7 0x5579e3a1b633 in flatview_write /home/test/qemu5/qemu/exec.c:3362
#8 0x5579e3a1bcb1 in address_space_write /home/test/qemu5/qemu/exec.c:3452
#9 0x5579e3a1bd03 in address_space_rw /home/test/qemu5/qemu/exec.c:3463
#10 0x5579e3b8b979 in kvm_cpu_exec /home/test/qemu5/qemu/accel/kvm/kvm-all.c:2045
#11 0x5579e3ae4499 in qemu_kvm_cpu_thread_fn /home/test/qemu5/qemu/cpus.c:1287
#12 0x5579e4cbdb9f in qemu_thread_start util/qemu-thread-posix.c:502
#13 0x7fd0146376da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#14 0x7fd01436088e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e
This is because in ioapic_eoi_broadcast function, we uses 'vector' to
index the 's->irq_eoi'. To fix this, we should uses the irq number.
Signed-off-by: Li Qiang <liq3ea@163.com> Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190622002119.126834-1-liq3ea@163.com>
Julio Montes [Fri, 5 Jul 2019 14:35:54 +0000 (14:35 +0000)]
hw/i386: Fix linker error when ISAPC is disabled
v2: include config-devices.h to use CONFIG_IDE_ISA
Message-Id: <20190705143554.10295-2-julio.montes@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Julio Montes <julio.montes@intel.com>
Julio Montes [Fri, 5 Jul 2019 14:35:53 +0000 (14:35 +0000)]
Makefile: generate header file with the list of devices enabled
v2: generate config-devices.h which contains the list of devices enabled
Message-Id: <20190705143554.10295-1-julio.montes@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Julio Montes <julio.montes@intel.com>
Liran Alon [Mon, 24 Jun 2019 23:05:14 +0000 (02:05 +0300)]
target/i386: kvm: Fix when nested state is needed for migration
When vCPU is in VMX operation and enters SMM mode,
it temporarily exits VMX operation but KVM maintained nested-state
still stores the VMXON region physical address, i.e. even when the
vCPU is in SMM mode then (nested_state->hdr.vmx.vmxon_pa != -1ull).
Therefore, there is no need to explicitly check for
KVM_STATE_NESTED_SMM_VMXON to determine if it is necessary
to save nested-state as part of migration stream.
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com>
Message-Id: <20190624230514.53326-1-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 24 Jun 2019 18:18:46 +0000 (20:18 +0200)]
minikconf: do not include variables from MINIKCONF_ARGS in config-all-devices.mak
When minikconf writes config-devices.mak, it includes all variables including
those from MINIKCONF_ARGS. This causes values from config-host.mak to "stick" to
the ones used in generating config-devices.mak, because config-devices.mak is
included after config-host.mak. Avoid this by omitting assignments coming
from the command line in the output of minikconf.
Reported-by: Christophe de Dinechin <dinechin@redhat.com> Reviewed-by: Christophe de Dinechin <dinechin@redhat.com> Tested-by: Christophe de Dinechin <dinechin@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The reason was the conversion of cpu->hyperv_synic to
cpu->hyperv_synic_kvm_only although the rest of the patch introduces a
feature checking mechanism. So I've fixed the KVM_EXIT_HYPERV_SYNIC in
hyperv-stub to do the same feature check as in the real hyperv.c
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com>
Message-Id: <20190624123835.28869-1-alex.bennee@linaro.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Li Qiang [Mon, 24 Jun 2019 15:16:35 +0000 (08:16 -0700)]
ioapic: clear irq_eoi when updating the ioapic redirect table entry
irq_eoi is used to count the number of irq injected during eoi
broadcast. It should be set to 0 when updating the ioapic's redirect
table entry.
Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Li Qiang <liq3ea@163.com> Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190624151635.22494-1-liq3ea@163.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vtd_address_space_unmap() will do proper page mask alignment to make
sure each IOTLB message will have correct masks for notification
messages (2^N-1), but sometimes it can be expanded to even supercede
the registered range. That could lead to unexpected UNMAP of already
mapped regions in some other notifiers.
Instead of doing mindless expension of the start address and address
mask, we split the range into smaller ones and guarantee that each
small range will have correct masks (2^N-1) and at the same time we
should also try our best to generate as less IOTLB messages as
possible.
Reported-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Message-Id: <20190624091811.30412-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Yan Zhao [Mon, 24 Jun 2019 09:18:10 +0000 (17:18 +0800)]
intel_iommu: Fix incorrect "end" for vtd_address_space_unmap
IOMMUNotifier is with inclusive ranges, so we should check
against (VTD_ADDRESS_SIZE(s->aw_bits) - 1).
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[peterx: split from another bigger patch] Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190624091811.30412-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Max Reitz [Mon, 24 Jun 2019 19:39:13 +0000 (21:39 +0200)]
i386/kvm: Fix build with -m32
find_next_bit() takes a pointer of type "const unsigned long *", but the
first argument passed here is a "uint64_t *". These types are
incompatible when compiling qemu with -m32.
Just use ctz64() instead.
Fixes: c686193072a47032d83cb4e131dc49ae30f9e5d Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190624193913.28343-1-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 21 Jun 2019 11:28:54 +0000 (13:28 +0200)]
checkpatch: do not warn for multiline parenthesized returned value
While indeed we do not want to have
return (a);
it is less clear that this applies to
return (a &&
b);
Some editors indent more nicely if you have parentheses, and some people's
eyes may appreciate that as well.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1561116534-21814-1-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Igor Mammedov [Mon, 10 Jun 2019 13:50:35 +0000 (15:50 +0200)]
pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size()
QEMU will crash when device-memory-region-size property is read if ms->device_memory
wasn't initialized yet.
Crash can be reproduced with:
$QEMU -preconfig -qmp unix:qmp_socket,server,nowait &
./scripts/qmp/qom-get -s qmp_socket /machine.device-memory-region-size
Instead of crashing return 0 if ms->device_memory hasn't been initialized.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1560174635-22602-1-git-send-email-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Igor Mammedov [Tue, 2 Jul 2019 14:07:44 +0000 (10:07 -0400)]
numa: allow memory-less nodes when using memdev as backend
QEMU fails to start if memory-less node is present when memdev
is used
qemu-system-x86_64 -object memory-backend-ram,id=ram0,size=128M \
-numa node -numa node,memdev=ram0
with error:
"memdev option must be specified for either all or no nodes"
which works as expected if legacy 'mem' is used.
Fix check to make memory-less nodes valid when memdev option is used
but still disallow mix of mem and memdev options.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190702140745.27767-2-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
numa: Make deprecation warnings conditional on !qtest_enabled()
This will help us avoid spurious warnings during "make check".
Note that this will silence the warnings generated by
tests/numa-test, but not the ones generated by
tests/bios-tables-test. We still need to change
tests/bios-tables-test to use "-numa ...,memdev=" to silence
these warnings.
Eduardo Habkost [Fri, 28 Jun 2019 00:28:44 +0000 (21:28 -0300)]
i386: Add Cascadelake-Server-v2 CPU model
Add new version of Cascadelake-Server CPU model, setting
stepping=5 and enabling the IA32_ARCH_CAPABILITIES MSR
with some flags.
The new feature will introduce a new host software requirement,
breaking our CPU model runnability promises. This means we can't
enable the new CPU model version by default in QEMU 4.1, because
management software isn't ready yet to resolve CPU model aliases.
This is why "pc-*-4.1" will keep returning Cascadelake-Server-v1
if "-cpu Cascadelake-Server" is specified.
Includes a test case to ensure the right combinations of
machine-type + CPU model + command-line feature flags will work
as expected.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190628002844.24894-10-ehabkost@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20190703221723.8161-1-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Eduardo Habkost [Fri, 28 Jun 2019 00:28:42 +0000 (21:28 -0300)]
i386: Make unversioned CPU models be aliases
This will make unversioned CPU models behavior depend on the
machine type:
* "pc-*-4.0" and older will not report them as aliases.
This is done to keep compatibility with older QEMU versions
after management software starts translating aliases.
* "pc-*-4.1" will translate unversioned CPU models to -v1.
This is done to keep compatibility with existing management
software, that still relies on CPU model runnability promises.
* "none" will translate unversioned CPU models to their latest
version. This is planned become the default in future machine
types (probably in pc-*-4.3).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190628002844.24894-8-ehabkost@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>