Klaus Jensen [Wed, 19 Jul 2023 18:21:58 +0000 (20:21 +0200)]
hw/nvme: fix compliance issue wrt. iosqes/iocqes
As of prior to this patch, the controller checks the value of CC.IOCQES
and CC.IOSQES prior to enabling the controller. As reported by Ben in
GitLab issue #1691, this is not spec compliant. The controller should
only check these values when queues are created.
This patch moves these checks to nvme_create_cq(). We do not need to
check it in nvme_create_sq() since that will error out if the completion
queue is not already created.
Also, since the controller exclusively supports SQEs of size 64 bytes
and CQEs of size 16 bytes, hard code that.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1691 Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit 6a33f2e920ec0b489a77200888e3692664077f2d) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Klaus Jensen [Thu, 3 Aug 2023 18:44:23 +0000 (20:44 +0200)]
hw/nvme: fix oob memory read in fdp events log
As reported by Trend Micro's Zero Day Initiative, an oob memory read
vulnerability exists in nvme_fdp_events(). The host-provided offset is
not verified.
Fix this.
This is only exploitable when Flexible Data Placement mode (fdp=on) is
enabled.
Fixes: CVE-2023-4135 Fixes: 73064edfb864 ("hw/nvme: flexible data placement emulation") Reported-by: Trend Micro's Zero Day Initiative Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit ecb1b7b082d3b7dceff0e486a114502fc52c0fdf) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
dump: kdump-zlib data pages not dumped with pvtime/aarch64
The kdump-zlib data pages are not dumped from aarch64 host when the
'pvtime' is involved, that is, when the block->target_end is not aligned to
page_size. In the below example, it is expected to dump two blocks.
However, there is an issue with get_next_page() so that the pages for
"mach-virt.ram" will not be dumped.
At line 1296, although we have reached at the end of the 'pvtime' block,
since it is not aligned to the page_size (e.g., 0x10000), it will not break
at line 1298.
As a result, get_next_page() will continue to the next
block ("mach-virt.ram"). Finally, when get_next_page() returns to the
caller:
- 'pfnptr' is referring to the 'pvtime'
- but 'blockptr' is referring to the "mach-virt.ram"
When get_next_page() is called the next time, "*pfnptr += 1" still refers
to the prior 'pvtime'. It will exit immediately because it is out of the
range of the current "mach-virt.ram".
The fix is to break when it is time to come to the next block, so that both
'pfnptr' and 'blockptr' refer to the same block.
Fixes: 94d788408d2d ("dump: fix kdump to work over non-aligned blocks") Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20230713055819.30497-1-dongli.zhang@oracle.com>
(cherry picked from commit 8a64609eea8cb2bac015968c4b62da5bce266e22) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Zhao Liu [Wed, 28 Jun 2023 13:54:37 +0000 (21:54 +0800)]
hw/smbios: Fix core count in type4
>From SMBIOS 3.0 specification, core count field means:
Core Count is the number of cores detected by the BIOS for this
processor socket. [1]
Before 003f230e37d7 ("machine: Tweak the order of topology members in
struct CpuTopology"), MachineState.smp.cores means "the number of cores
in one package", and it's correct to use smp.cores for core count.
But 003f230e37d7 changes the smp.cores' meaning to "the number of cores
in one die" and doesn't change the original smp.cores' use in smbios as
well, which makes core count in type4 go wrong.
Fix this issue with the correct "cores per socket" caculation.
[1] SMBIOS 3.0.0, section 7.5.6, Processor Information - Core Count
Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct CpuTopology") Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-5-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 196ea60a734c346d7d75f1d89aa37703d4d854e7) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Zhao Liu [Wed, 28 Jun 2023 13:54:36 +0000 (21:54 +0800)]
hw/smbios: Fix thread count in type4
>From SMBIOS 3.0 specification, thread count field means:
Thread Count is the total number of threads detected by the BIOS for
this processor socket. It is a processor-wide count, not a
thread-per-core count. [1]
So here we should use threads per socket other than threads per core.
[1] SMBIOS 3.0.0, section 7.5.8, Processor Information - Thread Count
Fixes: c97294ec1b9e ("SMBIOS: Build aggregate smbios tables and entry point") Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-4-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 7298fd7de5551c4501f54381228458e3c21cab4b) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Zhao Liu [Wed, 28 Jun 2023 13:54:35 +0000 (21:54 +0800)]
hw/smbios: Fix smbios_smp_sockets caculation
smp.sockets is the number of sockets which is configured by "-smp" (
otherwise, the default is 1). Trying to recalculate it here with another
rules leads to errors, such as:
1. 003f230e37d7 ("machine: Tweak the order of topology members in struct
CpuTopology") changes the meaning of smp.cores but doesn't fix
original smp.cores uses.
With the introduction of cluster, now smp.cores means the number of
cores in one cluster. So smp.cores * smp.threads just means the
threads in a cluster not in a socket.
2. On the other hand, we shouldn't use smp.cpus here because it
indicates the initial number of online CPUs at the boot time, and is
not mathematically related to smp.sockets.
So stop reinventing the another wheel and use the topo values that
has been calculated.
Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct CpuTopology") Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-3-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d79a284a44bb7d88b233fb6bb12ea3723f43469d) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Zhao Liu [Wed, 28 Jun 2023 13:54:34 +0000 (21:54 +0800)]
machine: Add helpers to get cores/threads per socket
The number of cores/threads per socket are needed for smbios, and are
also useful for other modules.
Provide the helpers to wrap the calculation of cores/threads per socket
so that we can avoid calculation errors caused by other modules miss
topology changes.
Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20230628135437.1145805-2-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit a1d027be95bc375238e5b9292c6aa661a8ddef4c) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
As lpc-hc is designed for re-entrant calls from xscom, mark it
re-entrancy safe.
Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
[clg: mark opb_master_regs as re-entrancy safe also ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Tested-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230526073850.2772197-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
(cherry picked from commit 76f9ebffcd41b62ae9ec26a1c25676f2ae1d9cc3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
As the code is designed for re-entrant calls to apic-msi, mark apic-msi
as reentrancy-safe.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-Id: <20230427211013.2994127-9-alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 50795ee051a342c681a9b45671c552fbd6274db8) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
As the code is designed for re-entrant calls from raven_io_ops to
pci-conf, mark raven_io_ops as reentrancy-safe.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20230427211013.2994127-8-alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 6dad5a6810d9c60ca320d01276f6133bbcfa1fc7) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
bcm2835_property: disable reentrancy detection for iomem
As the code is designed for re-entrant calls from bcm2835_property to
bcm2835_mbox and back into bcm2835_property, mark iomem as
reentrancy-safe.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-7-alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 985c4a4e547afb9573b6bd6843d20eb2c3d1d1cd) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Where somedisk.qcow2 is an image that contains already some partitions
and file systems.
In the boot menu of Fedora, go to
"Troubleshooting" -> "Rescue a Fedora system" -> "3) Skip to shell"
Then check "dmesg | grep -i 53c" for failure messages, and try to mount
a partition from somedisk.qcow2.
Message-Id: <20230516090556.553813-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit d139fe9ad8a27bcc50b4ead77d2f97d191a0e95e) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
hw: replace most qemu_bh_new calls with qemu_bh_new_guarded
This protects devices from bh->mmio reentrancy issues.
Thanks: Thomas Huth <thuth@redhat.com> for diagnosing OS X test failure. Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-5-alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit f63192b0544af5d3e4d5edfd85ab520fcf671377) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Advise authors to use the _guarded versions of the APIs, instead.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-Id: <20230427211013.2994127-4-alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit ef56ffbdd6b0605dc1e305611287b948c970e236) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
async: Add an optional reentrancy guard to the BH API
Devices can pass their MemoryReentrancyGuard (from their DeviceState),
when creating new BHes. Then, the async API will toggle the guard
before/after calling the BH call-back. This prevents bh->mmio reentrancy
issues.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-Id: <20230427211013.2994127-3-alxndr@bu.edu>
[thuth: Fix "line over 90 characters" checkpatch.pl error] Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 9c86c97f12c060bf7484dd931f38634e166a81f0) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA. The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:
1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case
These issues have led to problems such as stack-exhaustion and
use-after-frees.
Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-2-alxndr@bu.edu>
[thuth: Replace warn_report() with warn_report_once()] Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit a2e1753b8054344f32cf94f31c6399a58794a380) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Matt Borgerson [Thu, 13 Jul 2023 07:29:01 +0000 (00:29 -0700)]
target/i386: Check CR0.TS before enter_mmx
When CR0.TS=1, execution of x87 FPU, MMX, and some SSE instructions will
cause a Device Not Available (DNA) exception (#NM). System software uses
this exception event to lazily context switch FPU state.
Before this patch, enter_mmx helpers may be generated just before #NM
generation, prematurely resetting FPU state before the guest has a
chance to save it.
Signed-off-by: Matt Borgerson <contact@mborgerson.com>
Message-ID: <CADc=-s5F10muEhLs4f3mxqsEPAHWj0XFfOC2sfFMVHrk9fcpMg@mail.gmail.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b2ea6450d8e1336a33eb958ccc64604bc35a43dd) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Nicholas Piggin [Sun, 30 Jul 2023 11:18:42 +0000 (21:18 +1000)]
target/ppc: Fix VRMA page size for ISA v3.0
Until v2.07s, the VRMA page size (L||LP) was encoded in LPCR[VRMASD].
In v3.0 that moved to the partition table PS field.
The powernv machine can now run KVM HPT guests on POWER9/10 CPUs with
this fix and the patch to add ASDR.
Fixes: 3367c62f522b ("target/ppc: Support for POWER9 native hash") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230730111842.39292-1-npiggin@gmail.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
(cherry picked from commit 0e2a3ec36885f6d79a96230f582d4455878c6373) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Nicholas Piggin [Wed, 26 Jul 2023 18:22:27 +0000 (04:22 +1000)]
target/ppc: Fix pending HDEC when entering PM state
HDEC is defined to not wake from PM state. There is a check in the HDEC
timer to avoid setting the interrupt if we are in a PM state, but no
check on PM entry to lower HDEC if it already fired. This can cause a
HDECR wake up and QEMU abort with unsupported exception in Power Save
mode.
Fixes: 4b236b621bf ("ppc: Initial HDEC support") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230726182230.433945-4-npiggin@gmail.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
(cherry picked from commit 9915dac4847f3cc5ffd36e4c374a4eec83fe09b5) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Nicholas Piggin [Wed, 26 Jul 2023 18:22:25 +0000 (04:22 +1000)]
target/ppc: Implement ASDR register for ISA v3.0 for HPT
The ASDR register was introduced in ISA v3.0. It has not been
implemented for HPT. With HPT, ASDR is the format of the slbmte RS
operand (containing VSID), which matches the ppc_slb_t field.
Fixes: 3367c62f522b ("target/ppc: Support for POWER9 native hash") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230726182230.433945-2-npiggin@gmail.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
(cherry picked from commit 9201af096962a1967ce5d0b270ed16ae4edd3db6) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq()
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."
Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.
Yet the problem is that, vhost_vdpa_net_load_mq() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.
This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.
Fixes: f64c7cda69 ("vdpa: Add vhost_vdpa_net_load_mq") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <ec515ebb0b4f56368751b9e318e245a5d994fa72.1688438055.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f45fd95ec9e8104f6af801c734375029dda0f542) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac()
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."
Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.
Yet the problem is that, vhost_vdpa_net_load_mac() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.
This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.
Fixes: f73c0c43ac ("vdpa: extract vhost_vdpa_net_load_mac from vhost_vdpa_net_load") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <a21731518644abbd0c495c5b7960527c5911f80d.1688438055.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit b479bc3c9d5e473553137641fd31069c251f0d6e) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
vdpa: Fix possible use-after-free for VirtQueueElement
QEMU uses vhost_handle_guest_kick() to forward guest's available
buffers to the vdpa device in SVQ avail ring.
In vhost_handle_guest_kick(), a `g_autofree` `elem` is used to
iterate through the available VirtQueueElements. This `elem` is
then passed to `svq->ops->avail_handler`, specifically to the
vhost_vdpa_net_handle_ctrl_avail(). If this handler fails to
process the CVQ command, vhost_handle_guest_kick() regains
ownership of the `elem`, and either frees it or requeues it.
Yet the problem is that, vhost_vdpa_net_handle_ctrl_avail()
mistakenly frees the `elem`, even if it fails to forward the
CVQ command to vdpa device. This can result in a use-after-free
for the `elem` in vhost_handle_guest_kick().
This patch solves this problem by refactoring
vhost_vdpa_net_handle_ctrl_avail() to only freeing the `elem` if
it owns it.
Fixes: bd907ae4b0 ("vdpa: manual forward CVQ buffers") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <e3f2d7db477734afe5c6a5ab3fa8b8317514ea34.1688746840.git.yin31149@gmail.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 031b1abacbdb3f4e016b6b926f7e7876c05339bb) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Wed, 2 Aug 2023 13:57:23 +0000 (15:57 +0200)]
include/hw/i386/x86-iommu: Fix struct X86IOMMU_MSIMessage for big endian hosts
The first bitfield here is supposed to be used as a 64-bit equivalent
to the "uint64_t msi_addr" in the union. To make this work correctly
on big endian hosts, too, the __addr_hi field has to be part of the
bitfield, and the the bitfield members must be declared with "uint64_t"
instead of "uint32_t" - otherwise the values are placed in the wrong
bytes on big endian hosts.
Same applies to the 32-bit "msi_data" field: __resved1 must be part
of the bitfield, and the members must be declared with "uint32_t"
instead of "uint16_t".
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230802135723.178083-7-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit e1e56c07d1fa24aa37a7e89e6633768fc8ea8705) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Wed, 2 Aug 2023 13:57:22 +0000 (15:57 +0200)]
hw/i386/x86-iommu: Fix endianness issue in x86_iommu_irq_to_msi_message()
The values in "msg" are assembled in host endian byte order (the other
field are also not swapped), so we must not swap the __addr_head here.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230802135723.178083-6-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 37cf5cecb039a063c0abe3b51ae30f969e73aa84) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Wed, 2 Aug 2023 13:57:21 +0000 (15:57 +0200)]
hw/i386/intel_iommu: Fix index calculation in vtd_interrupt_remap_msi()
The values in "addr" are populated locally in this function in host
endian byte order, so we must not swap the index_l field here.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230802135723.178083-5-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit fcd8027423300b201b37842b88393dc5c6c8ee9e) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Wed, 2 Aug 2023 13:57:20 +0000 (15:57 +0200)]
hw/i386/intel_iommu: Fix struct VTDInvDescIEC on big endian hosts
On big endian hosts, we need to reverse the bitfield order in the
struct VTDInvDescIEC, just like it is already done for the other
bitfields in the various structs of the intel-iommu device.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230802135723.178083-4-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 4572b22cf9ba432fa3955686853c706a1821bbc7) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Wed, 2 Aug 2023 13:57:19 +0000 (15:57 +0200)]
hw/i386/intel_iommu: Fix endianness problems related to VTD_IR_TableEntry
The code already tries to do some endianness handling here, but
currently fails badly:
- While it already swaps the data when logging errors / tracing, it fails
to byteswap the value before e.g. accessing entry->irte.present
- entry->irte.source_id is swapped with le32_to_cpu(), though this is
a 16-bit value
- The whole union is apparently supposed to be swapped via the 64-bit
data[2] array, but the struct is a mixture between 32 bit values
(the first 8 bytes) and 64 bit values (the second 8 bytes), so this
cannot work as expected.
Fix it by converting the struct to two proper 64-bit bitfields, and
by swapping the values only once for everybody right after reading
the data from memory.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230802135723.178083-3-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 642ba89672279fbdd14016a90da239c85e845d18) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
After reading the guest memory with dma_memory_read(), we have
to make sure that we byteswap the little endian data to the host's
byte order.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230802135723.178083-2-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit cc2a08480e19007c05be8fe5b6893e20448954dc) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
pci: do not respond config requests after PCI device eject
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2224964
In migration with VF failover, Windows guest and ACPI hot
unplug we do not need to satisfy config requests, otherwise
the guest immediately detects the device and brings up its
driver. Many network VF's are stuck on the guest PCI bus after
the migration.
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Message-Id: <20230728084049.191454-1-yuri.benditovich@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 348e354417b64c484877354ee7cc66f29fa6c7df) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
target/hppa: Move iaoq registers and thus reduce generated code size
On hppa the Instruction Address Offset Queue (IAOQ) registers specifies
the next to-be-executed instructions addresses. Each generated TB writes those
registers at least once, so those registers are used heavily in generated
code.
Looking at the generated assembly, for a x86-64 host this code
to write the address $0x7ffe826f into iaoq_f is generated:
0x7f73e8000184: c7 85 d4 01 00 00 6f 82 movl $0x7ffe826f, 0x1d4(%rbp)
0x7f73e800018c: fe 7f
0x7f73e800018e: c7 85 d8 01 00 00 73 82 movl $0x7ffe8273, 0x1d8(%rbp)
0x7f73e8000196: fe 7f
With the trivial change, by moving the variables iaoq_f and iaoq_b to
the top of struct CPUArchState, the offset to %rbp is reduced (from
0x1d4 to 0), which allows the x86-64 tcg to generate 3 bytes less of
generated code per move instruction:
0x7fc1e800018c: c7 45 00 6f 82 fe 7f movl $0x7ffe826f, (%rbp)
0x7fc1e8000193: c7 45 04 73 82 fe 7f movl $0x7ffe8273, 4(%rbp)
Overall this is a reduction of generated code (not a reduction of
number of instructions).
A test run with checks the generated code size by running "/bin/ls"
with qemu-user shows that the code size shrinks from 1616767 to 1569273
bytes, which is ~97% of the former size.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Helge Deller <deller@gmx.de> Cc: qemu-stable@nongnu.org
(cherry picked from commit f8c0fd9804f435a20c3baa4c0c77ba9a02af24ef) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
zhenwei pi [Thu, 3 Aug 2023 02:43:13 +0000 (10:43 +0800)]
virtio-crypto: verify src&dst buffer length for sym request
For symmetric algorithms, the length of ciphertext must be as same
as the plaintext.
The missing verification of the src_len and the dst_len in
virtio_crypto_sym_op_helper() may lead buffer overflow/divulged.
This patch is originally written by Yiming Tao for QEMU-SECURITY,
resend it(a few changes of error message) in qemu-devel.
Fixes: CVE-2023-3180 Fixes: 04b9b37edda("virtio-crypto: add data queue processing handler") Cc: Gonglei <arei.gonglei@huawei.com> Cc: Mauro Matteo Cascella <mcascell@redhat.com> Cc: Yiming Tao <taoym@zju.edu.cn> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230803024314.29962-2-pizhenwei@bytedance.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9d38a8434721a6479fe03fb5afb150ca793d3980) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Li Feng [Mon, 31 Jul 2023 12:10:06 +0000 (20:10 +0800)]
vhost: fix the fd leak
When the vhost-user reconnect to the backend, the notifer should be
cleanup. Otherwise, the fd resource will be exhausted.
Fixes: f9a09ca3ea ("vhost: add support for configure interrupt") Signed-off-by: Li Feng <fengli@smartx.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20230731121018.2856310-2-fengli@smartx.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com>
(cherry picked from commit 18f2971ce403008d5e1c2875b483c9d1778143dc) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Eric Auger [Mon, 17 Jul 2023 16:21:26 +0000 (18:21 +0200)]
hw/virtio-iommu: Fix potential OOB access in virtio_iommu_handle_command()
In the virtio_iommu_handle_command() when a PROBE request is handled,
output_size takes a value greater than the tail size and on a subsequent
iteration we can get a stack out-of-band access. Initialize the
output_size on each iteration.
The issue was found with ASAN. Credits to:
Yiming Tao(Zhejiang University)
Gaoning Pan(Zhejiang University)
Fixes: 1733eebb9e7 ("virtio-iommu: Implement RESV_MEM probe request") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Mauro Matteo Cascella <mcascell@redhat.com> Cc: qemu-stable@nongnu.org
Message-Id: <20230717162126.11693-1-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit cf2f89edf36a59183166ae8721a8d7ab5cd286bd) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The arguments for deposit64 are (value, start, length, fieldval); this
appears to have thought they were (value, fieldval, start,
length). Reorder the parameters to match the actual function.
Cc: qemu-stable@nongnu.org Fixes: 950272506d ("target/m68k: Use semihosting/syscalls.h") Reported-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230801154519.3505531-1-peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 8caaae7319a5f7ca449900c0e6bfcaed78fa3ae2) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The arguments for deposit64 are (value, start, length, fieldval); this
appears to have thought they were (value, fieldval, start,
length). Reorder the parameters to match the actual function.
Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Fixes: d1e23cbaa403b2d ("target/nios2: Use semihosting/syscalls.h") Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230731235245.295513-1-keithp@keithp.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 71e2dd6aa1bdbac19c661638a4ae91816002ac9e) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Keith Packard [Tue, 1 Aug 2023 15:22:45 +0000 (08:22 -0700)]
target/nios2: Pass semihosting arg to exit
Instead of using R_ARG0 (the semihost function number), use R_ARG1
(the provided exit status).
Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230801152245.332749-1-keithp@keithp.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit c11d5bdae79a8edaf00dfcb2e49c064a50c67671) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
David Woodhouse [Fri, 7 Apr 2023 15:12:00 +0000 (17:12 +0200)]
hw/xen: fix off-by-one in xen_evtchn_set_gsi()
Coverity points out (CID 1508128) a bounds checking error. We need to check
for gsi >= IOAPIC_NUM_PINS, not just greater-than.
Also fix up an assert() that has the same problem, that Coverity didn't see.
Fixes: 4f81baa33ed6 ("hw/xen: Support GSI mapping to PIRQ") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230801175747.145906-2-dwmw2@infradead.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit cf885b19579646d6a085470658bc83432d6786d2) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
io: remove io watch if TLS channel is closed during handshake
The TLS handshake make take some time to complete, during which time an
I/O watch might be registered with the main loop. If the owner of the
I/O channel invokes qio_channel_close() while the handshake is waiting
to continue the I/O watch must be removed. Failing to remove it will
later trigger the completion callback which the owner is not expecting
to receive. In the case of the VNC server, this results in a SEGV as
vnc_disconnect_start() tries to shutdown a client connection that is
already gone / NULL.
CVE-2023-3354 Reported-by: jiangyegen <jiangyegen@huawei.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 10be627d2b5ec2d6b3dce045144aa739eef678b4) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Anthony PERARD [Tue, 4 Jul 2023 17:18:19 +0000 (18:18 +0100)]
xen-block: Avoid leaks on new error path
Commit 189829399070 ("xen-block: Use specific blockdev driver")
introduced a new error path, without taking care of allocated
resources.
So only allocate the qdicts after the error check, and free both
`filename` and `driver` when we are about to return and thus taking
care of both success and error path.
Coverity only spotted the leak of qdicts (*_layer variables).
Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: Coverity CID 1508722, 1398649 Fixes: 189829399070 ("xen-block: Use specific blockdev driver") Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230704171819.42564-1-anthony.perard@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit aa36243514a777f76c8b8a19b1f8a71f27ec6c78) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Anthony PERARD [Fri, 14 Jul 2023 15:27:20 +0000 (16:27 +0100)]
thread-pool: signal "request_cond" while locked
thread_pool_free() might have been called on the `pool`, which would
be a reason for worker_thread() to quit. In this case,
`pool->request_cond` is been destroyed.
If worker_thread() didn't managed to signal `request_cond` before it
been destroyed by thread_pool_free(), we got:
util/qemu-thread-posix.c:198: qemu_cond_signal: Assertion `cond->initialized' failed.
One backtrace:
__GI___assert_fail (assertion=0x55555614abcb "cond->initialized", file=0x55555614ab88 "util/qemu-thread-posix.c", line=198,
function=0x55555614ad80 <__PRETTY_FUNCTION__.17104> "qemu_cond_signal") at assert.c:101
qemu_cond_signal (cond=0x7fffb800db30) at util/qemu-thread-posix.c:198
worker_thread (opaque=0x7fffb800dab0) at util/thread-pool.c:129
qemu_thread_start (args=0x7fffb8000b20) at util/qemu-thread-posix.c:505
start_thread (arg=<optimized out>) at pthread_create.c:486
linux-user/armeb: Fix __kernel_cmpxchg() for armeb
Commit 7f4f0d9ea870 ("linux-user/arm: Implement __kernel_cmpxchg with host
atomics") switched to use qatomic_cmpxchg() to swap a word with the memory
content, but missed to endianess-swap the oldval and newval values when
emulating an armeb CPU, which expects words to be stored in big endian in
the guest memory.
The bug can be verified with qemu >= v7.0 on any little-endian host, when
starting the armeb binary of the upx program, which just hangs without
this patch.
Cc: qemu-stable@nongnu.org Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: "Markus F.X.J. Oberhumer" <markus@oberhumer.com> Reported-by: John Reiser <jreiser@BitWagon.com> Closes: https://github.com/upx/upx/issues/687
Message-Id: <ZMQVnqY+F+5sTNFd@p100> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 38dd78c41eaf08b490c9e7ec68fc508bbaa5cb1d) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
target/ppc: Disable goto_tb with architectural singlestep
The change to use translator_use_goto_tb went too far, as the
CF_SINGLE_STEP flag managed by the translator only handles
gdb single stepping and not the architectural single stepping
modeled in DisasContext.singlestep_enabled.
Fixes: 6e9cc373ec5 ("target/ppc: Use translator_use_goto_tb")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1795 Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 2e718e665706d5fcc3e3501bda26f277f055ed85) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
util/interval-tree: Use qatomic_set_mb in rb_link_node
Ensure that the stores to rb_left and rb_right are complete before
inserting the new node into the tree. Otherwise a concurrent reader
could see garbage in the new leaf.
Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 4c8baa02d36379507afd17bdea87aabe0aa32ed3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: s/qatomic_set_mb/qatomic_mb_set/ for 8.0 - it was renamed later)
util/interval-tree: Use qatomic_read for left/right while searching
Fixes a race condition (generally without optimization) in which
the subtree is re-read after the protecting if condition.
Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 055b86e0f0b4325117055d8d31c49011258f4af3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Maydell [Thu, 27 Jul 2023 10:39:06 +0000 (11:39 +0100)]
target/arm: Avoid writing to constant TCGv in trans_CSEL()
In commit 0b188ea05acb5 we changed the implementation of
trans_CSEL() to use tcg_constant_i32(). However, this change
was incorrect, because the implementation of the function
sets up the TCGv_i32 rn and rm to be either zero or else
a TCG temp created in load_reg(), and these TCG temps are
then in both cases written to by the emitted TCG ops.
The result is that we hit a TCG assertion:
(or on a non-debug build, just produce a garbage result)
Adjust the code so that rn and rm are always writeable
temporaries whether the instruction is using the special
case "0" or a normal register as input.
Cc: qemu-stable@nongnu.org Fixes: 0b188ea05acb5 ("target/arm: Use tcg_constant in trans_CSEL") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230727103906.2641264-1-peter.maydell@linaro.org
(cherry picked from commit 2b0d656ab6484cae7f174e194215a6d50343ecd2) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Maydell [Tue, 25 Jul 2023 09:56:51 +0000 (10:56 +0100)]
target/arm: Special case M-profile in debug_helper.c code
A lot of the code called from helper_exception_bkpt_insn() is written
assuming A-profile, but we will also call this helper on M-profile
CPUs when they execute a BKPT insn. This used to work by accident,
but recent changes mean that we will hit an assert when some of this
code calls down into lower level functions that end up calling
arm_security_space_below_el3(), arm_el_is_aa64(), and other functions
that now explicitly assert that the guest CPU is not M-profile.
Handle M-profile directly to avoid the assertions:
* in arm_debug_target_el(), M-profile debug exceptions always
go to EL1
* in arm_debug_exception_fsr(), M-profile always uses the short
format FSR (compare commit d7fe699be54b2, though in this case
the code in arm_v7m_cpu_do_interrupt() does not need to
look at the FSR value at all)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1775 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230721143239.1753066-1-peter.maydell@linaro.org
(cherry picked from commit 5d78893f39caf94c8587141e2219b57a7d63dd5c) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Maydell [Tue, 25 Jul 2023 09:56:51 +0000 (10:56 +0100)]
hw/arm/smmu: Handle big-endian hosts correctly
The implementation of the SMMUv3 has multiple places where it reads a
data structure from the guest and directly operates on it without
doing a guest-to-host endianness conversion. Since all SMMU data
structures are little-endian, this means that the SMMU doesn't work
on a big-endian host. In particular, this causes the Avocado test
machine_aarch64_virt.py:Aarch64VirtMachine.test_alpine_virt_tcg_gic_max
to fail on an s390x host.
Add appropriate byte-swapping on reads and writes of guest in-memory
data structures so that the device works correctly on big-endian
hosts.
As part of this we constrain queue_read() to operate only on Cmd
structs and queue_write() on Evt structs, because in practice these
are the only data structures the two functions are used with, and we
need to know what the data structure is to be able to byte-swap its
parts correctly.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20230717132641.764660-1-peter.maydell@linaro.org Cc: qemu-stable@nongnu.org
(cherry picked from commit c6445544d4cea2628fbad3bad09f3d3a03c749d3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Viktor Prutyanov [Mon, 26 Jun 2023 09:12:58 +0000 (12:12 +0300)]
virtio-net: pass Device-TLB enable/disable events to vhost
If vhost is enabled for virtio-net, Device-TLB enable/disable events
must be passed to vhost for proper IOMMU unmap flag selection.
Signed-off-by: Viktor Prutyanov <viktor@daynix.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230626091258.24453-3-viktor@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit cd9b8346884353ba9ae6560b44b7cccdf00a6633) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Viktor Prutyanov [Mon, 26 Jun 2023 09:12:57 +0000 (12:12 +0300)]
vhost: register and change IOMMU flag depending on Device-TLB state
The guest can disable or never enable Device-TLB. In these cases,
it can't be used even if enabled in QEMU. So, check Device-TLB state
before registering IOMMU notifier and select unmap flag depending on
that. Also, implement a way to change IOMMU notifier flag if Device-TLB
state is changed.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001312 Signed-off-by: Viktor Prutyanov <viktor@daynix.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230626091258.24453-2-viktor@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ee071f67f7a103c66f85f68ffe083712929122e3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Viktor Prutyanov [Fri, 12 May 2023 13:51:20 +0000 (16:51 +0300)]
virtio-pci: add handling of PCI ATS and Device-TLB enable/disable
According to PCIe Address Translation Services specification 5.1.3.,
ATS Control Register has Enable bit to enable/disable ATS. Guest may
enable/disable PCI ATS and, accordingly, Device-TLB for the VirtIO PCI
device. So, raise/lower a flag and call a trigger function to pass this
event to a device implementation.
Signed-off-by: Viktor Prutyanov <viktor@daynix.com>
Message-Id: <20230512135122.70403-2-viktor@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 206e91d143301414df2deb48a411e402414ba6db) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Thu, 20 Jul 2023 17:53:07 +0000 (19:53 +0200)]
target/loongarch: Fix the CSRRD CPUID instruction on big endian hosts
The test in tests/avocado/machine_loongarch.py is currently failing
on big endian hosts like s390x. By comparing the traces between running
the QEMU_EFI.fd bios on a s390x and on a x86 host, it's quickly obvious
that the CSRRD instruction for the CPUID is behaving differently. And
indeed: The code currently does a long read (i.e. 64 bit) from the
address that points to the CPUState->cpu_index field (with tcg_gen_ld_tl()
in the trans_csrrd() function). But this cpu_index field is only an "int"
(i.e. 32 bit). While this dirty pointer magic works on little endian hosts,
it of course fails on big endian hosts. Fix it by using a proper helper
function instead.
Message-Id: <20230720175307.854460-1-thuth@redhat.com> Reviewed-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit c34ad459926f6c600a55fe6782a27edfa405d60b) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
target/s390x: Fix assertion failure in VFMIN/VFMAX with type 13
Type 13 is reserved, so using it should result in specification
exception. Due to an off-by-1 error the code triggers an assertion at a
later point in time instead.
Cc: qemu-stable@nongnu.org Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)") Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230724082032.66864-8-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit ff537b0370ab5918052b8d8a798e803c47272406) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
target/s390x: Fix CONVERT TO LOGICAL/FIXED with out-of-range inputs
CONVERT TO LOGICAL/FIXED deviate from IEEE 754 in that they raise an
inexact exception on out-of-range inputs. float_flag_invalid_cvti
aligns nicely with that behavior, so convert it to
S390_IEEE_MASK_INEXACT.
Cc: qemu-stable@nongnu.org Fixes: defb0e3157af ("s390x: Implement opcode helpers") Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230724082032.66864-4-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 53684e344a27da770acc9012740334154ddea24f) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output
i386 and s390x implementations of op_add2 require an earlyclobber,
which is currently missing. This breaks VCKSM in s390x guests. E.g., on
x86_64 the following op:
Jordan Niethe [Mon, 17 Jul 2023 09:30:01 +0000 (19:30 +1000)]
tcg/ppc: Fix race in goto_tb implementation
Commit 20b6643324 ("tcg/ppc: Reorg goto_tb implementation") modified
goto_tb to ensure only a single instruction was patched to prevent
incorrect behavior if a thread was in the middle of multiple
instructions when they were replaced. However this introduced a race
between loading the jmp target into TCG_REG_TB and patching and
executing the direct branch.
tb_target_set_jmp_target() will replace 'patch_location' with a direct
branch if the target is in range. The direct branch now relies on
TCG_REG_TB being set up correctly by the ld. Prior to this commit
multiple instructions were patched in for the direct branch case; these
instructions would initialize TCG_REG_TB to the same value as the branch
target.
Imagine the following sequence:
1) Thread A is executing the goto_tb sequence and loads the jmp
target into TCG_REG_TB.
2) Thread B updates the jmp target address and calls
tb_target_set_jmp_target(). This patches a new direct branch into the
goto_tb sequence.
3) Thread A executes the newly patched direct branch. The value in
TCG_REG_TB still contains the old jmp target.
TCG_REG_TB MUST contain the translation block's tc.ptr. Execution will
eventually crash after performing memory accesses generated from a
faulty value in TCG_REG_TB.
This presents as segfaults or illegal instruction exceptions.
Do not revert commit 20b6643324 as it did fix a different race
condition. Instead remove the direct branch optimization and always use
indirect branches.
The direct branch optimization can be re-added later with a race free
sequence.
Fixes: 20b6643324 ("tcg/ppc: Reorg goto_tb implementation")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1726 Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com> Tested-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Co-developed-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Message-Id: <20230717093001.13167-1-jniethe5@gmail.com>
(cherry picked from commit 736a1588c104e9995c1831df33554df1f1def8b8) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
qemu-nbd: regression with arguments passing into nbd_client_thread()
Unfortunately
commit 03b67621445d601c9cdc7dfe25812e9f19b81488
(8.0: feb0814b3b48e75b336ad72eb303f9d579c94083)
Author: Denis V. Lunev <den@openvz.org>
Date: Mon Jul 17 16:55:40 2023 +0200
qemu-nbd: pass structure into nbd_client_thread instead of plain char*
has introduced a regression. struct NbdClientOpts resides on stack inside
'if' block. This specifically means that this stack space could be reused
once the execution will leave that block of the code.
This means that parameters passed into nbd_client_thread could be
overwritten at any moment.
The patch moves the data to the namespace of main() function effectively
preserving it for the whole process lifetime.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: <qemu-stable@nongnu.org> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230727105828.324314-1-den@openvz.org> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit e5b815b0defcc3617f473ba70c3e675ef0ee69c2) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: add reference to feb0814b3b48e75b336ad72eb303f9d579c94083 for 8.0 branch)
has introduced an interesting regression. Original behavior of
ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
was the following:
* qemu-nbd was started as a daemon
* the command execution is done and ssh exited with success
The patch has changed this behavior and 'ssh' command now hangs forever.
According to the normal specification of the daemon() call, we should
endup with STDERR pointing to /dev/null. That should be done at the
very end of the successful startup sequence when the pipe to the
bootstrap process (used for diagnostics) is no longer needed.
This could be achived in the same way as done for 'qemu-nbd -c' case.
That was commit 0eaf453e, also fixing up e6df58a5. STDOUT copying to
STDERR does the trick.
This also leads to proper 'ssh' connection closing which fixes my
original problem.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Hanna Reitz <hreitz@redhat.com> CC: <qemu-stable@nongnu.org>
Message-ID: <20230717145544.194786-3-den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 5c56dd27a2c905c9cf2472d2fd057621ce5fd00d) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
qemu-nbd: pass structure into nbd_client_thread instead of plain char*
We are going to pass additional flag inside next patch.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: <qemu-stable@nongnu.org>
Message-ID: <20230717145544.194786-2-den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 03b67621445d601c9cdc7dfe25812e9f19b81488) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
linux-user: Fix signed math overflow in brk() syscall
Fix the math overflow when calculating the new_malloc_size.
new_host_brk_page and brk_page are unsigned integers. If userspace
reduces the heap, new_host_brk_page is lower than brk_page which results
in a huge positive number (but should actually be negative).
Fix it by adding a proper check and as such make the code more readable.
Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: "Markus F.X.J. Oberhumer" <markus@oberhumer.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Cc: qemu-stable@nongnu.org Buglink: https://github.com/upx/upx/issues/683
(cherry picked from commit eac78a4b0b7da4de2c0a297f4d528ca9cc6256a3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
linux-user: Prohibit brk() to to shrink below initial heap address
Since commit 86f04735ac ("linux-user: Fix brk() to release pages") it's
possible for userspace applications to reduce their memory footprint by
calling brk() with a lower address and free up memory. Before that commit
guest heap memory was never unmapped.
But the Linux kernel prohibits to reduce brk() below the initial memory
address which is set at startup by the set_brk() function in binfmt_elf.c.
Such a range check was missed in commit 86f04735ac.
This patch adds the missing check by storing the initial brk value in
initial_target_brk and verify any new brk addresses against that value.
Tested with the i386 upx binary from
https://github.com/upx/upx/releases/download/v4.0.2/upx-4.0.2-i386_linux.tar.xz
Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: "Markus F.X.J. Oberhumer" <markus@oberhumer.com> Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Cc: qemu-stable@nongnu.org Buglink: https://github.com/upx/upx/issues/683
(cherry picked from commit dfe49864afb06e7e452a4366051697bc4fcfc1a5) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
linux-user: Fix qemu brk() to not zero bytes on current page
The qemu brk() implementation is too aggressive and cleans remaining bytes
on the current page above the last brk address.
But some existing applications are buggy and read/write bytes above their
current heap address. On a phyiscal machine this does not trigger a
runtime error as long as the access happens on the same page. Additionally
the Linux kernel allocates only full pages and does no zeroing on already
allocated pages, even if the brk address is lowered.
Fix qemu to behave the same way as the kernel does. Do not touch already
allocated pages, and - when running with different page sizes of guest and
host - zero out only those memory areas where the host page size is bigger
than the guest page size.
Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: "Markus F.X.J. Oberhumer" <markus@oberhumer.com> Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Cc: qemu-stable@nongnu.org Buglink: https://github.com/upx/upx/issues/683
(cherry picked from commit 15ad98536ad9410fb32ddf1ff09389b677643faa) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Klaus Jensen [Tue, 18 Jul 2023 09:44:17 +0000 (11:44 +0200)]
hw/nvme: fix endianness issue for shadow doorbells
In commit 2fda0726e514 ("hw/nvme: fix missing endian conversions for
doorbell buffers"), we fixed shadow doorbells for big-endian guests
running on little endian hosts. But I did not fix little-endian guests
on big-endian hosts. Fix this.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1765 Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-stable@nongnu.org Reported-by: Thomas Huth <thuth@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit ea3c76f1494d0c75873c3b470e6e048202661ad8) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
ui/vnc-clipboard: fix infinite loop in inflate_buffer (CVE-2023-3255)
A wrong exit condition may lead to an infinite loop when inflating a
valid zlib buffer containing some extra bytes in the `inflate_buffer`
function. The bug only occurs post-authentication. Return the buffer
immediately if the end of the compressed data has been reached
(Z_STREAM_END).
Commit 4f5c67f8df ("linux-user/arm: Take more care allocating
commpage") already took care of not allocating a commpage for
M-profile CPUs, however it had to be reverted as commit 6cda41daa2.
Re-introduce the M-profile fix from commit 4f5c67f8df.
Fixes: fbd3c4cff6 ("linux-user/arm: Mark the commpage executable")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1755 Reported-by: Christophe Lyon <christophe.lyon@linaro.org> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Anton Johansson <anjo@rev.ng> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230711153408.68389-1-philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit d713cf4d6c71076513a10528303b3e337b4d5998) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
tcg: Fix info_in_idx increment in layout_arg_by_ref
Off by one error, failing to take into account that layout_arg_1
already incremented info_in_idx for the first piece. We only
need care for the n-1 TCG_CALL_ARG_BY_REF_N pieces here.
Cc: qemu-stable@nongnu.org Fixes: 313bdea84d2 ("tcg: Add TCG_CALL_{RET,ARG}_BY_REF")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1751 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit e18ed26ce785f74a17e6f3a095647e08ba6fc669) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
linux-user/syscall: Implement execve without execveat
Support for execveat syscall was implemented in 55bbe4 and is available
since QEMU 8.0.0. It relies on host execveat, which is widely available
on most of Linux kernels today.
However, this change breaks qemu-user self emulation, if "host" qemu
version is less than 8.0.0. Indeed, it does not implement yet execveat.
This strange use case happens with most of distribution today having
binfmt support.
With a concrete failing example:
$ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls
/bin/bash: line 1: /bin/ls: Function not implemented
-> not implemented means execve returned ENOSYS
qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian
packages qemu-user-static* [1].
One usage of this is running wine-arm64 from linux-x64 (details [2]).
This is by updating qemu embedded in docker image that we ran into this
issue.
The solution to update host qemu is not always possible. Either it's
complicated or ask you to recompile it, or simply is not accessible
(GitLab CI, GitHub Actions). Thus, it could be worth to implement execve
without relying on execveat, which is the goal of this patch.
This patch was tested with example presented in this commit message.
Olaf Hering [Wed, 12 Jul 2023 07:47:22 +0000 (09:47 +0200)]
hw/ide/piix: properly initialize the BMIBA register
According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA-BUS
MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
32bit wide. To properly reset it to default values, all 32bit need to be
cleared. Bit #0 "Resource Type Indicator (RTE)" needs to be enabled.
The initial change wrote just the lower 8 bit, leaving parts of the "Bus
Master Interface Base Address" address at bit 15:4 unchanged.
Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 82371SB (Function 1)") Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230712074721.14728-1-olaf@aepfle.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 230dfd9257e92259876c113e58b5f0d22b056d2e) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
target/mips: enable GINVx support for I6400 and I6500
GINVI and GINVT operations are supported on MIPS I6400 and I6500 cores,
so indicate that properly in CP0.Config5 register bits [16:15].
Cc: qemu-stable@nongnu.org Signed-off-by: Marcin Nowakowski <marcin.nowakowski@fungible.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230630072806.3093704-1-marcin.nowakowski@fungible.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit baf21eebc3e1026d21d94fdf8ca470050e49968f) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
vfio: Fix null pointer dereference bug in vfio_bars_finalize()
vfio_realize() has the following flow:
1. vfio_bars_prepare() -- sets VFIOBAR->size.
2. msix_early_setup().
3. vfio_bars_register() -- allocates VFIOBAR->mr.
After vfio_bars_prepare() is called msix_early_setup() can fail. If it
does fail, vfio_bars_register() is never called and VFIOBAR->mr is not
allocated.
In this case, vfio_bars_finalize() is called as part of the error flow
to free the bars' resources. However, vfio_bars_finalize() calls
object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and
thus we get a null pointer dereference.
Fix it by checking VFIOBAR->mr in vfio_bars_finalize().
Fixes: 89d5202edc50 ("vfio/pci: Allow relocating MSI-X MMIO") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 8af87a3ec7e42ff1b9cf75ceee0451c31e34d153) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The Linux accept4() syscall allows two flags only: SOCK_NONBLOCK and
SOCK_CLOEXEC, and returns -EINVAL if any other bits have been set.
Change the qemu implementation accordingly, which means we can not use
the fcntl_flags_tbl[] translation table which allows too many other
values.
Beside the correction in behaviour, this actually fixes the accept4()
emulation for hppa, mips and alpha targets for which SOCK_NONBLOCK is
different than TARGET_SOCK_NONBLOCK (aka O_NONBLOCK).
The fix can be verified with the testcase of the debian lwt package,
which hangs forever in a read() syscall without this patch.
Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit dca4c8384d68bbf5d67f50a5446865d92d61f032) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
linux-user: Fix fcntl() and fcntl64() to return O_LARGEFILE for 32-bit targets
When running a 32-bit guest on a 64-bit host, fcntl[64](F_GETFL) should
return with the TARGET_O_LARGEFILE flag set, because all 64-bit hosts
support large files unconditionally.
But on 64-bit hosts, O_LARGEFILE has the value 0, so the flag
translation can't be done with the fcntl_flags_tbl[]. Instead add the
TARGET_O_LARGEFILE flag afterwards.
Note that for 64-bit guests the compiler will optimize away this code,
since TARGET_O_LARGEFILE is zero.
Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit e0ddf8eac9f83c0bc5a3d39605d873ee0fe53421) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Nicholas Piggin [Thu, 29 Jun 2023 02:07:13 +0000 (12:07 +1000)]
hw/ppc: Fix clock update drift
The clock update logic reads the clock twice to compute the new clock
value, with a value derived from the later time subtracted from a value
derived from the earlier time. The delta causes time to be lost.
This can ultimately result in time becoming unsynchronized between CPUs
and that can cause OS lockups, timeouts, watchdogs, etc. This can be
seen running a KVM guest (that causes lots of TB updates) on a powernv
SMP machine.
Fix this by reading the clock once.
Cc: qemu-stable@nongnu.org Fixes: dbdd25065e90 ("Implement time-base start/stop helpers.") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-ID: <20230629020713.327745-1-npiggin@gmail.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
(cherry picked from commit 2ad2e113deb5663e69a05dd6922cbfc6d7ea34d3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
qemu_cleanup: begin drained section after vm_shutdown()
in order to avoid requests being stuck in a BlockBackend's request
queue during cleanup. Having such requests can lead to a deadlock [0]
with a virtio-scsi-pci device using iothread that's busy with IO when
initiating a shutdown with QMP 'quit'.
There is a race where such a queued request can continue sometime
(maybe after bdrv_child_free()?) during bdrv_root_unref_child() [1].
The completion will hold the AioContext lock and wait for the BQL
during SCSI completion, but the main thread will hold the BQL and
wait for the AioContext as part of bdrv_root_unref_child(), leading to
the deadlock [0].
[0]:
> Thread 3 (Thread 0x7f3bbd87b700 (LWP 135952) "qemu-system-x86"):
> #0 __lll_lock_wait (futex=futex@entry=0x564183365f00 <qemu_global_mutex>, private=0) at lowlevellock.c:52
> #1 0x00007f3bc1c0d843 in __GI___pthread_mutex_lock (mutex=0x564183365f00 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2 0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x564183365f00 <qemu_global_mutex>, file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../util/qemu-thread-posix.c:94
> #3 0x000056418247cc2a in qemu_mutex_lock_iothread_impl (file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../softmmu/cpus.c:504
> #4 0x00005641826d5325 in prepare_mmio_access (mr=0x5641856148a0) at ../softmmu/physmem.c:2593
> #5 0x00005641826d6fe7 in address_space_stl_internal (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /home/febner/repos/qemu/memory_ldst.c.inc:318
> #6 0x00005641826d7154 in address_space_stl_le (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0) at /home/febner/repos/qemu/memory_ldst.c.inc:357
> #7 0x0000564182374b07 in pci_msi_trigger (dev=0x56418679b0d0, msg=...) at ../hw/pci/pci.c:359
> #8 0x000056418237118b in msi_send_message (dev=0x56418679b0d0, msg=...) at ../hw/pci/msi.c:379
> #9 0x0000564182372c10 in msix_notify (dev=0x56418679b0d0, vector=8) at ../hw/pci/msix.c:542
> #10 0x000056418243719c in virtio_pci_notify (d=0x56418679b0d0, vector=8) at ../hw/virtio/virtio-pci.c:77
> #11 0x00005641826933b0 in virtio_notify_vector (vdev=0x5641867a34a0, vector=8) at ../hw/virtio/virtio.c:1985
> #12 0x00005641826948d6 in virtio_irq (vq=0x5641867ac078) at ../hw/virtio/virtio.c:2461
> #13 0x0000564182694978 in virtio_notify (vdev=0x5641867a34a0, vq=0x5641867ac078) at ../hw/virtio/virtio.c:2473
> #14 0x0000564182665b83 in virtio_scsi_complete_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:115
> #15 0x00005641826670ce in virtio_scsi_complete_cmd_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:641
> #16 0x000056418266736b in virtio_scsi_command_complete (r=0x7f3bb0010560, resid=0) at ../hw/scsi/virtio-scsi.c:712
> #17 0x000056418239aac6 in scsi_req_complete (req=0x7f3bb0010560, status=2) at ../hw/scsi/scsi-bus.c:1526
> #18 0x000056418239e090 in scsi_handle_rw_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:242
> #19 0x000056418239e13f in scsi_disk_req_check_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:265
> #20 0x000056418239e482 in scsi_dma_complete_noio (r=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:340
> #21 0x000056418239e5d9 in scsi_dma_complete (opaque=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:371
> #22 0x00005641824809ad in dma_complete (dbs=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:107
> #23 0x0000564182480a72 in dma_blk_cb (opaque=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:127
> #24 0x00005641827bf78a in blk_aio_complete (acb=0x7f3bb00021a0) at ../block/block-backend.c:1563
> #25 0x00005641827bfa5e in blk_aio_write_entry (opaque=0x7f3bb00021a0) at ../block/block-backend.c:1630
> #26 0x000056418295638a in coroutine_trampoline (i0=-1342102448, i1=32571) at ../util/coroutine-ucontext.c:177
> #27 0x00007f3bc0caed40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x00007f3bbd8757f0 in ?? ()
> #29 0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7f3bbe3e9280 (LWP 135944) "qemu-system-x86"):
> #0 __lll_lock_wait (futex=futex@entry=0x5641856f2a00, private=0) at lowlevellock.c:52
> #1 0x00007f3bc1c0d8d1 in __GI___pthread_mutex_lock (mutex=0x5641856f2a00) at ../nptl/pthread_mutex_lock.c:115
> #2 0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
> #3 0x000056418293a140 in qemu_rec_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
> #4 0x00005641829532d5 in aio_context_acquire (ctx=0x5641856f29a0) at ../util/async.c:728
> #5 0x000056418279d5df in bdrv_set_aio_context_commit (opaque=0x5641856e6e50) at ../block.c:7493
> #6 0x000056418294e288 in tran_commit (tran=0x56418630bfe0) at ../util/transactions.c:87
> #7 0x000056418279d880 in bdrv_try_change_aio_context (bs=0x5641856f7130, ctx=0x56418548f810, ignore_child=0x0, errp=0x0) at ../block.c:7626
> #8 0x0000564182793f39 in bdrv_root_unref_child (child=0x5641856f47d0) at ../block.c:3242
> #9 0x00005641827be137 in blk_remove_bs (blk=0x564185709880) at ../block/block-backend.c:914
> #10 0x00005641827bd689 in blk_remove_all_bs () at ../block/block-backend.c:583
> #11 0x0000564182798699 in bdrv_close_all () at ../block.c:5117
> #12 0x000056418248a5b2 in qemu_cleanup () at ../softmmu/runstate.c:821
> #13 0x0000564182738603 in qemu_default_main () at ../softmmu/main.c:38
> #14 0x0000564182738631 in main (argc=30, argv=0x7ffd675a8a48) at ../softmmu/main.c:48
>
> (gdb) p *((QemuMutex*)0x5641856f2a00)
> $1 = {lock = {__data = {__lock = 2, __count = 2, __owner = 135952, ...
> (gdb) p *((QemuMutex*)0x564183365f00)
> $2 = {lock = {__data = {__lock = 2, __count = 0, __owner = 135944, ...
[1]:
> Thread 1 "qemu-system-x86" hit Breakpoint 5, bdrv_drain_all_end () at ../block/io.c:551
> #0 bdrv_drain_all_end () at ../block/io.c:551
> #1 0x00005569810f0376 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:156
> #2 0x00005569810bd3e0 in bdrv_replace_child_noperm (child=0x556982e2d7d0, new_bs=0x0) at ../block.c:2897
> #3 0x00005569810bdef2 in bdrv_root_unref_child (child=0x556982e2d7d0) at ../block.c:3227
> #4 0x00005569810e8137 in blk_remove_bs (blk=0x556982e42880) at ../block/block-backend.c:914
> #5 0x00005569810e7689 in blk_remove_all_bs () at ../block/block-backend.c:583
> #6 0x00005569810c2699 in bdrv_close_all () at ../block.c:5117
> #7 0x0000556980db45b2 in qemu_cleanup () at ../softmmu/runstate.c:821
> #8 0x0000556981062603 in qemu_default_main () at ../softmmu/main.c:38
> #9 0x0000556981062631 in main (argc=30, argv=0x7ffd7a82a418) at ../softmmu/main.c:48
> [Switching to Thread 0x7fe76dab2700 (LWP 103649)]
>
> Thread 3 "qemu-system-x86" hit Breakpoint 4, blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
> #0 blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
> #1 0x00005569810e8f36 in blk_wait_while_drained (blk=0x556982e42880) at ../block/block-backend.c:1312
> #2 0x00005569810e9231 in blk_co_do_pwritev_part (blk=0x556982e42880, offset=3422961664, bytes=4096, qiov=0x556983028060, qiov_offset=0, flags=0) at ../block/block-backend.c:1402
> #3 0x00005569810e9a4b in blk_aio_write_entry (opaque=0x556982e2cfa0) at ../block/block-backend.c:1628
> #4 0x000055698128038a in coroutine_trampoline (i0=-2090057872, i1=21865) at ../util/coroutine-ucontext.c:177
> #5 0x00007fe770f50d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #6 0x00007ffd7a829570 in ?? ()
> #7 0x0000000000000000 in ?? ()
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20230706131418.423713-1-f.ebner@proxmox.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ca2a5e630dc1f569266fb663bf0b65e4eb433fb2) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
For the outer product set of insns, which take an entire matrix
tile as output, the argument is not a combined tile+column.
Therefore using get_tile_rowcol was incorrect, as we extracted
the tile number from itself.
The test case relies only on assembler support for SME, since
no release of GCC recognizes -march=armv9-a+sme yet.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1620 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230622151201.1578522-5-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: dropped now-unneeded changes to sysregs CFLAGS] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 1f51573f7925b80e79a29f87c7d9d6ead60960c0) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: fixup context in tests/tcg/aarch64/Makefile.target)
Mark Cave-Ayland [Thu, 29 Jun 2023 08:25:22 +0000 (09:25 +0100)]
accel/tcg: Assert one page in tb_invalidate_phys_page_range__locked
Ensure that that both the start and last addresses are within
the same guest page.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230629082522.606219-3-mark.cave-ayland@ilande.co.uk>
[rth: Use tcg_debug_assert, simplify the expression] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit e665cf72fe6357945fdbecf747dac58c0c7c7c66) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Mark Cave-Ayland [Thu, 29 Jun 2023 08:25:21 +0000 (09:25 +0100)]
accel/tcg: Fix start page passed to tb_invalidate_phys_page_range__locked
Due to a copy-paste error in tb_invalidate_phys_range, the wrong
start address was passed to tb_invalidate_phys_page_range__locked.
Correct is to use the start of each page in turn.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Fixes: e506ad6a05 ("accel/tcg: Pass last not end to tb_invalidate_phys_range")
Message-Id: <20230629082522.606219-2-mark.cave-ayland@ilande.co.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 3307e08c6f142bb3d2406cfbc0ee19359748b51a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
linux-user: Avoid mmap of the last byte of the reserved_va
There is an overflow problem in mmap_find_vma_reserved:
when reserved_va == UINT32_MAX, end may overflow to 0.
Rather than a larger rewrite at this time, simply avoid
the final byte of the VA, which avoids searching the
final page, which avoids the overflow.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1741 Fixes: 95059f9c ("include/exec: Change reserved_va semantics to last byte") Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20230629080835.71371-1-richard.henderson@linaro.org>
(cherry picked from commit 605a8b5491a119a2a6efbf61e5a38f9374645990) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Shameer Kolothum [Tue, 13 Jun 2023 14:09:43 +0000 (15:09 +0100)]
vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path
When vfio_enable_vectors() returns with less than requested nr_vectors
we retry with what kernel reported back. But the retry path doesn't
call vfio_prepare_kvm_msi_virq_batch() and this results in,
qemu-system-aarch64: vfio: Error: Failed to enable 4 MSI vectors, retry with 1
qemu-system-aarch64: ../hw/vfio/pci.c:602: vfio_commit_kvm_msi_virq_batch: Assertion `vdev->defer_kvm_irq_routing' failed
Fixes: dc580d51f7dd ("vfio: defer to commit kvm irq routing when enable msi/msix") Reviewed-by: Longpeng <longpeng2@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit c17408892319712c12357e5d1c6b305499c58c2a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>