Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <820210309145653.743937-11-f4bug@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210214175912.732946-16-f4bug@amsat.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Rather than using the magic 0x80000000 number for the PCI I/O BAR
physical address on the main system bus, use a definition.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210417103028.601124-6-f4bug@amsat.org>
hw/pci-host: Rename Raven ASIC PCI bridge as raven.c
The ASIC PCI bridge chipset from Motorola is named 'Raven'.
This chipset is used in the PowerPC Reference Platform (PReP),
but not restricted to it. Rename it accordingly.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210417103028.601124-5-f4bug@amsat.org>
Peter Maydell [Sun, 11 Jul 2021 13:32:49 +0000 (14:32 +0100)]
Merge remote-tracking branch 'remotes/cminyard/tags/for-qemu-6.1-2' into staging
Some qemu updates for IPMI and I2C
Move some ADC file to where they belong and move some sensors to a
sensor directory, since with new BMCs coming in lots of different
sensors should be coming in. Keep from cluttering things up.
Add support for I2C PMBus devices.
Replace the confusing and error-prone i2c_send_recv and i2c_transfer with
specific send and receive functions. Several errors have already been
made with these, avoid any new errors.
Fix the watchdog_expired field in the IPMI watchdog, it's not a bool,
it's a u8. After a vmstate transfer, the new value could be wrong.
# gpg: Signature made Fri 09 Jul 2021 17:25:04 BST
# gpg: using RSA key FD0D5CE67CE0F59A6688268661F38C90919BFF81
# gpg: Good signature from "Corey Minyard <cminyard@mvista.com>" [unknown]
# gpg: aka "Corey Minyard <minyard@acm.org>" [unknown]
# gpg: aka "Corey Minyard <corey@minyard.net>" [unknown]
# gpg: aka "Corey Minyard <minyard@mvista.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: FD0D 5CE6 7CE0 F59A 6688 2686 61F3 8C90 919B FF81
* remotes/cminyard/tags/for-qemu-6.1-2: (24 commits)
tests/qtest: add tests for MAX34451 device model
hw/misc: add MAX34451 device
tests/qtest: add tests for ADM1272 device model
hw/misc: add ADM1272 device
hw/i2c: add support for PMBus
ipmi/sim: fix watchdog_expired data type error in IPMIBmcSim struct
hw/i2c: Introduce i2c_start_recv() and i2c_start_send()
hw/i2c: Extract i2c_do_start_transfer() from i2c_start_transfer()
hw/i2c: Make i2c_start_transfer() direction argument a boolean
hw/i2c: Rename i2c_set_slave_address() -> i2c_slave_set_address()
hw/i2c: Remove confusing i2c_send_recv()
hw/misc/auxbus: Replace i2c_send_recv() by i2c_recv() & i2c_send()
hw/misc/auxbus: Replace 'is_write' boolean by its value
hw/misc/auxbus: Explode READ_I2C / WRITE_I2C_MOT cases
hw/misc/auxbus: Fix MOT/classic I2C mode
hw/i2c/ppc4xx_i2c: Replace i2c_send_recv() by i2c_recv() & i2c_send()
hw/i2c/ppc4xx_i2c: Add reference to datasheet
hw/display/sm501: Replace i2c_send_recv() by i2c_recv() & i2c_send()
hw/display/sm501: Simplify sm501_i2c_write() logic
hw/input/lm832x: Define TYPE_LM8323 in public header
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/pmaydell/tags/pull-target-arm-20210709:
hw/intc: Improve formatting of MEMTX_ERROR guest error message
target/arm: Correct the encoding of MDCCSR_EL0 and DBGDSCRint
hw/arm/stellaris: Expand comment about handling of OLED chipselect
hw/gpio/pl061: Document a shortcoming in our implementation
hw/gpio/pl061: Convert to 3-phase reset and assert GPIO lines correctly on reset
hw/arm/virt: Make PL061 GPIO lines pulled low, not high
hw/gpio/pl061: Make pullup/pulldown of outputs configurable
hw/gpio/pl061: Honour Luminary PL061 PUR and PDR registers
hw/gpio/pl061: Document the interface of this device
hw/gpio/pl061: Add tracepoints for register read and write
hw/gpio/pl061: Clean up read/write offset handling logic
hw/gpio/pl061: Convert DPRINTF to tracepoints
hw/intc/arm_gicv3_cpuif: Fix virtual irq number check in icv_[dir|eoir]_write
tests/boot-serial-test: Add STM32VLDISCOVERY board testcase
docs/system: arm: Add stm32 boards description
stm32vldiscovery: Add the STM32VLDISCOVERY Machine
stm32f100: Add the stm32f100 SoC
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Sat, 10 Jul 2021 18:55:20 +0000 (19:55 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
- Make blockdev-reopen stable
- Remove deprecated qemu-img backing file without format
- rbd: Convert to coroutines and add write zeroes support
- rbd: Updated MAINTAINERS
- export/fuse: Allow other users access to the export
- vhost-user: Fix backends without multiqueue support
- Fix drive-backup transaction endless drained section
# gpg: Signature made Fri 09 Jul 2021 13:49:22 BST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (28 commits)
block: Make blockdev-reopen stable API
iotests: Test reopening multiple devices at the same time
block: Support multiple reopening with x-blockdev-reopen
block: Acquire AioContexts during bdrv_reopen_multiple()
block: Add bdrv_reopen_queue_free()
qcow2: Fix dangling pointer after reopen for 'file'
qemu-img: Improve error for rebase without backing format
qemu-img: Require -F with -b backing image
qcow2: Prohibit backing file changes in 'qemu-img amend'
blockdev: fix drive-backup transaction endless drained section
vhost-user: Fix backends without multiqueue support
MAINTAINERS: add block/rbd.c reviewer
block/rbd: fix type of task->complete
iotests/fuse-allow-other: Test allow-other
iotests/308: Test +w on read-only FUSE exports
export/fuse: Let permissions be adjustable
export/fuse: Give SET_ATTR_SIZE its own branch
export/fuse: Add allow-other option
export/fuse: Pass default_permissions for mount
util/uri: do not check argument of uri_free()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Sat, 10 Jul 2021 15:06:24 +0000 (16:06 +0100)]
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210709' into staging
ppc patch queue 2021-07-09
Here's a (probably) final pull request before the qemu-6.1 soft
freeze. Includes:
* Implementation of the new H_RPT_INVALIDATE hypercall
* Virtual Open Firmware for pSeries and pegasos2 machine types.
This is an experimental minimal Open Firmware implementation which
works by delegating nearly everything to qemu itself via a special
hypercall.
* A number of cleanups to the ppc soft MMU code
* Fix to handling of two-level radix mode translations for the
powernv machine type
* Update the H_GET_CPU_CHARACTERISTICS call with newly defined bits.
This will allow more flexible handling of possible future CPU
Spectre-like flaws
* Correctly treat mtmsrd as an illegal instruction on BookE cpus
* Firmware update for the ppce500 machine type
* remotes/dg-gitlab/tags/ppc-for-6.1-20210709: (33 commits)
target/ppc: Support for H_RPT_INVALIDATE hcall
linux-headers: Update
spapr: Fix implementation of Open Firmware client interface
target/ppc: Don't compile ppc_tlb_invalid_all without TCG
ppc/pegasos2: Implement some RTAS functions with VOF
ppc/pegasos2: Fix use of && instead of &
ppc/pegasos2: Use Virtual Open Firmware as firmware replacement
target/ppc/spapr: Update H_GET_CPU_CHARACTERISTICS L1D cache flush bits
target/ppc: Allow virtual hypervisor on CPU without HV
ppc/pegasos2: Introduce Pegasos2MachineState structure
target/ppc: mtmsrd is an illegal instruction on BookE
spapr: Implement Open Firmware client interface
docs/system: ppc: Update ppce500 documentation with eTSEC support
roms/u-boot: Bump ppce500 u-boot to v2021.07 to add eTSEC support
target/ppc: change ppc_hash32_xlate to use mmu_idx
target/ppc: introduce mmu-books.h
target/ppc: changed ppc_hash64_xlate to use mmu_idx
target/ppc: fix address translation bug for radix mmus
target/ppc: Fix compilation with DEBUG_BATS debug option
target/ppc: Fix compilation with FLUSH_ALL_TLBS debug option
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/ehabkost-gl/tags/machine-next-pull-request:
vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus
virtio-mem: Require only coordinated discards
softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types
softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
vfio: Support for RamDiscardManager in the vIOMMU case
vfio: Sanity check maximum number of DMA mappings with RamDiscardManager
vfio: Query and store the maximum number of possible DMA mappings
vfio: Support for RamDiscardManager in the !vIOMMU case
virtio-mem: Implement RamDiscardManager interface
virtio-mem: Don't report errors when ram_block_discard_range() fails
virtio-mem: Factor out traversing unplugged ranges
memory: Helpers to copy/free a MemoryRegionSection
memory: Introduce RamDiscardManager for RAM memory regions
Deprecate pmem=on with non-DAX capable backend file
vmbus: Don't make QOM property registration conditional
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
target/arm: Correct the encoding of MDCCSR_EL0 and DBGDSCRint
Signed-off-by: Nick Hudson <hnick@vmware.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:18 +0000 (11:40 +0100)]
hw/arm/stellaris: Expand comment about handling of OLED chipselect
The stellaris board doesn't emulate the handling of the OLED
chipselect line correctly. Expand the comment describing this,
including a sketch of the theoretical correct way to do it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:17 +0000 (11:40 +0100)]
hw/gpio/pl061: Document a shortcoming in our implementation
The Luminary PL061s in the Stellaris LM3S9695 don't all have the same
reset value for GPIOPUR. We can get away with not letting the board
configure the PUR reset value because we don't actually wire anything
up to the lines which should reset to pull-up. Add a comment noting
this omission.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:16 +0000 (11:40 +0100)]
hw/gpio/pl061: Convert to 3-phase reset and assert GPIO lines correctly on reset
The PL061 comes out of reset with all its lines configured as input,
which means they might need to be pulled to 0 or 1 depending on the
'pullups' and 'pulldowns' properties. Currently we do not assert
these lines on reset; they will only be set whenever the guest first
touches a register that triggers a call to pl061_update().
Convert the device to three-phase reset so we have a place where we
can safely call qemu_set_irq() to set the floating lines to their
correct values.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:15 +0000 (11:40 +0100)]
hw/arm/virt: Make PL061 GPIO lines pulled low, not high
For the virt board we have two PL061 devices -- one for NonSecure which
is inputs only, and one for Secure which is outputs only. For the former,
we don't care whether its outputs are pulled low or high when the line is
configured as an input, because we don't connect them. For the latter,
we do care, because we wire the lines up to the gpio-pwr device, which
assumes that level 1 means "do the action" and 1 means "do nothing".
For consistency in case we add more outputs in future, configure both
PL061s to pull GPIO lines down to 0.
Reported-by: Maxim Uvarov <maxim.uvarov@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:14 +0000 (11:40 +0100)]
hw/gpio/pl061: Make pullup/pulldown of outputs configurable
The PL061 GPIO does not itself include pullup or pulldown resistors
to set the value of a GPIO line treated as an output when it is
configured as an input (ie when the PL061 itself is not driving it).
In real hardware it is up to the board to add suitable pullups or
pulldowns. Currently our implementation hardwires this to "outputs
pulled high", which is correct for some boards (eg the realview ones:
see figure 3-29 in the "RealView Platform Baseboard for ARM926EJ-S
User Guide" DUI0224I), but wrong for others.
In particular, the wiring in the 'virt' board and the gpio-pwr device
assumes that wires should be pulled low, because otherwise the
pull-to-high will trigger a shutdown or reset action. (The only
reason this doesn't happen immediately on startup is due to another
bug in the PL061, where we don't assert the GPIOs to the correct
value on reset, but will do so as soon as the guest touches a
register and pl061_update() gets called.)
Add properties to the pl061 so the board can configure whether it
wants GPIO lines to have pullup, pulldown, or neither.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:13 +0000 (11:40 +0100)]
hw/gpio/pl061: Honour Luminary PL061 PUR and PDR registers
The Luminary variant of the PL061 has registers GPIOPUR and GPIOPDR
which lets the guest configure whether the GPIO lines are pull-up,
pull-down, or truly floating. Instead of assuming all lines are pulled
high, honour the PUR and PDR registers.
For the plain PL061, continue to assume that lines have an external
pull-up resistor, as we did before.
The stellaris board actually relies on this behaviour -- the CD line
of the ssd0323 display device is connected to GPIO output C7, and it
is only because of a different bug which we're about to fix that we
weren't incorrectly driving this line high on reset and putting the
ssd0323 into data mode.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:11 +0000 (11:40 +0100)]
hw/gpio/pl061: Add tracepoints for register read and write
Add tracepoints for reads and writes to the PL061 registers. This requires
restructuring pl061_read() to only return after the tracepoint, rather
than having lots of early-returns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:10 +0000 (11:40 +0100)]
hw/gpio/pl061: Clean up read/write offset handling logic
Currently the pl061_read() and pl061_write() functions handle offsets
using a combination of three if() statements and a switch(). Clean
this up to use just a switch, using case ranges.
This requires that instead of catching accesses to the luminary-only
registers on a stock PL061 via a check on s->rsvd_start we use
an "is this luminary?" check in the cases for each luminary-only
register.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Fri, 2 Jul 2021 10:40:09 +0000 (11:40 +0100)]
hw/gpio/pl061: Convert DPRINTF to tracepoints
Convert the use of the DPRINTF debug macro in the PL061 model to
use tracepoints.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
hw/intc/arm_gicv3_cpuif: Fix virtual irq number check in icv_[dir|eoir]_write
icv_eoir_write() and icv_dir_write() ignore invalid virtual IRQ numbers
(like LPIs). The issue is that these functions check against the number
of implemented IRQs (QEMU's default is num_irq=288) which can be lower
than the maximum virtual IRQ number (1020 - 1). The consequence is that
if a hypervisor creates an LR for an IRQ between 288 and 1020, then the
guest is unable to deactivate the resulting IRQ. Note that other
functions that deal with large IRQ numbers, like icv_iar_read, check
against 1020 and not against num_irq.
Fix the checks by using GICV3_MAXIRQ (1020) instead of the number of
implemented IRQs.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Message-id: 20210702233701.3369-1-ricarkol@google.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Alberto Garcia [Thu, 8 Jul 2021 11:47:09 +0000 (13:47 +0200)]
block: Make blockdev-reopen stable API
This patch drops the 'x-' prefix from x-blockdev-reopen.
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-7-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alberto Garcia [Thu, 8 Jul 2021 11:47:08 +0000 (13:47 +0200)]
iotests: Test reopening multiple devices at the same time
This test swaps the images used by two active block devices.
This is now possible thanks to the new ability to run
x-blockdev-reopen on multiple devices at the same time.
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-6-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alberto Garcia [Thu, 8 Jul 2021 11:47:07 +0000 (13:47 +0200)]
block: Support multiple reopening with x-blockdev-reopen
[ kwolf: Fixed AioContext locking ]
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-5-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Thu, 8 Jul 2021 11:47:06 +0000 (13:47 +0200)]
block: Acquire AioContexts during bdrv_reopen_multiple()
As the BlockReopenQueue can contain nodes in multiple AioContexts, only
one of which may be locked when AIO_WAIT_WHILE() can be called, we can't
let the caller lock the right contexts. Instead, individually lock the
AioContext of a single node when iterating the queue.
Reintroduce bdrv_reopen() as a wrapper for reopening a single node that
drains the node and temporarily drops the AioContext lock for
bdrv_reopen_multiple().
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Alberto Garcia [Thu, 8 Jul 2021 11:47:05 +0000 (13:47 +0200)]
block: Add bdrv_reopen_queue_free()
Move the code to free a BlockReopenQueue to a separate function.
It will be used in a subsequent patch.
[ kwolf: Also free explicit_options and options, and explicitly
qobject_ref() the value when it continues to be used. This makes
future memory leaks less likely. ]
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Thu, 8 Jul 2021 11:47:04 +0000 (13:47 +0200)]
qcow2: Fix dangling pointer after reopen for 'file'
Without an external data file, s->data_file is a second pointer with the
same value as bs->file. When changing bs->file to a different BdrvChild
and freeing the old BdrvChild, s->data_file must also be updated,
otherwise it points to freed memory and causes crashes.
This problem was caught by iotests case 245.
Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210708114709.206487-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Thu, 8 Jul 2021 15:52:28 +0000 (10:52 -0500)]
qemu-img: Improve error for rebase without backing format
When removeing support for qemu-img being able to create backing
chains without embedded backing formats, we caused a poor error
message as caught by iotest 114. Improve the situation to inform the
user what went wrong.
Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210708155228.2666172-1-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Mon, 3 May 2021 21:36:00 +0000 (14:36 -0700)]
qemu-img: Require -F with -b backing image
Back in commit d9f059aa6c (qemu-img: Deprecate use of -b without -F),
we deprecated the ability to create a file with a backing image that
requires qemu to perform format probing. Qemu can still probe older
files for backwards compatibility, but it is time to finish off the
ability to create such images, due to the potential security risk they
present. Update a couple of iotests affected by the change.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210503213600.569128-3-eblake@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eric Blake [Mon, 3 May 2021 21:35:59 +0000 (14:35 -0700)]
qcow2: Prohibit backing file changes in 'qemu-img amend'
This was deprecated back in bc5ee6da7 (qcow2: Deprecate use of
qemu-img amend to change backing file), and no one in the meantime has
given any reasons why it should be supported. Time to make change
attempts a hard error (but for convenience, specifying the _same_
backing chain is not forbidden). Update a couple of iotests to match.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210503213600.569128-2-eblake@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
drive_backup_prepare() does bdrv_drained_begin() in hope that
bdrv_drained_end() will be called in drive_backup_clean(). Still we
need to set state->bs for this to work. That's done too late: a lot of
failure paths in drive_backup_prepare() miss setting state->bs. Fix
that.
Fixes: 2288ccfac96281c316db942d10e3f921c1373064 Fixes: https://gitlab.com/qemu-project/qemu/-/issues/399 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210608171852.250775-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Mon, 5 Jul 2021 17:14:29 +0000 (19:14 +0200)]
vhost-user: Fix backends without multiqueue support
dev->max_queues was never initialised for backends that don't support
VHOST_USER_PROTOCOL_F_MQ, so it would use 0 as the maximum number of
queues to check against and consequently fail for any such backend.
Set it to 1 if the backend doesn't have multiqueue support.
Fixes: c90bd505a3e8210c23d69fecab9ee6f56ec4a161 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210705171429.29286-1-kwolf@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Wed, 7 Jul 2021 18:04:49 +0000 (20:04 +0200)]
MAINTAINERS: add block/rbd.c reviewer
adding myself as a designated reviewer.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20210707180449.32665-2-pl@kamp.de> Acked-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Wed, 7 Jul 2021 18:04:48 +0000 (20:04 +0200)]
block/rbd: fix type of task->complete
task->complete is a bool not an integer.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20210707180449.32665-1-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Fri, 25 Jun 2021 14:23:16 +0000 (16:23 +0200)]
iotests/308: Test +w on read-only FUSE exports
Test that +w on read-only FUSE exports returns an EROFS error. u+x on
the other hand should work. (There is no special reason to choose u+x
here, it simply is like +w another flag that is not set by default.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Fri, 25 Jun 2021 14:23:15 +0000 (16:23 +0200)]
export/fuse: Let permissions be adjustable
Allow changing the file mode, UID, and GID through SETATTR.
Without allow_other, UID and GID are not allowed to be changed, because
it would not make sense. Also, changing group or others' permissions
is not allowed either.
For read-only exports, +w cannot be set.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Fri, 25 Jun 2021 14:23:14 +0000 (16:23 +0200)]
export/fuse: Give SET_ATTR_SIZE its own branch
In order to support changing other attributes than the file size in
fuse_setattr(), we have to give each its own independent branch. This
also applies to the only attribute we do support right now.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210625142317.271673-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Fri, 25 Jun 2021 14:23:13 +0000 (16:23 +0200)]
export/fuse: Add allow-other option
Without the allow_other mount option, no user (not even root) but the
one who started qemu/the storage daemon can access the export. Allow
users to configure the export such that such accesses are possible.
While allow_other is probably what users want, we cannot make it an
unconditional default, because passing it is only possible (for non-root
users) if the global fuse.conf configuration file allows it. Thus, the
default is an 'auto' mode, in which we first try with allow_other, and
then fall back to without.
FuseExport.allow_other reports whether allow_other was actually used as
a mount option or not. Currently, this information is not used, but a
future patch will let this field decide whether e.g. an export's UID and
GID can be changed through chmod.
One notable thing about 'auto' mode is that libfuse may print error
messages directly to stderr, and so may fusermount (which it executes).
Our export code cannot really filter or hide them. Therefore, if 'auto'
fails its first attempt and has to fall back, fusermount will print an
error message that mounting with allow_other failed.
This behavior necessitates a change to iotest 308, namely we need to
filter out this error message (because if the first attempt at mounting
with allow_other succeeds, there will be no such message).
Furthermore, common.rc's _make_test_img should use allow-other=off for
FUSE exports, because iotests generally do not need to access images
from other users, so allow-other=on or allow-other=auto have no
advantage. OTOH, allow-other=on will not work on systems where
user_allow_other is disabled, and with allow-other=auto, we get said
error message that we would need to filter out again. Just disabling
allow-other is simplest.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Fri, 25 Jun 2021 14:23:12 +0000 (16:23 +0200)]
export/fuse: Pass default_permissions for mount
We do not do any permission checks in fuse_open(), so let the kernel do
them. We already let fuse_getattr() report the proper UNIX permissions,
so this should work the way we want.
This causes a change in 308's reference output, because now opening a
non-writable export with O_RDWR fails already, instead of only actually
attempting to write to it. (That is an improvement.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
uri_free() checks if its argument is NULL in uri_clean() and g_free().
There is no need to check the argument before the call.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Message-Id: <20210629063602.4239-1-xypron.glpk@gmx.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Fri, 2 Jul 2021 17:23:56 +0000 (19:23 +0200)]
block/rbd: drop qemu_rbd_refresh_limits
librbd supports 1 byte alignment for all aio operations.
Currently, there is no API call to query limits from the Ceph
ObjectStore backend. So drop the bdrv_refresh_limits completely
until there is such an API call.
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-7-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Fri, 2 Jul 2021 17:23:55 +0000 (19:23 +0200)]
block/rbd: add write zeroes support
This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores
BDRV_REQ_MAY_UNMAP for older librbd versions.
The rationale for this is as follows (citing Ilya Dryomov current RBD
maintainer):
---8<---
a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
and as a consequence always unmap if librbd is too old
It's not clear what qemu's expectation is but in general Write
Zeroes is allowed to unmap. The only guarantee is that subsequent
reads return zeroes, everything else is a hint. This is how it is
specified in the kernel and in the NVMe spec.
In particular, block/nvme.c implements it as follows:
This sets the Deallocate bit. But if it's not set, the device may
still deallocate:
"""
If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
command, and the namespace supports clearing all bytes to 0h in the
values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
from a deallocated logical block and its metadata (excluding
protection information), then for each specified logical block, the
controller:
- should deallocate that logical block;
...
If the Deallocate bit is cleared to '0' in a Write Zeroes command,
and the namespace supports clearing all bytes to 0h in the values
read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
a deallocated logical block and its metadata (excluding protection
information), then, for each specified logical block, the
controller:
- may deallocate that logical block;
"""
b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags
Again, it's not clear what qemu expects here, but without it we end
up in a ridiculous situation where specifying the "don't allow slow
fallback" switch immediately fails all efficient zeroing requests on
a device where Write Zeroes is always efficient:
$ qemu-io -c 'help write' | grep -- '-[zun]'
-n, -- with -z, don't allow slow fallback
-u, -- with -z, allow unmapping
-z, -- write zeroes using blk_co_pwrite_zeroes
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-6-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Fri, 2 Jul 2021 17:23:54 +0000 (19:23 +0200)]
block/rbd: migrate from aio to coroutines
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-5-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Fri, 2 Jul 2021 17:23:53 +0000 (19:23 +0200)]
block/rbd: update s->image_size in qemu_rbd_getlength
While at it just call rbd_get_size and avoid rbd_image_info_t.
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-4-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Fri, 2 Jul 2021 17:23:52 +0000 (19:23 +0200)]
block/rbd: store object_size in BDRVRBDState
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-3-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Fri, 2 Jul 2021 17:23:51 +0000 (19:23 +0200)]
block/rbd: bump librbd requirement to luminous release
Ceph Luminous (version 12.2.z) is almost 4 years old at this point.
Bump the requirement to get rid of the ifdef'ry in the code.
Qemu 6.1 dropped the support for RHEL-7 which was the last supported
OS that required an older librbd.
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-2-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Or Ozeri [Sun, 27 Jun 2021 11:46:35 +0000 (14:46 +0300)]
block/rbd: Add support for rbd image encryption
Starting from ceph Pacific, RBD has built-in support for image-level encryption.
Currently supported formats are LUKS version 1 and 2.
There are 2 new relevant librbd APIs for controlling encryption, both expect an
open image context:
rbd_encryption_format: formats an image (i.e. writes the LUKS header)
rbd_encryption_load: loads encryptor/decryptor to the image IO stack
This commit extends the qemu rbd driver API to support the above.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Message-Id: <20210627114635.39326-1-oro@il.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
spapr: Fix implementation of Open Firmware client interface
This addresses the comments from v22.
The functional changes are (the VOF ones need retesting with Pegasos2):
(VOF) setprop will start failing if the machine class callback
did not handle it;
(VOF) unit addresses are lowered in path_offset();
(SPAPR) /chosen/bootargs is initialized from kernel_cmdline if
the client did not change it.
ppc/pegasos2: Implement some RTAS functions with VOF
Linux uses RTAS functions to access PCI devices so we need to provide
these with VOF. Implement some of the most important functions to
allow booting Linux with VOF. With this the board is now usable
without a binary ROM image and we can enable it by default as other
boards.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20210708215113.B3F747456E3@zero.eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
BALATON Zoltan [Sun, 27 Jun 2021 16:27:13 +0000 (18:27 +0200)]
ppc/pegasos2: Use Virtual Open Firmware as firmware replacement
The pegasos2 board comes with an Open Firmware compliant ROM based on
SmartFirmware but it has some changes that are not open source
therefore the ROM binary cannot be included in QEMU. Guests running on
the board however depend on services provided by the firmware. The
Virtual Open Firmware recently added to QEMU implements a minimal set
of these services to allow some guests to boot without the original
firmware. This patch adds VOF as the default firmware for pegasos2
which allows booting Linux and MorphOS via -kernel option while a ROM
image can still be used with -bios for guests that don't run with VOF.
There are several new L1D cache flush bits added to the hcall which reflect
hardware security features for speculative cache access issues.
These behaviours are now being specified as negative in order to simplify
patched kernel compatibility with older firmware (a new problem found in
existing systems would automatically be vulnerable).
[dwg: Technically this changes behaviour for existing machine types.
After discussion with Nick, we've determined this is safe, because
the worst that will happen if a guest gets the wrong information due
to a migration is that it will perform some unnecessary workarounds,
but will remain correct and secure (well, as secure as it was going
to be anyway). In addition the change only affects cap-cfpc=safe
which is not enabled by default, and in fact is not possible to set
on any current hardware (though it's expected it will be possible on
POWER10)]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210615044107.1481608-1-npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
BALATON Zoltan [Sun, 27 Jun 2021 16:27:13 +0000 (18:27 +0200)]
target/ppc: Allow virtual hypervisor on CPU without HV
Change the assert in ppc_store_sdr1() to allow vhyp to be set on CPUs
without HV bit. This allows using the vhyp interface for firmware
emulation on pegasos2.
Nicholas Piggin [Tue, 6 Jul 2021 05:13:21 +0000 (15:13 +1000)]
target/ppc: mtmsrd is an illegal instruction on BookE
MSR is a 32-bit register in BookE and there is no mtmsrd instruction.
Cc: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20210706051321.609046-1-npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PAPR platform describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.
Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.
This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.
The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.
This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.
This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.
In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.
When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
This adds basic instances support which are managed by a hash map
ihandle -> [phandle].
Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..10000 - stack
400000.. - kernel 3ea0000.. - initramdisk
This OF CI does not implement "interpret".
Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.
With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.
This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.
This assumes potential support for booting from QEMU backends
such as blockdev or netdev without devices/drivers used.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
[dwg: Adjusted some includes which broke compile in some more obscure
compilation setups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Bin Meng [Tue, 6 Jul 2021 02:46:41 +0000 (10:46 +0800)]
roms/u-boot: Bump ppce500 u-boot to v2021.07 to add eTSEC support
Update the QEMU shipped u-boot.e500 image built from U-Boot mainline
v2021.07 release, which added eTSEC support to the QEMU ppce500 target,
via the following U-Boot series:
The cross-compilation toolchain used to build the U-Boot image is:
https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/10.1.0/x86_64-gcc-10.1.0-nolibc-powerpc-linux.tar.xz
Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: change ppc_hash32_xlate to use mmu_idx
Changed hash32 address translation to use the supplied mmu_idx, instead
of using what was stored in the msr, for parity purposes (radix64
already uses that) and for conceptual correctness, all the relevant
functions should always use the supplied mmu_idx, as there are no
guarantees that the mmu_idx stored in the CPU variable will not desync.
Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210706150316.21005-3-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Intrudoce a header common to all BookS MMUs, that can hold code that is
common to hash32 and book3s-v3 MMUs.
Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br>
Message-Id: <20210706150316.21005-2-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: changed ppc_hash64_xlate to use mmu_idx
Changed hash64 address translation to use the supplied mmu_idx instead
of using the one stored in the msr, for parity purposes (other book3s
MMUs already use it).
Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210628133610.1143-4-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: fix address translation bug for radix mmus
This commit attempts to fix a technical hiccup first mentioned by Richard
Henderson in
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06247.html
To sumarize the hiccup here, when radix-style mmus are translating an
address, they might need to call a second level of translation, with
hypervisor privileges. However, the way it was being done up until
this point meant that the second level translation had the same
privileges as the first level. It could lead to a bug in address
translation when running KVM inside a TCG guest, but this bug was never
experienced by users, so this isn't as much a bug fix as it is a
correctness cleanup.
This patch attempts that cleanup by making radix64_*_xlate functions
receive the mmu_idx, and passing one with the correct permission for the
second level translation.
The mmuidx macros added by this patch are only correct for non-bookE
mmus, because BookE style set the IS and DS bits inverted and there
might be other subtle differences. However, there doesn't seem to be
BookE cpus that have radix-style mmus, so we left a comment there to
document the issue, in case a machine does have that and was missed.
As part of this cleanup, we now need to send the correct mmmu_idx
when calling get_phys_page_debug, otherwise we might not be able to see the
memory that the CPU could
Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210628133610.1143-2-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: Fix compilation with DUMP_PAGE_TABLES debug option
../target/ppc/mmu_helper.c: In function 'get_segment_6xx_tlb':
../target/ppc/mmu_helper.c:514:46: error: passing argument 1 of
'ppc_hash32_hpt_mask' from incompatible pointer type [-Werror=incompatible-pointer-types]
This function is used by TCGCPUOps, and is thus TCG specific.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-10-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Create one common dispatch for all of the ppc_*_xlate functions.
Use ppc64_v3_radix to directly dispatch between ppc_radix64_xlate
and ppc_hash64_xlate.
Remove the separate *_handle_mmu_fault and *_get_phys_page_debug
functions, using common code for ppc_cpu_tlb_fill and
ppc_cpu_get_phys_page_debug.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-9-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mirror the interface of ppc_radix64_xlate (mostly), putting all
of the logic for older mmu translation into a single entry point.
For booke, we need to add mmu_idx to the xlate-style interface.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-8-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mirror the interface of ppc_radix64_xlate, putting all of
the logic for hash32 translation into a single entry point.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-7-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Mirror the interface of ppc_radix64_xlate, putting all of
the logic for hash64 translation into a single function.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-6-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: Use bool success for ppc_radix64_xlate
Instead of returning non-zero for failure, return true for success.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-5-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: Push real-mode handling into ppc_radix64_xlate
This removes some incomplete duplication between
ppc_radix64_handle_mmu_fault and ppc_radix64_get_phys_page_debug.
The former was correct wrt SPR_HRMOR and the latter was not.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-4-bruno.larsen@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
target/ppc: Use MMUAccessType with *_handle_mmu_fault
These changes were waiting until we didn't need to match
the function type of PowerPCCPUClass.handle_mmu_fault.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-3-bruno.larsen@eldorado.org.br> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead, use a switch on env->mmu_model. This avoids some
replicated information in cpu setup.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210621125115.67717-2-bruno.larsen@eldorado.org.br> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU reserves space for RTAS via /rtas/rtas-size which tells the client
how much space the RTAS requires to work which includes the RTAS binary
blob implementing RTAS runtime. Because pseries supports FWNMI which
requires plenty of space, QEMU reserves more than 2KB which is
enough for the RTAS blob as it is just 20 bytes (under QEMU).
Since FWNMI reset delivery was added, RTAS_SIZE macro is not used anymore.
This replaces RTAS_SIZE with RTAS_MIN_SIZE and uses it in
the /rtas/rtas-size calculation to account for the RTAS blob.
Fixes: 0e236d347790 ("ppc/spapr: Implement FWNMI System Reset delivery") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210622070336.1463250-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PowerPC CPUs use big endian by default but starting with POWER7,
server grade CPUs use the ILE bit of the LPCR special purpose
register to decide on the endianness to use when handling
interrupts. This gives a clue to QEMU on the endianness the
guest kernel is running, which is needed when generating an
ELF dump of the guest or when delivering an FWNMI machine
check interrupt.
Commit 382d2db62bcb ("target-ppc: Introduce callback for interrupt
endianness") added a class method to PowerPCCPUClass to modelize
this : default implementation returns a fixed "big endian" value,
while POWER7 and newer do the LPCR_ILE check. This is suboptimal
as it forces to implement the method for every new CPU family, and
it is very unlikely that this will ever be different than what we
have today.
We basically only have three cases to consider:
a) CPU doesn't have an LPCR => big endian
b) CPU has an LPCR but doesn't support the ILE bit => big endian
c) CPU has an LPCR and supports the ILE bit => little or big endian
Instead of class methods, introduce an inline helper that checks the
ILE bit in the LPCR_MASK to decide on the outcome. The new helper
words little endian instead of big endian. This allows to drop a !
operator in ppc_cpu_do_fwnmi_machine_check().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210622140926.677618-2-groug@kaod.org> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* remotes/stefanha-gitlab/tags/block-pull-request:
block/io: Merge discard request alignments
block: Add backend_defaults property
block/file-posix: Optimize for macOS
util/async: print leaked BH name when AioContext finalizes
util/async: add a human-readable name to BHs for debugging
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus
We support coordinated discarding of RAM using the RamDiscardManager for
the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards,
keeping uncoordinated discards (e.g., via virtio-balloon) disabled if
possible.
This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://"
by the block layer has to be implemented/unlocked separately. For now,
virtio-mem only supports x86-64; we don't restrict RamDiscardManager to
x86-64, though: arm64 and s390x are supposed to work as well, and we'll
test once unlocking virtio-mem support. The spapr IOMMUs will need special
care, to be tackled later, e.g.., once supporting virtio-mem.
Note: The block size of a virtio-mem device has to be set to sane sizes,
depending on the maximum hotplug size - to not run out of vfio mappings.
The default virtio-mem block size is usually in the range of a couple of
MBs. The maximum number of mapping is 64k, shared with other users.
Assume you want to hotplug 256GB using virtio-mem - the block size would
have to be set to at least 8 MiB (resulting in 32768 separate mappings).
Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-14-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We implement the RamDiscardManager interface and only require coordinated
discarding of RAM to work.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-13-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types
We want to separate the two cases whereby we discard ram
- uncoordinated: e.g., virito-balloon
- coordinated: e.g., virtio-mem coordinated via the RamDiscardManager
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-12-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
We have users in migration context that don't hold the BQL (when
finishing migration). To prepare for further changes, use a dedicated mutex
instead of atomic operations. Keep using qatomic_read ("READ_ONCE") for the
functions that only extract the current state (e.g., used by
virtio-balloon), locking isn't necessary.
While at it, split up the counter into two variables to make it easier
to understand.
Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-11-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
vfio: Support for RamDiscardManager in the vIOMMU case
vIOMMU support works already with RamDiscardManager as long as guests only
map populated memory. Both, populated and discarded memory is mapped
into &address_space_memory, where vfio_get_xlat_addr() will find that
memory, to create the vfio mapping.
Sane guests will never map discarded memory (e.g., unplugged memory
blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while
memory is getting discarded. However, there are two cases where a malicious
guests could trigger pinning of more memory than intended.
One case is easy to handle: the guest trying to map discarded memory
into an IOMMU.
The other case is harder to handle: the guest keeping memory mapped in
the IOMMU while it is getting discarded. We would have to walk over all
mappings when discarding memory and identify if any mapping would be a
violation. Let's keep it simple for now and print a warning, indicating
that setting RLIMIT_MEMLOCK can mitigate such attacks.
We have to take care of incoming migration: at the point the
IOMMUs get restored and start creating mappings in vfio, RamDiscardManager
implementations might not be back up and running yet: let's add runstate
priorities to enforce the order when restoring.
Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-10-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
vfio: Sanity check maximum number of DMA mappings with RamDiscardManager
Although RamDiscardManager can handle running into the maximum number of
DMA mappings by propagating errors when creating a DMA mapping, we want
to sanity check and warn the user early that there is a theoretical setup
issue and that virtio-mem might not be able to provide as much memory
towards a VM as desired.
As suggested by Alex, let's use the number of KVM memory slots to guess
how many other mappings we might see over time.
Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-9-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
vfio: Query and store the maximum number of possible DMA mappings
Let's query the maximum number of possible DMA mappings by querying the
available mappings when creating the container (before any mappings are
created). We'll use this informaton soon to perform some sanity checks
and warn the user.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-8-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
vfio: Support for RamDiscardManager in the !vIOMMU case
Implement support for RamDiscardManager, to prepare for virtio-mem
support. Instead of mapping the whole memory section, we only map
"populated" parts and update the mapping when notified about
discarding/population of memory via the RamDiscardListener. Similarly, when
syncing the dirty bitmaps, sync only the actually mapped (populated) parts
by replaying via the notifier.
Using virtio-mem with vfio is still blocked via
ram_block_discard_disable()/ram_block_discard_require() after this patch.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-7-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Let's properly notify when (un)plugging blocks, after discarding memory
and before allowing the guest to consume memory. Handle errors from
notifiers gracefully (e.g., no remaining VFIO mappings) when plugging,
rolling back the change and telling the guest that the VM is busy.
One special case to take care of is replaying all notifications after
restoring the vmstate. The device starts out with all memory discarded,
so after loading the vmstate, we have to notify about all plugged
blocks.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-6-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
virtio-mem: Don't report errors when ram_block_discard_range() fails
Any errors are unexpected and ram_block_discard_range() already properly
prints errors. Let's stop manually reporting errors.
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-5-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
virtio-mem: Factor out traversing unplugged ranges
Let's factor out the core logic, no need to replicate.
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-4-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>