Linus Torvalds [Tue, 19 Mar 2024 18:38:27 +0000 (11:38 -0700)]
Merge tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Various virtual vs physical address usage fixes
- Add new bitwise types and helper functions and use them in s390
specific drivers and code to make it easier to find virtual vs
physical address usage bugs.
Right now virtual and physical addresses are identical for s390,
except for module, vmalloc, and similar areas. This will be changed,
hopefully with the next merge window, so that e.g. the kernel image
and modules will be located close to each other, allowing for direct
branches and also for some other simplifications.
As a prerequisite this requires to fix all misuses of virtual and
physical addresses. As it turned out people are so used to the
concept that virtual and physical addresses are the same, that new
bugs got added to code which was already fixed. In order to avoid
that even more code gets merged which adds such bugs add and use new
bitwise types, so that sparse can be used to find such usage bugs.
Most likely the new types can go away again after some time
- Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation
- Fix kprobe branch handling: if an out-of-line single stepped relative
branch instruction has a target address within a certain address area
in the entry code, the program check handler may incorrectly execute
cleanup code as if KVM code was executed, leading to crashes
- Fix reference counting of zcrypt card objects
- Various other small fixes and cleanups
* tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
s390/entry: compare gmap asce to determine guest/host fault
s390/entry: remove OUTSIDE macro
s390/entry: add CIF_SIE flag and remove sie64a() address check
s390/cio: use while (i--) pattern to clean up
s390/raw3270: make class3270 constant
s390/raw3270: improve raw3270_init() readability
s390/tape: make tape_class constant
s390/vmlogrdr: make vmlogrdr_class constant
s390/vmur: make vmur_class constant
s390/zcrypt: make zcrypt_class constant
s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL support
s390/vfio_ccw_cp: use new address translation helpers
s390/iucv: use new address translation helpers
s390/ctcm: use new address translation helpers
s390/lcs: use new address translation helpers
s390/qeth: use new address translation helpers
s390/zfcp: use new address translation helpers
s390/tape: fix virtual vs physical address confusion
s390/3270: use new address translation helpers
s390/3215: use new address translation helpers
...
tracing: Just use strcmp() for testing __string() and __assign_str() match
As __assign_str() no longer uses its "src" parameter, there's a check to
make sure nothing depends on it being different than what was passed to
__string(). It originally just compared the pointer passed to __string()
with the pointer passed into __assign_str() via the "src" parameter. But
there's a couple of outliers that just pass in a quoted string constant,
where comparing the pointers is UB to the compiler, as the compiler is
free to create multiple copies of the same string constant.
Instead, just use strcmp(). It may slow down the trace event, but this
will eventually be removed.
Also, fix the issue of passing NULL to strcmp() by adding a WARN_ON() to
make sure that both "src" and the pointer saved in __string() are either
both NULL or have content, and then checking if "src" is not NULL before
performing the strcmp().
Linus Torvalds [Tue, 19 Mar 2024 18:19:36 +0000 (11:19 -0700)]
Merge tag 'pm-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update the Energy Model to make it prevent errors due to power
unit mismatches, fix a typo in power management documentation, convert
one driver to using a platform remove callback returning void, address
two cpufreq issues (one in the core and one in the DT driver), and
enable boost support in the SCMI cpufreq driver.
Specifics:
- Modify the Energy Model code to bail out and complain if the unit
of power is not uW to prevent errors due to unit mismatches (Lukasz
Luba)
- Make the intel_rapl platform driver use a remove callback returning
void (Uwe Kleine-König)
- Fix typo in the suspend and interrupts document (Saravana Kannan)
- Make per-policy boost flags actually take effect on platforms using
cpufreq_boost_set_sw() (Sibi Sankar)
- Enable boost support in the SCMI cpufreq driver (Sibi Sankar)
- Make the DT cpufreq driver use zalloc_cpumask_var() for allocating
cpumasks to avoid using unitinialized memory (Marek Szyprowski)"
* tag 'pm-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: scmi: Enable boost support
firmware: arm_scmi: Add support for marking certain frequencies as turbo
cpufreq: dt: always allocate zeroed cpumask
cpufreq: Fix per-policy boost behavior on SoCs using cpufreq_boost_set_sw()
Documentation: power: Fix typo in suspend and interrupts doc
PM: EM: Force device drivers to provide power in uW
powercap: intel_rapl: Convert to platform remove callback returning void
Linus Torvalds [Tue, 19 Mar 2024 18:15:14 +0000 (11:15 -0700)]
Merge tag 'acpi-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These update ACPI documentation and kerneldoc comments.
Specifics:
- Add markup to generate links from footnotes in the ACPI enumeration
document (Chris Packham)
- Update the handle_eject_request() kerneldoc comment to document the
arguments of the function and improve kerneldoc comments for ACPI
suspend and hibernation functions (Yang Li)"
* tag 'acpi-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Improve kerneldoc comments for suspend and hibernation functions
ACPI: docs: enumeration: Make footnotes links
ACPI: Document handle_eject_request() arguments
Linus Torvalds [Tue, 19 Mar 2024 18:11:01 +0000 (11:11 -0700)]
Merge tag 'thermal-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These update thermal drivers for ARM platforms by adding new hardware
support (r8a779h0, H616 THS), addressing issues (Mediatek LVTS,
Mediatek MT7896, thermal-of) and cleaning up code.
Specifics:
- Fix memory leak in the error path at probe time in the Mediatek
LVTS driver (Christophe Jaillet)
- Fix control buffer enablement regression on Meditek MT7896 (Frank
Wunderlich)
- Drop spaces before TABs in different places: thermal-of, ST drivers
and Makefile (Geert Uytterhoeven)
- Adjust DT binding for NXP as fsl,tmu-range min/maxItems can vary
among several SoC versions (Fabio Estevam)
- Add support for the H616 THS controller on Sun8i platforms (Martin
Botka)
- Don't fail probe due to zone registration failure because there is
no trip points defined in the DT (Mark Brown)
- Support variable TMU array size for new platforms (Peng Fan)
- Adjust the DT binding for thermal-of and make the polling time not
required and assume it is zero when not found in the DT (Konrad
Dybcio)
- Add r8a779h0 support in both the DT and the rcar_gen3 driver (Geert
Uytterhoeven)"
* tag 'thermal-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/drivers/rcar_gen3: Add support for R-Car V4M
dt-bindings: thermal: rcar-gen3-thermal: Add r8a779h0 support
thermal/of: Assume polling-delay(-passive) 0 when absent
dt-bindings: thermal-zones: Don't require polling-delay(-passive)
thermal/drivers/qoriq: Fix getting tmu range
thermal/drivers/sun8i: Don't fail probe due to zone registration failure
thermal/drivers/sun8i: Add support for H616 THS controller
thermal/drivers/sun8i: Add SRAM register access code
thermal/drivers/sun8i: Extend H6 calibration to support 4 sensors
thermal/drivers/sun8i: Explain unknown H6 register value
dt-bindings: thermal: sun8i: Add H616 THS controller
soc: sunxi: sram: export register 0 for THS on H616
dt-bindings: thermal: qoriq-thermal: Adjust fsl,tmu-range min/maxItems
thermal: Drop spaces before TABs
thermal/drivers/mediatek: Fix control buffer enablement on MT7896
thermal/drivers/mediatek/lvts_thermal: Fix a memory leak in an error handling path
Linus Torvalds [Tue, 19 Mar 2024 18:05:34 +0000 (11:05 -0700)]
Merge tag 'ata-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
"A single fix for ASMedia HBAs.
These HBAs do not indicate that they support SATA Port Multipliers
CAP.SPM (Supports Port Multiplier) is not set.
Likewise, they do not allow you to probe the devices behind an
attached PMP, as defined according to the SATA-IO PMP specification.
Instead, they have decided to implement their own version of PMP,
and because of this, plugging in a PMP actually works, even if the
HBA claims that it does not support PMP.
Revert a recent quirk for these HBAs, as that breaks ASMedia's own
implementation of PMP.
Unfortunately, this will once again give some users of these HBAs
significantly increased boot time. However, a longer boot time for
some, is the lesser evil compared to some other users not being able
to detect their drives at all"
* tag 'ata-6.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ahci: asm1064: asm1166: don't limit reported ports
Linus Torvalds [Tue, 19 Mar 2024 15:57:39 +0000 (08:57 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- Per vq sizes in vdpa
- Info query for block devices support in vdpa
- DMA sync callbacks in vduse
- Fixes, cleanups
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (35 commits)
virtio_net: rename free_old_xmit_skbs to free_old_xmit
virtio_net: unify the code for recycling the xmit ptr
virtio-net: add cond_resched() to the command waiting loop
virtio-net: convert rx mode setting to use workqueue
virtio: packed: fix unmap leak for indirect desc table
vDPA: report virtio-blk flush info to user space
vDPA: report virtio-block read-only info to user space
vDPA: report virtio-block write zeroes configuration to user space
vDPA: report virtio-block discarding configuration to user space
vDPA: report virtio-block topology info to user space
vDPA: report virtio-block MQ info to user space
vDPA: report virtio-block max segments in a request to user space
vDPA: report virtio-block block-size to user space
vDPA: report virtio-block max segment size to user space
vDPA: report virtio-block capacity to user space
virtio: make virtio_bus const
vdpa: make vdpa_bus const
vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min
vDPA/ifcvf: get_max_vq_size to return max size
virtio_vdpa: create vqs with the actual size
...
Linus Torvalds [Tue, 19 Mar 2024 15:48:09 +0000 (08:48 -0700)]
Merge tag 'for-linus-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- Xen event channel handling fix for a regression with a rare kernel
config and some added hardening
- better support of running Xen dom0 in PVH mode
- a cleanup for the xen grant-dma-iommu driver
* tag 'for-linus-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: increment refcnt only if event channel is refcounted
xen/evtchn: avoid WARN() when unbinding an event channel
x86/xen: attempt to inflate the memory balloon on PVH
xen/grant-dma-iommu: Convert to platform remove callback returning void
Previously, patches have been added to limit the reported count of SATA
ports for asm1064 and asm1166 SATA controllers, as those controllers do
report more ports than physically having.
While it is allowed to report more ports than physically having in CAP.NP,
it is not allowed to report more ports than physically having in the PI
(Ports Implemented) register, which is what these HBAs do.
(This is a AHCI spec violation.)
Unfortunately, it seems that the PMP implementation in these ASMedia HBAs
is also violating the AHCI and SATA-IO PMP specification.
What these HBAs do is that they do not report that they support PMP
(CAP.SPM (Supports Port Multiplier) is not set).
Instead, they have decided to add extra "virtual" ports in the PI register
that is used if a port multiplier is connected to any of the physical
ports of the HBA.
Enumerating the devices behind the PMP as specified in the AHCI and
SATA-IO specifications, by using PMP READ and PMP WRITE commands to the
physical ports of the HBA is not possible, you have to use the "virtual"
ports.
This is of course bad, because this gives us no way to detect the device
and vendor ID of the PMP actually connected to the HBA, which means that
we can not apply the proper PMP quirks for the PMP that is connected to
the HBA.
Limiting the port map will thus stop these controllers from working with
SATA Port Multipliers.
This patch reverts both patches for asm1064 and asm1166, so old behavior
is restored and SATA PMP will work again, but it will also reintroduce the
(minutes long) extra boot time for the ASMedia controllers that do not
have a PMP connected (either on the PCIe card itself, or an external PMP).
However, a longer boot time for some, is the lesser evil compared to some
other users not being able to detect their drives at all.
Fixes: 0077a504e1a4 ("ahci: asm1166: correct count of reported ports") Fixes: 9815e3961754 ("ahci: asm1064: correct count of reported ports") Cc: stable@vger.kernel.org Reported-by: Matt <cryptearth@googlemail.com> Signed-off-by: Conrad Kostecki <conikost@gentoo.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
[cassel: rewrote commit message] Signed-off-by: Niklas Cassel <cassel@kernel.org>
Xuan Zhuo [Thu, 29 Feb 2024 07:20:43 +0000 (15:20 +0800)]
virtio_net: rename free_old_xmit_skbs to free_old_xmit
Since free_old_xmit_skbs not only deals with skb, but also xdp frame and
subsequent added xsk, so change the name of this function to
free_old_xmit.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20240229072044.77388-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Thu, 29 Feb 2024 07:20:42 +0000 (15:20 +0800)]
virtio_net: unify the code for recycling the xmit ptr
There are two completely similar and independent implementations. This
is inconvenient for the subsequent addition of new types. So extract a
function from this piece of code and call this function uniformly to
recover old xmit ptr.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20240229072044.77388-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jason Wang [Thu, 20 Jul 2023 08:38:39 +0000 (04:38 -0400)]
virtio-net: add cond_resched() to the command waiting loop
Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed on a device whose CVQ might be slow.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230720083839.481487-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Jason Wang [Thu, 20 Jul 2023 08:38:38 +0000 (04:38 -0400)]
virtio-net: convert rx mode setting to use workqueue
This patch convert rx mode setting to be done in a workqueue, this is
a must for allow to sleep when waiting for the cvq command to
response since current code is executed under addr spin lock.
Note that we need to disable and flush the workqueue during freeze,
this means the rx mode setting is lost after resuming. This is not the
bug of this patch as we never try to restore rx mode setting during
resume.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230720083839.481487-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Xuan Zhuo [Fri, 23 Feb 2024 07:18:33 +0000 (15:18 +0800)]
virtio: packed: fix unmap leak for indirect desc table
When use_dma_api and premapped are true, then the do_unmap is false.
Because the do_unmap is false, vring_unmap_extra_packed is not called by
detach_buf_packed.
if (unlikely(vq->do_unmap)) {
curr = id;
for (i = 0; i < state->num; i++) {
vring_unmap_extra_packed(vq,
&vq->packed.desc_extra[curr]);
curr = vq->packed.desc_extra[curr].next;
}
}
So the indirect desc table is not unmapped. This causes the unmap leak.
So here, we check vq->use_dma_api instead. Synchronously, dma info is
updated based on use_dma_api judgment
This bug does not occur, because no driver use the premapped with
indirect.
Fixes: b319940f83c2 ("virtio_ring: skip unmap for premapped") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20240223071833.26095-1-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zhu Lingshan [Sun, 18 Feb 2024 18:56:04 +0000 (02:56 +0800)]
vDPA: report virtio-block write zeroes configuration to user space
This commits reports write zeroes configuration of
virtio-block devices to user space, includes:
1)maximum write zeroes sectors size
2)maximum write zeroes segment number
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zhu Lingshan [Sun, 18 Feb 2024 18:56:03 +0000 (02:56 +0800)]
vDPA: report virtio-block discarding configuration to user space
This commit reports virtio-blk discarding configuration
to user space,includes:
1) the maximum discard sectors
2) maximum number of discard segments for the block driver to use
3) the alignment for splitting a discarding request
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zhu Lingshan [Sun, 18 Feb 2024 18:56:02 +0000 (02:56 +0800)]
vDPA: report virtio-block topology info to user space
This commit allows vDPA reporting topology information of
virtio-blk devices to user space, includes:
1) the number of logical blocks per physical block
2) offset of first aligned logical block
3) suggested minimum I/O size in blocks
4) optimal (suggested maximum) I/O size in blocks
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240218185606.13509-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now that the driver core can properly handle constant struct bus_type,
move the virtio_bus variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Message-Id: <20240204-bus_cleanup-virtio-v1-1-3bcb2212aaa0@marliere.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jason Wang <jasowang@redhat.com>
Now that the driver core can properly handle constant struct bus_type,
move the vdpa_bus variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Message-Id: <20240204-bus_cleanup-vdpa-v1-1-1745eccb0a5c@marliere.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IFCVF HW supports operation with vq size less than the max size,
as the spec required.
This commit implements vdpa_config_ops.get_vq_num_min to report
the minimal size of the virtqueues, which gives vDPA framework
a chance to reduce the vring size.
We need at least one descriptor to be functional, but it is better
no less than 64 to meet ceratin performance requirements.
Actually the framework would allocate at least a PAGE for the vq.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zhu Lingshan [Fri, 2 Feb 2024 16:39:04 +0000 (00:39 +0800)]
vDPA/ifcvf: get_max_vq_size to return max size
Since we already implemented vdpa_config_ops.get_vq_size,
so get_max_vq_size can return the acutal max size of the
virtqueues other than the max allowed safe size.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zhu Lingshan [Fri, 2 Feb 2024 16:39:03 +0000 (00:39 +0800)]
virtio_vdpa: create vqs with the actual size
The size of a virtqueue is a per vq configuration,
this commit allows virtio_vdpa to create
virtqueues with the actual size of a specific
vq size that supported by the backend device.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20240202163905.8834-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Shannon Nelson [Tue, 20 Feb 2024 01:10:50 +0000 (17:10 -0800)]
vdpa/pds: fixes for VF vdpa flr-aer handling
This addresses a couple of things found while testing the FLR and AER
handling with the VFs.
- release irqs before calling vp_modern_remove()
- make sure we have a valid struct pointer before using it to release irqs
- make sure the FW is alive before trying to add a new device
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20240220011050.30913-1-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Maxime Coquelin [Mon, 19 Feb 2024 17:06:06 +0000 (18:06 +0100)]
vduse: implement DMA sync callbacks
Since commit 295525e29a5b ("virtio_net: merge dma
operations when filling mergeable buffers"), VDUSE device
require support for DMA's .sync_single_for_cpu() operation
as the memory is non-coherent between the device and CPU
because of the use of a bounce buffer.
This patch implements both .sync_single_for_cpu() and
.sync_single_for_device() callbacks, and also skip bounce
buffer copies during DMA map and unmap operations if the
DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra
copies of the same buffer.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20240219170606.587290-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jonah Palmer [Fri, 16 Feb 2024 14:25:02 +0000 (09:25 -0500)]
vdpa/mlx5: Allow CVQ size changes
The MLX driver was not updating its control virtqueue size at set_vq_num
and instead always initialized to MLX5_CVQ_MAX_ENT (16) at
setup_cvq_vring.
Qemu would try to set the size to 64 by default, however, because the
CVQ size always was initialized to 16, an error would be thrown when
sending >16 control messages (as used-ring entry 17 is initialized to 0).
For example, starting a guest with x-svq=on and then executing the
following command would produce the error below:
# for i in {1..20}; do ifconfig eth0 hw ether XX:xx:XX:xx:XX:XX; done
qemu-system-x86_64: Insufficient written data (0)
[ 435.331223] virtio_net virtio0: Failed to set mac address by vq command.
SIOCSIFHWADDR: Invalid argument
Acked-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240216142502.78095-1-jonah.palmer@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting")
Steve Sistare [Tue, 13 Feb 2024 14:25:58 +0000 (06:25 -0800)]
vdpa: skip suspend/resume ops if not DRIVER_OK
If a vdpa device is not in state DRIVER_OK, then there is no driver state
to preserve, so no need to call the suspend and resume driver ops.
Suggested-by: Eugenio Perez Martin <eperezma@redhat.com>" Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Message-Id: <1707834358-165470-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Currently, we don't reenable the config if freezing the device failed.
For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.
Let's fix this by re-enabling the config if freezing fails.
Fixes: 22b7050a024d ("virtio: defer config changed notifications") Cc: <stable@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240213135425.795001-1-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Steve Sistare [Fri, 9 Feb 2024 22:30:07 +0000 (14:30 -0800)]
vdpa_sim: reset must not run
vdpasim_do_reset sets running to true, which is wrong, as it allows
vdpasim_kick_vq to post work requests before the device has been
configured. To fix, do not set running until VIRTIO_CONFIG_S_DRIVER_OK
is set.
Fixes: 0c89e2a3a9d0 ("vdpa_sim: Implement suspend vdpa op") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1707517807-137331-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Suzuki K Poulose [Thu, 25 Jan 2024 23:20:39 +0000 (23:20 +0000)]
virtio: uapi: Drop __packed attribute in linux/virtio_pci.h
Commit 92792ac752aa ("virtio-pci: Introduce admin command sending function")
added "__packed" structures to UAPI header linux/virtio_pci.h. This triggers
build failures in the consumer userspace applications without proper "definition"
of __packed (e.g., kvmtool build fails).
Moreover, the structures are already packed well, and doesn't need explicit
packing, similar to the rest of the structures in all virtio_* headers. Remove
the __packed attribute.
Fixes: 92792ac752aa ("virtio-pci: Introduce admin command sending function") Cc: Feng Liu <feliu@nvidia.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Yishai Hadas <yishaih@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Message-Id: <20240125232039.913606-1-suzuki.poulose@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost: Added pad cleanup if vnet_hdr is not present.
When the Qemu launched with vhost but without tap vnet_hdr,
vhost tries to copy vnet_hdr from socket iter with size 0
to the page that may contain some trash.
That trash can be interpreted as unpredictable values for
vnet_hdr.
That leads to dropping some packets and in some cases to
stalling vhost routine when the vhost_net tries to process
packets and fails in a loop.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20240115194840.1183077-1-andrew@daynix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Mon, 18 Mar 2024 22:39:48 +0000 (15:39 -0700)]
Merge tag 'dlm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- Fix mistaken variable assignment that caused a refcounting problem
- Revert a recent change that began using atomic counters where they
were not needed (for lkb wait_count)
- Add comments around forced state reset for waiting lock operations
during recovery
* tag 'dlm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: add comments about forced waiters reset
dlm: revert atomic_t lkb_wait_count
dlm: fix user space lkb refcounting
Linus Torvalds [Mon, 18 Mar 2024 22:34:03 +0000 (15:34 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Very small update this cycle:
- Minor code improvements in fi, rxe, ipoib, mana, cxgb4, mlx5,
irdma, rxe, rtrs, mana
- Simplify the hns hem mechanism
- Fix EFA's MSI-X allocation in resource constrained configurations
- Fix a KASN splat in srpt
- Narrow hns's congestion control selection to QPs granularity and
allow userspace to select it
- Solve a parallel module loading race between the CM module and a
driver module
- Flexible array cleanup
- Dump hns's SCC Conext to 'rdma res' for debugging
- Make mana build page lists for HW objects that require a 0 offset
correctly
- Stuck CM ID debugging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (29 commits)
RDMA/cm: add timeout to cm_destroy_id wait
RDMA/mana_ib: Use virtual address in dma regions for MRs
RDMA/mana_ib: Fix bug in creation of dma regions
RDMA/hns: Append SCC context to the raw dump of QPC
RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings
RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity
RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store()
RDMA/uverbs: Remove flexible arrays from struct *_filter
RDMA/device: Fix a race between mad_client and cm_client init
RDMA/hns: Fix mis-modifying default congestion control algorithm
RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user
RDMA/srpt: Do not register event handler until srpt device is fully setup
RDMA/irdma: Remove duplicate assignment
RDMA/efa: Limit EQs to available MSI-X vectors
RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype
RDMA/cxgb4: Delete unused c4iw_ep_redirect prototype
RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function
RDMA/mana_ib: Introduce mana_ib_get_netdev helper function
RDMA/mana_ib: Introduce mdev_to_gc helper function
RDMA/hns: Simplify 'struct hns_roce_hem' allocation
...
Linus Torvalds [Mon, 18 Mar 2024 22:27:03 +0000 (15:27 -0700)]
Merge tag 'ktest-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
- Allow variables to contain variables. This makes the shell commands
have a bit more flexibility to reuse existing variables.
- Have make_warnings_file in build-only mode require limited variables
The make_warnings_file test will create a file with all existing
warnings (which can be used to compare against in builds with new
commits). Add it to the build-only list that doesn't require other
variables (like how to reset a machine), as the make_warnings_file
makes the most sense on build only tests.
* tag 'ktest-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: force $buildonly = 1 for 'make_warnings_file' test type
ktest.pl: Process variables within variables
Linus Torvalds [Mon, 18 Mar 2024 22:11:44 +0000 (15:11 -0700)]
Merge tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"Main user visible change:
- User events can now have "multi formats"
The current user events have a single format. If another event is
created with a different format, it will fail to be created. That
is, once an event name is used, it cannot be used again with a
different format. This can cause issues if a library is using an
event and updates its format. An application using the older format
will prevent an application using the new library from registering
its event.
A task could also DOS another application if it knows the event
names, and it creates events with different formats.
The multi-format event is in a different name space from the single
format. Both the event name and its format are the unique
identifier. This will allow two different applications to use the
same user event name but with different payloads.
- Added support to have ftrace_dump_on_oops dump out instances and
not just the main top level tracing buffer.
Other changes:
- Add eventfs_root_inode
Only the root inode has a dentry that is static (never goes away)
and stores it upon creation. There's no reason that the thousands
of other eventfs inodes should have a pointer that never gets set
in its descriptor. Create a eventfs_root_inode desciptor that has a
eventfs_inode descriptor and a dentry pointer, and only the root
inode will use this.
- Added WARN_ON()s in eventfs
There's some conditionals remaining in eventfs that should never be
hit, but instead of removing them, add WARN_ON() around them to
make sure that they are never hit.
- Have saved_cmdlines allocation also include the map_cmdline_to_pid
array
The saved_cmdlines structure allocates a large amount of data to
hold its mappings. Within it, it has three arrays. Two are already
apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
can be saved by also including the map_cmdline_to_pid[] array as
well.
- Restructure __string() and __assign_str() macros used in
TRACE_EVENT()
Dynamic strings in TRACE_EVENT() are declared with:
__string(name, source)
And assigned with:
__assign_str(name, source)
In the tracepoint callback of the event, the __string() is used to
get the size needed to allocate on the ring buffer and
__assign_str() is used to copy the string into the ring buffer.
There's a helper structure that is created in the TRACE_EVENT()
macro logic that will hold the string length and its position in
the ring buffer which is created by __string().
There are several trace events that have a function to create the
string to save. This function is executed twice. Once for
__string() and again for __assign_str(). There's no reason for
this. The helper structure could also save the string it used in
__string() and simply copy that into __assign_str() (it also
already has its length).
By using the structure to store the source string for the
assignment, it means that the second argument to __assign_str() is
no longer needed.
It will be removed in the next merge window, but for now add a
warning if the source string given to __string() is different than
the source string given to __assign_str(), as the source to
__assign_str() isn't even used and will be going away.
- Added checks to make sure that the source of __string() is also the
source of __assign_str() so that it can be safely removed in the
next merge window.
Included fixes that the above check found.
- Other minor clean ups and fixes"
* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
tracing: Add __string_src() helper to help compilers not to get confused
tracing: Use strcmp() in __assign_str() WARN_ON() check
tracepoints: Use WARN() and not WARN_ON() for warnings
tracing: Use div64_u64() instead of do_div()
tracing: Support to dump instance traces by ftrace_dump_on_oops
tracing: Remove second parameter to __assign_rel_str()
tracing: Add warning if string in __assign_str() does not match __string()
tracing: Add __string_len() example
tracing: Remove __assign_str_len()
ftrace: Fix most kernel-doc warnings
tracing: Decrement the snapshot if the snapshot trigger fails to register
tracing: Fix snapshot counter going between two tracers that use it
tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
tracing: Use ? : shortcut in trace macros
tracing: Do not calculate strlen() twice for __string() fields
tracing: Rework __assign_str() and __string() to not duplicate getting the string
cxl/trace: Properly initialize cxl_poison region name
net: hns3: tracing: fix hclgevf trace event strings
drm/i915: Add missing ; to __assign_str() macros in tracepoint code
NFSD: Fix nfsd_clid_class use of __string_len() macro
...
Linus Torvalds [Mon, 18 Mar 2024 21:59:13 +0000 (14:59 -0700)]
Merge tag 'sysctl-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
"No functional changes - additional testing is required for the rest of
the pending changes.
- New shared repo for sysctl maintenance
- check-sysctl-docs adjustment for API changes by Thomas Weißschuh"
* tag 'sysctl-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
scripts: check-sysctl-docs: handle per-namespace sysctls
ipc: remove linebreaks from arguments of __register_sysctl_table
scripts: check-sysctl-docs: adapt to new API
MAINTAINERS: Update sysctl tree location
Linus Torvalds [Mon, 18 Mar 2024 19:15:19 +0000 (12:15 -0700)]
Merge tag 'for-linus-6.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"One fix, one cleanup...
Fix: Julia Lawall pointed out a null pointer dereference.
Cleanup: Vlastimil Babka sent me a patch to remove some SLAB related
code"
* tag 'for-linus-6.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
Julia Lawall reported this null pointer dereference, this should fix it.
fs/orangefs: remove ORANGEFS_CACHE_CREATE_FLAGS
Linus Torvalds [Mon, 18 Mar 2024 18:26:00 +0000 (11:26 -0700)]
Merge tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs update from Jaegeuk Kim:
"In this round, there are a number of updates on mainly two areas:
Zoned block device support and Per-file compression. For example,
we've found several issues to support Zoned block device especially
having large sections regarding to GC and file pinning used for
Android devices. In compression side, we've fixed many corner race
conditions that had broken the design assumption.
Enhancements:
- Support file pinning for Zoned block device having large section
- Enhance the data recovery after sudden power cut on Zoned block
device
- Add more error injection cases to easily detect the kernel panics
- add a proc entry show the entire disk layout
- Improve various error paths paniced by BUG_ON in block allocation
and GC
- support SEEK_DATA and SEEK_HOLE for compression files
Bug fixes:
- avoid use-after-free issue in f2fs_filemap_fault
- fix some race conditions to break the atomic write design
assumption
- fix to truncate meta inode pages forcely
- resolve various per-file compression issues wrt the space
management and compression policies
- fix some swap-related bugs
In addition, we removed deprecated codes such as io_bits and
heap_allocation, and also fixed minor error handling routines with
neat debugging messages"
* tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
f2fs: fix to avoid use-after-free issue in f2fs_filemap_fault
f2fs: truncate page cache before clearing flags when aborting atomic write
f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag
f2fs: prevent atomic write on pinned file
f2fs: fix to handle error paths of {new,change}_curseg()
f2fs: unify the error handling of f2fs_is_valid_blkaddr
f2fs: zone: fix to remove pow2 check condition for zoned block device
f2fs: fix to truncate meta inode pages forcely
f2fs: compress: fix reserve_cblocks counting error when out of space
f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocks
f2fs: add a proc entry show disk layout
f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup
f2fs: fix to check return value of f2fs_gc_range
f2fs: fix to check return value __allocate_new_segment
f2fs: fix to do sanity check in update_sit_entry
f2fs: fix to reset fields for unloaded curseg
f2fs: clean up new_curseg()
f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate()
f2fs: fix blkofs_end correctly in f2fs_migrate_blocks()
f2fs: ro: don't start discard thread for readonly image
...
Linus Torvalds [Mon, 18 Mar 2024 18:15:58 +0000 (11:15 -0700)]
Merge tag 'ovl-fixes-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
"Only minor fixes:
- Fix uncalled for WARN_ON from v6.8-rc1
- Fix the overlayfs MAINTAINERS entry"
* tag 'ovl-fixes-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: relax WARN_ON in ovl_verify_area()
MAINTAINERS: update overlayfs git tree
Linus Torvalds [Mon, 18 Mar 2024 16:15:50 +0000 (09:15 -0700)]
Merge tag 'vfs-6.9-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a few small fixes for this merge window:
- Undo the hiding of silly-rename files in afs. If they're hidden
they can't be deleted by rm manually anymore causing regressions
- Avoid caching the preferred address for an afs server to avoid
accidently overriding an explicitly specified preferred server
address
- Fix bad stat() and rmdir() interaction in afs
- Take a passive reference on the superblock when opening a block
device so the holder is available to concurrent callers from the
block layer
- Clear private data pointer in fscache_begin_operation() to avoid it
being falsely treated as valid"
* tag 'vfs-6.9-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fscache: Fix error handling in fscache_begin_operation()
fs,block: get holder during claim
afs: Fix occasional rmdir-then-VNOVNODE with generic/011
afs: Don't cache preferred address
afs: Revert "afs: Hide silly-rename files from userspace"
Linus Torvalds [Mon, 18 Mar 2024 16:05:37 +0000 (09:05 -0700)]
Merge tag 'sound-fix-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Two regression fixes that had been introduced in this merge window,
additional HD-audio quirks, and a further enhancement for the new
kunit"
* tag 'sound-fix-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: core: add kunitconfig
ALSA: hda/realtek: add in quirk for Acer Swift Go 16 - SFG16-71
Revert "ALSA: usb-audio: Name feature ctl using output if input is PCM"
ALSA: timer: Fix missing irq-disable at closing
ALSA: hda/realtek: Add quirk for Lenovo Yoga 9 14IMH9
tracing: Add __string_src() helper to help compilers not to get confused
The __string() helper macro of the TRACE_EVENT() macro is used to
determine how much of the ring buffer needs to be allocated to fit the
given source string. Some trace events have a string that is dependent on
another variable that could be NULL, and in those cases the string is
passed in to be NULL.
The __string() macro can handle being passed in a NULL pointer for which
it will turn it into "(null)". It does that with:
Instead, create a static inline function that takes the src string and
will return the string if it is not NULL and will return "(null)" if it
is. This will then make the strlen() line:
strlen(__string_src(src)) + 1
Where the compiler can see that strlen() will not end up with NULL and
does not warn about it.
Note that this depends on commit 51270d573a8d ("tracing/net_sched: Fix
tracepoints that save qdisc_dev() as a string") being applied, as passing
the qdisc_dev() into __string_src() will give an error.
tracing: Use strcmp() in __assign_str() WARN_ON() check
The WARN_ON() check in __assign_str() to catch where the source variable
to the macro doesn't match the source variable to __string() gives an
error in clang:
>> include/trace/events/sunrpc.h:703:4: warning: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Wstring-compare]
670 | __assign_str(progname, "unknown");
That's because the __assign_str() macro has:
WARN_ON_ONCE((src) != __data_offsets.dst##_ptr_);
Where "src" is a string literal. Clang warns when comparing a string
literal directly as it is undefined to what the value of the literal is.
Since this is still to make sure the same string that goes to __string()
is the same as __assign_str(), for string literals do a test for that and
then use strcmp() in those cases
Note that this depends on commit 51270d573a8d ("tracing/net_sched: Fix
tracepoints that save qdisc_dev() as a string") being applied, as this was
what found that bug.
Link: https://lore.kernel.org/linux-trace-kernel/20240312113002.00031668@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nathan Chancellor <nathan@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402292111.KIdExylU-lkp@intel.com/ Fixes: 433e1d88a3be ("tracing: Add warning if string in __assign_str() does not match __string()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tracepoints: Use WARN() and not WARN_ON() for warnings
There are two WARN_ON*() warnings in tracepoint.h that deal with RCU
usage. But when they trigger, especially from using a TRACE_EVENT()
macro, the information is not very helpful and is confusing:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at include/trace/events/lock.h:24 lock_acquire+0x2b2/0x2d0
Where the above warning takes you to:
TRACE_EVENT(lock_acquire, <<<--- line 24 in lock.h
TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
int trylock, int read, int check,
struct lockdep_map *next_lock, unsigned long ip),
[..]
Change the WARN_ON_ONCE() to WARN_ONCE() and add a string that allows
someone to search for exactly where the bug happened.
Link: https://lore.kernel.org/linux-trace-kernel/20240228133112.0d64fb1b@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Huang Yiwei [Fri, 23 Feb 2024 08:31:26 +0000 (16:31 +0800)]
tracing: Support to dump instance traces by ftrace_dump_on_oops
Currently ftrace only dumps the global trace buffer on an OOPs. For
debugging a production usecase, instance trace will be helpful to
check specific problems since global trace buffer may be used for
other purposes.
This patch extend the ftrace_dump_on_oops parameter to dump a specific
or multiple trace instances:
- ftrace_dump_on_oops=0: as before -- don't dump
- ftrace_dump_on_oops[=1]: as before -- dump the global trace buffer
on all CPUs
- ftrace_dump_on_oops=2 or =orig_cpu: as before -- dump the global
trace buffer on CPU that triggered the oops
- ftrace_dump_on_oops=<instance_name>: new behavior -- dump the
tracing instance matching <instance_name>
- ftrace_dump_on_oops[=2/orig_cpu],<instance1_name>[=2/orig_cpu],
<instrance2_name>[=2/orig_cpu]: new behavior -- dump the global trace
buffer and multiple instance buffer on all CPUs, or only dump on CPU
that triggered the oops if =2 or =orig_cpu is given
Also, the sysctl node can handle the input accordingly.
tracing: Remove second parameter to __assign_rel_str()
The second parameter of __assign_rel_str() is no longer used. It can be removed.
Note, the only real users of rel_string is user events. This code is just
in the sample code for testing purposes.
This makes __assign_rel_str() different than __assign_str() but that's
fine. __assign_str() is used over 700 places and has a larger impact. That
change will come later.
Now that __assign_str() gets the length from the __string() (and
__string_len()) macros, there's no reason to have a separate
__assign_str_len() macro as __assign_str() can get the length of the
string needed.
Also remove __assign_rel_str() although it had no users anyway.
Randy Dunlap [Fri, 23 Feb 2024 05:48:33 +0000 (21:48 -0800)]
ftrace: Fix most kernel-doc warnings
Reduce the number of kernel-doc warnings from 52 down to 10, i.e.,
fix 42 kernel-doc warnings by (a) using the Returns: format for
function return values or (b) using "@var:" instead of "@var -"
for function parameter descriptions.
Fix one return values list so that it is formatted correctly when
rendered for output.
tracing: Decrement the snapshot if the snapshot trigger fails to register
Running the ftrace selftests caused the ring buffer mapping test to fail.
Investigating, I found that the snapshot counter would be incremented
every time a snapshot trigger was added, even if that snapshot trigger
failed.
tracing: Fix snapshot counter going between two tracers that use it
Running the ftrace selftests caused the ring buffer mapping test to fail.
Investigating, I found that the snapshot counter would be incremented
every time a tracer that uses the snapshot is enabled even if the snapshot
was used by the previous tracer.
would leave the snapshot counter at 1 and not zero. That's because the
enabling of wakeup_dl would increment the counter again but the setting
the tracer to nop would only decrement it once.
Do not arm the snapshot for a tracer if the previous tracer already had it
armed.
Link: https://lore.kernel.org/linux-trace-kernel/20240223013344.570525723@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Fixes: 16f7e48ffc53a ("tracing: Add snapshot refcount") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
The TRACE_EVENT macros has some dependency if a __string() field is NULL,
where it will save "(null)" as the string. This string is also used by
__assign_str(). It's better to create a single macro instead of having
something that will not be caught by the compiler if there is an
unfortunate typo.
Link: https://lore.kernel.org/linux-trace-kernel/20240222211443.106216915@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chuck Lever <chuck.lever@oracle.com> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There's even some code that may call a function helper to find the
s->string value. The problem with the above is that the work to get the
s->string is done twice. Once at the __string() and again in the
__assign_str().
The length of the string is calculated via a strlen(), not once, but
twice. Once during the __string() macro and again in __assign_str(). But
the length is actually already recorded in the data location and here's no
reason to call strlen() again.
Just use the saved length that was saved in the __string() code for the
__assign_str() code.
Link: https://lore.kernel.org/linux-trace-kernel/20240222211442.793074999@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There's even some code that may call a function helper to find the
s->string value. The problem with the above is that the work to get the
s->string is done twice. Once at the __string() and again in the
__assign_str().
But the __string() uses dynamic_array() which has a helper structure that
is created holding the offsets and length of the string fields. Instead of
finding the string twice, just save it off in another field from that
helper structure, and have __assign_str() use that instead.
Note, this also means that the second parameter of __assign_str() isn't
even used anymore, and may be removed in the future.
Link: https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Alison Schofield [Thu, 14 Mar 2024 20:12:17 +0000 (13:12 -0700)]
cxl/trace: Properly initialize cxl_poison region name
The TP_STRUCT__entry that gets assigned the region name, or an
empty string if no region is present, is erroneously initialized
to the cxl_region pointer. It needs to be properly initialized
otherwise it's length is wrong and garbage chars can appear in
the kernel trace output: /sys/kernel/tracing/trace
The bad initialization was due in part to a naming conflict with
the parameter: struct cxl_region *region. The field 'region' is
already exposed externally as the region name, so changing that
to something logical, like 'region_name' is not an option. Instead
rename the internal only struct cxl_region to the commonly used
'cxlr'.
Impact is that tooling depending on that trace data can miss
picking up a valid event when searching by region name. The
TP_printk() output, if enabled, does emit the correct region
names in the dmesg log.
This was found during testing of the cxl-list option to report
media-errors for a region.
Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: stable@vger.kernel.org Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The __string() and __assign_str() helper macros of the TRACE_EVENT() macro
are going through some optimizations where only the source string of
__string() will be used and the __assign_str() source will be ignored and
later removed.
To make sure that there's no issues, a new check is added between the
__string() src argument and the __assign_str() src argument that does a
strcmp() to make sure they are the same string.
drm/i915: Add missing ; to __assign_str() macros in tracepoint code
I'm working on improving the __assign_str() and __string() macros to be
more efficient, and removed some unneeded semicolons. This triggered a bug
in the build as some of the __assign_str() macros in intel_display_trace
was missing a terminating semicolon.
Link: https://lore.kernel.org/linux-trace-kernel/20240222133057.2af72a19@gandalf.local.home Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: stable@vger.kernel.org Fixes: 2ceea5d88048b ("drm/i915: Print plane name in fbc tracepoints") Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
NFSD: Fix nfsd_clid_class use of __string_len() macro
I'm working on restructuring the __string* macros so that it doesn't need
to recalculate the string twice. That is, it will save it off when
processing __string() and the __assign_str() will not need to do the work
again as it currently does.
Currently __string_len(item, src, len) doesn't actually use "src", but my
changes will require src to be correct as that is where the __assign_str()
will get its value from.
The event class nfsd_clid_class has:
__string_len(name, name, clp->cl_name.len)
But the second "name" does not exist and causes my changes to fail to
build. That second parameter should be: clp->cl_name.data.
Link: https://lore.kernel.org/linux-trace-kernel/20240222122828.3d8d213c@gandalf.local.home Cc: Neil Brown <neilb@suse.de> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Tom Talpey <tom@talpey.com> Cc: stable@vger.kernel.org Fixes: d27b74a8675ca ("NFSD: Use new __string_len C macros for nfsd_clid_class") Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Thu, 22 Feb 2024 00:18:07 +0000 (00:18 +0000)]
tracing/user_events: Document multi-format flag
User programs can now ask user_events to handle the synchronization of
multiple different formats for an event with the same name via the new
USER_EVENT_REG_MULTI_FORMAT flag.
Add a section for USER_EVENT_REG_MULTI_FORMAT that explains the intended
purpose and caveats of using it. Explain how deletion works in these
cases and how to use /sys/kernel/tracing/dynamic_events for per-version
deletion.
Beau Belgrave [Thu, 22 Feb 2024 00:18:06 +0000 (00:18 +0000)]
selftests/user_events: Test multi-format events
User_events now has multi-format events which allow for the same
register name, but with different formats. When this occurs, different
tracepoints are created with unique names.
Add a new test that ensures the same name can be used for two different
formats. Ensure they are isolated from each other and that name and arg
matching still works if yet another register comes in with the same
format as one of the two.
Currently user_events supports 1 event with the same name and must have
the exact same format when referenced by multiple programs. This opens
an opportunity for malicious or poorly thought through programs to
create events that others use with different formats. Another scenario
is user programs wishing to use the same event name but add more fields
later when the software updates. Various versions of a program may be
running side-by-side, which is prevented by the current single format
requirement.
Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
the user program wishes to use the same user_event name, but may have
several different formats of the event. When this flag is used, create
the underlying tracepoint backing the user_event with a unique name
per-version of the format. It's important that existing ABI users do
not get this logic automatically, even if one of the multi format
events matches the format. This ensures existing programs that create
events and assume the tracepoint name will match exactly continue to
work as expected. Add logic to only check multi-format events with
other multi-format events and single-format events to only check
single-format events during find.
Change system name of the multi-format event tracepoint to ensure that
multi-format events are isolated completely from single-format events.
This prevents single-format names from conflicting with multi-format
events if they end with the same suffix as the multi-format events.
Add a register_name (reg_name) to the user_event struct which allows for
split naming of events. We now have the name that was used to register
within user_events as well as the unique name for the tracepoint. Upon
registering events ensure matches based on first the reg_name, followed
by the fields and format of the event. This allows for multiple events
with the same registered name to have different formats. The underlying
tracepoint will have a unique name in the format of {reg_name}.{unique_id}.
For example, if both "test u32 value" and "test u64 value" are used with
the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
tracepoints. The dynamic_events file would then show the following:
u:test u64 count
u:test u32 count
The actual tracepoint names look like this:
test.0
test.1
Both would be under the new user_events_multi system name to prevent the
older ABI from being used to squat on multi-formatted events and block
their use.
Deleting events via "!u:test u64 count" would only delete the first
tracepoint that matched that format. When the delete ABI is used all
events with the same name will be attempted to be deleted. If
per-version deletion is required, user programs should either not use
persistent events or delete them via dynamic_events.
Beau Belgrave [Thu, 22 Feb 2024 00:18:04 +0000 (00:18 +0000)]
tracing/user_events: Prepare find/delete for same name events
The current code for finding and deleting events assumes that there will
never be cases when user_events are registered with the same name, but
different formats. Scenarios exist where programs want to use the same
name but have different formats. An example is multiple versions of a
program running side-by-side using the same event name, but with updated
formats in each version.
This change does not yet allow for multi-format events. If user_events
are registered with the same name but different arguments the programs
see the same return values as before. This change simply makes it
possible to easily accommodate for this.
Update find_user_event() to take in argument parameters and register
flags to accommodate future multi-format event scenarios. Have find
validate argument matching and return error pointers to cover when
an existing event has the same name but different format. Update
callers to handle error pointer logic.
Move delete_user_event() to use hash walking directly now that
find_user_event() has changed. Delete all events found that match the
register name, stop if an error occurs and report back to the user.
Update user_fields_match() to cover list_empty() scenarios now that
find_user_event() uses it directly. This makes the logic consistent
across several callsites.
When a ring-buffer is memory mapped by user-space, no trace or
ring-buffer swap is possible. This means the snapshot feature is
mutually exclusive with the memory mapping. Having a refcount on
snapshot users will help to know if a mapping is possible or not.
Instead of relying on the global trace_types_lock, a new spinlock is
introduced to serialize accesses to trace_array->snapshot. This intends
to allow access to that variable in a context where the mmap lock is
already held.
ring-buffer: Make wake once of ring_buffer_wait() more robust
The default behavior of ring_buffer_wait() when passed a NULL "cond"
parameter is to exit the function the first time it is woken up. The
current implementation uses a counter that starts at zero and when it is
greater than one it exits the wait_event_interruptible().
But this relies on the internal working of wait_event_interruptible() as
that code basically has:
if (cond)
return;
prepare_to_wait();
if (!cond)
schedule();
finish_wait();
That is, cond is called twice before it sleeps. The default cond of
ring_buffer_wait() needs to account for that and wait for its counter to
increment twice before exiting.
Instead, use the seq/atomic_inc logic that is used by the tracing code
that calls this function. Add an atomic_t seq to rb_irq_work and when cond
is NULL, have the default callback take a descriptor as its data that
holds the rbwork and the value of the seq when it started.
The wakeups will now increment the rbwork->seq and the cond callback will
simply check if that number is different, and no longer have to rely on
the implementation of wait_event_interruptible().
Link: https://lore.kernel.org/linux-trace-kernel/20240315063115.6cb5d205@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 7af9ded0c2ca ("ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Now that we open block devices as files we need to deal with the
realities that closing is a deferred operation. An operation on the
block device such as e.g., freeze, thaw, or removal that runs
concurrently with umount, tries to acquire a stable reference on the
holder. The holder might already be gone though. Make that reliable by
grabbing a passive reference to the holder during bdev_open() and
releasing it during bdev_release().
Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only Reported-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/r/ZfEQQ9jZZVes0WCZ@infradead.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@infradead.org> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: https://lore.kernel.org/r/CAHj4cs8tbDwKRwfS1=DmooP73ysM__xAb2PQc6XsAmWR+VuYmg@mail.gmail.com Link: https://lore.kernel.org/r/20240315-freibad-annehmbar-ca68c375af91@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
Linus Torvalds [Sun, 17 Mar 2024 23:59:33 +0000 (16:59 -0700)]
Merge tag 'i3c/for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Not much this cycle with only three patches.
Core:
- i3c_bus_type is now const
Drivers:
- dw: disabling IBI is only allowed when hot join and SIR are disabled"
* tag 'i3c/for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: Make i3c_bus_type const
i3c: dw: Disable IBI IRQ depends on hot-join and SIR enabling
dt-bindings: i3c: drop "master" node name suffix
Linus Torvalds [Sun, 17 Mar 2024 19:26:04 +0000 (12:26 -0700)]
Merge tag 'efi-fixes-for-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fix from Ard Biesheuvel:
"This fixes an oversight on my part in the recent EFI stub rework for
x86, which is needed to get Linux/x86 distro builds signed again for
secure boot by Microsoft. For this reason, most of this work is being
backported to v6.1, which is therefore also affected by this
regression.
- Explicitly wipe BSS in the native EFI entrypoint, so that globals
shared with the legacy decompressor are zero-initialized correctly"
* tag 'efi-fixes-for-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efistub: Clear decompressor BSS in native EFI entrypoint
Linus Torvalds [Sun, 17 Mar 2024 19:12:55 +0000 (12:12 -0700)]
Merge tag 'perf-urgent-2024-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fixes from Ingo Molnar:
- Work around AMD erratum to filter out bogus LBR stack entries
- Fix incorrect PMU reset that can result in warnings (or worse)
during suspend/hibernation
* tag 'perf-urgent-2024-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/core: Avoid register reset when CPU is dead
perf/x86/amd/lbr: Discard erroneous branch entries
Linus Torvalds [Sun, 17 Mar 2024 19:06:10 +0000 (12:06 -0700)]
Merge tag 'linux-watchdog-6.9-rc1' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- Remove usage of the deprecated ida_simple_xx() API
- Add kernel-doc for wdt_set_timeout()
- Add support for R-Car V4M, StarFive's JH8100 and sam9x7-wdt
- Fixes and small improvements
* tag 'linux-watchdog-6.9-rc1' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: intel-mid_wdt: Get platform data via dev_get_platdata()
watchdog: intel-mid_wdt: Don't use "proxy" headers
watchdog: intel-mid_wdt: Remove unused intel-mid.h
dt-bindings: watchdog: sama5d4-wdt: add compatible for sam9x7-wdt
dt-bindings: watchdog: sprd,sp9860-wdt: convert to YAML
dt-bindings: watchdog: starfive,jh7100-wdt: Add compatible for JH8100
watchdog: stm32_iwdg: initialize default timeout
dt-bindings: watchdog: arm,sp805: document the reset signal
watchdog: sp805_wdt: deassert the reset if available
watchdog/hpwdt: Support Suspend and Resume
dt-bindings: watchdog: renesas-wdt: Add support for R-Car V4M
watchdog: starfive: check watchdog status before enabling in system resume
watchdog: starfive: Check pm_runtime_enabled() before decrementing usage counter
watchdog: qcom: fine tune the max timeout value calculation
watchdog: Add kernel-doc for wdt_set_timeout()
watchdog: core: Remove usage of the deprecated ida_simple_xx() API
Linus Torvalds [Sun, 17 Mar 2024 19:02:21 +0000 (12:02 -0700)]
Merge tag 'pcmcia-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull PCMCIA updates from Dominik Brodowski:
"Mark some structs 'const' now that the driver core supports it
(Ricardo B Marliere)"
* tag 'pcmcia-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
pcmcia: cs: make pcmcia_socket_class constant
pcmcia: ds: make pcmcia_bus_type const
Linus Torvalds [Sun, 17 Mar 2024 18:50:54 +0000 (11:50 -0700)]
Merge tag 'input-for-v6.9-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a new driver for Goodix Berlin I2C and SPI touch controllers
- support for IQS7222D v1.1 and v1.2 in iqs7222 driver
- support for IST3032C and IST3038B parts in Imagis touchscreen driver
- support for touch keys for Imagis touchscreen controllers
- support for Snakebyte GAMEPADs in xpad driver
- various cleanups and conversions to yaml for device tree bindings
- assorted fixes and cleanups
- old Synaptics navpoint driver has been removed since the only board
that used it (HP iPAQ hx4700) was removed a while ago.
* tag 'input-for-v6.9-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (37 commits)
Input: xpad - add support for Snakebyte GAMEPADs
dt-bindings: input: samsung,s3c6410-keypad: convert to DT Schema
Input: imagis - add touch key support
dt-bindings: input: imagis: Document touch keys
Input: imagis - use FIELD_GET where applicable
Input: make input_class constant
dt-bindings: input: atmel,captouch: convert bindings to YAML
Input: iqs7222 - add support for IQS7222D v1.1 and v1.2
dt-bindings: input: allwinner,sun4i-a10-lrad: drop redundant type from label
Input: serio - make serio_bus const
Input: synaptics-rmi4 - make rmi_bus_type const
Input: xilinx_ps2 - fix kernel-doc for xps2_of_probe function
input/touchscreen: imagis: add support for IST3032C
dt-bindings: input/touchscreen: imagis: add compatible for IST3032C
input/touchscreen: imagis: Add support for Imagis IST3038B
dt-bindings: input/touchscreen: Add compatible for IST3038B
input/touchscreen: imagis: Correct the maximum touch area value
Input: leds - change config symbol dependency for audio mute trigger
Input: ti_am335x_tsc - remove redundant assignment to variable config
Input: xpad - sort xpad_device by vendor and product ID
...
Sven Schnelle [Wed, 13 Mar 2024 08:51:22 +0000 (09:51 +0100)]
s390/entry: compare gmap asce to determine guest/host fault
With the current implementation, there are some cornercases where
a host fault would be treated as a guest fault, for example
when the sie instruction causes a program check. Therefore store
the gmap asce in ptregs, and use that to compare the primary asce
from the fault instead of matching instruction addresses.
Sven Schnelle [Tue, 20 Feb 2024 14:18:56 +0000 (15:18 +0100)]
s390/entry: remove OUTSIDE macro
With only one OUTSIDE user left, remove the macro and move the code
directly to the machine check handler. This has the advantage that
it is much easier to determine which registers are used.
Sven Schnelle [Tue, 20 Feb 2024 13:21:14 +0000 (14:21 +0100)]
s390/entry: add CIF_SIE flag and remove sie64a() address check
When a program check, interrupt or machine check is triggered, the
PSW address is compared to a certain range of the sie64a() function
to figure out whether SIE was interrupted and a cleanup of SIE is
needed.
This doesn't work with kprobes: If kprobes probes an instruction, it
copies the instruction to the kprobes instruction page and overwrites the
original instruction with an undefind instruction (Opcode 00). When this
instruction is hit later, kprobes single-steps the instruction on the
kprobes_instruction page.
However, if this instruction is a relative branch instruction it will now
point to a different location in memory due to being moved to the kprobes
instruction page. If the new branch target points into sie64a() the kernel
assumes it interrupted SIE when processing the breakpoint and will crash
trying to access the SIE control block.
Instead of comparing the address, introduce a new CIF_SIE flag which
indicates whether SIE was interrupted.
Juergen Gross [Wed, 13 Mar 2024 07:14:09 +0000 (08:14 +0100)]
xen/events: increment refcnt only if event channel is refcounted
In bind_evtchn_to_irq_chip() don't increment the refcnt of the event
channel blindly. In case the event channel is NOT refcounted, issue a
warning instead.
Add an additional safety net by doing the refcnt increment only if the
caller has specified IRQF_SHARED in the irqflags parameter.
Juergen Gross [Wed, 13 Mar 2024 07:14:08 +0000 (08:14 +0100)]
xen/evtchn: avoid WARN() when unbinding an event channel
When unbinding a user event channel, the related handler might be
called a last time in case the kernel was built with
CONFIG_DEBUG_SHIRQ. This might cause a WARN() in the handler.
Avoid that by adding an "unbinding" flag to struct user_event which
will short circuit the handler.
Amir Goldstein [Sun, 17 Mar 2024 11:59:43 +0000 (13:59 +0200)]
ovl: relax WARN_ON in ovl_verify_area()
syzbot hit an assertion in copy up data loop which looks like it is
the result of a lower file whose size is being changed underneath
overlayfs.
This type of use case is documented to cause undefined behavior, so
returning EIO error for the copy up makes sense, but it should not be
causing a WARN_ON assertion.
Reported-and-tested-by: syzbot+3abd99031b42acf367ef@syzkaller.appspotmail.com Fixes: ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()") Signed-off-by: Amir Goldstein <amir73il@gmail.com>