Mateusz Guzik [Tue, 22 Feb 2022 15:54:17 +0000 (16:54 +0100)]
fd: rename fget*_locked to fget*_noref
This gets rid of the error prone naming where fget_unlocked returns with
a ref held, while fget_locked requires a lock but provides nothing in
terms of making sure the file lives past unlock.
Warner Losh [Tue, 22 Feb 2022 17:34:36 +0000 (10:34 -0700)]
camcontrol: Force a rescan of the lun after firmware download.
After downloading the firmware to a device, it's inquiry data likely
will change. Force a rescan of the target with the CAM_EXPECT_INQ_CHANGE
flag to get it to record the new inqury data as being expected. This
avoids the need for a 'camcontrol rescan' on the device which detaches
and re-attaches the disk (da, ada) device. This brings fwdownload up to
nvmecontrol's ability to do the same thing w/o changing the exposed
nvme/nvd/nda device. We scan the target and not the LUN because dual
actuator drives have multiple LUNs, but the firmware is global across
many vendors' drives (and the so far theoretical ones that aren't won't
be harmed by the rescan).
Since the underlying struct disk is now preserved accross this
operation, it's now possible to upgrade firmware of a root device w/o
crashing the system. On systems that are quite busy, the worst that
happens is that certain operaions are reported cancelled when the new
firmware is activated. These operations are retried with the normal CAM
recovery mechanisms and will work on the retry. The only visible hiccup
is the time that new firmware is flashing / initializing. One should not
consider this operation completely risk free, however, since not all
drives are well behaved after a firmware download.
Andrew Turner [Mon, 30 Aug 2021 16:43:22 +0000 (17:43 +0100)]
Add NT_ARM_ADDR_MASK
This can be used by debuggers to find which bits in a virtual address
should be masked off to get a canonical address. This is currently used
by the Pointer Authentication Code support to get its mask. It could also
be used if we support Top Byte Ignore for the same purpose.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34302
Mark Johnston [Tue, 22 Feb 2022 14:26:33 +0000 (09:26 -0500)]
riscv: Fix another race in pmap_pinit()
Commit c862d5f2a789 ("riscv: Fix a race in pmap_pinit()") did not really
fix the race. Alan writes,
Suppose that N entries in the L1 tables are in use, and we are in the
middle of the memcpy(). Specifically, we have read the zero-filled
(N+1)st entry from the kernel L1 table. Then, we are preempted. Now,
another core/thread does pmap_growkernel(), which fills the (N+1)st
entry. Finally, we return to the original core/thread, and overwrite
the valid entry with the zero that we earlier read.
Try to fix the race properly, by copying kernel L1 entries while holding
the allpmaps lock. To avoid doing unnecessary work while holding this
global lock, copy only the entries that we expect to be valid.
Fixes: c862d5f2a789 ("riscv: Fix a race in pmap_pinit()")
Reported by: alc, jrtc27
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34267
Ed Maste [Mon, 21 Feb 2022 04:09:36 +0000 (23:09 -0500)]
vt: fix double-click word selection for first word on line
Previously when double-clicking on the first word on a line we would
select from the cursor position to the end of the word, not from the
beginning of the line. This is because we searched backward for a
space to mark the beginning of a word.
Now, use the beginning of the line if we do not find a space.
PR: 261553
Reported by: Stefan B.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Emmanuel Vadot [Tue, 22 Feb 2022 08:58:36 +0000 (09:58 +0100)]
dwc: Support phy mode MII
Some board use dwc phy in MII mode, so do not fail to attach if this is
the case.
Only rockchip code uses the phy mode to program some custom syscon register.
Hubert Mazur [Thu, 20 Jan 2022 09:56:10 +0000 (10:56 +0100)]
sdhci_fsl_fdt: Add voltage switching through syscon
Some SoCs does not have a fixed regulator to handle voltage
switching automatically. Add support for voltage switching
through syscon register when necessary. Add new errata flag
indicating missing regulator. Apply errata to SoCs, which are
known to be affected, i.e. LS1046 and LS1012.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D34029
Hubert Mazur [Tue, 21 Dec 2021 14:52:56 +0000 (15:52 +0100)]
sdhci_fsl_fdt: Apply errata for LX2160A
LX2160A is affected by two erratum regarding SDHCI. However this board
has generic compat string in DTS for SDHCI which means erratum cannot
be simply applied. Compare compat string from "/" path with LX2160A
compat string when attaching device and apply erratum.
Kornel Duleba [Wed, 22 Dec 2021 08:17:28 +0000 (09:17 +0100)]
sdhci_fsl_fdt: Fix tuning code
- Some of the register writes were already done in the generic tuning code.
Remove them.
- Increase the polling timeout. The previous value is probably fine, but since
timeouts are treated as fatal errors increasing it to 200ms won't hurt.
- Rework the HS400 switching code. Make sure that the switch happens at the
right time. Reset the DLL0 block. We need to do that if u-boot has previously
configured the controller in HS400 mode.
- Check current timing before tuning. The tuning devmethod is always called,
even for timings that don't require the tuning procedure.
- Rework software tuning routine code. Use inner formula for clock
divider calculation, as previous one was incorrect.
- Implement custom re-tune procedure.
Mitchell Horne [Fri, 5 Feb 2021 20:07:39 +0000 (16:07 -0400)]
boottrace(8): small wrapper utility
This is a small program that when invoked will create start and stop
boottrace entries via sysctl, and execute the desired command. Having
this as an executable -- as opposed to some shell script invoking
sysctl(8) -- allows the total resource usage recorded by the trace
entries to include the child process.
Reviewed by: 0mp, trasz (older version)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31929
Mitchell Horne [Tue, 22 Feb 2022 00:15:45 +0000 (20:15 -0400)]
boottrace: a simple boot and shutdown-time tracing facility
Boottrace is a facility for capturing trace events during boot and
shutdown. This includes kernel initialization, as well as rc. It has
been used by NetApp internally for several years, for catching and
diagnosing slow devices or subsystems. It is driven from userspace by
sysctl interface, and the output is a human-readable log of events
(kern.boottrace.log).
This commit adds the core boottrace functionality implementing these
interfaces. Adding the trace annotations themselves to kernel and
userland will happen in follow-up commits. A future commit will also add
a boottrace(4) man page.
For now, boottrace is unconditionally compiled into the kernel but
disabled by default. It can be enabled by setting the
kern.boottrace.enabled tunable to 1 in loader.conf(5).
There is an existing boot-time event tracing facility, which can be
compiled into the kernel with 'options TSLOG'. While there is some
functional overlap between this and boottrace, they are distinct. TSLOG
is suitable for generating detailed timing information and flamegraphs,
and has been used to great success by cperciva@ to diagnose and reduce
the overall system boot time. Boottrace aims to more quickly provide an
overview of timing and resource usage of the boot (and shutdown) process
to a sysadmin who requires this knowledge.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
X-NetApp-PR: #23
Differential Revision: https://reviews.freebsd.org/D30184
Kristof Provost [Sat, 19 Feb 2022 15:34:31 +0000 (16:34 +0100)]
bridge: Don't share broadcast packets
if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.
Chuck Tuffli [Mon, 21 Feb 2022 18:34:14 +0000 (10:34 -0800)]
nvme: Add OAES bit-field definitions
Create definitions for the Optional Asynchronous Events Supported (OAES)
values. Also adds a helper macro for the common use case of "mask and
shift". E.g.
value = NVME_CTRLR_DATA_OAES_NS_ATTR_MASK << NVME_CTRLR_DATA_OAES_NS_ATTR_SHIFT;
becomes
value = NVMEB(NVME_CTRLR_DATA_OAES_NS_ATTR);
Emmanuel Vadot [Mon, 21 Feb 2022 17:31:00 +0000 (18:31 +0100)]
files: Make mmc_helper depend on gpio
mmc_helper have an hard dependency on gpio_if.h
gpio(4) isn't in the default x86 kernel and none of the x86
sd/mmc drivers uses mmc_helper so just add a dependency on gpio.
Fixes: 85b3794ceef ("files: Make ext_resources non-optional")
Arka Sharma [Fri, 18 Feb 2022 15:34:15 +0000 (09:34 -0600)]
mmap map_at_zero test: handle W^X
Use kern.elfXX.allow_wx to decide whether to map W+X or W-only memory.
Future work could expand this test to add an "allow_wx" axis to the
test matrix, but I would argue that a separate test should be written,
since that's orthogonal to map_at_zero.
Randall Stewart [Mon, 21 Feb 2022 11:30:17 +0000 (06:30 -0500)]
tcp: Congestion control move to using reference counting.
In the transport call on 12/3 Gleb asked to move the CC modules towards
using reference counting to prevent folks from unloading a module in use.
It was also agreed that Michael would do a user space utility like tcp_drop
that could be used to move all connections that are using a specific CC
to some other CC.
This is the half I committed to doing, making it so that we maintain a refcount
on a cc module every time a pcb refers to it and decrementing that every
time a pcb no longer uses a cc module. This also helps us simplify the
whole unloading process by getting rid of tcp_ccunload() which munged
through all the tcb's. Instead we mark a module as being removed and
prevent further references to it. We also make sure that if a module is
marked as being removed it cannot be made as the default and also
the opposite of that, if its a default it fails and does not mark it as being
removed.
Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33249
The IBTA specification has new speed - NDR. That speed supports signaling
rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX
device to IB speed and width. Added translation of new 100Gb, 200Gb and
400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively.
Stefan Eßer [Sun, 20 Feb 2022 21:07:35 +0000 (22:07 +0100)]
dev/pci: fix potential panic due to bogus VPD data
A panic has been observed on a system with a Intel X520 dual LAN
device. The panic is caused by a KASSERT() noticing that the amount
of VPD data copied out to the pciconf command does not match the
amount of data read from the device.
The cause of the size mismatch was VPD data that started with 0x82,
the VPD tag that indicates that a VPD ident follows, but with a length
of more than 255 characters, which happens to be the maximum ident
size supported by the API between kernel and the pciconf program.
The data provided did not resemble an actual VPD identifier, and it
can be assumed that the initial tag value 0x82 happens to be there
by accident.
An ident size of 255 far exceeds the sensible length of that data
element, which is in the order of at most 30 to 40 bytes.
This patch adds several consitstency checks to the VPD parser, the
most critical being that ident lengths of more than 255 bytes are
rejected. Other checks reject VPD with more than one ident tag or
with an empty (zero length) ident string.
This patch prevents the panic that occured when "pciconf -lV" was
executed on the affected system.
During the anaylsis of the issue and the VPD code it has been
found that the VPD parser uses a state machine that accepts tags
in any order and combination. This is a bad match for the actual
VPD data, which has a very simple structure that can be parsed
with a non-recursive direct descent parser (which always knows
exactly which token to expect next).
A review fpr a much simpler VPD parser that performs many more
consistency checks and rejects invalid VPD has been proposed in
review https://reviews.freebsd.org/D34268.
Reported by: mikej at paymentallianceintl.com (Michael Jung)
Approved by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34255
Kirk McKusick [Sun, 20 Feb 2022 21:18:05 +0000 (13:18 -0800)]
Avoid unaligned writes by fsck_ffs(8).
Normally fsck_ffs never does reads or writes that are not aligned
to the size of one of the checked filesystems fragments. The one
exception is when it finds that it needs to write the superblock
recovery information. Here it will write with the alignment reported
by the underlying disk as its sector size as reported by an
ioctl(diskfd, DIOCGSECTORSIZE, &secsize).
Modern disks have a sector size of 4096, but for backward compatibility
with older disks will report that they have a sector size of 512.
When presented with a 512 byte write, they have to read the associated
4096 byte sector, replace the 512 bytes to be written, and write
the updated 4096 byte sector back to the disk. Unfortunately, some
disks report that they have 512 sectors, but fail writes that are not
aligned to 4096 boundaries and are a multiple of 4096 bytes in size.
This commit updates fsck_ffs(8) so that it uses the filesystem fragment
size as the smallest size and alignment for doing writes rather than
the disk's reported sector size.
Stefan Eßer [Sun, 20 Feb 2022 14:24:43 +0000 (15:24 +0100)]
fetch: make -S argument accept values > 2GB
Use strtoll() to parse the argument of the -S option.
FreeBSD has supported 64 bit file offsets for more than 25 years on
all architectures and off_t is a 64 bit integer type for that reason.
While strtol() returns a 64 bit value on 64 LP64 architectures, it
is limit to 32 bit on e.g. i386. The strtoll() function returns a 64
but result on all supported architectures and therefore supports the
possible file lengths and file offsets on 32 bit archtectures.
Michal Meloun [Sun, 20 Feb 2022 09:45:14 +0000 (10:45 +0100)]
ofw_iicbus: Add method for manual setting of basic OFW parameters.
Some IIC multifunction devices may have multiple I2C addresses per chip, but
only the primary address is listed in the DT (e.g. MAX776200). In this case,
the sub-devices for the secondary addresses must be created manually with
fixed OFW parameters (node, name, compatibility string, IIC address).
Add a bus method to the ofw_iicbus interface that does this.
Kirk McKusick [Sun, 20 Feb 2022 05:30:37 +0000 (21:30 -0800)]
Provide an interface that allows GEOM modules to return multiple messages.
The gctl_error() function provides GEOM modules with the ability
to report only a single message. When running with the verbose
flag, commands that handle multiple devices may want to report a
message for each of the devices on which it operates. This commit
adds the gctl_msg() function that can be called multiple times
to post messages. When finished issuing messages, the application
must either call gctl_post_messages() or call gctl_error() to cause
the messages to be reported to the calling process.
John Baldwin [Fri, 18 Feb 2022 23:20:14 +0000 (15:20 -0800)]
ctl ramdisk: Free compare buffer after a compare I/O request.
For a compare request, the ramdisk backend allocates a temporary
buffer to hold the I/O data and then compares it against the LUN's
pages in ctl_backend_ramdisk_cmp after the data has been filled.
However, the tempory buffer was leaked when after the comparison was
complete. Fix this by freeing the buffer after the comparison.
Navdeep Parhar [Fri, 4 Feb 2022 21:16:35 +0000 (13:16 -0800)]
cxgbe(4): Changes to the fatal error handler.
* New error_flags that can be used from the error ithread and elsewhere
without a synch_op.
* Stop the adapter immediately in t4_fatal_err but defer most of the
rest of the handling to a task. The task is allowed to sleep, unlike
the ithread. Remove async_event_task as it is no longer needed.
* Dump the devlog, CIMLA, and PCIE_FW exactly once on any fatal error
involving the firmware or the CIM block. While here, dump some
additional info (see dump_cim_regs) for these errors.
* If both reset_on_fatal_err and panic_on_fatal_err are set then attempt
a reset first and do not panic the system if it is successful.