cem [Thu, 5 Sep 2019 21:43:33 +0000 (21:43 +0000)]
Remove long-dead BUF_ASSERT_{,UN}HELD assertions
These were fully neutered in r177676 (2008), but not removed at the time for
unclear reasons. They're totally dead code, so go ahead and yank them now.
cem [Thu, 5 Sep 2019 21:30:52 +0000 (21:30 +0000)]
msdosfs: Drop an unneeded brelse in bread error condition
After r294954, it is an invariant that bread returns non-NULL bp if and only
if the routine succeeded. On error, it handles any buffer cleanup
internally. So the brelse(NULL) here was just redundant.
No functional change.
Discussed with: kib (extracted from a larger differential)
Bounds check again after advancing cp, otherwise we have a possible
heap buffer overflow. This was discovered by a Google fuzzer test.
This can lead to remote denial of service. User interaction and
execution privileges are not a prerequisite for exploitation.
Reported by: enh at Google, to FreeBSD by maya@NetBSD.org
Obtained from: enh at Google
See also: NetBSD ns_name.c r1.12
Reviewed by: delphij, ume
MFC after: 3 days
https://android-review.googlesource.com/c/platform/bionic/+/1093130
Differential Revision: https://reviews.freebsd.org/D21523
ian [Thu, 5 Sep 2019 19:17:53 +0000 (19:17 +0000)]
Use a single write of 3 bytes instead of iicdev_writeto() in ads111x.
The iicdev_writeto() function basically does scatter-gather IO by filling
in a pair of iic_msg structs to write the register address then the data
from different locations but with a single bus START/xfer/STOP sequence.
It turns out several low-level i2c controller drivers do not honor the
IIC_NOSTART flag, so the second piece of the write gets a new START on
the bus, and that confuses the ads111x chips which expect a continuous
write of 3 bytes to set a register.
A proper fix for this is to track down all the misbehaving controllers
drivers and fix them. For now this change makes this driver work again.
ian [Thu, 5 Sep 2019 19:07:48 +0000 (19:07 +0000)]
Ensure a measurement is complete before reading the result in ads111x.
Also, disable the comparator by default; it's not used for anything.
The previous logic would start a measurement, and then pause_sbt() for the
averaging time currently configured in the chip. After waiting that long,
the code would blindly read the measurement register and return its value.
The problem is that the chip's idea of averaging time is based on its
internal free-running 1MHz oscillator, which may be running at a wildly
different rate than the kernel clock. If the chip's internal timer was
running slower than the kernel clock, we'd end up grabbing a stale result
from an old measurement.
The driver now still uses pause_sbt() to yield the cpu while waiting for
the measurement to complete, but after sleeping it checks the chip's status
register to ensure the measurement engine is idle. If it's not, the driver
uses a retry loop to wait a bit (5% of the original wait time) then check
again for completion.
r339782 re-enabled acl test 00 and 02, which were disabled in r336617
due to PR 229930.
When the tests were disabled the code to set their required programs was
disabled as well, but this was not reinstated when r339782 re-enabled
them.
Do so now.
adrian [Thu, 5 Sep 2019 15:55:24 +0000 (15:55 +0000)]
[lib80211] add initial VHT (11ac) channel ranges for FCC.
This is a simple set of VHT channels and flags for the FCC (US) regulatory
domain. This needs to be researched and done for the rest of the
regulatory domains, but this should at least unblock some more ath10k
testing.
patch(1): fix the file removal test, strengthen it a bit
To remain compatible with GNU patch, we should ensure that once we're
removing empty files after a reversed /dev/null patch we don't remove files
that have been modified. GNU patch leaves these intact and just reverses the
hunk that created the file, effectively implying --remove-empty-files for
reversed /dev/null patches.
rc: Honor ${name}_env when a custom *_cmd is defined (e.g., start_cmd)
A user may set ${name}_env variable in rc.conf(5) in order to set additional
environment variables for a service command. Unfortunately, at the moment
this variable is only honored when the command is specified via the command
variable. Those additional environment variables coming from ${name}_env
are never set if the service is started via the ${rc_arg}_cmd variable (for
example start_cmd).
manu [Thu, 5 Sep 2019 14:15:47 +0000 (14:15 +0000)]
pkgbase: Create a FreeBSD-utilities package and make it the default one
The default package use to be FreeBSD-runtime but it should only contain
binaries and libs enough to boot to single user and repair the system, it
is also very handy to have a package that can be tranform to a small mfsroot.
So create a new package named FreeBSD-utilities and make it the default one.
Also move a few binaries and lib into this package when it make sense.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D21506
manu [Thu, 5 Sep 2019 14:13:08 +0000 (14:13 +0000)]
pkgbase: Put a lot of binaries and lib in FreeBSD-runtime
All of them are needed to be able to boot to single user and be able
to repair a existing FreeBSD installation so put them directly into
FreeBSD-runtime.
manu [Thu, 5 Sep 2019 14:11:16 +0000 (14:11 +0000)]
pkgbase: Put libbluetooth in the bluetooth package
It make sense to have everything bluetooth related in the same package.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D21502
Summary:
- basic: test application of patches created by diff -u at the
beginning/middle/end of file, which have differing amounts of context
before and after chunks being added
- limited_ctx: stems from PR 74127 in which a rogue line was getting added
when the patch should have been rejected. Similar behavior was
reproducible with larger contexts near the beginning/end of a file. See
r326084 for details
- file_creation: patch sourced from /dev/null should create the file
- file_nodupe: said patch sourced from /dev/null shouldn't dupe the contents
when re-applied (personal vendetta, WIP, see comment)
- file_removal: this follows from nodupe; the reverse of a patch sourced
from /dev/null is most naturally deleting the file, as is expected based
on GNU patch behavior (WIP)
Delete the unused "nd" argument for nfsrv_checkdsattr().
The "nd" argument for nfsrv_checkdsattr() is no longer used by the function.
This patch deletes it. This allows subsequent patches to delete the "nd"
argument from nfsrv_proxyds(), since it's only use of "nd" was to pass it
to nfsrv_checkdsattr(). The same will then be true for nfsvno_getattr(),
which passes "nd" to nfsrv_proxyds().
Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion
over why it might need "nd".
This patch is trivial and does not have any semantic effect.
Found by inspection while working on the NFSv4.2 server.
Don't free pages in a shadowing object. While this degrades MADV_FREE
to a no-op (and we could, instead, choose to fall back to
MADV_DONTNEED, at the cost of changing pmap_madvise), this is
presently considered a temporary fix. We may prefer to risk a little
fragmentation of the map by creating a zero/OBJT_DEFAULT entry over
top of the existing object and, simultaneously, revert to the existing
marking any pages in the former shadowing object in the advised region
as reclaimable. At least one consumer of MADV_FREE (snmalloc) may use
mmap() to construct zeroed pages "eventually" here anyway, so the
fragmentation may be coming anyway.
>>> 8.6 Doorbell Stride for Software Emulation
>>> The doorbell stride,...is useful in software emulation of an NVM
>>> Express controller. ... For hardware implementations of the NVM
>>> Express interface, the expected doorbell stride value is 0h.
However, hardware in the wild exists with a doorbell stride of 1
(meaning 8 byte separation). This change supports that hardware, as
well as software emulators as envisioned in Section 8.6. Since this is
the fast path, care has been taken to make this computation
efficient. The bit of math to compute an offset for each is replaced
by a memory load from cache of a pre-computed value.
MFC After: 3 days
Reviewed by: scottl@
Differential Revision: https://reviews.freebsd.org/D21514
Currently the code only bumps holdcnt and clears the VI_FREE flag, not
performing actual vhold. Since the vnode is still visible elsewhere, a
potential new user can find it and incorrectly assume it is properly held.
Use vholdl instead to correctly hold the vnode. Another place recycling
(vlrureclaim) does this already.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21522
Report the Host Buffer Memory minimum and preferred sizes.
The Host Buffer feature (NVMe 1.4 section 89) allows for the NVMe card
request the host provide it buffer for lookaside tables and maybe
other things. Report the card's minimum and preferred sizes with
nvmecontrol/camcontrol identify.
procstat/tests: Fix flakiness by waiting for program to start
Some of the procstat tests start a program "while1" and examine the process
using procstat, but did not wait properly for it to start (kill -0 will
succeed immediately after the child process has been created).
Instead, have "while1" write something when it starts, and use a fifo to
wait for that.
r351741 reworked readdir slightly to avoid pfs_node/pidhash LOR, but
inadvertently regressed pid == NO_PID; new pfs_lookup_proc() fails for the
obvious reasons, and later pfs_visible_proc doesn't capture the
pid == NO_PID -> return 1 aspect of pfs_visible. We can infact skip this
whole block if we're operating on a directory w/ NO_PID, as it's always
visible.
bectl(8): implement sorting for 'bectl list' output
Allow 'bectl list' to sort output by a given property name. The property
name is passed in using a command-line flag, '-c' for ascending order and
'-C' for descending order. The properties allowed to sort by are:
- name (the default output, even if '-c' or '-C' are not used)
- creation
- origin
- used
- usedds
- usedsnap
- usedrefreserv
The default output for 'bectl list' is now ascending alphabetical order of
BE name.
To sort by creation time from earliest to latest, the command would be
'bectl list -c creation'
This code has been written as a proof of concept, but I think that it
can be useful in general. It allows to set the status of an enclosure
slot. Practically, this means controlling whatever slot status LEDs the
enclosure provides. At present, the new command does not have sanity
checks or any conveniences. That means that it is possible to issue the
command for an invalid slot and an enclosure. But the worst I have seen
happening is either the command failing or simply being ignored. Also,
at the moment, the status has to be specified as a numeric bit mask.
The bit definitions can be found in sys/dev/mps/mpi/mpi2_init.h, they
are prefixed with MPI2_SEP_REQ_SLOTSTATUS_. The only way to address a
slot is by the enclosure handle and the slot number. Both are readily
available from mpsutil show commands.
So, future enhancements could include alternative ways to address a slot
(e.g., by a disk handle or a disk device name) and human friendly names
for slot statuses.
The new command is useful alternative to 'sas2ircu locate' command.
First, sas2ircu is a proprietary blob. Second, it supports setting only
locate / identify status bit.
ZFS: Always refuse receving non-resume stream when resume state exists
This fixes a hole in the situation where the resume state is left from
receiving a new dataset and, so, the state is set on the dataset itself
(as opposed to %recv child).
Additionally, distinguish incremental and resume streams in error
messages.
- Use ptoa() instead of the archaic ctob().
- Use pagezero() to zero a PDP page.
- Remove PA_MIN_ADDRESS, orphaned by r351742.
- Remove unneeded parens and an unnecessary control flow statement.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21495
LOCAL_MODULES: Allow LOCAL_MODULES="" in src.conf to work
Currently LOCAL_MODULES= works, but LOCAL_MODULES="" causes build errors as
.for still has the empty string to loop over. An .if empty prior to the loop
was considered, but LOCAL_MODULES has empty quotes at that point and thus,
isn't empty. A better solution likely exists, but this floats us by for
now...
GEOM is supposed to be topology-agnostic, but the GPT and BSD partition code
has arbitrary restrictions on nesting that are annoying in cases such as
running VMs on raw partitions (since the VM's partitioning scheme is not
visible to the host).
This patch adds sysctls to disable the restrictions except in the case of
BSD label (and similar) partitions with offset 0 (where we need to avoid
recursively recognizing the label).
r351650 switched posixshm to using OBJT_SWAP for shm_object
r351795 added support to the swap_pager for tracking writeable mappings
Take advantage of this and start tracking writeable mappings; fd sealing
will use this to reject a seal on writing with EBUSY if any such mapping
exist.
Currently writemapping accounting is only done for vnode_pager which does
some accounting on the underlying vnode.
Extend this to allow accounting to be possible for any of the pager types.
New pageops are added to update/release writecount that need to be
implemented for any pager wishing to do said accounting, and we implement
these methods now for both vnode_pager (unchanged) and swap_pager.
The primary motivation for this is to allow other systems with OBJT_SWAP
objects to check if their objects have any write mappings and reject
operations with EBUSY if so. posixshm will be the first to do so in order to
reject adding write seals to the shmfd if any writable mappings exist.
Fix two TCP RACK issues:
* Convert the TCP delayed ACK timer from ms to ticks as required.
This fixes the timer on platforms with hz != 1000.
* Don't delay acknowledgements which report duplicate data using
DSACKs.
It allows a process to request that stack gap was not applied to its
stacks, retroactively. Also it is possible to control the gaps in the
process after exec.
PR: 239894
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21352
With this last piece in place, make -C /usr/src/release release.iso is
finally able to run in a jail. This was not possible before because
msdosfs cannot be mounted inside a jail.
bsdgrep(1): add some basic tests for some GNU Extension support
These will be expanded later as I come up with good test cases; for now,
these seem to be enough to trigger bugs in base gnugrep and expose missing
features in bsdgrep.
vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting
freed and usecount to denote it is actively used.
Previously all operations bumping usecount would also bump holdcnt, which is
not necessary. We can detect if usecount is already > 1 (in which case holdcnt
is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves
on atomic ops.
Reviewed by: kib
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21471
Implement nvme suspend / resume for pci attachment
When we suspend, we need to properly shutdown the NVME controller. The
controller may go into D3 state (or may have the power removed), and
to properly flush the metadata to non-volatile RAM, we must complete a
normal shutdown. This consists of deleting the I/O queues and setting
the shutodown bit. We have to do some extra stuff to make sure we
reset the software state of the queues as well.
On resume, we have to reset the card twice, for reasons described in
the attach funcion. Once we've done that, we can restart the card. If
any of this fails, we'll fail the NVMe card, just like we do when a
reset fails.
Set is_resetting for the duration of the suspend / resume. This keeps
the reset taskqueue from running a concurrent reset, and also is
needed to prevent any hw completions from queueing more I/O to the
card. Pass resetting flag to nvme_ctrlr_start. It doesn't need to get
that from the global state of the ctrlr. Wait for any pending reset to
finish. All queued I/O will get sent to the hardware as part of
nvme_ctrlr_start(), though the upper layers shouldn't send any
down. Disabling the qpairs is the other failsafe to ensure all I/O is
queued.
Rename nvme_ctrlr_destory_qpairs to nvme_ctrlr_delete_qpairs to avoid
confusion with all the other destroy functions. It just removes the
queues in hardware, while the other _destroy_ functions tear down
driver data structures.
Split parts of the hardware reset function up so that I can
do part of the reset in suspsend. Split out the software disabling
of the qpairs into nvme_ctrlr_disable_qpairs.
Finally, fix a couple of spelling errors in comments related to
this.
Add preliminary support for atomic updates of per-page queue state.
Queue operations on a page use the page lock when updating the page to
reflect the desired queue state, and the page queue lock when physically
enqueuing or dequeuing a page. Multiple pages share a given page lock,
but queue state is per-page; this false sharing results in heavy lock
contention.
Take a small step towards the use of atomic_cmpset to synchronize
updates to per-page queue state by introducing vm_page_pqstate_cmpset()
and using it in the page daemon. In the longer term the plan is to stop
using the page lock to protect page identity and rely only on the object
and page busy locks. However, since the page daemon avoids acquiring
the object lock except when necessary, some synchronization with a
concurrent free of the page is required. vm_page_pqstate_cmpset() can
be used to ensure that queue state updates are successful only if the
page is not scheduled for a dequeue, which is sufficient for the page
daemon.
Add vm_page_swapqueue(), which moves a page from one queue to another
using vm_page_pqstate_cmpset(). Use it in the active queue scan, which
does not use the object lock. Modify vm_page_dequeue_deferred() to
use vm_page_pqstate_cmpset() as well.
r351198 allows the kernel to use domain-local memory to back the vm_page
array (up to 2MB boundaries) and reserves a separate PML4 entry for that
purpose. One consequence of that change is that the vm_page array is no
longer present in minidumps, which only adds pages mapped above
VM_MIN_KERNEL_ADDRESS.
To avoid the friction caused by having kernel data structures mapped
below VM_MIN_KERNEL_ADDRESS, map the vm_page array starting at
VM_MIN_KERNEL_ADDRESS instead of using a dedicated PML4 entry.
Reviewed by: kib
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21491
Add a sysctl to dump kernel mappings and their properties on amd64.
The sysctl is called vm.pmap.kernel_maps. It dumps address ranges
and their corresponding protection and mapping mode, as well as
counts of 2MB and 1GB pages in the range.
This patch improves the DSACK handling to conform with RFC 2883.
The lowest SACK block is used when multiple Blocks would be elegible as
DSACK blocks ACK blocks get reordered - while maintaining the ordering of
SACK blocks not relevant in the DSACK context is maintained.
In nvme_completion_poll, add a sanity check to make sure that we complete the
polling within a second. Panic if we don't. All the commands that use this
interface should typically complete within a few tens to hundreds of
microseconds. Panic rather than return ETIMEDOUT because if the command somehow
does later complete, it will randomly corrupt memory. Also, it helps to get a
traceback from where the unexpected failure happens, rather than an infinite
loop.
In all the places that we use the polled for completion interface, except crash
dump support code, move the while loop into an inline function. These aren't
done in the fast path, so if the compiler choses to not inline, any performance
hit is tiny.
Add a brief comment explaining why we can return ETIMEDOUT from the call to the
polled interface. Normally this would have the potential to corrupt stack memory
because the completion routines would run after we return. In this case,
however, we're doing a dump so it's safe for reasons explained in the comment.
proc: clear pid bitmap entry after dropping proctree lock
There is no correctness change here, but the procid lock is contended in
the fork path and taking it while holding proctree avoidably extends its
hold time.
Note that there are other ids which can end up getting cleared with the
lock.