adrian [Thu, 17 Sep 2015 04:45:29 +0000 (04:45 +0000)]
Bring over the QoS logic from the Linux r92su driver.
* the tx descriptor TID is priority, not TID.
* the tx descriptor queue id mapping is separate from the
TID/priority; rather than just "BE".
TODO:
* go and re-re-re-verify the queue mappings; the linux and openbsd
mappings aren't exactly the same. I need to verify all of this
before I try to flip on 11n RX.
META_MODE: Default OBJROOT to the traditional /usr/src/SRCTOP/.
This avoids easily colliding multiple src trees with the same objects. Having
multiple checkouts in dir/ dir2/ dir3/ would all use obj/ without any unique
identifier inside of obj/. This pattern is more likely to be used due
to the non-META_MODE behavior working with it fine.
In environments where ../obj/ is wanted as the obj directory the value of
OBJROOT can be set to ${SRCTOP:H}/obj/ instead via src-env.conf (set by
SRC_ENV_CONF) or environment. For environment it must be single quoted or
escaped. This will be more likely for vendors who are building images or using
NFS for builds. In those cases MAKEOBJDIRPREFIX may already be utilized and
is supported.
META_MODE: Allow MAKEOBJDIRPREFIX to work more closely to its traditional behavior.
The preferred way to modify the object directory root is to use OBJROOT.
However, setting OBJROOT to ${MAKEOBJDIRPREFIX}/${SRCTOP}/ effectively behaves
as expected.
The problem with this before was that setting OBJROOT to contain SRCTOP
resulted in a recursive replacement (/usr/obj/usr/obj/usr/src/). Anchoring to
the start of the path for replacing SRCCTOP in CURDIR resolves this by
avoiding replacing SRCTOP when CURDIR is within the OBJDIR.
adrian [Thu, 17 Sep 2015 03:42:18 +0000 (03:42 +0000)]
Program the firmware setup stuff with the current hardware setup:
* Do 1T1R for now, until we read the config out of ROM and use it.
* Disable turbo mode, I dunno what this is, but the linux drivers
have this disabled.
* Set the firmware endpoints to what we read from USB.
adrian [Thu, 17 Sep 2015 03:01:19 +0000 (03:01 +0000)]
Use DELAY() rather than usb_pause_mtx() - the latter releases the lock
before waiting, which prevents the lock from really acting like
a hardware serialiser. Sigh.
Block secondary ITS instances from attaching on ARM64
Currently FreeBSD supports only single PIC controller. Some systems
that have more than one (like ThunderX dual-socket) fails to boot.
Disable other PICes until proper handling is implemented in the
generic interrupt code.
Reviewed by: imp
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3682
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
des [Wed, 16 Sep 2015 23:09:31 +0000 (23:09 +0000)]
If forwarders were specified on the command line, create an empty
resolvconf.conf so that resolvconf won't replace the manually configured
forwarders with dynamically configured ones the next time the lease is
renewed.
In tcp_ctlinput() separate the (ip == NULL) block from the rest of the
function to reduce so many levels of indentation. Style the lines that
got now indentation reduced. No functional change.
META_MODE: Fix OBJROOT ending in two // when it does not yet exist.
This would lead to the 2nd build (after the first with a missing OBJROOT) to
always rebuild everything as the 'command' would have changed due to the path
changing from having // to only /.
Always clear TDB_USERWR before fetching system call arguments. The
TDB_USERWR flag may still be set after a debugger detaches from a
process via PT_DETACH. Previously the flag would never be cleared
forcing a double fetch of the system call arguments for each system
call. Note that the flag cannot be cleared at PT_DETACH time in case
one of the threads in the process is currently stopped in
syscallenter() and the debugger has modified the arguments for that
pending system call before detaching.
When a process group leader exits, all of the processes in the group are
sent SIGHUP and SIGCONT if any of the processes are stopped. Currently this
behavior is triggered for any type of process stop including ptrace() stops
and transient stops for single threading during exit() and execve().
Thus, if a debugger is attached to a process in a group when the leader
exits, the entire group can be HUPed. Instead, only send the signals if a
process in the group is stopped due to SIGSTOP.
Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:
For IPv4 we have the following chain:
ip_output() -> (ether|atm|whatever)_output() -> arpresolve()
Lookup is done in proper place (link-layer output routine) and it is possible
to provide cached lle data.
For IPv6 situation is more complex:
ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
nd6_storelladdr()
We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
* checks if lle exists, creates it if needed (similar to arpresolve())
* performes lle state transitions (similar to arpresolve())
* calls nd6_output_ifp() which pushes packets to link output routine along
with running SeND/MAC hooks regardless of lle state
(e.g. works as run-hooks placeholder).
After that, iface output routine like ether_output() calls nd6_storelladdr()
which performs lle lookup once again.
As a result, we perform lookup twice for each outgoing packet for most types
of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
interfaces (see nd6_need_cache()).
Fix this behavior by eliminating first ND lookup. To be more specific:
* make all nd6_output() consumers use nd6_output_ifp() instead
* rename nd6_output[_slow]() to nd6_resolve_[slow]()
* convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
e.g. copy L2 address to buffer instead of pushing packet towards lower
layers
* Make all nd6_storelladdr() users use nd6_resolve()
* eliminate nd6_storelladdr()
The resulting callchain is the following:
ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()
Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
-> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
nd6_resolve() which will return EWOULDBLOCK, and that result
will be converted to 0.
(And EWOULDBLOCK is actually used by IB/TOE code).
Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
if old ifp fib differes from new one, that the caller
is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
Ensure that the MAD agent's delayed taskqueue is completely stopped
before proceeding. Otherwise, nothing prevents it from running after the
MAD agent struct has been been freed, and this results in a use-after-free
when the task's ta_pending count is incremented in the callout handler.
Threads holding a read lock of a sleepable rm lock are not permitted
to sleep. The rmlock implementation enforces this by disabling
sleeping when a read lock is acquired. To simplify the implementation,
sleeping is disabled for most of the duration of rm_rlock. However,
it doesn't need to be disabled until the lock is acquired. If a
sleepable rm lock is contested, then rm_rlock may need to acquire the
backing sx lock. This tripped the overly-broad assertion. Fix by
relaxing the assertion around the call to sx_xlock().
cem [Tue, 15 Sep 2015 20:22:30 +0000 (20:22 +0000)]
kevent(2): Note DOOMED vnodes with NOTE_REVOKE
In poll mode, check for and wake VBAD vnodes. (Vnodes that are VBAD at
registration will never be woken by the RECLAIM trigger.)
Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation. (Vnodes that
were fine at registration but are vgoned while being monitored should signal
waiters.)
Simplify nd6_cache_lladdr:
* Move isRouter calculation code to separate nd6_is_router() function.
* Make nd6_cache_lladdr() return void: its return value hasn't been used
since r53541 KAME import in 1999.
Perform I2C transmission in a single burst when mode is "none" or not set
Some more automated I2C controllers cannot explicitly create
START/STOP/etc. conditions on the bus.
Instead, the correct condition is set automatically according
to the pending transfer status.
This particular behavior can cause trouble if some I2C slave
requires sending address offset within the chip followed by
the actual data or command. In that case we cannot assume that
the driver will not STOP immediately after sending
offset.
To avoid that, do not split offset transfer from data transfer
for default transmission modes and do exactly that if requested
in command line (stop-start and repeated-start modes).
This more generic approach should cover special cases like
the one described.
* Require explicitl lle unlink prior to calling llentry_delete().
This one slightly decreases time of holding afdata wlock.
* While here, make nd6_free() return void. No one has used its return value
since r186119.
Remove an unneeded typedef of ip6_t from the DTrace ip provider library.
It causes an error when ipfilter is enabled, since ipl.ko contains an
identical typedef.
Preserve the device queue status before retrying a sense request in
chdone(). Previously, the retry could clear the CAM_DEV_QFRZN bit in the
CCB status, leaving the queue frozen.
Submitted by: Jeff Miller <Jeff.Miller@isilon.com>
Reviewed by: ken
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
adrian [Tue, 15 Sep 2015 03:01:40 +0000 (03:01 +0000)]
Replace the scan event input path hack with the new rx-stats based method.
This allows for arbitrary channel info to be placed in the input call rather
than the totally gross hack of overriding ic_curchan.
Without this I'm sure ic_curchan setting was racing with the scan code
setting the channel itself..
On receipt of a redirect message, install an interface route for the
redirected destination. On removal of the corresponding Neighbor Cache
entry, remove the interface route.
This requires changes in rtredirect_fib() to cope with an AF_LINK
address for the gateway and with the absence of RTF_GATEWAY.
This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo
Phase 2 test suite.
Unrelated to the above, fix a recursion on the radix node head lock
triggered by the Tahi Redirected to Alternate Router test cases.
When I first wrote this patch in October 2012, all Section 2
(Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE,
and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported
that it passes Tahi. (Thanks!)
These other test cases also passed in 2012:
* the RTF_MODIFIED case, with IPv4 and IPv6 (using a
RTF_HOST|RTF_GATEWAY route for the destination)
* the redirected-to-self case, with IPv4 and IPv6
* a valid IPv4 redirect
All testing in 2012 was done with WITNESS and INVARIANTS.
Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015,
Mark Kelley <mark_kelley@dell.com> in 2012,
TC Telkamp <terence_telkamp@dell.com> in 2012
PR: 152791
Reviewed by: melifaro (current rev), bz (earlier rev)
Approved by: kib (mentor)
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3602
* Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures
Implement callout_drain_async(), inspired by the projects/hps_head
branch.
This function is used to drain a callout via a callback instead of
blocking the caller until the drain is complete. Refer to the
callout_drain_async() manual page for a detailed description.
Limitation: If a lock is used with the callout, the callout can only
be drained asynchronously one time unless the callout_init_mtx()
function is called again. This limitation is not present in
projects/hps_head and will require more invasive changes to the
timeout code, which was not in the scope of this patch.
To make driver programming easier the TSO limits are changed to
reflect the values used in the BUSDMA tag a network adapter driver is
using. The TCP/IP network stack will subtract space for all linklevel
and protocol level headers and ensure that the full mbuf chain passed
to the network adapter fits within the given limits.
Implementation notes:
If a network adapter driver needs to fixup the first mbuf in order to
support VLAN tag insertion, the size of the VLAN tag should be
subtracted from the TSO limit. Else not.
Network adapters which typically inline the complete header mbuf could
technically transmit one more segment. This patch does not implement a
mechanism to recover the last segment for data transmission. It is
believed when sufficiently large mbuf clusters are used, the segment
limit will not be reached and recovering the last segment will not
have any effect.
The current TSO algorithm tries to send MTU-sized packets, where the
MTU typically is 1500 bytes, which gives 1448 bytes of TCP data
payload per packet for IPv4. That means if the TSO length limitiation
is set to 65536 bytes, there will be a data payload remainder of
(65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to
recover total TSO length due to inlining mbuf header data will not
have any effect, because adding or removing the ETH/IP/TCP headers
to or from 324 bytes will not cause more or less TCP payload to be
TSO'ed.
Existing network adapter limits will be updated separately.