melifaro [Tue, 11 Nov 2014 02:52:40 +0000 (02:52 +0000)]
Kill custom in_matroute() radix mathing function removing one rte mutex lock.
Initially in_matrote() in_clsroute() in their current state was introduced by
r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them
in route table, setting RTPRF_OURS flag and some expire time. After that, either
GC came or RTPRF_OURS got removed on first-packet. It was a good solution
in that days (and probably another decade after that) to keep TCP metrics.
However, after moving metrics to TCP hostcache in r122922, most of in_rmx
functionality became unused. It might had been used for flushing icmp-originated
routes before rte mutexes/refcounting, but I'm not sure about that.
So it looks like this is nearly impossible to make GC do its work nowadays:
in_rtkill() ignores non-RTPRF_OURS routes.
route can only become RTPRF_OURS after dropping last reference via rtfree()
which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes.
Dynamic routes can still be installed via received redirect, but they
have default lifetime (no specific rt_expire) and no one has another trie walker
to call RTFREE() on them.
So, the changelist:
* remove custom rnh_match / rnh_close matching function.
* remove all GC functions
* partially revert r256695 (proto3 is no more used inside kernel,
it is not possible to use rt_expire from user point of view, proto3 support
is not complete)
* Finish r241884 (similar to this commit) and remove remaining IPv6 parts
luigi [Tue, 11 Nov 2014 00:13:28 +0000 (00:13 +0000)]
in the Linux section, properly define the NMG_LOCK type.
Also import WITH_GENERIC in preparation to adding fine-grained
options to disable specific netmap components.
emaste [Mon, 10 Nov 2014 18:20:46 +0000 (18:20 +0000)]
Add /usr/lib/debug directory to hier(7)
The canonical standalone debug directory established by the GNU
toolchain is /usr/lib/debug, and we use it when WITH_DEBUG_FILES is set.
Mention it in the file system hierarchy page.
Reviewed by: bcr
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1134
ae [Mon, 10 Nov 2014 16:12:51 +0000 (16:12 +0000)]
Add sa6_checkzone_ifp() function. It checks correctness of struct
sockaddr_in6, usually obtained from the user level through ioctl.
It initializes sin6_scope_id using given interface.
des [Mon, 10 Nov 2014 09:55:35 +0000 (09:55 +0000)]
I just realized that the previous commit message makes no sense: the
first sentence should have read "Constify the AES and SHA-256 code and
wrappers". This allows us to feed zero_region (which is const) to the
hash function during reseeding and thereby implement the FS&K version of
SHAd-256 instead of the older F&S version.
delphij [Mon, 10 Nov 2014 08:20:21 +0000 (08:20 +0000)]
MFV r274273:
ZFS large block support.
Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool). This *may* remain unchanged because of memory constraint.
Limited safety belt is provided for mounted root filesystem but use
caution is advised.
mav [Sun, 9 Nov 2014 22:43:29 +0000 (22:43 +0000)]
Handle PREEMPT AND ABORT service action equal to PREEMPT.
With command serialization used in CTL, there are no other commands to abort
when PREEMPT AND ABORT gets to run, so it is practically equal to PREEMPT.
melifaro [Sun, 9 Nov 2014 21:33:01 +0000 (21:33 +0000)]
Renove faith(4) and faithd(8) from base. It looks like industry
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.
bryanv [Sun, 9 Nov 2014 20:04:12 +0000 (20:04 +0000)]
Enable LRO by default when available on vtnet interfaces
The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.
Initialize tqent_flags in the userland taskq implementation. Without
this the assertion of tq->tq_freelist != NULL may fail in taskq_destroy.
The problem is that tqent_flags is never initialized in the userland
implementation while the kernel one does initialize it. Without proper
initialization, the flag may have its lowest bit set, making it treated
as TQENT_FLAG_PREALLOC and never removing taskq_ent_t from tq_freelist.
Remove ip6_getdstifaddr() and all functions to work with auxiliary data.
It isn't safe to keep unreferenced ifaddrs. Use in6ifa_ifwithaddr() to
determine ifaddr corresponding to destination address. Since currently
we keep addresses with embedded scope zone, in6ifa_ifwithaddr is called
with zero zoneid and marked with XXX.
Also remove route and lle lookups from ip6_input. Use in6ifa_ifwithaddr()
instead.
kib [Sat, 8 Nov 2014 11:56:26 +0000 (11:56 +0000)]
MFi386 r253328:
Create a proper stack frame for amd64 version of bcopy(). Note that
this also makes the stack properly aligned in the function, despite it
is not strictly needed.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
delphij [Sat, 8 Nov 2014 07:30:40 +0000 (07:30 +0000)]
MFV r274271:
Improve zdb -b performance:
- Reduce gethrtime() call to 1/100th of blkptr's;
- Skip manipulating the size-ordered tree;
- Issue more (10, previously 3) async reads;
- Use lighter weight testing in traverse_visitbp();
ngie [Sat, 8 Nov 2014 03:20:56 +0000 (03:20 +0000)]
Use PROGS instead of PROG and remove unnecessary SRCS?= assignment
Using PROG instead of PROGS will in cases of high -j with -DNO_ROOT cause
the PROG to show up more than once as it's handling the SCRIPTS install case
in a recursive manner, separate from the non-recursive case
After the recent batch of commits to bsd.progs.mk to fix behavior with how
variables are defaulted to, explicitly setting SRCS for a PROG is no longer
required
jilles [Fri, 7 Nov 2014 21:30:16 +0000 (21:30 +0000)]
sh(1): Mention portability issue with shifting zero positional parameters.
Per Austin Group issue #459, shifting zero positional parameters may or may
not be considered an operand error (which causes the shell to exit in most
cases).
kib [Fri, 7 Nov 2014 20:23:43 +0000 (20:23 +0000)]
Fix random.ko module.
- Remove duplicated sources between standard part of the kernel and
module. In particular, it caused duplicated lock initialization and
sysctl registration, both having bad consequences.
- Add missed source files to module.
- Static part of the kernel provides randomdev module, not
random_adaptors. Correct dependencies.
- Use cdev modules declaration macros.
kib [Fri, 7 Nov 2014 20:10:09 +0000 (20:10 +0000)]
Simplify assembler in ivy.c. Move the copying of the random bits into
buffer from asm to C, which reduces amount of arguments for inline asm
and simplifies constraints. Use unsigned types consistently.
zbb [Fri, 7 Nov 2014 19:34:10 +0000 (19:34 +0000)]
Avoid panic in ofwbus caused by not released resource list entry
After resource allocation and release, resource list entry
stays non-NULL. This causes panic in ofwbus_alloc_resource()
on subsequent resource allocation.
Clean appropriate list entry on release to avoid this.
Obtained from: Semihalf
Reviewed by: ian
Sponsored by: The FreeBSD Foundation
Split it into two modules: if_gre(4) for GRE encapsulation and
if_me(4) for minimal encapsulation within IP.
gre(4) changes:
* convert to if_transmit;
* rework locking: protect access to softc with rmlock,
protect from concurrent ioctls with sx lock;
* correct interface accounting for outgoing datagramms (count only payload size);
* implement generic support for using IPv6 as delivery header;
* make implementation conform to the RFC 2784 and partially to RFC 2890;
* add support for GRE checksums - calculate for outgoing datagramms and check
for inconming datagramms;
* add support for sending sequence number in GRE header;
* remove support of cached routes. This fixes problem, when gre(4) doesn't
work at system startup. But this also removes support for having tunnels with
the same addresses for inner and outer header.
* deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
Use our standard ioctls for tunnels.
me(4):
* implementation conform to RFC 2004;
* use if_transmit;
* use the same locking model as gre(4);
glebius [Fri, 7 Nov 2014 15:14:10 +0000 (15:14 +0000)]
Remove struct arpcom. It is unused by most interface types, that allocate
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.
bryanv [Fri, 7 Nov 2014 03:36:28 +0000 (03:36 +0000)]
Several minor changes to hopefully complete the VirtIO console driver
- Support the KDB alt break sequence to enter the debugger,
panic, reboot, etc. [1]
- Provide emergency write feature description. Note that QEMU
does not implement this feature.
- Make the VTCON_FLAG_* defines sequential once again.
- When the multiple port feature is not negotiated, query the
rows and columns of the one console during the device attach
when the size feature is negotiated.
- Report failure to the device if hot plugging a port fails.
- Acknowledge the console port event with an open event. This
is required by the spec, but QEMU doesn't seem to care.
dteske [Fri, 7 Nov 2014 00:59:40 +0000 (00:59 +0000)]
For really fast machines, an edge-case may exist where dpv(3) may be
built before contrib dependency, dialog(3). Add dialog(3) to the list
of _prebuild_libs to ensure that this does not happen.
Tested on: 11.0-CURRENT amd64 @ r274205
Thanks to: kargl, Larry Rosenman <ler@lerctr.org>, ngie, markj
Recommended by: ngie
Reviewed by: ngie, markj
MFC after: 21 days
X-MFC-to: stable/10 stable/9
X-MFC-with: 274116 274120 274121 274123 274144 274146 274192 274203
jfv [Thu, 6 Nov 2014 23:45:05 +0000 (23:45 +0000)]
Update the Intel i40e drivers, ixl version 1.2.8, ixlv version 1.1.18
-Improved VF stability, thanks to changes from Ryan Stone,
and Juniper.
- RSS fixes in the ixlv driver
- link detection in the ixlv driver
- New sysctl's added in ixl and ixlv
- reset timeout increased for ixlv
- stability fixes in detach
- correct media reporting
- Coverity warnings fixed
- Many small bug fixes
- VF Makefile modified - nvm shared code needed
- remove unused sleep channels in ixlv_sc struct
Submitted by: Eric Joyner (committed by jfv)
MFC after: 1 week
dteske [Thu, 6 Nov 2014 22:53:50 +0000 (22:53 +0000)]
SUBDIR_DEPENDS__ in lib/Makefile is not working out so well for me.
Add to using _prebuild_libs in (top-level) Makefile.inc1.
NB: Unbreak build yet again (we'll get this right eventually)
Reviewed by: markj, ngie
Thanks to: ian, markj, ngie, Nikolai Lifanov <lifanov@mail.lifanov.com>
MFC after: 21 days
X-MFC-to: stable/10 stable/9
X-MFC-with: 274116 274120 274121 274123 274144 274146 274192
markj [Thu, 6 Nov 2014 22:46:40 +0000 (22:46 +0000)]
Automatically build with debug symbols when building with WITH_CTF.
Otherwise there's nothing for ctfconvert to do, and it ends up emitting an
error for each object file. Also remove some redundant checks from
bsd.prog.mk and bsd.lib.mk.
dteske [Thu, 6 Nov 2014 19:28:01 +0000 (19:28 +0000)]
Re-enable dpv(1,3): Introduced via r274116; temporarily disabled
shortly thereafter via r274124 until I could get the right recipe
down w/respect to SUBDIR_DEPEND.
Thanks to: ngie, ian
Reviewed by: ian
MFC after: 21 days X-MFC-to: stable/10 stable/9
X-MFC-with: 274116 274120 274121 274123 274144 274146
ian [Thu, 6 Nov 2014 19:14:58 +0000 (19:14 +0000)]
Strengthen the sanity checking of busdma tag parameters.
It turns out an alignment of zero can lead to an endless loop in the
vm reservations code, so specifically disallow that. The manpage says
hardware which can do dma at any address should use a value of one, which
hints at the forbiddeness of zero without exactly saying it. Several
other conditions which could lead to insanity in working with the tag are
also checked now.
Every existing call to bus_dma_tag_create() (about 680 of them) was
eyeballed for violations of these things, and two alignment=0 glitches
were fixed. It's possible something was missed, but overall this
shouldn't lead to any arm users suddenly experiencing failures.
imp [Thu, 6 Nov 2014 17:19:41 +0000 (17:19 +0000)]
Ignore errors from rm -rf to support high -j builds. This is, at best,
a kludge. However, it also effectively works around the issues for
high -j builds on systems that do not have the rm fixes.
A better fix would be to rmdir here, and fix the places where we're
sloppy and not list all the files we create in CLEANFILES, should
anybody have the time to chase them all to ground.
imp [Thu, 6 Nov 2014 16:48:37 +0000 (16:48 +0000)]
Retire the '@' symlink. It isn't really needed and causes more
problems than it solves. SYSDIR is already defined almost always and
can be used instead. Working around the one case where it isn't is
much easier than working around the fact that @ may not exist in 18
other places.
melifaro [Thu, 6 Nov 2014 13:13:09 +0000 (13:13 +0000)]
Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.
We currrently have 3 places where MTU is altered:
1) route addition.
In this case domain overrides radix _addroute callback (in[6]_addroute)
and all necessary checks/fixes are/can be done there.
2) route change (especially, GW change).
In this case, there are no explicit per-domain calls, but one can
override rte by setting ifa_rtrequest hook to domain handler
(inet6 does this).
3) ifconfig ifaceX mtu YYYY
In this case, we have no callbacks, but ip[6]_output performes runtime
checks and decreases rt_mtu if necessary.
Generally, the goals are to be able to handle all MTU changes in
control plane, not in runtime part, and properly deal with increased
interface MTU.
This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)
* The latter is done by using new dom_ifmtu callback since
if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
for some cases, so we need an abstract way to know maximum MTU size
for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
use this functions on new non-inserted rte.