cem [Mon, 26 Oct 2015 03:30:38 +0000 (03:30 +0000)]
ioat: Allocate memory for ring resize sanely
Add a new flag for DMA operations, DMA_NO_WAIT. It behaves much like
other NOWAIT flags -- if queueing an operation would sleep, abort and
return NULL instead.
When growing the internal descriptor ring, the memory allocation is
performed outside of all locks. A lock-protected flag is used to avoid
duplicated work. Threads that cannot sleep and attempt to queue
operations when the descriptor ring is full allocate a larger ring with
M_NOWAIT, or bail if that fails.
ioat_reserve_space() could become an external API if is important to
callers that they have room for a sequence of operations, or that those
operations succeed each other directly in the hardware ring.
This patch splits the internal head index (->head) from the hardware's
head-of-chain (DMACOUNT) register (->hw_head). In the future, for
simplicity's sake, we could drop the 'ring' array entirely and just use
a linked list (with head and tail pointers rather than indices).
zbb [Sun, 25 Oct 2015 23:22:40 +0000 (23:22 +0000)]
Add support for unspecified ranges on ThunderX system
When one tries to allocate a resource with unspecified range,
read already configured BAR values (by UEFI or whatever).
This is necessary to make VNIC VFs working and to allow them to be
properly allocated.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D3752
zbb [Sun, 25 Oct 2015 22:14:04 +0000 (22:14 +0000)]
Introduce e6000sw etherswitch support
Add e6000sw driver supporting Marvell 88E6352, 88E6172, 88E6176 switches.
It needs to be attached to mdio interface, exporting SMI access
functionality. e6000sw supports port-based VLAN configuration, per-port
media changing, accessing PHY and switch registers.
e6000sw attaches miibuses and PHY drivers as children. Instead of typical
tick as callout, kthread-based tick is used. This combined with SX locks
allows MDIO read/write calls to sleep. It is expected, because this
hardware requires long delays in SMI read/write procedures, which can not
be handled by busy-waiting.
zbb [Sun, 25 Oct 2015 22:00:56 +0000 (22:00 +0000)]
Add etherswitch support to mge
This commit introduces support for etherswitch devices that utilize SMI as
a way of accessing its registers. SMI register is located in address space
of mge -- access to it was exported through MDIO interface.
Attachment functions were enhanced so as to ensure proper initialisation
in both cases: 1) PHYs attached directly to mge, 2) PHYs attached to
switch device and switch attached to mge. Attachment of etherswitch device
depends on dts entry with compatible="mrvl,sw" property. If none is found,
typical PHY attachment procedure follows.
In case of switch attached, PHYs' status and configuration is accessible
via etherswitchcfg, and ifconfig shows always-up, non-configurable mge
interfaces.
Due to the fact that there may be simultaneous accessess to SMI
registers (e.g. from PHY attached to one of mge instances and switch
to the other), SMI access interlock was added. It is SX lock,
because sleep ability is necessary -- busy-waiting would result
in poor performance due to long delays required by hardware.
Underlying switch driver is obliged to use sleepable locks as well.
pjd [Sun, 25 Oct 2015 18:48:09 +0000 (18:48 +0000)]
The aio_waitcomplete(2) syscall should not sleep when the given timeout
is 0. Without this change it was sleeping for one tick. Maybe not a big
deal, but it makes share/dtrace/blocking script to report that.
theraven [Sun, 25 Oct 2015 14:52:16 +0000 (14:52 +0000)]
Lots of improvements to the BSD-licensed dtc
- Various fixes to includes (including recursive includes)
- Lots of testing that the output exactly matches GPL'd dtc
- Lots of bug fixes to merging
- Fix incorrect mmap usage
- Ad-hoc memory management replaced with C++11 unique_ptr and similar
Patrick Wildt has successfully run many (all?) of the GPL dtc test suite.
kp [Sun, 25 Oct 2015 13:14:53 +0000 (13:14 +0000)]
PF_ANEQ() macro will in most situations returns TRUE comparing two identical
IPv4 packets (when it should return FALSE). It happens because PF_ANEQ() doesn't
stop if first 32 bits of IPv4 packets are equal and starts to check next 3*32
bits (like for IPv6 packet). Those bits containt some garbage and in result
PF_ANEQ() wrongly returns TRUE.
Fix: Check if packet is of AF_INET type and if it is then compare only first 32
bits of data.
ngie [Sun, 25 Oct 2015 07:42:56 +0000 (07:42 +0000)]
Fix compiling with gcc [4.2.1] after r287797 when MK_HESOID == no and
MK_NIS == no by converting `i` back to an int, and instead cast the loop
comparison to `int`
The loop comparison is iterating the len(ns_dtab)-1, because
the last element is the sentinel tuple { NULL, NULL, NULL, }, so when
both HESOID and NIS are off, len(ns_dtab)-1 == 1 - 1 == 0, and the loop
is skipped because the expression is tautologically false
While here, convert `(sizeof(x) / sizeof(x[0]))` to `nitems(x)`
Tested with: clang 3.7.0, gcc 4.2.1, and gcc 4.9.4 [*] with MK_NIS={no,yes}
and by running bash -lc 'id -u && id -g && id'
* gcc 4.9.4 needs another patch in order for the compile to succeed
with -Werror with lib/libc/gen/getgrent.c
ngie [Sun, 25 Oct 2015 04:37:00 +0000 (04:37 +0000)]
Limit RESOLUTION_MAX to INT_MAX, not UINT_MAX (all spelled out) so the
mode value isn't always clipped to -1 when (resolution * size) == 32, which
would have been the case with values => {4i,32b,32t}.
ngie [Sun, 25 Oct 2015 04:04:25 +0000 (04:04 +0000)]
Use 't' (bit-field) not 'b' (bit-sized integral type) for describing MRIE (aka
"Method of Reporting Informational Exceptions") in the SCSI mode database.
T10/04-371 revision 2 (revision 4; page 2, table 1) describes it as a
bit-field of 4 bits wide.
This a recommit of head@r289913 to fix the original commit message, in
particular:
- I incorrectly claimed that unit change was 'i' -> 't'.
- The spec I reference in this commit is 2 decades newer than the one noted in
r289913. The fields in the SCSI mode database are more complete in the newer
spec, so it'll be easier for someone to decipher this commit if need be
later.
- I screwed up the bug entry in the previous commit message
Pointyhat to: ngie (for botching up r289913)
PR: 200619
Reported by: Michael Baptist
Submitted by: Lars Skodje
Sponsored by: EMC / Isilon Storage Divisionf
ngie [Sun, 25 Oct 2015 03:16:08 +0000 (03:16 +0000)]
Use 't' (bits) not 'i' (bytes) for describing MRIE (aka
"Method of Reporting Informational Exceptions") in the SCSI mode database as
the field described in X3T10/94-190 (revision 4; page 2, table 1) [1.] is
4 bits wide, not 4 bytes wide
cem [Sat, 24 Oct 2015 23:44:58 +0000 (23:44 +0000)]
ioat: Pull out timer callout delay into a constant
Pull out the timer callout delay into IOAT_INTR_TIMO and shorten it
considerably (5s -> 100ms). Single operations do not take 5-10 seconds
and when interrupts aren't working, waiting 100ms sucks a lot less than
5s.
adrian [Sat, 24 Oct 2015 22:37:59 +0000 (22:37 +0000)]
arge(4): flip this on for AR9344 SoCs.
I couldn't test arge0->arge1 bridging, only arge0 VLAN bridging.
The DIR-825C1 only hooks up arge0 to the switch GMAC0 and so
you need to abuse VLANs to test.
ngie [Sat, 24 Oct 2015 22:12:23 +0000 (22:12 +0000)]
Add more cd9660/FFS makefs testcases
General changes:
- Parameterize out the mount command.
- Use mtree to verify the contents of an image (check_image_contents) instead
of using diff (diff verifies content, but not file metadata).
- Move common logic out to functions (common_cleanup, mount_image,
check_image_contents)
- Add stub testcases for makefs -D (crashes with SIGBUS, similar to bug # 192839)
- Add a note about the ISO-9660 and rockridge specs
- Add testcases that exercise:
-- Creating disk images from an mtree and multiple directories.
-- -F flag use (not really an extensive testcase right now)
cd9660-specific test changes:
- Remove an XXX comment about symlinks; I forgot that non-rockridge images turn
symlinks into hardlinks.
- Add testcases that exercise:
-- -o allow-deep-trees
-- -o allow-max-name stub testcase (doesn't seem to be implemented in makefs)
-- -o preparer (existence in image; not conformance to spec)
-- -o publisher (existence in image; not conformance to spec)
-- -o rockridge (basic)
ngie [Sat, 24 Oct 2015 21:59:58 +0000 (21:59 +0000)]
Make vers.c creation atomic by using a temporary file, then moving
the temporary file to vers.c at the end of the script
The previous logic wrote out to vers.c multiple times, so the file
could be incorrectly interpreted as being completely written out
after one of the echo calls with recursive make, when in reality it
was only partially written.
Also, in the event the build was interrupted when creating vers.c
(small race window), it would have a leftover file that needed to
be cleaned up before resuming the build.
kib [Sat, 24 Oct 2015 21:59:22 +0000 (21:59 +0000)]
Reduce the amount of calls to VOP_BMAP() made from the local vnode
pager. It is enough to execute VOP_BMAP() once to obtain both the
disk block address for the requested page, and the before/after limits
for the contiguous run. The clipping of the vm_page_t array passed to
the vnode_pager_generic_getpages() and the disk address for the first
page in the clipped array can be deduced from the call results.
While there, remove some noise (like if (1) {...}) and adjust nearby
code.
Reviewed by: alc
Discussed with: glebius
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
kib [Sat, 24 Oct 2015 21:37:47 +0000 (21:37 +0000)]
Intel SDM before revision 56 described the CLFLUSH instruction as only
ordered with the MFENCE instruction. Similar weak guarantees are also
specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods
pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced
CLFLUSH loop with MFENCE both before and after the loop.
In the revision 56 of SDM, Intel stated that all existing
implementations of CLFLUSH are strict, CLFLUSH instructions execution
is ordered WRT other CLFLUSH and writes. Also, the strict behaviour
is made architectural.
A new instruction CLFLUSHOPT (which was documented for some time in
the Instruction Set Extensions Programming Reference) provides the
weak behaviour which was previously attributed to CLFLUSH.
Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do
not execute MFENCE before and after the flushing loop.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
ian [Sat, 24 Oct 2015 21:25:53 +0000 (21:25 +0000)]
Provide armv4/v5 implementations of several of the armv6 cache maintenance
functions. This will make it possible to use the same busdma code for all
arm platforms v4 thru v7.
ian [Sat, 24 Oct 2015 19:39:41 +0000 (19:39 +0000)]
Rename dcache_dma_preread() to dcache_inv_poc_dma() to make it clear that it
is a dcache invalidate to point of coherency just like dcache_inv_poc(), but
a slightly different version specific to dma operations. Elaborate the
comment about how and why it's different.
mav [Sat, 24 Oct 2015 17:34:40 +0000 (17:34 +0000)]
Add PIM_EXTLUNS support to isp(4) driver.
Now 24xx and above chips support full 8-byte LUN address space.
Older FC chips may support up to 16K LUNs when firmware allows.
Tested in both initiator and target modes for 23xx, 24xx and 25xx.
bdrewery [Sat, 24 Oct 2015 04:03:32 +0000 (04:03 +0000)]
Slightly rework the comments and logic for default Clang/GCC.
This is because the previous version was very obscure about the fact
that despite having Clang "on by default" for architectures such as powerpc, it
does not actually build due to the GCC it uses not having C++11 support.
Using an external compiler that supports C++11 does allow this to work.
This whole block should be rethought more given "on by default" is not
really default without extra work which could actually be surprising for
why Clang is showing up when using a newer GCC.
bdrewery [Sat, 24 Oct 2015 04:03:29 +0000 (04:03 +0000)]
Configs should not be under MK_INCLUDES control.
'buildconfig' is connected to 'all', but 'installconfig' is only called
manually. There is not much need to conditionalize this file right
now due to how it is hooked up and its impact on various build phases.
markj [Sat, 24 Oct 2015 03:14:36 +0000 (03:14 +0000)]
DWARF emitted by clang 3.7 encodes array sizes using the DW_AT_count
attribute rather than DW_AT_upper_bound. Teach ctfconvert about this so that
array type sizes are encoded correctly.
ache [Sat, 24 Oct 2015 02:23:15 +0000 (02:23 +0000)]
Since no room left in the _flags, reuse __SALC for O_APPEND.
It helps to remove _fcntl() call from _ftello() and optimize seek position
calculation in _swrite().
ian [Sat, 24 Oct 2015 02:18:14 +0000 (02:18 +0000)]
Change the preallocation of a busdma segment mapping array from per-tag to
per-map. The per-tag scheme is not safe, and a mutex can't be used to
protect it because the mapping routines can't sleep. Code brought in
from armv6 implementation.
ian [Fri, 23 Oct 2015 22:52:00 +0000 (22:52 +0000)]
Instead of all memory allocations using M_DEVBUF, use new categories
M_BUSDMA for allocations of metadata (tags, maps, segment tracking lists),
and M_BOUNCE for bounce pages.
ian [Fri, 23 Oct 2015 22:51:48 +0000 (22:51 +0000)]
Instead of all memory allocations using M_DEVBUF, use new categories
M_BUSDMA for allocations of metadata (tags, maps, segment tracking lists),
and M_BOUNCE for bounce pages.
bdrewery [Fri, 23 Oct 2015 21:30:27 +0000 (21:30 +0000)]
Rework r289778 to always parallelize known targets, without ordering.
- Rather than allow 'make clean*' to ignore dependencies, make a static
list of targets in STANDALONE_SUBDIR_TARGETS that are known to be safe.
This allows a user to override them if needed and avoids adding this feature
to user-defined targets that are in ${SUBDIR_TARGETS}. [1]
- This now also allows to force SUBDIR_PARALLEL when calling these
targets, since no dependencies are needed.
ian [Fri, 23 Oct 2015 21:29:37 +0000 (21:29 +0000)]
Catch up to r232356: change the boundary constraint type to bus_addr_t.
This code lived in the projects/armv6 branch when that change got applied
to all the other arches.
ian [Fri, 23 Oct 2015 20:49:34 +0000 (20:49 +0000)]
Whitespace and style nits, no functional changes.
The goal is to make these two files cosmetically alike so that the actual
implementation differences are visible. The only changes which aren't
spaces<->tabs and rewrapping and reindenting lines are a couple fields
shuffled around in the tag and map structs so that everything is in the same
order in both versions (which should amount to no functional change).
bdrewery [Fri, 23 Oct 2015 19:41:58 +0000 (19:41 +0000)]
Fix regression from r289734 that caused crunchgen "subdirs" to not be
properly recursed.
The .for loop was defining a ${__dir} variable that was being set at a
different evaluation time than the target itself, so every 'cd ${__dir}'
became the last value that was in ${__dir}. This resulted in 'make obj'
not properly being ran in the tree that would leave .depend files
scattered around when 'make all' was ran in rescue/.
To fix this, define a CRUNCH_SRCDIR_* for every prog if it does not
already have one and then use that variable in every relevant place.
This allows simplifying some logic as well.
asomers [Fri, 23 Oct 2015 19:28:24 +0000 (19:28 +0000)]
Fix various Coverity issues in sbin/savecore/savecore.c:
CID1009429: Fix unchecked return value from lseek while clearing dump
CID1007781: Fix file descriptor leak in DoFile
CID1007261: Don't send potentially unterminated string to syslog(3)
mav [Fri, 23 Oct 2015 18:34:18 +0000 (18:34 +0000)]
Add partial support for QUERY TMF to CAM and isp(4).
This change allows to decode respective functions in isp(4) in target mode
and pass them through CAM to CTL. Unfortunately neither CAM nor isp(4)
support returning response info for those task management functions now.
On the other side I just have no initiator to test this functionality.
vangyzen [Fri, 23 Oct 2015 15:56:17 +0000 (15:56 +0000)]
resolver: abuse _res a little less
In the past, _res was a global variable. Now, it's multiple function calls.
Several functions in the resolver use _res multiple times and therefore
call the function(s) far more than necessary.
Fix those callers to store the result of _res in a local variable.
Add __noinline to the definition of res_init() to avoid the code bloat
that these changes would have otherwise incurred. Thanks to jilles
for noticing this.
royger [Fri, 23 Oct 2015 15:46:42 +0000 (15:46 +0000)]
blkfront: add support for unmapped IO
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.
This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:
PVHVM
3165.84 real 6354.17 user 4483.32 sys
PVHVM with unmapped IO
2099.46 real 4624.52 user 2967.38 sys
This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.
The implementation of bus_dmamap_load_ma_triv currently calls
_bus_dmamap_load_phys on each page that is part of the passed in buffer.
Since each page is treated as an individual buffer, the resulting behaviour
is different from the behaviour of _bus_dmamap_load_buffer. This breaks
certain drivers, like Xen blkfront.
If an unmapped buffer of size 4096 that starts at offset 13 into the first
page is passed to the current _bus_dmamap_load_ma implementation (so the ma
array contains two pages), the result is that two segments are created, one
with a size of 4083 and the other with size 13 (because two independant
calls to _bus_dmamap_load_phys are performed, one for each physical page).
If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer
the result is that only one segment is created, with a size of 4096.
This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce
buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements
_bus_dmamap_load_ma so that it's behaviour is the same as the mapped version
(_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer
code, other arches are left untouched.