adrian [Thu, 15 Oct 2015 04:22:56 +0000 (04:22 +0000)]
Add support for the BCM57765 card reader.
This patch adds support for the BCM57765[2] card reader function included in
Broadcom's BCM57766 ethernet/sd3.0 controller. This controller is commonly
found in laptops and Apple hardware (MBP, iMac, etc).
The BCM57765 chipset is almost fully compatible with the SD3.0 spec, but
does not support deriving a frequency below 781KHz from its default base
clock via the standard SD3.0-configured 10-bit clock divisor.
If such a divisor is set, card identification (which requires a 400KHz
clock frequency) will time out[1].
As a work-around, I've made use of an undocumented device-specific clock
control register to switch the controller to a 63MHz clock source when
targeting clock speeds below 781KHz; the clock source is likewise switched
back to the 200MHz clock when targeting speeds greater than 781KHz.
Additionally, this patch fixes a small sdhci_pci bug; the
sdhci_pci_softc->quirks flag was not copied to the sdhci_slot, resulting in
`quirk` behavior not being applied by sdhci.c.
[1] A number of Linux/FreeBSD users have noted that bringing up the chipsets'
associated ethernet interface will allow SD cards to enumerate (slowly).
This is a controller implementation side-effect triggered by the ethernet
driver's reading of the hardware statistics registers.
[2] This may also fix card detection when using the BCM57785 chipset, but I
don't have access to the BCM57785 chipset and can't verify.
I actually snagged some BCM57785 hardware recently (2012 Retina MacBook Pro)
and can confirm that this also fixes card enumeration with the BCM57785
chipset; with the patch, I can boot off of the internal sdcard reader.
bz [Thu, 15 Oct 2015 01:51:10 +0000 (01:51 +0000)]
2nd try, after r289319:
HWPMC depends on pmu.c even if device pmu is not specified.
Would be great if we could just automatically enabled "device pmu"
if we try to compile in HWPMC.
Also several old kernel cnfigurations seem to have HWPMC enabled but are
pre-FDT and thus fail. So make pmu.c depend on fdt in case of hwpmc as
well.
cem [Wed, 14 Oct 2015 23:48:16 +0000 (23:48 +0000)]
NTB: MFV 1db97f25: Pull out platform detection logic
Pull out read of PPD and platform detection logic to new functions,
ntb_detect_xeon(), ntb_detect_soc(). No functional change -- mostly
this is just shuffling the code to more closely match the Linux driver.
Linux commit log:
To simplify some of the platform detection code. Move the platform
detection to a function to be called earlier.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
cem [Wed, 14 Oct 2015 23:48:03 +0000 (23:48 +0000)]
NTB: Abstract doorbell register access
The doorbell registers (and associated mask) are 16-bit on Xeon but
64-bit on SoC. Abstract IO access to doorbell registers with
'db_ioread' and 'db_iowrite' (names and idea borrowed from the dual
BSD/GPL Linux driver).
cem [Wed, 14 Oct 2015 23:47:52 +0000 (23:47 +0000)]
if_ntb: MFV 3cc5ba19: Add alignment check to meet hardware requirement
Original Linux commit log:
The NTB translate register must have the value to be BAR size aligned.
This alignment check make sure that the DMA memory allocated has the
proper alignment. Another requirement for NTB to function properly with
memory window BAR size greater or equal to 4M is to use the CMA feature
in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and
CONFIG_CMA_SIZE_MBYTES set.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
cem [Wed, 14 Oct 2015 23:47:35 +0000 (23:47 +0000)]
NTB: MFV a1413cfb: correct the spread of queues over mw's
The detection of an uneven number of queues on the given memory windows
was not correct. The mw_num is zero based and the mod should be
division to spread them evenly over the mw's.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
cem [Wed, 14 Oct 2015 23:47:23 +0000 (23:47 +0000)]
NTB: Remap MSI-X messages over available slots
Remap MSI-X messages over available slots rather than falling back to
legacy INTx when fewer MSI-X slots are available than were requested.
N.B. the Linux driver does *not* do this.
To aid in testing, a tunable 'hw.ntb.force_remap_mode' has been added.
It defaults to off (0). When the tunable is enabled and sufficient
slots were available, the driver restricts the number of slots by one
and remaps the MSI-X messages over the remaining slots.
In case this is actually not okay (as I don't yet have access to this
hardware to test), a tunable 'hw.ntb.prefer_intx_to_remap' has been
added. It defaults to off (0). When the tunable is enabled and fewer
slots are available than requested, fall back to legacy INTx mode rather
than attempting to remap MSI-X messages.
cem [Wed, 14 Oct 2015 23:46:15 +0000 (23:46 +0000)]
NTB: MFV 53a788a7: Split ntb_setup_interrupts() into SOC, Xeon, and legacy routines
The names don't line up 100% with Linux. Our routines are named
ntb_setup_interrupts, ntb_setup_xeon_msix, ntb_setup_soc_msix, and
ntb_setup_legacy_interrupt. Linux SNB = FreeBSD Xeon; Linux BWD =
FreeBSD SOC. Original Linux commit log:
This is an cleanup effort to make ntb_setup_msix() more readable - use
ntb_setup_bwd_msix() to init MSI-Xs on BWD hardware and
ntb_setup_snb_msix() - on SNB hardware.
Function ntb_setup_snb_msix() also initializes MSI-Xs the way it should
has been done - looping pci_enable_msix() until success or failure.
Authored by: Alexander Gordeev
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
gjb [Wed, 14 Oct 2015 22:33:11 +0000 (22:33 +0000)]
Deprecate MD5 checksum generation in favor of SHA512.
This was discussed during the 10.2-RELEASE cycle, however
since we were nearing the end of the cycle, we decided to
defer this change until after 10.2-RELEASE.
Reminded by: so (delphij), jmg
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
bdrewery [Wed, 14 Oct 2015 20:30:32 +0000 (20:30 +0000)]
Recurse on 'buildconfig' and 'installconfig'. Remove the 'config' pseudo target.
The 'config' target isn't really needed right now so just remove it to avoid
any clashes with config(8) building. It's also likely misspelled and should
be 'configs' if we decide to add it back. This was just a convenience
target recently added.
ngie [Wed, 14 Oct 2015 20:22:12 +0000 (20:22 +0000)]
Fix test-fenv:test_dfl_env when run on some amd64 CPUs
Compare the fields that the AMD [1] and Intel [2] specs say will be
set once fnstenv returns.
Not all amd64 capable processors zero out the env.__x87.__other field
(example: AMD Opteron 6308). The AMD64/x64 specs aren't explicit on what the
env.__x87.__other field will contain after fnstenv is executed, so the values
in env.__x87.__other could be filled with arbitrary data depending on how the
CPU-specific implementation of fnstenv.
bz [Wed, 14 Oct 2015 18:53:34 +0000 (18:53 +0000)]
Revert r289319 as it seems some ARM kernels include HWPMC but no FDT.
To me that seems broken as certain interrupts will never be handled
properly. I'll re-open D3877 and we can seek a better solution and
try again. For now go back to that state and avoid compile time errors.
bz [Wed, 14 Oct 2015 18:30:04 +0000 (18:30 +0000)]
Properly define functions withut argument and wrap for { for style purposes
as followed in the rest of the file. This will hopefully make gcc more happy.
kib [Wed, 14 Oct 2015 18:27:35 +0000 (18:27 +0000)]
Allow PT_INTERP and PT_NOTES segments to be located anywhere in the
executable image. Keep one page (arbitrary) limit on the max allowed
size of the PT_NOTES.
The ELF image activators still require that program headers of the
executable are fully contained in the first page of the image file.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D3871
bz [Wed, 14 Oct 2015 17:20:19 +0000 (17:20 +0000)]
Now that we can detect the Cortex-A8 properly, fix the event list
according to the Cortex-A8 TRM r3p2 section 3.2.49.
The A8 list differs from the "ARM-v7 common" list, given the A8
was an earlier model.
There is still more work to be done for other Cortex-Ax version as
andrew points out, but I am just trying to fix A8 for now for teaching.
bz [Wed, 14 Oct 2015 17:07:24 +0000 (17:07 +0000)]
HWPMC depends on pmu.c even if device pmu is not specified.
Would be great if we could just automatically enabled "device pmu"
if we try to compile in HWPMC.
kp [Wed, 14 Oct 2015 16:21:41 +0000 (16:21 +0000)]
pf: Fix TSO issues
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.
The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.
To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.
vangyzen [Wed, 14 Oct 2015 14:26:44 +0000 (14:26 +0000)]
resolver: automatically reload /etc/resolv.conf
On each resolver query, use stat(2) to see if the modification time
of /etc/resolv.conf has changed. If so, reload the file and reinitialize
the resolver library. However, only call stat(2) if at least two seconds
have passed since the last call to stat(2), since calling it on every
query could kill performance.
This new behavior is enabled by default. Add a "reload-period" option
to disable it or change the period of the test.
Document this behavior and option in resolv.conf(5).
mav [Wed, 14 Oct 2015 10:38:05 +0000 (10:38 +0000)]
MFV r289308: 6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Justin T. Gibbs <gibbs@FreeBSD.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Wed, 14 Oct 2015 07:50:08 +0000 (07:50 +0000)]
MFV r289298: 6286 ZFS internal error when set large block on bootfs
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Wed, 14 Oct 2015 07:45:44 +0000 (07:45 +0000)]
MFV r289296: 6288 dmu_buf_will_dirty could be faster
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
hiren [Wed, 14 Oct 2015 06:57:28 +0000 (06:57 +0000)]
Fix an unnecessarily aggressive behavior where mtu clamping begins on first
retransmission timeout (rto) when blackhole detection is enabled. Make
sure it only happens when the second attempt to send the same segment also fails
with rto.
Also make sure that each mtu probing stage (usually 1448 -> 1188 -> 524) follows
the same pattern and gets 2 chances (rto) before further clamping down.
Note: RFC4821 doesn't specify implementation details on how this situation
should be handled.
bdrewery [Wed, 14 Oct 2015 05:50:16 +0000 (05:50 +0000)]
Fix support for building a PROG_CXX, and PROG, directly.
For example in lib/atf/libatf-c++/tests/detail it is now possible to
run 'make application_test'. This was intended to worked for PROGS,
but lacked support for PROGS_CXX.
Also fix redefining the main PROG target to recurse. This isn't needed
since the main process is setting PROG/PROG_CXX to handle it directly
via bsd.prog.mk.
bdrewery [Wed, 14 Oct 2015 04:42:05 +0000 (04:42 +0000)]
Follow-up r288218 by ensuring common objects are built before recursing.
Some example where this is a problem:
lib/atf/libatf-c++/tests/Makefile:SRCS.${_T}= ${_T}.cpp test_helpers.cpp
lib/atf/libatf-c++/tests/detail/Makefile:SRCS.${_T}= ${_T}.cpp test_helpers.cpp
lib/atf/libatf-c/tests/Makefile:SRCS.${_T}= ${_T}.c test_helpers.c
lib/atf/libatf-c/tests/detail/Makefile:SRCS.${_T}= ${_T}.c test_helpers.c
lib/libpam/libpam/tests/Makefile:SRCS.${test} = ${test}.c ${COMMONSRC}
A similar change may be needed for FILES, SCRIPTS, or INCS, but for now stay
with just SRCS.
bdrewery [Wed, 14 Oct 2015 02:37:30 +0000 (02:37 +0000)]
Replace the out-of-place includes/files/config handling in bsd.subdir.mk with
more typical ALL_SUBDIR_TARGETS entries and target hooks in bsd.incs.mk,
bsd.files.mk and bsd.confs.mk.
This allows the targets to be NOPs if unneeded and still work with the
shortcut 'make includes' to build and then install in a parallel-safe manner.
Sort and re-indent the ALL_SUBDIR_TARGETS with the new entries.
Enable Snoop from Primary to Secondary side on BAR23 and BAR45 on all
TLPs. Previously, Snoop was only enabled from Secondary to Primary
side. This can have a performance improvement on some workloads.
Also, make the code more obvious about how the link is being enabled.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
jeff [Wed, 14 Oct 2015 02:10:07 +0000 (02:10 +0000)]
Parallelize the buffer cache and rewrite getnewbuf(). This results in a
8x performance improvement in a micro benchmark on a 4 socket machine.
- Get buffer headers from a per-cpu uma cache that sits in from of the
free queue.
- Use a per-cpu quantum cache in vmem to eliminate contention for kva.
- Use multiple clean queues according to buffer cache size to eliminate
clean queue lock contention.
- Introduce a bufspace daemon that attempts to prevent getnewbuf() callers
from blocking or doing direct recycling.
- Close some bufspace allocation races that could lead to endless
recycling.
- Further the transition to a more modern style of small functions grouped
by prefix in order to improve growing complexity.
hiren [Wed, 14 Oct 2015 00:35:37 +0000 (00:35 +0000)]
There are times when it would be really nice to have a record of the last few
packets and/or state transitions from each TCP socket. That would help with
narrowing down certain problems we see in the field that are hard to reproduce
without understanding the history of how we got into a certain state. This
change provides just that.
It saves copies of the last N packets in a list in the tcpcb. When the tcpcb is
destroyed, the list is freed. I thought this was likely to be more
performance-friendly than saving copies of the tcpcb. Plus, with the packets,
you should be able to reverse-engineer what happened to the tcpcb.
To enable the feature, you will need to compile a kernel with the TCPPCAP
option. Even then, the feature defaults to being deactivated. You can activate
it by setting a positive value for the number of captured packets. You can do
that on either a global basis or on a per-socket basis (via a setsockopt call).
There is no way to get the packets out of the kernel other than using kmem or
getting a coredump. I thought that would help some of the legal/privacy concerns
regarding such a feature. However, it should be possible to add a future effort
to export them in PCAP format.
I tested this at low scale, and found that there were no mbuf leaks and the peak
mbuf usage appeared to be unchanged with and without the feature.
The main performance concern I can envision is the number of mbufs that would be
used on systems with a large number of sockets. If you save five packets per
direction per socket and have 3,000 sockets, that will consume at least 30,000
mbufs just to keep these packets. I tried to reduce the concerns associated with
this by limiting the number of clusters (not mbufs) that could be used for this
feature. Again, in my testing, that appears to work correctly.
Differential Revision: D3100
Submitted by: Jonathan Looney <jlooney at juniper dot net>
Reviewed by: gnn, hiren
cem [Tue, 13 Oct 2015 23:42:13 +0000 (23:42 +0000)]
NTB: MFV fca4d518: Fix ntb_transport link down race
A WARN_ON is being hit in ntb_qp_link_work due to the NTB transport link
being down while the ntb qp link is still active. This is caused by the
transport link being brought down prior to the qp link worker thread
being terminated. To correct this, shutdown the qp's prior to bringing
the transport link down. Also, only call the qp worker thread if it is
in interrupt context, otherwise call the function directly.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The Xeon NTB-RP setup, the transparent side does not get a link up/down
interrupt. Since the presence of a NTB device on the transparent side
means that we have a NTB link up, we can work around the lack of an
interrupt by simply calling the link up function to notify the upper
layers.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Modifications to the 14th bit of the B2BDOORBELL register will not be
mirrored to the remote system due to a hardware issue. To get around
the issue, shrink the number of available doorbell bits by 1. The max
number of doorbells was being used as a way to referencing the Link
Doorbell bit. Since this would no longer work, the driver must now
explicitly reference that bit.
This does not affect the xeon_errata_workaround case, as it is not using
the b2bdoorbell register.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
delphij [Tue, 13 Oct 2015 22:55:17 +0000 (22:55 +0000)]
Use chroot(2) instead of using prefixes for files.
Previously, the code prefixes the chroot path to actual file paths to
simulate the effect. This, however, will not work for tzset(3) which
expects the current system have a working set of timezone data files,
and that is not always the case.
This changeset simplifies the handling of paths and use an actual
chroot(2) call to implement the effect.
This commit does not actually add NTB-RP support. Mostly it serves to
shuffle code around to match the Linux driver. Original Linux commit
log follows:
Add support for Non-Transparent Bridge connected to a PCI-E Root Port on
the remote system (also known as NTB-RP mode). This allows for a NTB
enabled system to be connected to a non-NTB enabled system/slot.
Modifications to the registers and BARs/MWs on the Secondary side by the
remote system are reflected into registers on the Primary side for the
local system. Similarly, modifications of registers and BARs/MWs on
Primary side by the local system are reflected into registers on the
Secondary side for the Remote System. This allows communication between
the 2 sides via these registers and BARs/MWs.
Note: there is not a fix for the Xeon Errata (that was already worked
around in NTB-B2B mode) for NTB-RP mode. Due to this limitation, NTB-RP
will not work on the Secondary side with the Xeon Errata workaround
enabled. To get around this, disable the workaround via the
xeon_errata_workaround=0 modparm. However, this can cause the hang
described in the errata.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
Many variable names in the NTB driver refer to the primary or secondary
side. However, these variables will be used to access the reverse case
when in NTB-RP mode. Make these names more generic in anticipation of
NTB-RP support.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
bdrewery [Tue, 13 Oct 2015 19:42:57 +0000 (19:42 +0000)]
bsd.subdir.mk: Move all of the targets into ALL_SUBDIR_TARGETS.
Also improve documentation.
The SUBDIR_TARGETS variable should really be named LOCAL_SUBDIR_TARGETS, but
renaming it may be a surprise for downstream vendors who use this variable.
bdrewery [Tue, 13 Oct 2015 19:11:22 +0000 (19:11 +0000)]
bsd.subdir.mk: Handle cleanobj.
Before this, the target was unknown. Now it will recurse on subdirs and run
the target in the current directory. It is required to recurse as there
may be subdirs that have objs in their directory or in the object directory,
so it is not enough to just delete the objdir of the subdir parent.
bdrewery [Tue, 13 Oct 2015 18:52:56 +0000 (18:52 +0000)]
Partially revert r288266: Remove SUBDIR_PARALLEL from kerberos5/lib.
I intended to remove this before committing r288266. It works but is clearly
wrong and working by accident due to the dependencies listed in the root
Makefile.inc1 file.
bdrewery [Tue, 13 Oct 2015 18:23:51 +0000 (18:23 +0000)]
Simplify syscall generation and ABI source file handling for the build.
This is to make the Makefile more easily extendable for new ABIs.
This also makes several other subtle changes:
- The build now is given a list of ABIs to use based on the MACHINE_ARCH or
MACHINE_CPUARCH. These ABIs have a related path in sys/ that is used
to generate their syscalls. For each ABI to build check for a
ABI.c, MACHINE_ARCH-ABI.c, or a MACHINE_CPUARCH-ABI.c. This matches
the old behavior needed for archs such as powerpc* and mips*.
- The ABI source file selection allows for simpler assignment of common
ABIs such as "fbsd32" from sys/compat/freebsd32, or cloudabi64.
- Expand 'fbsd' to 'freebsd' everywhere for consistency.
- Split out the powerpc-fbsd.c file into a powerpc64-freebsd32.c to be more
like the amd64-freebsd32.c file and to more easily allow the auto-generation
of ABI handling to work.
- Rename 'syscalls.h' to 'fbsd_syscalls.h' to lessen the ambiguity and
avoid confusion with syscall.h (such as in r288997).
- For non-native syscall header files, they are now renamed to be
ABI_syscalls.h, where ABI is what ABI the Makefile is building.
- Remove all of the makesyscalls config files. The "native" one being
name i386.conf was a long outstanding bug. They were all the same
except for the data they generated, so now it is just auto-generated
as a build artifact.
- The syscalls array is now fixed to be static in the syscalls header to
remove the compiler warning about non-extern. This was worked around
in the aarch64-fbsd.c file but not the others.
- All syscall table names are now just 'syscallnames' since they don't
need to be different as they are all static in their own ABI files. The
alternative is to name them ABI_syscallnames which does not seem
necessary.
The BWD NTB device will drop the link if an error is encountered on the
point-to-point PCI bridge. The link will stay down until all errors are
cleared and the link is re-established. On link down, check to see if
the error is detected, if so do the necessary housekeeping to try and
recover from the error and reestablish the link.
There is a potential race between the 2 NTB devices recovering at the
same time. If the times are synchronized, the link will not recover and
the driver will be stuck in this loop forever. Add a random interval to
the recovery time to prevent this race.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
sbruno [Tue, 13 Oct 2015 17:00:14 +0000 (17:00 +0000)]
makefs(8) leaves sblock.fs_providersize uninitialized (zero) that can be easily
checked with dumpfs(8). This may lead to other problems, f.e. geom_label kernel
module sanity checks do not like zero fs_old_size value and skips such UFS1
file system while tasting (fs_old_size derives from sblock.fs_providersize).
dim [Tue, 13 Oct 2015 16:24:22 +0000 (16:24 +0000)]
Pull in r250085 from upstream llvm trunk (by Andrea Di Biagio):
[x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.
This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.
In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.
Example (-mattr=+avx):
%0 = trunc <8 x i32> %a to <8 x i23>
%1 = icmp eq <8 x i23> %0, zeroinitializer
The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
v8i16 = setcc t32, t25, seteq:ch
where t32 and t25 are now values of type v8i32.
Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25
However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
%vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3
Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.
This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.
This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.
This should fix the "Cannot emit physreg copy instruction" errors when
compiling contrib/wpa/src/common/ieee802_11_common.c, and CPUTYPE is set
to a CPU supporting AVX (e.g. sandybridge, ivybridge).
There is a Xeon hardware errata related to writes to SDOORBELL or B2BDOORBELL
in conjunction with inbound access to NTB MMIO Space, which may hang the
system. To workaround this issue, use one of the memory windows to access the
interrupt and scratch pad registers on the remote system. This bypasses the
issue, but removes one of the memory windows from use by the transport. This
reduction of MWs necessitates adding some logic to determine the number of
available MWs.
Since some NTB usage methodologies may have unidirectional traffic, the ability
to disable the workaround via modparm has been added.
See BF113 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-c5500-c3500-spec-update.pdf
See BT119 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
Due to ambiguous documentation, the USD/DSD identification is backward
when compared to the setting in BIOS. Correct the bits to match the
BIOS setting.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
cem [Tue, 13 Oct 2015 03:10:04 +0000 (03:10 +0000)]
NTB: MFV 87034511: Correct Number of Scratch Pad Registers
The NTB Xeon hardware has 16 scratch pad registers and 16 back-to-back
scratch pad registers. Correct the #define to represent this and update
the variable names to reflect their usage.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division