Using the glibc cortex string work work authored by Linaro as base to
create new copy to/from user kernel routine.
Iperf performance increase:
-l (size) 1 core result
Optimized 64B 44-51Mb/s
1500B 4.9Gb/s
30000B 16.2Gb/s
Original 64B 34-50.7Mb/s
1500B 4.7Gb/s
30000B 14.5Gb/s
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1400349
Note there was one change I did to move around tst to be right next to
the branch for better optimization for ThunderX.
Signed-off-by: Craig Magina <craig.magina@canonical.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
USER(12f, stp B_l, B_h, [dst, #16])
USER(12f, stp C_l, C_h, [dst, #32])
USER(12f, stp D_l, D_h, [dst, #48])
- tst count, #0x3f
add src, src, #64
add dst, dst, #64
+ tst count, #0x3f
b.ne .Ltail63
b .Lsuccess