From: Andrew Cooper Date: Thu, 14 Mar 2024 23:31:11 +0000 (+0000) Subject: x86/bitops: Improve arch_ffs() in the general case X-Git-Tag: 4.19.0-rc1~96 X-Git-Url: http://xenbits.xensource.com/gitweb?a=commitdiff_plain;h=989e5f08d33e2642d25fd60d7b373d3d8ff39990;p=xen.git x86/bitops: Improve arch_ffs() in the general case The asm in arch_ffs() is safe but inefficient. CMOV would be an improvement over a conditional branch, but for 64bit CPUs both Intel and AMD have provided enough details about the behaviour for a zero input. It is safe to pre-load the destination register with -1 and drop the conditional logic. However, it is common to find ffs() in a context where the optimiser knows that x is non-zero even if it the value isn't known precisely. In this case, it's safe to drop the preload of -1 too. There are only a handful of uses of ffs() in the x86 build, and all of them improve as a result of this: add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-92 (-92) Function old new delta mask_write 121 113 -8 xmem_pool_alloc 1076 1056 -20 test_bitops 390 358 -32 pt_update_contig_markers 1236 1204 -32 Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich Release-acked-by: Oleksii Kurochko --- diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h index 13e3730138..a8e70d7ed7 100644 --- a/xen/arch/x86/include/asm/bitops.h +++ b/xen/arch/x86/include/asm/bitops.h @@ -420,12 +420,39 @@ static inline int ffsl(unsigned long x) static always_inline unsigned int arch_ffs(unsigned int x) { - int r; + unsigned int r; + + if ( __builtin_constant_p(x > 0) && x > 0 ) + { + /* + * A common code pattern is: + * + * while ( bits ) + * { + * bit = ffs(bits); + * ... + * + * and the optimiser really can work with the knowledge of x being + * non-zero without knowing it's exact value, in which case we don't + * need to compensate for BSF's corner cases. Otherwise... + */ + asm ( "bsf %[val], %[res]" + : [res] "=r" (r) + : [val] "rm" (x) ); + } + else + { + /* + * ... the AMD manual states that BSF won't modify the destination + * register if x=0. The Intel manual states that the result is + * undefined, but the architects have said that the register is + * written back with it's old value (zero extended as normal). + */ + asm ( "bsf %[val], %[res]" + : [res] "=r" (r) + : [val] "rm" (x), "[res]" (-1) ); + } - asm ( "bsf %1,%0\n\t" - "jnz 1f\n\t" - "mov $-1,%0\n" - "1:" : "=r" (r) : "rm" (x)); return r + 1; } #define arch_ffs arch_ffs