The asm in arch_ffs() is safe but inefficient. CMOV would be an improvement over a conditional branch, but for 64bit CPUs both Intel and AMD have provided enough details about the behaviour for a zero input. It is safe to pre-load the destination register with -1 and drop the conditional logic.
However, it is common to find ffs() in a context where the optimiser knows that x in nonzero even if it the value isn't known precisely, and in that case it's safe to drop the preload of -1 too. There are only a handful of uses of ffs() in the x86 build, and all of them improve as a result of this: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31) Function old new delta mask_write 114 107 -7 xmem_pool_alloc 1063 1039 -24 Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> --- CC: Jan Beulich <jbeul...@suse.com> CC: Roger Pau Monné <roger....@citrix.com> CC: Wei Liu <w...@xen.org> CC: Stefano Stabellini <sstabell...@kernel.org> CC: Julien Grall <jul...@xen.org> CC: Volodymyr Babchuk <volodymyr_babc...@epam.com> CC: Bertrand Marquis <bertrand.marq...@arm.com> CC: Michal Orzel <michal.or...@amd.com> CC: Oleksii Kurochko <oleksii.kuroc...@gmail.com> CC: Shawn Anastasio <sanasta...@raptorengineering.com> CC: consult...@bugseng.com <consult...@bugseng.com> CC: Simone Ballarin <simone.balla...@bugseng.com> CC: Federico Serafini <federico.seraf...@bugseng.com> CC: Nicola Vetrini <nicola.vetr...@bugseng.com> v2: * New. * Use __builtin_constant_p(x > 0) to optimise better. --- xen/arch/x86/include/asm/bitops.h | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h index 122767fc0d10..1d7aea6065ef 100644 --- a/xen/arch/x86/include/asm/bitops.h +++ b/xen/arch/x86/include/asm/bitops.h @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) static always_inline unsigned int arch_ffs(unsigned int x) { - int r; + unsigned int r; + + if ( __builtin_constant_p(x > 0) && x > 0 ) + { + /* Safe, when the compiler knows that x is nonzero. */ + asm ( "bsf %[val], %[res]" + : [res] "=r" (r) + : [val] "rm" (x) ); + } + else + { + /* + * The AMD manual states that BSF won't modify the destination + * register if x=0. The Intel manual states that the result is + * undefined, but the architects have said that the register is + * written back with it's old value (zero extended as normal). + */ + asm ( "bsf %[val], %[res]" + : [res] "=r" (r) + : [val] "rm" (x), "[res]" (-1) ); + } - asm ( "bsf %1,%0\n\t" - "jnz 1f\n\t" - "mov $-1,%0\n" - "1:" : "=r" (r) : "rm" (x)); return r + 1; } #define arch_ffs arch_ffs -- 2.30.2