The asm in arch_ffs() is safe but inefficient.

CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input.  It is safe to pre-load the destination register with -1 and drop the
conditional logic.

However, it is common to find ffs() in a context where the optimiser knows
that x in nonzero even if it the value isn't known precisely, and in that case
it's safe to drop the preload of -1 too.

There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31)
  Function                                     old     new   delta
  mask_write                                   114     107      -7
  xmem_pool_alloc                             1063    1039     -24

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
---
CC: Jan Beulich <jbeul...@suse.com>
CC: Roger Pau Monné <roger....@citrix.com>
CC: Wei Liu <w...@xen.org>
CC: Stefano Stabellini <sstabell...@kernel.org>
CC: Julien Grall <jul...@xen.org>
CC: Volodymyr Babchuk <volodymyr_babc...@epam.com>
CC: Bertrand Marquis <bertrand.marq...@arm.com>
CC: Michal Orzel <michal.or...@amd.com>
CC: Oleksii Kurochko <oleksii.kuroc...@gmail.com>
CC: Shawn Anastasio <sanasta...@raptorengineering.com>
CC: consult...@bugseng.com <consult...@bugseng.com>
CC: Simone Ballarin <simone.balla...@bugseng.com>
CC: Federico Serafini <federico.seraf...@bugseng.com>
CC: Nicola Vetrini <nicola.vetr...@bugseng.com>

v2:
 * New.
 * Use __builtin_constant_p(x > 0) to optimise better.
---
 xen/arch/x86/include/asm/bitops.h | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 122767fc0d10..1d7aea6065ef 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
 
 static always_inline unsigned int arch_ffs(unsigned int x)
 {
-    int r;
+    unsigned int r;
+
+    if ( __builtin_constant_p(x > 0) && x > 0 )
+    {
+        /* Safe, when the compiler knows that x is nonzero. */
+        asm ( "bsf %[val], %[res]"
+              : [res] "=r" (r)
+              : [val] "rm" (x) );
+    }
+    else
+    {
+        /*
+         * The AMD manual states that BSF won't modify the destination
+         * register if x=0.  The Intel manual states that the result is
+         * undefined, but the architects have said that the register is
+         * written back with it's old value (zero extended as normal).
+         */
+        asm ( "bsf %[val], %[res]"
+              : [res] "=r" (r)
+              : [val] "rm" (x), "[res]" (-1) );
+    }
 
-    asm ( "bsf %1,%0\n\t"
-          "jnz 1f\n\t"
-          "mov $-1,%0\n"
-          "1:" : "=r" (r) : "rm" (x));
     return r + 1;
 }
 #define arch_ffs arch_ffs
-- 
2.30.2


Reply via email to