[PATCH v5] lib: optimize cpumask_next_and()

2017-11-30 Thread Clement Courbet
> So I think it really worth to be separated patch. Really, it's > completely nontrivial why adding new function in lib/find_bit.c > requires including asm-generic/bitops/find.h in arm and uncore32 > asm/bitops.h headers (bug?). And why doing that makes you guard > find_first_bit and find_first_zer

Re: [PATCH v5] lib: optimize cpumask_next_and()

2017-11-29 Thread Yury Norov
On Wed, Nov 29, 2017 at 10:35:55AM +0100, Clement Courbet wrote: > > > Note that on Arm (), the new c implementation still outperforms the > > > old one that uses c+ the asm implementation of `find_next_bit` [3]. > > What is 'c+'? Is it typo? > > I meant "a mix of C and asm" ~(C + asm). Rephrased.

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-29 Thread Clement Courbet
> > Note that on Arm (), the new c implementation still outperforms the > > old one that uses c+ the asm implementation of `find_next_bit` [3]. > What is 'c+'? Is it typo? I meant "a mix of C and asm" ~(C + asm). Rephrased. > If you find generic find_bit() on arm faster that asm one, we'd > defin

Re: [PATCH v5] lib: optimize cpumask_next_and()

2017-11-28 Thread Yury Norov
NACK. I'm sorry, but it seems you have to send v6. See comments inline. On Tue, Nov 28, 2017 at 02:13:34PM +0100, Clement Courbet wrote: > We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and(). > It's essentially a joined iteration in search for a non-zero bit, which > is curr

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-28 Thread Clement Courbet
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and(). It's essentially a joined iteration in search for a non-zero bit, which is currently implemented as a lookup join (find a nonzero bit on the lhs, lookup the rhs to see if it's set there). Implement a direct join (find a nonz