Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-31 Thread linux
> static inline int new_find_first_bit(const unsigned long *b, unsigned size) > { > int x = 0; > do { > unsigned long v = *b++; > if (v) > return __ffs(v) + x; > if (x >= size) > break; >

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-31 Thread Richard Kennedy
Hi, FWIW the following routine is consistently slightly faster using Steven's test harness , with a big win when no bit set. static inline int new_find_first_bit(const unsigned long *b, unsigned size) { int x = 0; do { unsigned long v = *b++; if (v)

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Fri, 29 Jul 2005, Linus Torvalds wrote: > It has another downside too: it's extra complexity and potential for bugs > in the compiler. And if you tell me gcc people never have bugs, I will > laugh in your general direction. You mean these that have been sitting in their Bugzilla for some th

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Linus Torvalds
On Fri, 29 Jul 2005, David Woodhouse wrote: > > On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote: > > Basic rule: inline assembly is _better_ than random compiler extensions. > > It's better to have _one_ well-documented extension that is very generic > > than it is to have a thousand sp

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Linus Torvalds
On Fri, 29 Jul 2005, Maciej W. Rozycki wrote: > > Hmm, that's what's in the GCC info pages for the relevant functions > (I've omitted the "l" and "ll" variants): > > "-- Built-in Function: int __builtin_ffs (unsigned int x) > Returns one plus the index of the least significant 1-bit of X

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Linus Torvalds wrote: > > Since you're considering GCC-generated code for ffs(), ffz() and friends, > > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as > > apropriate? > > Please don't. Try again in three years when everybody has them. Well, __bu

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread linux-os \(Dick Johnson\)
On Fri, 29 Jul 2005 [EMAIL PROTECTED] wrote: >> OK, I guess when I get some time, I'll start testing all the i386 bitop >> functions, comparing the asm with the gcc versions. Now could someone >> explain to me what's wrong with testing hot cache code. Can one >> instruction retrieve from memory

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Fri, 29 Jul 2005, David Woodhouse wrote: > Builtins are more portable and their implementation will improve to > match developments in the target CPU. Inline assembly, as we have seen, > remains the same for years while the technology moves on. > > Although it's often the case that inline asse

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Linus Torvalds wrote: > There may be more upsides on other architectures (*cough*ia64*cough*) that > have strange scheduling issues and other complexities, but on x86 in > particular, the __builtin_xxx() functions tend to be a lot more pain than > they are worth. Not only d

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread linux
> OK, I guess when I get some time, I'll start testing all the i386 bitop > functions, comparing the asm with the gcc versions. Now could someone > explain to me what's wrong with testing hot cache code. Can one > instruction retrieve from memory better than others? To add one to Linus' list, not

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread David Woodhouse
On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote: > Basic rule: inline assembly is _better_ than random compiler extensions. > It's better to have _one_ well-documented extension that is very generic > than it is to have a thousand specialized extensions. Counterexample: FR-V and its __bu

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > OK, I guess when I get some time, I'll start testing all the i386 bitop > functions, comparing the asm with the gcc versions. Now could someone > explain to me what's wrong with testing hot cache code. Can one > instruction retrieve from memory bet

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote: > On Thu, 28 Jul 2005, Steven Rostedt wrote: > > > I've been playing with different approaches, (still all hot cache > > though), and inspecting the generated code. It's not that the gcc > > generated code is always better for the normal

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Mitchell Blank Jr
Steven Rostedt wrote: > In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO > configurable" I discovered that a C version of find_first_bit is faster > than the asm version There are probably other cases of this in asm-i386/bitopts.h. For instance I think the "btl" instruction is pr

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > I can change the find_first_bit to use __builtin_ffs, but how would you > implement the ffz? The thing is, there are basically _zero_ upsides to using the __builtin_xx functions on x86. There may be more upsides on other architectures (*cough*ia6

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Maciej W. Rozycki wrote: > > Since you're considering GCC-generated code for ffs(), ffz() and friends, > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as > apropriate? Please don't. Try again in three years when everybody has them.

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Steven Rostedt wrote: > I've been playing with different approaches, (still all hot cache > though), and inspecting the generated code. It's not that the gcc > generated code is always better for the normal case. But since it sees > more and everything is not hidden in asm, it

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote: > Since you're considering GCC-generated code for ffs(), ffz() and friends, > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as > apropriate? Reasonably recent GCC may actually be good enough to use the > faste

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 08:30 -0700, Linus Torvalds wrote: > > I suspect the old "rep scas" has always been slower than > compiler-generated code, at least under your test conditions. Many of the > old asm's are actually _very_ old, and some of them come from pre-0.01 > days and are more about me

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > The 32 looks like it may be problamatic. Is there any i386 64 bit > machines. Or is hard coding 32 OK? We have BITS_PER_LONG exactly for this usage, but the sizeof also works. Linus - To unsubscribe from this list: send the line

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO > configurable" I discovered that a C version of find_first_bit is faster > than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1 > (both from versions of Debian unsta

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
[snip] > static inline int find_first_bit(const unsigned long *addr, unsigned size) > { [snip] > + int x = 0; > + do { > + if (*addr) > + return __ffs(*addr) + x; > + addr++; > + if (x >= size) > + break; > +

[PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable" I discovered that a C version of find_first_bit is faster than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1 (both from versions of Debian unstable). I wrote a benchmark (attached) that runs the cod