Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-31 Thread linux
> static inline int new_find_first_bit(const unsigned long *b, unsigned size) > { > int x = 0; > do { > unsigned long v = *b++; > if (v) > return __ffs(v) + x; > if (x >= size) > break; >

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-31 Thread Richard Kennedy
Hi, FWIW the following routine is consistently slightly faster using Steven's test harness , with a big win when no bit set. static inline int new_find_first_bit(const unsigned long *b, unsigned size) { int x = 0; do { unsigned long v = *b++; if (v)

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-31 Thread Richard Kennedy
Hi, FWIW the following routine is consistently slightly faster using Steven's test harness , with a big win when no bit set. static inline int new_find_first_bit(const unsigned long *b, unsigned size) { int x = 0; do { unsigned long v = *b++; if (v)

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-31 Thread linux
static inline int new_find_first_bit(const unsigned long *b, unsigned size) { int x = 0; do { unsigned long v = *b++; if (v) return __ffs(v) + x; if (x = size) break; x += 32;

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Fri, 29 Jul 2005, Linus Torvalds wrote: > It has another downside too: it's extra complexity and potential for bugs > in the compiler. And if you tell me gcc people never have bugs, I will > laugh in your general direction. You mean these that have been sitting in their Bugzilla for some

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Linus Torvalds
On Fri, 29 Jul 2005, David Woodhouse wrote: > > On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote: > > Basic rule: inline assembly is _better_ than random compiler extensions. > > It's better to have _one_ well-documented extension that is very generic > > than it is to have a thousand

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Linus Torvalds
On Fri, 29 Jul 2005, Maciej W. Rozycki wrote: > > Hmm, that's what's in the GCC info pages for the relevant functions > (I've omitted the "l" and "ll" variants): > > "-- Built-in Function: int __builtin_ffs (unsigned int x) > Returns one plus the index of the least significant 1-bit of

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Linus Torvalds wrote: > > Since you're considering GCC-generated code for ffs(), ffz() and friends, > > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as > > apropriate? > > Please don't. Try again in three years when everybody has them. Well,

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread linux-os \(Dick Johnson\)
On Fri, 29 Jul 2005 [EMAIL PROTECTED] wrote: >> OK, I guess when I get some time, I'll start testing all the i386 bitop >> functions, comparing the asm with the gcc versions. Now could someone >> explain to me what's wrong with testing hot cache code. Can one >> instruction retrieve from memory

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Fri, 29 Jul 2005, David Woodhouse wrote: > Builtins are more portable and their implementation will improve to > match developments in the target CPU. Inline assembly, as we have seen, > remains the same for years while the technology moves on. > > Although it's often the case that inline

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Linus Torvalds wrote: > There may be more upsides on other architectures (*cough*ia64*cough*) that > have strange scheduling issues and other complexities, but on x86 in > particular, the __builtin_xxx() functions tend to be a lot more pain than > they are worth. Not only

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread linux
> OK, I guess when I get some time, I'll start testing all the i386 bitop > functions, comparing the asm with the gcc versions. Now could someone > explain to me what's wrong with testing hot cache code. Can one > instruction retrieve from memory better than others? To add one to Linus' list,

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread David Woodhouse
On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote: > Basic rule: inline assembly is _better_ than random compiler extensions. > It's better to have _one_ well-documented extension that is very generic > than it is to have a thousand specialized extensions. Counterexample: FR-V and its

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread David Woodhouse
On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote: Basic rule: inline assembly is _better_ than random compiler extensions. It's better to have _one_ well-documented extension that is very generic than it is to have a thousand specialized extensions. Counterexample: FR-V and its

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread linux
OK, I guess when I get some time, I'll start testing all the i386 bitop functions, comparing the asm with the gcc versions. Now could someone explain to me what's wrong with testing hot cache code. Can one instruction retrieve from memory better than others? To add one to Linus' list, note

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Linus Torvalds wrote: There may be more upsides on other architectures (*cough*ia64*cough*) that have strange scheduling issues and other complexities, but on x86 in particular, the __builtin_xxx() functions tend to be a lot more pain than they are worth. Not only do

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Fri, 29 Jul 2005, David Woodhouse wrote: Builtins are more portable and their implementation will improve to match developments in the target CPU. Inline assembly, as we have seen, remains the same for years while the technology moves on. Although it's often the case that inline assembly

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread linux-os \(Dick Johnson\)
On Fri, 29 Jul 2005 [EMAIL PROTECTED] wrote: OK, I guess when I get some time, I'll start testing all the i386 bitop functions, comparing the asm with the gcc versions. Now could someone explain to me what's wrong with testing hot cache code. Can one instruction retrieve from memory better

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Linus Torvalds wrote: Since you're considering GCC-generated code for ffs(), ffz() and friends, how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as apropriate? Please don't. Try again in three years when everybody has them. Well,

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Linus Torvalds
On Fri, 29 Jul 2005, Maciej W. Rozycki wrote: Hmm, that's what's in the GCC info pages for the relevant functions (I've omitted the l and ll variants): -- Built-in Function: int __builtin_ffs (unsigned int x) Returns one plus the index of the least significant 1-bit of X, or

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Linus Torvalds
On Fri, 29 Jul 2005, David Woodhouse wrote: On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote: Basic rule: inline assembly is _better_ than random compiler extensions. It's better to have _one_ well-documented extension that is very generic than it is to have a thousand

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-29 Thread Maciej W. Rozycki
On Fri, 29 Jul 2005, Linus Torvalds wrote: It has another downside too: it's extra complexity and potential for bugs in the compiler. And if you tell me gcc people never have bugs, I will laugh in your general direction. You mean these that have been sitting in their Bugzilla for some

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > OK, I guess when I get some time, I'll start testing all the i386 bitop > functions, comparing the asm with the gcc versions. Now could someone > explain to me what's wrong with testing hot cache code. Can one > instruction retrieve from memory

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote: > On Thu, 28 Jul 2005, Steven Rostedt wrote: > > > I've been playing with different approaches, (still all hot cache > > though), and inspecting the generated code. It's not that the gcc > > generated code is always better for the normal

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Mitchell Blank Jr
Steven Rostedt wrote: > In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO > configurable" I discovered that a C version of find_first_bit is faster > than the asm version There are probably other cases of this in asm-i386/bitopts.h. For instance I think the "btl" instruction is

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > I can change the find_first_bit to use __builtin_ffs, but how would you > implement the ffz? The thing is, there are basically _zero_ upsides to using the __builtin_xx functions on x86. There may be more upsides on other architectures

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Maciej W. Rozycki wrote: > > Since you're considering GCC-generated code for ffs(), ffz() and friends, > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as > apropriate? Please don't. Try again in three years when everybody has them.

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Steven Rostedt wrote: > I've been playing with different approaches, (still all hot cache > though), and inspecting the generated code. It's not that the gcc > generated code is always better for the normal case. But since it sees > more and everything is not hidden in asm,

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote: > Since you're considering GCC-generated code for ffs(), ffz() and friends, > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as > apropriate? Reasonably recent GCC may actually be good enough to use the >

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 08:30 -0700, Linus Torvalds wrote: > > I suspect the old "rep scas" has always been slower than > compiler-generated code, at least under your test conditions. Many of the > old asm's are actually _very_ old, and some of them come from pre-0.01 > days and are more about

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > The 32 looks like it may be problamatic. Is there any i386 64 bit > machines. Or is hard coding 32 OK? We have BITS_PER_LONG exactly for this usage, but the sizeof also works. Linus - To unsubscribe from this list: send the line

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: > > In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO > configurable" I discovered that a C version of find_first_bit is faster > than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1 > (both from versions of Debian

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
[snip] > static inline int find_first_bit(const unsigned long *addr, unsigned size) > { [snip] > + int x = 0; > + do { > + if (*addr) > + return __ffs(*addr) + x; > + addr++; > + if (x >= size) > + break; > +

[PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable" I discovered that a C version of find_first_bit is faster than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1 (both from versions of Debian unstable). I wrote a benchmark (attached) that runs the

[PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
In the thread [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable I discovered that a C version of find_first_bit is faster than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1 (both from versions of Debian unstable). I wrote a benchmark (attached) that runs the code

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
[snip] static inline int find_first_bit(const unsigned long *addr, unsigned size) { [snip] + int x = 0; + do { + if (*addr) + return __ffs(*addr) + x; + addr++; + if (x = size) + break; + x

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: In the thread [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable I discovered that a C version of find_first_bit is faster than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1 (both from versions of Debian unstable).

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: The 32 looks like it may be problamatic. Is there any i386 64 bit machines. Or is hard coding 32 OK? We have BITS_PER_LONG exactly for this usage, but the sizeof also works. Linus - To unsubscribe from this list: send the line

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 08:30 -0700, Linus Torvalds wrote: I suspect the old rep scas has always been slower than compiler-generated code, at least under your test conditions. Many of the old asm's are actually _very_ old, and some of them come from pre-0.01 days and are more about me

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote: Since you're considering GCC-generated code for ffs(), ffz() and friends, how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as apropriate? Reasonably recent GCC may actually be good enough to use the fastest

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Maciej W. Rozycki
On Thu, 28 Jul 2005, Steven Rostedt wrote: I've been playing with different approaches, (still all hot cache though), and inspecting the generated code. It's not that the gcc generated code is always better for the normal case. But since it sees more and everything is not hidden in asm, it

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Maciej W. Rozycki wrote: Since you're considering GCC-generated code for ffs(), ffz() and friends, how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as apropriate? Please don't. Try again in three years when everybody has them.

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: I can change the find_first_bit to use __builtin_ffs, but how would you implement the ffz? The thing is, there are basically _zero_ upsides to using the __builtin_xx functions on x86. There may be more upsides on other architectures

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Mitchell Blank Jr
Steven Rostedt wrote: In the thread [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable I discovered that a C version of find_first_bit is faster than the asm version There are probably other cases of this in asm-i386/bitopts.h. For instance I think the btl instruction is pretty

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Steven Rostedt
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote: On Thu, 28 Jul 2005, Steven Rostedt wrote: I've been playing with different approaches, (still all hot cache though), and inspecting the generated code. It's not that the gcc generated code is always better for the normal case.

Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

2005-07-28 Thread Linus Torvalds
On Thu, 28 Jul 2005, Steven Rostedt wrote: OK, I guess when I get some time, I'll start testing all the i386 bitop functions, comparing the asm with the gcc versions. Now could someone explain to me what's wrong with testing hot cache code. Can one instruction retrieve from memory better