> static inline int new_find_first_bit(const unsigned long *b, unsigned size)
> {
> int x = 0;
> do {
> unsigned long v = *b++;
> if (v)
> return __ffs(v) + x;
> if (x >= size)
> break;
>
Hi,
FWIW the following routine is consistently slightly faster using
Steven's test harness , with a big win when no bit set.
static inline int new_find_first_bit(const unsigned long *b, unsigned
size)
{
int x = 0;
do {
unsigned long v = *b++;
if (v)
Hi,
FWIW the following routine is consistently slightly faster using
Steven's test harness , with a big win when no bit set.
static inline int new_find_first_bit(const unsigned long *b, unsigned
size)
{
int x = 0;
do {
unsigned long v = *b++;
if (v)
static inline int new_find_first_bit(const unsigned long *b, unsigned size)
{
int x = 0;
do {
unsigned long v = *b++;
if (v)
return __ffs(v) + x;
if (x = size)
break;
x += 32;
On Fri, 29 Jul 2005, Linus Torvalds wrote:
> It has another downside too: it's extra complexity and potential for bugs
> in the compiler. And if you tell me gcc people never have bugs, I will
> laugh in your general direction.
You mean these that have been sitting in their Bugzilla for some
On Fri, 29 Jul 2005, David Woodhouse wrote:
>
> On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
> > Basic rule: inline assembly is _better_ than random compiler extensions.
> > It's better to have _one_ well-documented extension that is very generic
> > than it is to have a thousand
On Fri, 29 Jul 2005, Maciej W. Rozycki wrote:
>
> Hmm, that's what's in the GCC info pages for the relevant functions
> (I've omitted the "l" and "ll" variants):
>
> "-- Built-in Function: int __builtin_ffs (unsigned int x)
> Returns one plus the index of the least significant 1-bit of
On Thu, 28 Jul 2005, Linus Torvalds wrote:
> > Since you're considering GCC-generated code for ffs(), ffz() and friends,
> > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
> > apropriate?
>
> Please don't. Try again in three years when everybody has them.
Well,
On Fri, 29 Jul 2005 [EMAIL PROTECTED] wrote:
>> OK, I guess when I get some time, I'll start testing all the i386 bitop
>> functions, comparing the asm with the gcc versions. Now could someone
>> explain to me what's wrong with testing hot cache code. Can one
>> instruction retrieve from memory
On Fri, 29 Jul 2005, David Woodhouse wrote:
> Builtins are more portable and their implementation will improve to
> match developments in the target CPU. Inline assembly, as we have seen,
> remains the same for years while the technology moves on.
>
> Although it's often the case that inline
On Thu, 28 Jul 2005, Linus Torvalds wrote:
> There may be more upsides on other architectures (*cough*ia64*cough*) that
> have strange scheduling issues and other complexities, but on x86 in
> particular, the __builtin_xxx() functions tend to be a lot more pain than
> they are worth. Not only
> OK, I guess when I get some time, I'll start testing all the i386 bitop
> functions, comparing the asm with the gcc versions. Now could someone
> explain to me what's wrong with testing hot cache code. Can one
> instruction retrieve from memory better than others?
To add one to Linus' list,
On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
> Basic rule: inline assembly is _better_ than random compiler extensions.
> It's better to have _one_ well-documented extension that is very generic
> than it is to have a thousand specialized extensions.
Counterexample: FR-V and its
On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
Basic rule: inline assembly is _better_ than random compiler extensions.
It's better to have _one_ well-documented extension that is very generic
than it is to have a thousand specialized extensions.
Counterexample: FR-V and its
OK, I guess when I get some time, I'll start testing all the i386 bitop
functions, comparing the asm with the gcc versions. Now could someone
explain to me what's wrong with testing hot cache code. Can one
instruction retrieve from memory better than others?
To add one to Linus' list, note
On Thu, 28 Jul 2005, Linus Torvalds wrote:
There may be more upsides on other architectures (*cough*ia64*cough*) that
have strange scheduling issues and other complexities, but on x86 in
particular, the __builtin_xxx() functions tend to be a lot more pain than
they are worth. Not only do
On Fri, 29 Jul 2005, David Woodhouse wrote:
Builtins are more portable and their implementation will improve to
match developments in the target CPU. Inline assembly, as we have seen,
remains the same for years while the technology moves on.
Although it's often the case that inline assembly
On Fri, 29 Jul 2005 [EMAIL PROTECTED] wrote:
OK, I guess when I get some time, I'll start testing all the i386 bitop
functions, comparing the asm with the gcc versions. Now could someone
explain to me what's wrong with testing hot cache code. Can one
instruction retrieve from memory better
On Thu, 28 Jul 2005, Linus Torvalds wrote:
Since you're considering GCC-generated code for ffs(), ffz() and friends,
how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
apropriate?
Please don't. Try again in three years when everybody has them.
Well,
On Fri, 29 Jul 2005, Maciej W. Rozycki wrote:
Hmm, that's what's in the GCC info pages for the relevant functions
(I've omitted the l and ll variants):
-- Built-in Function: int __builtin_ffs (unsigned int x)
Returns one plus the index of the least significant 1-bit of X, or
On Fri, 29 Jul 2005, David Woodhouse wrote:
On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
Basic rule: inline assembly is _better_ than random compiler extensions.
It's better to have _one_ well-documented extension that is very generic
than it is to have a thousand
On Fri, 29 Jul 2005, Linus Torvalds wrote:
It has another downside too: it's extra complexity and potential for bugs
in the compiler. And if you tell me gcc people never have bugs, I will
laugh in your general direction.
You mean these that have been sitting in their Bugzilla for some
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> OK, I guess when I get some time, I'll start testing all the i386 bitop
> functions, comparing the asm with the gcc versions. Now could someone
> explain to me what's wrong with testing hot cache code. Can one
> instruction retrieve from memory
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote:
> On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> > I've been playing with different approaches, (still all hot cache
> > though), and inspecting the generated code. It's not that the gcc
> > generated code is always better for the normal
Steven Rostedt wrote:
> In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
> configurable" I discovered that a C version of find_first_bit is faster
> than the asm version
There are probably other cases of this in asm-i386/bitopts.h. For instance
I think the "btl" instruction is
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> I can change the find_first_bit to use __builtin_ffs, but how would you
> implement the ffz?
The thing is, there are basically _zero_ upsides to using the __builtin_xx
functions on x86.
There may be more upsides on other architectures
On Thu, 28 Jul 2005, Maciej W. Rozycki wrote:
>
> Since you're considering GCC-generated code for ffs(), ffz() and friends,
> how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
> apropriate?
Please don't. Try again in three years when everybody has them.
On Thu, 28 Jul 2005, Steven Rostedt wrote:
> I've been playing with different approaches, (still all hot cache
> though), and inspecting the generated code. It's not that the gcc
> generated code is always better for the normal case. But since it sees
> more and everything is not hidden in asm,
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote:
> Since you're considering GCC-generated code for ffs(), ffz() and friends,
> how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
> apropriate? Reasonably recent GCC may actually be good enough to use the
>
On Thu, 2005-07-28 at 08:30 -0700, Linus Torvalds wrote:
>
> I suspect the old "rep scas" has always been slower than
> compiler-generated code, at least under your test conditions. Many of the
> old asm's are actually _very_ old, and some of them come from pre-0.01
> days and are more about
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> The 32 looks like it may be problamatic. Is there any i386 64 bit
> machines. Or is hard coding 32 OK?
We have BITS_PER_LONG exactly for this usage, but the sizeof also works.
Linus
-
To unsubscribe from this list: send the line
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
> configurable" I discovered that a C version of find_first_bit is faster
> than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1
> (both from versions of Debian
[snip]
> static inline int find_first_bit(const unsigned long *addr, unsigned size)
> {
[snip]
> + int x = 0;
> + do {
> + if (*addr)
> + return __ffs(*addr) + x;
> + addr++;
> + if (x >= size)
> + break;
> +
In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
configurable" I discovered that a C version of find_first_bit is faster
than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1
(both from versions of Debian unstable). I wrote a benchmark (attached)
that runs the
In the thread [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
configurable I discovered that a C version of find_first_bit is faster
than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1
(both from versions of Debian unstable). I wrote a benchmark (attached)
that runs the code
[snip]
static inline int find_first_bit(const unsigned long *addr, unsigned size)
{
[snip]
+ int x = 0;
+ do {
+ if (*addr)
+ return __ffs(*addr) + x;
+ addr++;
+ if (x = size)
+ break;
+ x
On Thu, 28 Jul 2005, Steven Rostedt wrote:
In the thread [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
configurable I discovered that a C version of find_first_bit is faster
than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1
(both from versions of Debian unstable).
On Thu, 28 Jul 2005, Steven Rostedt wrote:
The 32 looks like it may be problamatic. Is there any i386 64 bit
machines. Or is hard coding 32 OK?
We have BITS_PER_LONG exactly for this usage, but the sizeof also works.
Linus
-
To unsubscribe from this list: send the line
On Thu, 2005-07-28 at 08:30 -0700, Linus Torvalds wrote:
I suspect the old rep scas has always been slower than
compiler-generated code, at least under your test conditions. Many of the
old asm's are actually _very_ old, and some of them come from pre-0.01
days and are more about me
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote:
Since you're considering GCC-generated code for ffs(), ffz() and friends,
how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
apropriate? Reasonably recent GCC may actually be good enough to use the
fastest
On Thu, 28 Jul 2005, Steven Rostedt wrote:
I've been playing with different approaches, (still all hot cache
though), and inspecting the generated code. It's not that the gcc
generated code is always better for the normal case. But since it sees
more and everything is not hidden in asm, it
On Thu, 28 Jul 2005, Maciej W. Rozycki wrote:
Since you're considering GCC-generated code for ffs(), ffz() and friends,
how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
apropriate?
Please don't. Try again in three years when everybody has them.
On Thu, 28 Jul 2005, Steven Rostedt wrote:
I can change the find_first_bit to use __builtin_ffs, but how would you
implement the ffz?
The thing is, there are basically _zero_ upsides to using the __builtin_xx
functions on x86.
There may be more upsides on other architectures
Steven Rostedt wrote:
In the thread [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
configurable I discovered that a C version of find_first_bit is faster
than the asm version
There are probably other cases of this in asm-i386/bitopts.h. For instance
I think the btl instruction is pretty
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote:
On Thu, 28 Jul 2005, Steven Rostedt wrote:
I've been playing with different approaches, (still all hot cache
though), and inspecting the generated code. It's not that the gcc
generated code is always better for the normal case.
On Thu, 28 Jul 2005, Steven Rostedt wrote:
OK, I guess when I get some time, I'll start testing all the i386 bitop
functions, comparing the asm with the gcc versions. Now could someone
explain to me what's wrong with testing hot cache code. Can one
instruction retrieve from memory better
46 matches
Mail list logo