> static inline int new_find_first_bit(const unsigned long *b, unsigned size)
> {
> int x = 0;
> do {
> unsigned long v = *b++;
> if (v)
> return __ffs(v) + x;
> if (x >= size)
> break;
>
Hi,
FWIW the following routine is consistently slightly faster using
Steven's test harness , with a big win when no bit set.
static inline int new_find_first_bit(const unsigned long *b, unsigned
size)
{
int x = 0;
do {
unsigned long v = *b++;
if (v)
On Fri, 29 Jul 2005, Linus Torvalds wrote:
> It has another downside too: it's extra complexity and potential for bugs
> in the compiler. And if you tell me gcc people never have bugs, I will
> laugh in your general direction.
You mean these that have been sitting in their Bugzilla for some th
On Fri, 29 Jul 2005, David Woodhouse wrote:
>
> On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
> > Basic rule: inline assembly is _better_ than random compiler extensions.
> > It's better to have _one_ well-documented extension that is very generic
> > than it is to have a thousand sp
On Fri, 29 Jul 2005, Maciej W. Rozycki wrote:
>
> Hmm, that's what's in the GCC info pages for the relevant functions
> (I've omitted the "l" and "ll" variants):
>
> "-- Built-in Function: int __builtin_ffs (unsigned int x)
> Returns one plus the index of the least significant 1-bit of X
On Thu, 28 Jul 2005, Linus Torvalds wrote:
> > Since you're considering GCC-generated code for ffs(), ffz() and friends,
> > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
> > apropriate?
>
> Please don't. Try again in three years when everybody has them.
Well, __bu
On Fri, 29 Jul 2005 [EMAIL PROTECTED] wrote:
>> OK, I guess when I get some time, I'll start testing all the i386 bitop
>> functions, comparing the asm with the gcc versions. Now could someone
>> explain to me what's wrong with testing hot cache code. Can one
>> instruction retrieve from memory
On Fri, 29 Jul 2005, David Woodhouse wrote:
> Builtins are more portable and their implementation will improve to
> match developments in the target CPU. Inline assembly, as we have seen,
> remains the same for years while the technology moves on.
>
> Although it's often the case that inline asse
On Thu, 28 Jul 2005, Linus Torvalds wrote:
> There may be more upsides on other architectures (*cough*ia64*cough*) that
> have strange scheduling issues and other complexities, but on x86 in
> particular, the __builtin_xxx() functions tend to be a lot more pain than
> they are worth. Not only d
> OK, I guess when I get some time, I'll start testing all the i386 bitop
> functions, comparing the asm with the gcc versions. Now could someone
> explain to me what's wrong with testing hot cache code. Can one
> instruction retrieve from memory better than others?
To add one to Linus' list, not
On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
> Basic rule: inline assembly is _better_ than random compiler extensions.
> It's better to have _one_ well-documented extension that is very generic
> than it is to have a thousand specialized extensions.
Counterexample: FR-V and its __bu
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> OK, I guess when I get some time, I'll start testing all the i386 bitop
> functions, comparing the asm with the gcc versions. Now could someone
> explain to me what's wrong with testing hot cache code. Can one
> instruction retrieve from memory bet
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote:
> On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> > I've been playing with different approaches, (still all hot cache
> > though), and inspecting the generated code. It's not that the gcc
> > generated code is always better for the normal
Steven Rostedt wrote:
> In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
> configurable" I discovered that a C version of find_first_bit is faster
> than the asm version
There are probably other cases of this in asm-i386/bitopts.h. For instance
I think the "btl" instruction is pr
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> I can change the find_first_bit to use __builtin_ffs, but how would you
> implement the ffz?
The thing is, there are basically _zero_ upsides to using the __builtin_xx
functions on x86.
There may be more upsides on other architectures (*cough*ia6
On Thu, 28 Jul 2005, Maciej W. Rozycki wrote:
>
> Since you're considering GCC-generated code for ffs(), ffz() and friends,
> how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
> apropriate?
Please don't. Try again in three years when everybody has them.
On Thu, 28 Jul 2005, Steven Rostedt wrote:
> I've been playing with different approaches, (still all hot cache
> though), and inspecting the generated code. It's not that the gcc
> generated code is always better for the normal case. But since it sees
> more and everything is not hidden in asm, it
On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote:
> Since you're considering GCC-generated code for ffs(), ffz() and friends,
> how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as
> apropriate? Reasonably recent GCC may actually be good enough to use the
> faste
On Thu, 2005-07-28 at 08:30 -0700, Linus Torvalds wrote:
>
> I suspect the old "rep scas" has always been slower than
> compiler-generated code, at least under your test conditions. Many of the
> old asm's are actually _very_ old, and some of them come from pre-0.01
> days and are more about me
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> The 32 looks like it may be problamatic. Is there any i386 64 bit
> machines. Or is hard coding 32 OK?
We have BITS_PER_LONG exactly for this usage, but the sizeof also works.
Linus
-
To unsubscribe from this list: send the line
On Thu, 28 Jul 2005, Steven Rostedt wrote:
>
> In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
> configurable" I discovered that a C version of find_first_bit is faster
> than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1
> (both from versions of Debian unsta
[snip]
> static inline int find_first_bit(const unsigned long *addr, unsigned size)
> {
[snip]
> + int x = 0;
> + do {
> + if (*addr)
> + return __ffs(*addr) + x;
> + addr++;
> + if (x >= size)
> + break;
> +
In the thread "[RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO
configurable" I discovered that a C version of find_first_bit is faster
than the asm version now when compiled against gcc 3.3.6 and gcc 4.0.1
(both from versions of Debian unstable). I wrote a benchmark (attached)
that runs the cod
23 matches
Mail list logo