Re: [RFC] div64_64 support

2007-03-07 Thread Sami Farin
On Wed, Mar 07, 2007 at 11:11:49 -0500, Chuck Ebbert wrote: > Sami Farin wrote: > > On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: > > ... > >> And I found bug in gcc-4.1.2, it gave 0 for ncubic results > >> when doing 1000 loops test... gcc-4.0.3 works. > > > > Found it. > > > > --- c

Re: [RFC] div64_64 support

2007-03-07 Thread Chuck Ebbert
Sami Farin wrote: > On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: > ... >> And I found bug in gcc-4.1.2, it gave 0 for ncubic results >> when doing 1000 loops test... gcc-4.0.3 works. > > Found it. > > --- cbrt-test.c~ 2007-03-07 00:20:54.735248105 +0200 > +++ cbrt-test.c 2

Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin
On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: ... > And I found bug in gcc-4.1.2, it gave 0 for ncubic results > when doing 1000 loops test... gcc-4.0.3 works. Found it. --- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200 +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200 @@

Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin
On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote: > Don't count the existing Newton-Raphson out. It turns out that to get enough > precision for 32 bits, only 4 iterations are needed. By unrolling those, it > gets much better timing. > > Slightly gross test program (with original cu

Re: [RFC] div64_64 support

2007-03-06 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 10:29:41 -0800 > /* calculate the cubic root of x using Newton-Raphson */ > static uint32_t ncubic(uint64_t a) > { > uint64_t x; > > /* Initial estimate is based on: >* cbrt(x) = exp(log(x) / 3) >*/ >

Re: [RFC] div64_64 support

2007-03-06 Thread Stephen Hemminger
On Tue, 6 Mar 2007 20:48:41 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote: > > Don't count the existing Newton-Raphson out. It turns out that to get enough > > precision for 32 bits, only 4 iterations are needed. By unrolling those,

Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote: > Don't count the existing Newton-Raphson out. It turns out that to get enough > precision for 32 bits, only 4 iterations are needed. By unrolling those, it > gets much better timing. But did you fix the >2^43 bug too? SGI has alr

Re: [RFC] div64_64 support

2007-03-06 Thread H. Peter Anvin
Andi Kleen wrote: Let me see... You throw code like that and expect someone to actually understand it in one year, and be able to correct a bug ? To be honest I don't expect any bugs in this function. Please add something, an URL or even better a nice explanation, per favor... It's stra

Re: [RFC] div64_64 support II

2007-03-06 Thread H. Peter Anvin
Andi Kleen wrote: The problem with these algorithms that tradoff one or more multiplies in order to avoid a divide is that they don't give anything and often lose when both multiplies and divides are emulated in software. Actually on rereading this: is there really any Linux port that emulates

Re: [RFC] div64_64 support

2007-03-06 Thread Stephen Hemminger
Don't count the existing Newton-Raphson out. It turns out that to get enough precision for 32 bits, only 4 iterations are needed. By unrolling those, it gets much better timing. Slightly gross test program (with original cubic wraparound bug fixed). --- /* Test and measure perf of cube root algor

Re: [RFC] div64_64 support II

2007-03-06 Thread David Miller
From: [EMAIL PROTECTED] (Dagfinn Ilmari Mannsåker) Date: Tue, 06 Mar 2007 18:43:14 +0100 > Andi Kleen <[EMAIL PROTECTED]> writes: > > > Actually on rereading this: is there really any Linux port > > that emulates multiplies in software? I thought that was only > > done on really small microcontro

Re: [RFC] div64_64 support II

2007-03-06 Thread Dagfinn Ilmari Mannsåker
Andi Kleen <[EMAIL PROTECTED]> writes: > Actually on rereading this: is there really any Linux port > that emulates multiplies in software? I thought that was only > done on really small microcontrollers or smart cards; but anything > 32bit+ that runs Linux should have hardware multiply, shouldn't

Re: [RFC] div64_64 support

2007-03-06 Thread Roland Kuhn
Hi Andi! On 6 Mar 2007, at 15:45, Andi Kleen wrote: Let me see... You throw code like that and expect someone to actually understand it in one year, and be able to correct a bug ? To be honest I don't expect any bugs in this function. Please add something, an URL or even better a nice ex

Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
> > Let me see... You throw code like that and expect someone to actually > understand it in one year, and be able to correct a bug ? To be honest I don't expect any bugs in this function. > > > Please add something, an URL or even better a nice explanation, per favor... It's straight out of

Re: [RFC] div64_64 support

2007-03-06 Thread Eric Dumazet
On Tuesday 06 March 2007 14:34, Andi Kleen wrote: > - return x; > + int s; > + u32 y; > + u64 b; > + u64 bs; > + > + y = 0; > + for (s = 63; s >= 0; s -= 3) { > + y = 2 * y; > + b = 3 * y * (y+1) + 1; > + bs = b << s; > +

Re: [RFC] div64_64 support II

2007-03-06 Thread Andi Kleen
> The problem with these algorithms that tradoff one or more > multiplies in order to avoid a divide is that they don't > give anything and often lose when both multiplies and > divides are emulated in software. Actually on rereading this: is there really any Linux port that emulates multiplies in

Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
On Mon, Mar 05, 2007 at 04:25:51PM -0800, David Miller wrote: > Another thing is that the non-Hacker's Delight version iterates > differently for different input values, so the input value space is > very important to consider when comparing these two pieces of code. I did some stochastic testing

Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
On Mon, Mar 05, 2007 at 03:57:14PM -0800, Stephen Hemminger wrote: > On 03 Mar 2007 03:31:52 +0100 > Andi Kleen <[EMAIL PROTECTED]> wrote: > > > Stephen Hemminger <[EMAIL PROTECTED]> writes: > > > > > Here is another way to handle the 64 bit divide case. > > > It allows full 64 bit divide by addi

Re: [RFC] div64_64 support

2007-03-05 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Mon, 5 Mar 2007 15:57:14 -0800 > I tried the code from Hacker's Delight. > It is cool, but performance is CPU (and data) dependent: > > Average # of usecs per operation: Interesting results. The problem with these algorithms that tradoff one or

Re: [RFC] div64_64 support

2007-03-05 Thread Stephen Hemminger
On 03 Mar 2007 03:31:52 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > Stephen Hemminger <[EMAIL PROTECTED]> writes: > > > Here is another way to handle the 64 bit divide case. > > It allows full 64 bit divide by adding the support routine > > GCC needs. > > Not supplying that was intentional by

Re: [RFC] div64_64 support

2007-03-02 Thread Andi Kleen
Stephen Hemminger <[EMAIL PROTECTED]> writes: > Here is another way to handle the 64 bit divide case. > It allows full 64 bit divide by adding the support routine > GCC needs. Not supplying that was intentional by Linus so that people think twice (or more often) before they using such expensive o

Re: [RFC] div64_64 support

2007-02-26 Thread Dan Williams
On 2/26/07, Stephen Hemminger <[EMAIL PROTECTED]> wrote: Here is another way to handle the 64 bit divide case. It allows full 64 bit divide by adding the support routine GCC needs. I know ARM already went through the process of removing __udivdi3 support: http://www.arm.linux.org.uk/developer

Re: [RFC] div64_64 support

2007-02-26 Thread Segher Boessenkool
I thought the motivation for div64() was that a 64:32->32 divide could be done a lot faster on a number of platforms (including the important x86) than a generic 64:64->64 divide, but gcc doesn't handle the devolution automatically -- there is no such libgcc function. That there's no such func

Re: [RFC] div64_64 support

2007-02-26 Thread H. Peter Anvin
Stephen Hemminger wrote: Hmm. Those are the GCC internal versions, that are picked up but doing divide in place. Do we want to allow general 64 bit in kernel to be easily used? It could cause sloppy slow code, but it would look cleaner. ... and it would handle datatypes which may be architect

Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt
On Feb 26 2007 16:07, Stephen Hemminger wrote: >> On Feb 26 2007 15:44, Stephen Hemminger wrote: >> >> >-x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; >> >> >+x = (2 * x + (u32) (a / x*x)) / 3; >> >> >> >> Previously there was div64_64(a, x*x) which is equivalent

Re: [RFC] div64_64 support

2007-02-26 Thread Stephen Hemminger
On Tue, 27 Feb 2007 01:05:26 +0100 (MET) Jan Engelhardt <[EMAIL PROTECTED]> wrote: > > On Feb 26 2007 15:44, Stephen Hemminger wrote: > >> >- x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; > >> >+ x = (2 * x + (u32) (a / x*x)) / 3; > >> > >> Previously there was div64_64(a, x*x)

Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt
On Feb 26 2007 15:44, Stephen Hemminger wrote: >> >- x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; >> >+ x = (2 * x + (u32) (a / x*x)) / 3; >> >> Previously there was div64_64(a, x*x) which is equivalent to >> (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is >> equ

Re: [RFC] div64_64 support

2007-02-26 Thread Stephen Hemminger
On Tue, 27 Feb 2007 00:02:50 +0100 (MET) Jan Engelhardt <[EMAIL PROTECTED]> wrote: > > On Feb 26 2007 13:28, Stephen Hemminger wrote: > >> > >> ./arch/arm26/lib/udivdi3.c > >> ./arch/sh/lib/udivdi3.c > >> ./arch/sparc/lib/udivdi3.S > >> > >> should not this be consolidated too? > > > >Hmm. Thos

Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt
On Feb 26 2007 13:28, Stephen Hemminger wrote: >> >> ./arch/arm26/lib/udivdi3.c >> ./arch/sh/lib/udivdi3.c >> ./arch/sparc/lib/udivdi3.S >> >> should not this be consolidated too? > >Hmm. Those are the GCC internal versions, that are picked up but >doing divide in place. Do we want to allow gene

Re: [RFC] div64_64 support

2007-02-26 Thread Stephen Hemminger
Here is another way to handle the 64 bit divide case. It allows full 64 bit divide by adding the support routine GCC needs. --- arch/alpha/Kconfig |4 arch/arm/Kconfig |4 arch/arm26/Kconfig |4 arch/avr32/Kconfig |4 a

Re: [RFC] div64_64 support

2007-02-26 Thread Stephen Hemminger
On Mon, 26 Feb 2007 21:09:26 +0100 (MET) Jan Engelhardt <[EMAIL PROTECTED]> wrote: > > On Feb 23 2007 17:05, Stephen Hemminger wrote: > > > >Since there already two users of full 64 bit division in the kernel, > >and other places maybe hiding out as well. Add a full 64/64 bit divide. > > > >Yes t

Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt
On Feb 23 2007 17:05, Stephen Hemminger wrote: > >Since there already two users of full 64 bit division in the kernel, >and other places maybe hiding out as well. Add a full 64/64 bit divide. > >Yes this expensive, but there are places where it is necessary. >It is not clear if doing the scaling b

Re: [RFC] div64_64 support

2007-02-24 Thread Sami Farin
On Fri, Feb 23, 2007 at 17:05:27 -0800, Stephen Hemminger wrote: > Since there already two users of full 64 bit division in the kernel, > and other places maybe hiding out as well. Add a full 64/64 bit divide. > > Yes this expensive, but there are places where it is necessary. > It is not clear if