Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Bill Hart
I also timed the basecase and divide and conquer division in GMP. The basecase seems to be the same speed in both (our code is derived from some version of theirs, so no big surprise there). Their divide and conquer division is about 10% faster for large sizes. This is likely to do with the very fa

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Bill Hart
On 29 March 2013 21:37, Brian Gladman wrote: > On 29/03/2013 21:15, Bill Hart wrote: >> Someone has replaced longlong.h in MPIR with a really, really slow >> version! Why on earth did we do that!? > > IIRC Jason decided to split longlong.h into constant and processor > dependent sections so that i

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Bill Hart
I see that longlong.h now uses generic C code when you do a generic build, whereas before it used inline assembly. So fortunately the changes to longlong.h only affect generic C builds. The new code is still promising. The ratio of divrem_basecase to divrem_divconquer times is 1.45 for MPIR and 2.

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Brian Gladman
On 29/03/2013 21:15, Bill Hart wrote: > Someone has replaced longlong.h in MPIR with a really, really slow > version! Why on earth did we do that!? IIRC Jason decided to split longlong.h into constant and processor dependent sections so that it could be more conveniently generated and maintained.

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Bill Hart
Someone has replaced longlong.h in MPIR with a really, really slow version! Why on earth did we do that!? When I replace it with the former fast version, mul_basecase is about the same speed as BSDNT and unfortunately BSDNT divide and conquer division does not catch MPIR. However, there is still

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Bill Hart
Actually, I think this comparison is not as fair as I thought. The mul_basecase code in MPIR really sucks. It's horribly slow compared to the code in BSDNT. I thought compilers were able to optimise this sort of stuff these days, but apparently not. mul_basecase is not actually called that much be

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Bill Hart
Here are the times for nxn multiplication in bsdnt: size = 10: classical = 0s, kara = 0s, toom33 = 0s size = 11: classical = 2e-06s, kara = 0s, toom33 = 2e-06s size = 13: classical = 0s, kara = 0s, toom33 = 2e-06s size = 15: classical = 0s, kara = 2e-06s, toom33 = 2e-06s size = 17: classical = 0s,

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Fredrik Johansson
On Fri, Mar 29, 2013 at 7:38 PM, Bill Hart wrote: > So the conclusion is very clear. Even though my division basecase is > slightly slower in BSDNT, the BSDNT divconquer code ends up being > nearly twice as fast as MPIR! > Could you show the ratios for 2n by n division versus n by n multiplicatio

Re: [mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Fredrik Johansson
On Fri, Mar 29, 2013 at 7:38 PM, Bill Hart wrote: > So the conclusion is very clear. Even though my division basecase is > slightly slower in BSDNT, the BSDNT divconquer code ends up being > nearly twice as fast as MPIR! > That's quite amazing! If it turns out to be nearly as much an improvement

[mpir-devel] Re: Divide and conquer division code

2013-03-29 Thread Bill Hart
Whoops, sorry, I see what I did wrong. I accidentally had the schoolbook and divconquer algorithms transposed in my MPIR-2.6.0 timing code. Things make a lot more sense now. Here are the actual timings: MPIR-2.6.0: size = 10: classical = 1.5e-06s, divconquer = 1e-06s size = 11: classical = 1.5e-

[mpir-devel] Divide and conquer division code

2013-03-29 Thread Bill Hart
Hi all, I've been doing some timings of the new divide and conquer style division with remainder code I wrote. I did some timings of a generic build (C only) of MPIR-2.6.0 with the new C code I wrote (BSDNT). Remarkably, the new code I have written displays quite a few interesting features in com