[mpir-devel] FFT timing problem

2012-10-17 Thread Bill Hart
Hi all, In discussing another issue with Paul Zimmermann, he noted the following ugly jump in performance in the flint/mpir fft for multiplying two integers of the given number of limbs together: > mpn_mul_fft_main > limbs time > 10467800.556109766 > 10467810.55

Re: [mpir-devel] FFT use of addsub_n

2012-01-10 Thread Jason
On Tuesday 10 January 2012 19:06:57 Bill Hart wrote: > In my code I don't alias operands with sumdiff. However I do require > it to deal with the case n = 0. > > So do you know what the difference in timings between sumdiff and an > add + sub is on K10 when everything is in cache? > > Bill. > >

Re: [mpir-devel] FFT use of addsub_n

2012-01-10 Thread Jason
On Tuesday 10 January 2012 19:06:57 Bill Hart wrote: > In my code I don't alias operands with sumdiff. However I do require > it to deal with the case n = 0. > The asm version does handle n=0 purely by chance , a fair number of mpn functions dont > So do you know what the difference in timings

Re: [mpir-devel] FFT use of addsub_n

2012-01-10 Thread Bill Hart
In my code I don't alias operands with sumdiff. However I do require it to deal with the case n = 0. So do you know what the difference in timings between sumdiff and an add + sub is on K10 when everything is in cache? Bill. On 10 January 2012 18:52, Jason wrote: > On Tuesday 10 January 2012 14

Re: [mpir-devel] FFT use of addsub_n

2012-01-10 Thread Jason
On Tuesday 10 January 2012 14:23:35 Bill Hart wrote: > I just tried removing mpn_sumdiff_n references from my code, and this > slowed it down substantially. So this function is really important for > the speed of the FFT. Given that the speedup of sumdiff is pretty small I would guess that the me

Re: [mpir-devel] FFT use of addsub_n

2012-01-10 Thread Bill Hart
I just tried removing mpn_sumdiff_n references from my code, and this slowed it down substantially. So this function is really important for the speed of the FFT. Unfortunately it is not exported by MPIR and even though it is defined for all processors, it is mpn_sumdiff_n in some libraries and _g

Re: [mpir-devel] FFT use of addsub_n

2012-01-09 Thread Bill Hart
I wouldn't worry about it. It is possible I overwrote my timings file and that the times are not affected after all. On 9 January 2012 17:56, Jason wrote: > On Sunday 08 January 2012 11:30:05 Bill Hart wrote: >> I decided to try the FFT without addsub_n and it seems to actually go >> consistently

Re: [mpir-devel] FFT use of addsub_n

2012-01-09 Thread Jason
On Sunday 08 January 2012 11:30:05 Bill Hart wrote: > I decided to try the FFT without addsub_n and it seems to actually go > consistently about 3% faster, which is totally mysterious. So I have > removed it from the two files it is defined in: Thats very strange , I assume this is on a K10 , wha

[mpir-devel] FFT use of addsub_n

2012-01-08 Thread Bill Hart
I decided to try the FFT without addsub_n and it seems to actually go consistently about 3% faster, which is totally mysterious. So I have removed it from the two files it is defined in: ifft_mfa_truncate_sqrt2.c ifft_truncate_sqrt2.c As sumdiff_n seems to be defined for all platforms as far back

[mpir-devel] FFT squaring modifications done

2012-01-06 Thread Bill Hart
I have now completed the modifications to the FFT to handle squaring. This wasn't hard at all. Attached to this email is a diff of the changes for Brian. Of course it won't apply automatically to mpir, but it should be possible to go through and apply the same changes by hand. Beware that in one

[mpir-devel] FFT wrapper done

2012-01-05 Thread Bill Hart
I have now written a wrapper function in mul_fft_full.c which multiplies two integers. It will fail if the product is too small to use the FFT (there is an assert for this which hopefully will be on during testing). If I have done the calculations correctly it won't work if the product of the two

[mpir-devel] FFT progress

2011-12-24 Thread Bill Hart
I have finally managed to implement the truncated sqrt2 transforms and the matrix fourier sqrt2 transforms. This enables me to get actual timings vs mpir. These times for multiplying two integers of the given numbers of bits. On my machine the mpir FFT starts beating toom algorithms at about 16000

[mpir-devel] FFT

2011-12-18 Thread Bill Hart
I've been doing some more work on the FFT. I found a way to get around writing more new butterflies for the FFT. So now I have the FFT working for power of two lengths. The times are quite good but can be improved yet. Firstly MPIR just uses mpn_mul_n to do pointwise mults in the FFT range. I'm ju

[mpir-devel] FFT progress

2011-11-13 Thread Bill Hart
Hi all, I've been working away at this new FFT I've been writing (for 2 years). There's a multitude of functions, FFT, IFFT, truncated FFT and IFFT, matrix fourier algorithm FFT and IFFT and truncated matrix fourier algorithm FFT and IFFT. Each of these needs a new version which handles the sqrt

[mpir-devel] FFT Tuning on Windows for MPIR-1.2

2009-05-27 Thread Cactus
Hi All, We are in the final stages of preparing for the MPIR 1.2 release and we need a volunteer to help in producing FFT tuning values for the Core2 processor. I was doing this but I was using a mobile Core2 processor with power saving features that seemed to interfere with the tuning process.

[mpir-devel] FFT Tune Question & App Benchmark

2009-02-26 Thread Jeff Gilchrist
First let me say that it has been very interesting following this devel list with all the talk about optimizing MPIR on AMD and Core2 architectures. I can't wait for release 1.0 so I can start using that with nice Windows support for some Windows factoring binaries I maintain (http://gilchrist.ca