[mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
exe!fft_truncate1_twiddle(ii=649f90, is=8, n=0, w=000100, t1=2ef878, t2=2ef898, ws=1, r=0, c=0, rs=20, trunc=0) Line 113 exe!fft_truncate1_twiddle(ii=649f90, is=8, n=1, w=80, t1=2ef878, t2=2ef898, ws=1, r=0, c=0, rs=10, trunc=0) Line 123 exe!fft_truncate1_twiddle(ii=649f90, is=8, n=

[mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
mul_mfa_truncate_sqrt2(r1=648810, i1=648090, n1=78, i2=648450, n2=78, depth=7, w=1) fft_mfa_truncate_sqrt2_outer(ii=649f90, n=80, w=1, t1=2ef878, t2=2ef898, temp=2ef8b8, n1=8, trunc=100) fft_truncate1_twiddle(ii=649f90, is=8, n=10, w= 8, t1=2ef878, t2=2ef898, ws=1, r=0, c=0, rs=1, trunc=0) fft_t

[mpir-devel] Re: FFT progress

2012-01-03 Thread Bill Hart
Oh sorry, I'm confused. Obviously trunc <= n is valid. What we need though is the entire call trace that leads to that set of params so we can see where the bug is. I am sure the bug is in my code, I'm just not sure where. Bill. On Tuesday, 3 January 2012, Bill Hart wrote: > Hi Brian, > > yes n

[mpir-devel] Re: FFT progress

2012-01-03 Thread Bill Hart
Hi Brian, yes n=1, trunc=0 is an invalid set of params. We must have trunc > n. So we need to track down where it is called from to see how it's happening. Bill. On Tuesday, 3 January 2012, Cactus wrote: > The twiddle code is entered with n = 0 from the above sequence both when n = 1 and also

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
The twiddle code is entered with n = 0 from the above sequence both when n = 1 and also when trunk = 0 and n = 0. Brian -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To view this discussion on the web visit https://groups.google.com/d

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
This is one sequence IN fft_mfa_truncate_sqrt2.c that ends with entry to the failing code with n = 0. void fft_truncate1_twiddle(mp_limb_t ** ii, mp_size_t is, mp_size_t n, mp_bitcnt_t w, mp_limb_t ** t1, mp_limb_t ** t2, mp_size_t ws, mp_size_t r, mp_size_t c, mp_size_t rs, mp_size_t

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Bill Hart
One possibility is that in t-mul_mfa_truncate_sqrt2 the top limbs of the integers being multiplied are zero, so that ultimately trunc <= 2*n which would cause the problem. Bill. On 3 January 2012 15:39, Bill Hart wrote: > Hi Brian, > > I am not sure how n ever gets to be zero. It should start as

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Bill Hart
Hi Brian, I am not sure how n ever gets to be zero. It should start as a power of 2 and always remain thus. Probably the bug is elsewhere. Bill. On 3 January 2012 13:42, Cactus wrote: > I think I have found the problem. > > Both  ifft_radix2_twiddle and ifft_radix2_twiddle don't guard against e

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
I think I have found the problem. Both fft_radix2_twiddle and ifft_radix2_twiddle don't guard against entry with n = 0. I changed the first part of both routines according to the template: if (n < 2) { mp_size_t tw1, tw2; tw1 = r*c; tw2 = tw1 + rs*c; if(n)

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
I think I have found the problem. Both ifft_radix2_twiddle and ifft_radix2_twiddle don't guard against entry with n = 0. I changed the first part of both routines according to the template: if (n < 2) { mp_size_t tw1, tw2; tw1 = r*c; tw2 = tw1 + rs*c; if(n)

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Bill Hart
On 3 January 2012 10:00, Cactus wrote: > Thanks Bill, > > So the test failure matters - I was hoping that it didn't :-) > > My laptop, which I am using right now, has 4GB of RAM - I am not clear > whether it is the stack or the heap that overflows but I suspect the former. This shouldn't be a pro

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
Thanks Bill, So the test failure matters - I was hoping that it didn't :-) My laptop, which I am using right now, has 4GB of RAM - I am not clear whether it is the stack or the heap that overflows but I suspect the former. On another issue, how do I actually integrate the new FFT with MPIR?

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Bill Hart
Hi Brian, you are absolutely right. I had completely forgotten about the function. I renamed a whole pile of others that were called twiddle, but I forgot that these still existed. It's ok though, that is what I intended. Bill. On 3 January 2012 08:16, Cactus wrote: > I am using the version you

Re: [mpir-devel] Re: FFT progress

2012-01-03 Thread Cactus
I am using the version you refer to above. As far as I can see, the test t-mul_mfa_truncate_sqrt2 does end up calling fft_radix2_twiddle via this call sequence: mul_mfa_truncate_sqrt2(r1, i1, int_limbs, i2, int_limbs, depth, w); fft_mfa_truncate_sqrt2_outer(ii, n, w, &t1, &t2, &s1, sqrt,

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
Oh dear, which implementation are you using? The new code doesn't call fft_radix2_twiddle. I hope you are using the implementation here: https://github.com/wbhart/flint2/tree/trunk/fft The other implementation is extremely messy in comparison. Bill. On 3 January 2012 01:15, Cactus wrote: > O

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
That's interesting. How much memory does your machine have? Bill. On 3 January 2012 01:15, Cactus wrote: > Only one test now fails: > >    mul_mfa_truncate_sqrt2 > > It runs out of memory after a huge number of recursive calls > to fft_radix2_twiddle > >    Brian > > -- > You received this messa

[mpir-devel] Re: FFT progress

2012-01-02 Thread Cactus
Only one test now fails: mul_mfa_truncate_sqrt2 It runs out of memory after a huge number of recursive calls to fft_radix2_twiddle Brian -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To view this discussion on the web visit https://gr

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
On 2 January 2012 23:06, James Cloos wrote: >> "BH" == Bill Hart writes: > > BH> I do make the assumption that sizeof(mp_limb_t) = sizeof(ptr) in the > BH> memory allocation. But I think this is true on 64 bit Windows, is it > BH> not? > > doze 64 has 32bit long; if mp_limb_t is based on long

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread James Cloos
> "BH" == Bill Hart writes: BH> I do make the assumption that sizeof(mp_limb_t) = sizeof(ptr) in the BH> memory allocation. But I think this is true on 64 bit Windows, is it BH> not? doze 64 has 32bit long; if mp_limb_t is based on long it will not be the same size as pointer. -JimC -- Jam

[mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
Hi Brian, On Monday, 2 January 2012, Cactus wrote: > I think the FLINT64 define is a potential issue. It's the size of an mp_limb_t in Flint, not an unsigned long, which I barely use (if I do it's probably a bug). > Should the FFT work with MPIR's longlong.h Yes. > What should I do about fli

[mpir-devel] Re: FFT progress

2012-01-02 Thread Cactus
I think the FLINT64 define is a potential issue. Should the FFT work with MPIR's longlong.h What should I do about flint_clz_tab? I have removed all references to the include file 'ulong_extras.h' - maybe this is needed somewhere (although everything builds without it).. Brian -- You

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
I do make the assumption that sizeof(mp_limb_t) = sizeof(ptr) in the memory allocation. But I think this is true on 64 bit Windows, is it not? Bill. On 2 January 2012 21:21, Cactus wrote: > The split/combine works in win32 so this is, as you suggest, the old 32/64 > bit integer problem.  I could

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread Cactus
The split/combine works in win32 so this is, as you suggest, the old 32/64 bit integer problem. I couldn't test the other tow failures in win32 as they require gmpn_addsub_n which ends up as an undefined symbol in the 32-bit builds that I can use. I have not yet done the malloc conversions so

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
Hi Brian, you made quick work of that! All three of those tests would fail if there was a problem in fft_split_bits or fft_combine_bits. Those are relatively straightforward functions. They break a long integer up into coefficients. They are documented pretty well in the doc directory and should

[mpir-devel] Re: FFT progress

2012-01-02 Thread Cactus
I have got the FFT building on Windows but three of the tests fail in x64: mul_mfa_truncate_sqrt2error in limb 0, ea987380 != aa987380 mul_truncate_sqrt2error in limb 0, ea987380 != aa987380 split/combine_bitsFAIL: Error in limb 2, 1002153456 != 1520 I'm not sure how to debug these as

Re: [mpir-devel] Re: FFT progress

2012-01-02 Thread Jason
On Monday 02 January 2012 07:30:03 Bill Hart wrote: > I've now added the negacyclic transforms to the flint repo and cleaned > it all up ready for some tuning code to be written. > > At the moment some hard coded tuning values are present in > mulmod_2expp1.c. There are no tuning values for the ma

[mpir-devel] Re: FFT progress

2012-01-02 Thread Cactus
Fantastic work Bill - it offers a really substantial speed up and is hence something we should add to MPIR as soon as you feel it is ready. >From your list of changes needed for moving it into MPIR, it seems that almost all are search/replace operations - is it always obvious where TMP_MARK sho

[mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
I found another speedup: combine the inner transforms and pointwise mults and combine the ifft and normalisation in a cache friendly way. This results in about a 5% speedup over much of the range. Here are the times now. bits iters mpir this gmp n w 1958401000 1.149s 1.

[mpir-devel] Re: FFT progress

2012-01-02 Thread Bill Hart
The code can be more easily browsed here: https://github.com/wbhart/flint2/tree/trunk/fft Bill. On 2 January 2012 07:30, Bill Hart wrote: > I've now added the negacyclic transforms to the flint repo and cleaned > it all up ready for some tuning code to be written. > > At the moment some hard co

[mpir-devel] Re: FFT progress

2012-01-01 Thread Bill Hart
I've now added the negacyclic transforms to the flint repo and cleaned it all up ready for some tuning code to be written. At the moment some hard coded tuning values are present in mulmod_2expp1.c. There are no tuning values for the main multiplication routine because there is no wrapper function

[mpir-devel] Re: FFT progress

2011-12-31 Thread Bill Hart
I have finished implementing and debugging a new Nussbaumer convolution. It isn't the absolutely fastest possible thing one can do, but it is still slightly better than what is in mpir. The fourth column in the following table shows the times for the new code. It is only used from a certain point

[mpir-devel] Re: FFT progress

2011-12-29 Thread Bill Hart
I have cleaned up my FFT massively and it now takes just over 2000 lines of code including test code. You can view the code here: http://selmer.warwick.ac.uk/gitweb/flint2.git?a=tree;f=fft;h=1fcb640bdbf46137b01611a7ff7e1cc90fd2e4c0;hb=HEAD Just a few minor test functions to go, some tuning code a

[mpir-devel] Re: FFT progress

2011-12-27 Thread Bill Hart
David clarified the timing in that last slide of his. Apparently it was just a misconfiguration external to MPIR. He reports that the correct time for MPIR is 224s, which is in line with what we might expect from the figures I gave for AMD chips. Bill. On 26 December 2011 14:44, Bill Hart wrote:

[mpir-devel] Re: FFT progress

2011-12-26 Thread Bill Hart
A few days ago, David Harvey gave a really nice talk at Warwick about some new number theoretic transform code he has been working on. It uses some nice tricks to get *really* fast butterflies. Even a C implementation turned out to be quite fast, although he is also working on some assembly optimis

[mpir-devel] Re: FFT progress

2011-12-25 Thread Bill Hart
I finished implementing the truncated matrix fourier square root two transforms and I can finally multiply integers of all sizes. (Note that I'm still using the mpir Nussbaumer convolution for pointwise multiplications when the products get huge. Times will drop further when at some point in the fu