exe!fft_truncate1_twiddle(ii=649f90, is=8, n=0, w=000100, t1=2ef878,
t2=2ef898, ws=1, r=0, c=0, rs=20, trunc=0) Line 113
exe!fft_truncate1_twiddle(ii=649f90, is=8, n=1, w=80, t1=2ef878,
t2=2ef898, ws=1, r=0, c=0, rs=10, trunc=0) Line 123
exe!fft_truncate1_twiddle(ii=649f90, is=8, n=
mul_mfa_truncate_sqrt2(r1=648810, i1=648090, n1=78, i2=648450, n2=78,
depth=7, w=1)
fft_mfa_truncate_sqrt2_outer(ii=649f90, n=80, w=1, t1=2ef878, t2=2ef898,
temp=2ef8b8, n1=8, trunc=100)
fft_truncate1_twiddle(ii=649f90, is=8, n=10, w= 8, t1=2ef878, t2=2ef898,
ws=1, r=0, c=0, rs=1, trunc=0)
fft_t
Oh sorry, I'm confused. Obviously trunc <= n is valid.
What we need though is the entire call trace that leads to that set of
params so we can see where the bug is. I am sure the bug is in my code, I'm
just not sure where.
Bill.
On Tuesday, 3 January 2012, Bill Hart wrote:
> Hi Brian,
>
> yes n
Hi Brian,
yes n=1, trunc=0 is an invalid set of params.
We must have trunc > n. So we need to track down where it is called from
to see how it's happening.
Bill.
On Tuesday, 3 January 2012, Cactus wrote:
> The twiddle code is entered with n = 0 from the above sequence both when
n = 1 and also
The twiddle code is entered with n = 0 from the above sequence both when n
= 1 and also when trunk = 0 and n = 0.
Brian
--
You received this message because you are subscribed to the Google Groups
"mpir-devel" group.
To view this discussion on the web visit
https://groups.google.com/d
This is one sequence IN fft_mfa_truncate_sqrt2.c that ends with entry to
the failing code with n = 0.
void fft_truncate1_twiddle(mp_limb_t ** ii, mp_size_t is,
mp_size_t n, mp_bitcnt_t w, mp_limb_t ** t1, mp_limb_t ** t2,
mp_size_t ws, mp_size_t r, mp_size_t c, mp_size_t rs, mp_size_t
One possibility is that in t-mul_mfa_truncate_sqrt2 the top limbs of
the integers being multiplied are zero, so that ultimately trunc <=
2*n which would cause the problem.
Bill.
On 3 January 2012 15:39, Bill Hart wrote:
> Hi Brian,
>
> I am not sure how n ever gets to be zero. It should start as
Hi Brian,
I am not sure how n ever gets to be zero. It should start as a power
of 2 and always remain thus. Probably the bug is elsewhere.
Bill.
On 3 January 2012 13:42, Cactus wrote:
> I think I have found the problem.
>
> Both ifft_radix2_twiddle and ifft_radix2_twiddle don't guard against e
I think I have found the problem.
Both fft_radix2_twiddle and ifft_radix2_twiddle don't guard against entry
with n = 0. I changed the first part of both routines according to the
template:
if (n < 2)
{
mp_size_t tw1, tw2;
tw1 = r*c;
tw2 = tw1 + rs*c;
if(n)
I think I have found the problem.
Both ifft_radix2_twiddle and ifft_radix2_twiddle don't guard against entry
with n = 0. I changed the first part of both routines according to the
template:
if (n < 2)
{
mp_size_t tw1, tw2;
tw1 = r*c;
tw2 = tw1 + rs*c;
if(n)
On 3 January 2012 10:00, Cactus wrote:
> Thanks Bill,
>
> So the test failure matters - I was hoping that it didn't :-)
>
> My laptop, which I am using right now, has 4GB of RAM - I am not clear
> whether it is the stack or the heap that overflows but I suspect the former.
This shouldn't be a pro
Thanks Bill,
So the test failure matters - I was hoping that it didn't :-)
My laptop, which I am using right now, has 4GB of RAM - I am not clear
whether it is the stack or the heap that overflows but I suspect the former.
On another issue, how do I actually integrate the new FFT with MPIR?
Hi Brian,
you are absolutely right. I had completely forgotten about the
function. I renamed a whole pile of others that were called twiddle,
but I forgot that these still existed. It's ok though, that is what I
intended.
Bill.
On 3 January 2012 08:16, Cactus wrote:
> I am using the version you
I am using the version you refer to above.
As far as I can see, the test t-mul_mfa_truncate_sqrt2 does end up
calling fft_radix2_twiddle via this call sequence:
mul_mfa_truncate_sqrt2(r1, i1, int_limbs, i2, int_limbs, depth, w);
fft_mfa_truncate_sqrt2_outer(ii, n, w, &t1, &t2, &s1, sqrt,
Oh dear, which implementation are you using?
The new code doesn't call fft_radix2_twiddle.
I hope you are using the implementation here:
https://github.com/wbhart/flint2/tree/trunk/fft
The other implementation is extremely messy in comparison.
Bill.
On 3 January 2012 01:15, Cactus wrote:
> O
That's interesting. How much memory does your machine have?
Bill.
On 3 January 2012 01:15, Cactus wrote:
> Only one test now fails:
>
> mul_mfa_truncate_sqrt2
>
> It runs out of memory after a huge number of recursive calls
> to fft_radix2_twiddle
>
> Brian
>
> --
> You received this messa
Only one test now fails:
mul_mfa_truncate_sqrt2
It runs out of memory after a huge number of recursive calls
to fft_radix2_twiddle
Brian
--
You received this message because you are subscribed to the Google Groups
"mpir-devel" group.
To view this discussion on the web visit
https://gr
On 2 January 2012 23:06, James Cloos wrote:
>> "BH" == Bill Hart writes:
>
> BH> I do make the assumption that sizeof(mp_limb_t) = sizeof(ptr) in the
> BH> memory allocation. But I think this is true on 64 bit Windows, is it
> BH> not?
>
> doze 64 has 32bit long; if mp_limb_t is based on long
> "BH" == Bill Hart writes:
BH> I do make the assumption that sizeof(mp_limb_t) = sizeof(ptr) in the
BH> memory allocation. But I think this is true on 64 bit Windows, is it
BH> not?
doze 64 has 32bit long; if mp_limb_t is based on long it will not be the
same size as pointer.
-JimC
--
Jam
Hi Brian,
On Monday, 2 January 2012, Cactus wrote:
> I think the FLINT64 define is a potential issue.
It's the size of an mp_limb_t in Flint, not an unsigned long, which I
barely use (if I do it's probably a bug).
> Should the FFT work with MPIR's longlong.h
Yes.
> What should I do about fli
I think the FLINT64 define is a potential issue.
Should the FFT work with MPIR's longlong.h
What should I do about flint_clz_tab?
I have removed all references to the include file 'ulong_extras.h' - maybe
this is needed somewhere (although everything builds without it)..
Brian
--
You
I do make the assumption that sizeof(mp_limb_t) = sizeof(ptr) in the
memory allocation. But I think this is true on 64 bit Windows, is it
not?
Bill.
On 2 January 2012 21:21, Cactus wrote:
> The split/combine works in win32 so this is, as you suggest, the old 32/64
> bit integer problem. I could
The split/combine works in win32 so this is, as you suggest, the old 32/64
bit integer problem. I couldn't test the other tow failures in win32 as
they require gmpn_addsub_n which ends up as an undefined symbol in the
32-bit builds that I can use.
I have not yet done the malloc conversions so
Hi Brian,
you made quick work of that!
All three of those tests would fail if there was a problem in
fft_split_bits or fft_combine_bits. Those are relatively
straightforward functions. They break a long integer up into
coefficients. They are documented pretty well in the doc directory and
should
I have got the FFT building on Windows but three of the tests fail in x64:
mul_mfa_truncate_sqrt2error in limb 0, ea987380 != aa987380
mul_truncate_sqrt2error in limb 0, ea987380 != aa987380
split/combine_bitsFAIL: Error in limb 2, 1002153456 != 1520
I'm not sure how to debug these as
On Monday 02 January 2012 07:30:03 Bill Hart wrote:
> I've now added the negacyclic transforms to the flint repo and cleaned
> it all up ready for some tuning code to be written.
>
> At the moment some hard coded tuning values are present in
> mulmod_2expp1.c. There are no tuning values for the ma
Fantastic work Bill - it offers a really substantial speed up and is hence
something we should add to MPIR as soon as you feel it is ready.
>From your list of changes needed for moving it into MPIR, it seems that
almost all are search/replace operations - is it always obvious where
TMP_MARK sho
I found another speedup: combine the inner transforms and pointwise
mults and combine the ifft and normalisation in a cache friendly way.
This results in about a 5% speedup over much of the range. Here are
the times now.
bits iters mpir this gmp n w
1958401000 1.149s 1.
The code can be more easily browsed here:
https://github.com/wbhart/flint2/tree/trunk/fft
Bill.
On 2 January 2012 07:30, Bill Hart wrote:
> I've now added the negacyclic transforms to the flint repo and cleaned
> it all up ready for some tuning code to be written.
>
> At the moment some hard co
I've now added the negacyclic transforms to the flint repo and cleaned
it all up ready for some tuning code to be written.
At the moment some hard coded tuning values are present in
mulmod_2expp1.c. There are no tuning values for the main
multiplication routine because there is no wrapper function
I have finished implementing and debugging a new Nussbaumer
convolution. It isn't the absolutely fastest possible thing one can
do, but it is still slightly better than what is in mpir.
The fourth column in the following table shows the times for the new
code. It is only used from a certain point
I have cleaned up my FFT massively and it now takes just over 2000
lines of code including test code. You can view the code here:
http://selmer.warwick.ac.uk/gitweb/flint2.git?a=tree;f=fft;h=1fcb640bdbf46137b01611a7ff7e1cc90fd2e4c0;hb=HEAD
Just a few minor test functions to go, some tuning code a
David clarified the timing in that last slide of his. Apparently it
was just a misconfiguration external to MPIR. He reports that the
correct time for MPIR is 224s, which is in line with what we might
expect from the figures I gave for AMD chips.
Bill.
On 26 December 2011 14:44, Bill Hart wrote:
A few days ago, David Harvey gave a really nice talk at Warwick about
some new number theoretic transform code he has been working on. It
uses some nice tricks to get *really* fast butterflies. Even a C
implementation turned out to be quite fast, although he is also
working on some assembly optimis
I finished implementing the truncated matrix fourier square root two
transforms and I can finally multiply integers of all sizes. (Note
that I'm still using the mpir Nussbaumer convolution for pointwise
multiplications when the products get huge. Times will drop further
when at some point in the fu
35 matches
Mail list logo