Hi all,
In discussing another issue with Paul Zimmermann, he noted the
following ugly jump in performance in the flint/mpir fft for
multiplying two integers of the given number of limbs together:
> mpn_mul_fft_main
> limbs time
> 10467800.556109766
> 10467810.55
On Tuesday 10 January 2012 19:06:57 Bill Hart wrote:
> In my code I don't alias operands with sumdiff. However I do require
> it to deal with the case n = 0.
>
> So do you know what the difference in timings between sumdiff and an
> add + sub is on K10 when everything is in cache?
>
> Bill.
>
>
On Tuesday 10 January 2012 19:06:57 Bill Hart wrote:
> In my code I don't alias operands with sumdiff. However I do require
> it to deal with the case n = 0.
>
The asm version does handle n=0 purely by chance , a fair number of mpn
functions dont
> So do you know what the difference in timings
In my code I don't alias operands with sumdiff. However I do require
it to deal with the case n = 0.
So do you know what the difference in timings between sumdiff and an
add + sub is on K10 when everything is in cache?
Bill.
On 10 January 2012 18:52, Jason wrote:
> On Tuesday 10 January 2012 14
On Tuesday 10 January 2012 14:23:35 Bill Hart wrote:
> I just tried removing mpn_sumdiff_n references from my code, and this
> slowed it down substantially. So this function is really important for
> the speed of the FFT.
Given that the speedup of sumdiff is pretty small I would guess that the
me
I just tried removing mpn_sumdiff_n references from my code, and this
slowed it down substantially. So this function is really important for
the speed of the FFT.
Unfortunately it is not exported by MPIR and even though it is defined
for all processors, it is mpn_sumdiff_n in some libraries and
_g
I wouldn't worry about it. It is possible I overwrote my timings file
and that the times are not affected after all.
On 9 January 2012 17:56, Jason wrote:
> On Sunday 08 January 2012 11:30:05 Bill Hart wrote:
>> I decided to try the FFT without addsub_n and it seems to actually go
>> consistently
On Sunday 08 January 2012 11:30:05 Bill Hart wrote:
> I decided to try the FFT without addsub_n and it seems to actually go
> consistently about 3% faster, which is totally mysterious. So I have
> removed it from the two files it is defined in:
Thats very strange , I assume this is on a K10 , wha
I decided to try the FFT without addsub_n and it seems to actually go
consistently about 3% faster, which is totally mysterious. So I have
removed it from the two files it is defined in:
ifft_mfa_truncate_sqrt2.c
ifft_truncate_sqrt2.c
As sumdiff_n seems to be defined for all platforms as far back
I have now completed the modifications to the FFT to handle squaring.
This wasn't hard at all.
Attached to this email is a diff of the changes for Brian. Of course
it won't apply automatically to mpir, but it should be possible to go
through and apply the same changes by hand.
Beware that in one
I have now written a wrapper function in mul_fft_full.c which
multiplies two integers. It will fail if the product is too small to
use the FFT (there is an assert for this which hopefully will be on
during testing).
If I have done the calculations correctly it won't work if the product
of the two
I have finally managed to implement the truncated sqrt2 transforms and
the matrix fourier sqrt2 transforms. This enables me to get actual
timings vs mpir. These times for multiplying two integers of the given
numbers of bits.
On my machine the mpir FFT starts beating toom algorithms at about
16000
I've been doing some more work on the FFT. I found a way to get around
writing more new butterflies for the FFT. So now I have the FFT
working for power of two lengths.
The times are quite good but can be improved yet. Firstly MPIR just
uses mpn_mul_n to do pointwise mults in the FFT range. I'm ju
Hi all,
I've been working away at this new FFT I've been writing (for 2 years).
There's a multitude of functions, FFT, IFFT, truncated FFT and IFFT,
matrix fourier algorithm FFT and IFFT and truncated matrix fourier
algorithm FFT and IFFT.
Each of these needs a new version which handles the sqrt
Hi All,
We are in the final stages of preparing for the MPIR 1.2 release and
we need a volunteer to help in producing FFT tuning values for the
Core2 processor.
I was doing this but I was using a mobile Core2 processor with power
saving features that seemed to interfere with the tuning process.
First let me say that it has been very interesting following this
devel list with all the talk about optimizing MPIR on AMD and Core2
architectures. I can't wait for release 1.0 so I can start using that
with nice Windows support for some Windows factoring binaries I
maintain (http://gilchrist.ca
16 matches
Mail list logo