On Tuesday 10 January 2012 19:06:57 Bill Hart wrote:
> In my code I don't alias operands with sumdiff. However I do require
> it to deal with the case n = 0.
>
> So do you know what the difference in timings between sumdiff and an
> add + sub is on K10 when everything is in cache?
>
> Bill.
>
>
On Tuesday 10 January 2012 19:06:57 Bill Hart wrote:
> In my code I don't alias operands with sumdiff. However I do require
> it to deal with the case n = 0.
>
The asm version does handle n=0 purely by chance , a fair number of mpn
functions dont
> So do you know what the difference in timings
In my code I don't alias operands with sumdiff. However I do require
it to deal with the case n = 0.
So do you know what the difference in timings between sumdiff and an
add + sub is on K10 when everything is in cache?
Bill.
On 10 January 2012 18:52, Jason wrote:
> On Tuesday 10 January 2012 14
On Tuesday 10 January 2012 14:23:35 Bill Hart wrote:
> I just tried removing mpn_sumdiff_n references from my code, and this
> slowed it down substantially. So this function is really important for
> the speed of the FFT.
Given that the speedup of sumdiff is pretty small I would guess that the
me
I just tried removing mpn_sumdiff_n references from my code, and this
slowed it down substantially. So this function is really important for
the speed of the FFT.
Unfortunately it is not exported by MPIR and even though it is defined
for all processors, it is mpn_sumdiff_n in some libraries and
_g
I wouldn't worry about it. It is possible I overwrote my timings file
and that the times are not affected after all.
On 9 January 2012 17:56, Jason wrote:
> On Sunday 08 January 2012 11:30:05 Bill Hart wrote:
>> I decided to try the FFT without addsub_n and it seems to actually go
>> consistently
On Sunday 08 January 2012 11:30:05 Bill Hart wrote:
> I decided to try the FFT without addsub_n and it seems to actually go
> consistently about 3% faster, which is totally mysterious. So I have
> removed it from the two files it is defined in:
Thats very strange , I assume this is on a K10 , wha
I decided to try the FFT without addsub_n and it seems to actually go
consistently about 3% faster, which is totally mysterious. So I have
removed it from the two files it is defined in:
ifft_mfa_truncate_sqrt2.c
ifft_truncate_sqrt2.c
As sumdiff_n seems to be defined for all platforms as far back