Deleting case1,2,3 so we do the main loop and just fall thru straight into
case0 then the time is back to 3393 , so there are no branches now to get in
the way.
ie
add $4,%r8
mov %rcx,-16(%rdi,%r8,8)
jnc lp # this is end of main loop
ALIGN(32)
skiplp:
#cmp $2,%r8
#j
On Tuesday 05 May 2009 03:09:54 Gonzalo Tornaria wrote:
> On Mon, May 4, 2009 at 11:27 AM, Jason Moxham
wrote:
> > Hi
> >
> > I've been playing with some assembler for the Intel Core2 chips and have
> > come across this timing oddity which I cant explain . Any ideas?
>
> Maybe it's to do with th
On Monday 04 May 2009 19:23:43 David Harvey wrote:
> Does it make a difference if you permute the case0 block with any of
> the others?
>
No difference
> Does it make a difference if you insert a dummy read/write instruction
> into the case0 block?
>
if I put a
mov %r15,%r9
at the start of cas
On Mon, May 4, 2009 at 11:27 AM, Jason Moxham wrote:
>
> Hi
>
> I've been playing with some assembler for the Intel Core2 chips and have come
> across this timing oddity which I cant explain . Any ideas?
Maybe it's to do with the branch predictor? Remarks:
1. It seems to me that this starts hap
Does it make a difference if you permute the case0 block with any of
the others?
Does it make a difference if you insert a dummy read/write instruction
into the case0 block?
david
On May 4, 1:39 pm, Jason Moxham wrote:
> Making all cases the same ie using jmp case0 then all the times are fast
Making all cases the same ie using jmp case0 then all the times are fast , and
using a jmp case1 then all the times are slow. This looks like just the case0
epilogue is fast , and case1,2,3 epilogues are taking 500 cycles.
L1 cache is 32Kb and our 2srcs and 1dst are 24K overall , so all data sh
What happens if you remove the epilogue, i.e. make it run the main
loop exactly floor(n/4) times, so that it performs exactly the same
sequence of instructions for e.g. n = 12, 13, 14, 15?
david
On May 4, 11:44 am, Jason Moxham wrote:
> Yeah , the numbers are consistent , nice surprise for core
Yeah , the numbers are consistent , nice surprise for core2 :)
And running tests on there own gives us the same numbers.
tune$ ./speed -c -s 1000 mpn_test_pppn
overhead 7.00 cycles, precision 100 units of 5.37e-10 secs, CPU freq
1861.91 MHz
mpn_test_pppn
1000 2809.93
tune$
Do you get consistent numbers if you run only for a single value of n?
i.e. it's not an artifact of the way the buffers are allocated or
something?
david
On May 4, 10:27 am, Jason Moxham wrote:
> Hi
>
> I've been playing with some assembler for the Intel Core2 chips and have come
> across this
Done
On Monday 23 February 2009 17:21:25 Bill Hart wrote:
> Yes we can.
>
> 2009/2/23 Cactus :
> > On Feb 23, 3:22 pm, ja...@njkfrudils.plus.com wrote:
> >> I've finished with the core2 and K8 branches now , so I can delete them
> >> , I assume the svn log will be transfered/kept in trunk?
> >>
Yes we can.
2009/2/23 Cactus :
>
>
>
> On Feb 23, 3:22 pm, ja...@njkfrudils.plus.com wrote:
>> I've finished with the core2 and K8 branches now , so I can delete them , I
>> assume the svn log will be transfered/kept in trunk?
>>
>> Is this OK , Brian?
>>
>> Jason
>
> Yes, that is fine by me. I
On Feb 23, 3:22 pm, ja...@njkfrudils.plus.com wrote:
> I've finished with the core2 and K8 branches now , so I can delete them , I
> assume the svn log will be transfered/kept in trunk?
>
> Is this OK , Brian?
>
> Jason
Yes, that is fine by me. I think we can also delete the toom3 branch
as w
I've finished with the core2 and K8 branches now , so I can delete them , I
assume the svn log will be transfered/kept in trunk?
Is this OK , Brian?
Jason
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
"mpir
If the sage.math machine is overloaded you could try ssh mod
That machine is identical. It is officially for modular forms
development, but if you have access and it is not loaded I am sure no
one will mind you running some short timings.
Bill.
2009/2/23 :
>
> On Monday 23 February 2009 12:12:
On Monday 23 February 2009 12:12:59 Bill Hart wrote:
> I think SkyNet/eno is a Core 2. It would be interesting to see what
> score one obtains on a slower Core 2 machine to test the idea that
> these scores scale linearly with clock speed.
>
> At the moment it looks like we'd be about 10% behind G
On Monday 23 February 2009 12:12:59 Bill Hart wrote:
> I think SkyNet/eno is a Core 2. It would be interesting to see what
> score one obtains on a slower Core 2 machine to test the idea that
> these scores scale linearly with clock speed.
>
> At the moment it looks like we'd be about 10% behind G
I think SkyNet/eno is a Core 2. It would be interesting to see what
score one obtains on a slower Core 2 machine to test the idea that
these scores scale linearly with clock speed.
At the moment it looks like we'd be about 10% behind GMP 4.3 if the
benchmarks do scale linearly.
Bill.
2009/2/23
I've done the merge , it is only marginally faster than before ,probably
because there were not many inc/dec's , and there is probably a still a fair
amount of slack in the functions.
I got a bench of 10364 on sage.math
On Sunday 22 February 2009 22:34:11 Jason Martin wrote:
> Sounds fine.
>
Sounds fine.
Jason Worth Martin
Asst. Professor of Mathematics
http://www.math.jmu.edu/~martin
On Sun, Feb 22, 2009 at 2:49 PM, wrote:
>
>
> If there are no objections , I will merge the core-2 branch into trunk
> tomorrow. All I have changed are inc/dec to add/sub .
> I didn't do it for mpn
On Feb 22, 11:49 am, ja...@njkfrudils.plus.com wrote:
> If there are no objections , I will merge the core-2 branch into trunk
> tomorrow. All I have changed are inc/dec to add/sub .
> I didn't do it for mpn-divexact_byff or the sub,add part of redc_basecase
> because it was not trivial.
>
> Whe
If there are no objections , I will merge the core-2 branch into trunk
tomorrow. All I have changed are inc/dec to add/sub .
I didn't do it for mpn-divexact_byff or the sub,add part of redc_basecase
because it was not trivial.
When I say a merge I will just copy the new files across , as the
On Fri, Feb 20, 2009 at 3:18 PM, wrote:
>
> On Friday 20 February 2009 16:42:19 Jason Martin wrote:
>> > On Friday 20 February 2009 14:12:19 ja...@njkfrudils.plus.com wrote:
>> >
>> > What happened to core-2 mul_basecase and sqr_basecase ? , no-wonder
>> > core-2 benchmarks are crap
>>
>> There
On Friday 20 February 2009 20:37:50 Cactus wrote:
> On Feb 20, 8:18 pm, ja...@njkfrudils.plus.com wrote:
> > On Friday 20 February 2009 16:42:19 Jason Martin wrote:
> > > > On Friday 20 February 2009 14:12:19 ja...@njkfrudils.plus.com wrote:
> > > >
> > > > What happened to core-2 mul_basecase and
On Feb 20, 8:18 pm, ja...@njkfrudils.plus.com wrote:
> On Friday 20 February 2009 16:42:19 Jason Martin wrote:
>
> > > On Friday 20 February 2009 14:12:19 ja...@njkfrudils.plus.com wrote:
>
> > > What happened to core-2 mul_basecase and sqr_basecase ? , no-wonder
> > > core-2 benchmarks are crap
On Friday 20 February 2009 16:42:19 Jason Martin wrote:
> > On Friday 20 February 2009 14:12:19 ja...@njkfrudils.plus.com wrote:
> >
> > What happened to core-2 mul_basecase and sqr_basecase ? , no-wonder
> > core-2 benchmarks are crap
>
> There aren't any :-) I was just using Gaudry's code for t
On Fri, Feb 20, 2009 at 8:45 AM, mabshoff
wrote:
>
>
>
> On Feb 20, 8:45 am, ja...@njkfrudils.plus.com wrote:
>
> Jason: those numbers already *rock*
>
>> Some benchmarks on core2-unknown-linux-gnu (sage.math)Intel(R) Xeon(R) CPU
>> X7460 @ 2.66GHz
>>
>> 8307 on trunk r1623 , should be same sco
On Feb 20, 8:45 am, ja...@njkfrudils.plus.com wrote:
Jason: those numbers already *rock*
> Some benchmarks on core2-unknown-linux-gnu (sage.math)Intel(R) Xeon(R) CPU
>
> X7460 @ 2.66GHz
>
> 8307 on trunk r1623 , should be same score as mpir-0.9.0
> 10252 on core-2 branch r1623
>
>
> On Friday 20 February 2009 14:12:19 ja...@njkfrudils.plus.com wrote:
>
> What happened to core-2 mul_basecase and sqr_basecase ? , no-wonder core-2
> benchmarks are crap
There aren't any :-) I was just using Gaudry's code for those routines.
Should be able to use your amd64 code for those eve
Some benchmarks on core2-unknown-linux-gnu (sage.math)Intel(R) Xeon(R) CPU
X7460 @ 2.66GHz
8307 on trunk r1623 , should be same score as mpir-0.9.0
10252 on core-2 branch r1623
a 23.4% speedup .
I keep getting
make[5]: warning: Clock skew detected. Your build may be incomplete
On Friday 20 February 2009 14:12:19 ja...@njkfrudils.plus.com wrote:
What happened to core-2 mul_basecase and sqr_basecase ? , no-wonder core-2
benchmarks are crap
> Running the k8/k10 asm code with no changes on the core2 machine sage we
> get this
>
> popcount,hamdist no popcount instructi
30 matches
Mail list logo