[mpir-devel] Re: Core2

2009-05-04 Thread Gonzalo Tornaria
On Mon, May 4, 2009 at 11:27 AM, Jason Moxham wrote: > > Hi > > I've been playing with some assembler for the Intel Core2 chips and have come > across this timing oddity which I cant explain . Any ideas? Maybe it's to do with the branch predictor? Remarks: 1. It seems to me that this starts hap

[mpir-devel] Re: Core2

2009-05-04 Thread David Harvey
Does it make a difference if you permute the case0 block with any of the others? Does it make a difference if you insert a dummy read/write instruction into the case0 block? david On May 4, 1:39 pm, Jason Moxham wrote: > Making all cases the same ie using jmp case0 then all the times are fast

[mpir-devel] Re: Core2

2009-05-04 Thread Jason Moxham
Making all cases the same ie using jmp case0 then all the times are fast , and using a jmp case1 then all the times are slow. This looks like just the case0 epilogue is fast , and case1,2,3 epilogues are taking 500 cycles. L1 cache is 32Kb and our 2srcs and 1dst are 24K overall , so all data sh

[mpir-devel] Re: Core2

2009-05-04 Thread David Harvey
What happens if you remove the epilogue, i.e. make it run the main loop exactly floor(n/4) times, so that it performs exactly the same sequence of instructions for e.g. n = 12, 13, 14, 15? david On May 4, 11:44 am, Jason Moxham wrote: > Yeah , the numbers are consistent , nice surprise for core

[mpir-devel] Re: Core2

2009-05-04 Thread Jason Moxham
Yeah , the numbers are consistent , nice surprise for core2 :) And running tests on there own gives us the same numbers. tune$ ./speed -c -s 1000 mpn_test_pppn overhead 7.00 cycles, precision 100 units of 5.37e-10 secs, CPU freq 1861.91 MHz mpn_test_pppn 1000 2809.93 tune$

[mpir-devel] Re: Core2

2009-05-04 Thread David Harvey
Do you get consistent numbers if you run only for a single value of n? i.e. it's not an artifact of the way the buffers are allocated or something? david On May 4, 10:27 am, Jason Moxham wrote: > Hi > > I've been playing with some assembler for the Intel Core2 chips and have come > across this

[mpir-devel] Core2

2009-05-04 Thread Jason Moxham
Hi I've been playing with some assembler for the Intel Core2 chips and have come across this timing oddity which I cant explain . Any ideas? Attached is an attempt at mpn_addlsh1_n running timings for a few sizes limbs time in cycles 990 3358.04 991 3323.79 992

[mpir-devel] Re: MPIR on GPU

2009-05-04 Thread Bill Hart
Hi Paul, Great to hear you have some serious hardware hooked up! Some of us in Seattle for a Sage Days conference (I am already here visiting Seattle now) in two weeks are planning a GPU party to get a first step towards doing some GPU computations for MPIR. We'll just be writing some CUDA at fi

[mpir-devel] MPIR on GPU

2009-05-04 Thread Paul Leyland
(Changing the thread title to be a little more relevant than "Fast computation of binomial coefficients". I now have a Tesla C1060 plugged into a Dell T7400 box running RHEL5 and am learning how to use CUDA for non-trivial computations. I'