[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Jeff Gilchrist
On Wed, Mar 4, 2009 at 8:34 PM, wrote: > http://www.bluewhite64.com/bluewhite64-linux-news.html > > is 64bit slackware , if you not used slackware before , prepare to be > transported to the late 80's. > or perhaps  early 90's Thanks for the link. I have used slackware before, way back, p

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread jason
On Thursday 05 March 2009 01:25:51 Jeff Gilchrist wrote: > This is really starting to frustrate me. I tried Fedora 10 on my > Core2 system and it will not boot either so there is something going > on that both Fedora and Ubuntu don't like. I can get SLAX to boot but > it is only 32bit so that do

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Jeff Gilchrist
This is really starting to frustrate me. I tried Fedora 10 on my Core2 system and it will not boot either so there is something going on that both Fedora and Ubuntu don't like. I can get SLAX to boot but it is only 32bit so that doesn't really help us with timing. Tomorrow I will at least be ab

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread jason
On Wednesday 04 March 2009 23:56:48 Bill Hart wrote: > I've had a think, especially considering the 10's of thousands of > people who will be using MPIR in Sage, not to mention the sponsor, and > I think we need to write try tests for the mpn functions we use. > > We could divide the work in half

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread Bill Hart
I mean reference implementation, not reference tests. Bill. 2009/3/4 Bill Hart : > I've had a think, especially considering the 10's of thousands of > people who will be using MPIR in Sage, not to mention the sponsor, and > I think we need to write try tests for the mpn functions we use. > > We

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread Bill Hart
I've had a think, especially considering the 10's of thousands of people who will be using MPIR in Sage, not to mention the sponsor, and I think we need to write try tests for the mpn functions we use. We could divide the work in half by one person writing the reference tests and the other writin

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread Bill Hart
Do you think the tests in make check are sufficient to test every branch of each of those functions? Bill. 2009/3/4 : > > On Wednesday 04 March 2009 23:24:59 Bill Hart wrote: >> Is there a test for lshift1, rshift1, addlsh1, addrsh1, addadd, >> addsub, sumdiff, divebyff or redc_basecase? >> >>

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread jason
On Wednesday 04 March 2009 23:24:59 Bill Hart wrote: > Is there a test for lshift1, rshift1, addlsh1, addrsh1, addadd, > addsub, sumdiff, divebyff or redc_basecase? > > Do we need tests for these? > > I know we use addadd and addsub. Do we use any of the others yet? > we use lshift1 rshift1 addls

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Bill Hart
Yeah bring on 96 bit multiplies!! 2009/3/4 : > > On Wednesday 04 March 2009 23:04:13 Bill Hart wrote: >> Yasm does seem to preserve what you code pretty well. I don't know of >> any weird optimisation it does. >> > > Yes , so far it's been no worse than gas > >> The other advantage is we ship a

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread jason
On Wednesday 04 March 2009 23:04:13 Bill Hart wrote: > Yasm does seem to preserve what you code pretty well. I don't know of > any weird optimisation it does. > Yes , so far it's been no worse than gas > The other advantage is we ship a fixed version of yasm with MPIR and > at least while we are

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread Bill Hart
Is there a test for lshift1, rshift1, addlsh1, addrsh1, addadd, addsub, sumdiff, divebyff or redc_basecase? Do we need tests for these? I know we use addadd and addsub. Do we use any of the others yet? Bill. 2009/3/4 Bill Hart : > 2009/3/4  : >> >> On Wednesday 04 March 2009 22:40:18 Bill Hart

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Cactus
On Mar 4, 8:52 pm, Bill Hart wrote: > Hi Brian, > > I will finish off the K8 code as is, and have already done the K10 > code. But if you could email the output of applying the new version to > the Core 2 code, that would probably save me lots of time. Bill In the Core 2 translation I convert

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Cactus
On Mar 4, 10:53 pm, ja...@njkfrudils.plus.com wrote: > On Wednesday 04 March 2009 20:52:20 Bill Hart wrote: > > > > > Hi Brian, > > > I will finish off the K8 code as is, and have already done the K10 > > code. But if you could email the output of applying the new version to > > the Core 2 code,

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread Bill Hart
2009/3/4 : > > On Wednesday 04 March 2009 22:40:18 Bill Hart wrote: >> I'd like to propose a code freeze on all K8/K10 assembly code, which I >> have now converted to yasm format, unless serious bugs are uncovered. >> >> If we freeze the code then we can begin testing. I propose we wear out >> ea

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Bill Hart
Yasm does seem to preserve what you code pretty well. I don't know of any weird optimisation it does. The other advantage is we ship a fixed version of yasm with MPIR and at least while we are doing that you can guarantee that you aren't going to hit some weird gas bugs that only occur on system

[mpir-devel] Re: Code freeze on K8/K10 assembler code

2009-03-04 Thread jason
On Wednesday 04 March 2009 22:40:18 Bill Hart wrote: > I'd like to propose a code freeze on all K8/K10 assembly code, which I > have now converted to yasm format, unless serious bugs are uncovered. > > If we freeze the code then we can begin testing. I propose we wear out > each and every file wit

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread jason
On Wednesday 04 March 2009 20:52:20 Bill Hart wrote: > Hi Brian, > > I will finish off the K8 code as is, and have already done the K10 > code. But if you could email the output of applying the new version to > the Core 2 code, that would probably save me lots of time. > > Bill. > > 2009/3/4 Cactu

[mpir-devel] Code freeze on K8/K10 assembler code

2009-03-04 Thread Bill Hart
I'd like to propose a code freeze on all K8/K10 assembly code, which I have now converted to yasm format, unless serious bugs are uncovered. If we freeze the code then we can begin testing. I propose we wear out each and every file with /tests/devel/try including many small operands and as many d

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Bill Hart
Hi Brian, I will finish off the K8 code as is, and have already done the K10 code. But if you could email the output of applying the new version to the Core 2 code, that would probably save me lots of time. Bill. 2009/3/4 Cactus : > > > > On Mar 4, 8:25 pm, Bill Hart wrote: >> Hmm, so far no a

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Cactus
On Mar 4, 8:25 pm, Bill Hart wrote: > Hmm, so far no alignment issues appear to have slowed things down. I > didn't know about the alignb thing. > > What may be useful is a version of the converter in the x86_64 > directory which converts any .asm files to .as files in yasm format. > > The issu

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Bill Hart
Hmm, so far no alignment issues appear to have slowed things down. I didn't know about the alignb thing. What may be useful is a version of the converter in the x86_64 directory which converts any .asm files to .as files in yasm format. The issues with the $ were in sqr_basecase in the amd64 dir

[mpir-devel] Re: Notes on conversion to yasm format

2009-03-04 Thread Cactus
On Mar 4, 7:12 pm, Bill Hart wrote: > This thread will be for notes on converting Jason's code to yasm > format. It might help Jason to avoid a couple of minor things which > cause crunching in the gears and may help Brian improve the python > script. > > * Yasm doesn't like loop labels to be c

[mpir-devel] Notes on conversion to yasm format

2009-03-04 Thread Bill Hart
This thread will be for notes on converting Jason's code to yasm format. It might help Jason to avoid a couple of minor things which cause crunching in the gears and may help Brian improve the python script. * Yasm doesn't like loop labels to be called loop: Anything else is fine, e.g. loop1: *

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Cactus
On Mar 4, 5:58 pm, Cactus wrote: > On Mar 4, 4:19 pm, ja...@njkfrudils.plus.com wrote: > > > On Wednesday 04 March 2009 15:57:25 Cactus wrote: > > > > On Mar 4, 3:49 pm, ja...@njkfrudils.plus.com wrote: > > > > On Wednesday 04 March 2009 15:40:04 Bill Hart wrote: > > > > > This is on a K10. The

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Cactus
On Mar 4, 4:19 pm, ja...@njkfrudils.plus.com wrote: > On Wednesday 04 March 2009 15:57:25 Cactus wrote: > > > > > On Mar 4, 3:49 pm, ja...@njkfrudils.plus.com wrote: > > > On Wednesday 04 March 2009 15:40:04 Bill Hart wrote: > > > > This is on a K10. The runs differ significantly. Interesting. >

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
yeah it is not clear whether it is just athlon 64's. i.e. those machines formerly called K9. My Turion would probably also be just fine. Bill. 2009/3/4 : > > On Wednesday 04 March 2009 16:29:30 Bill Hart wrote: >> My guess would be stalls due to unavailability of data from the cache >> in time

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread jason
On Wednesday 04 March 2009 16:29:30 Bill Hart wrote: > My guess would be stalls due to unavailability of data from the cache > in time for the pipeline. Some insertions of nop's would probably make > this go away. That may or may not affect the timings on Opteron. > > I suspect it is only an issue

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
My guess would be stalls due to unavailability of data from the cache in time for the pipeline. Some insertions of nop's would probably make this go away. That may or may not affect the timings on Opteron. I suspect it is only an issue on cheaper processors like the Athlon 64 x2 and Turion 64 x2.

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Cactus
On Mar 4, 4:19 pm, ja...@njkfrudils.plus.com wrote: > On Wednesday 04 March 2009 15:57:25 Cactus wrote: > > > > > On Mar 4, 3:49 pm, ja...@njkfrudils.plus.com wrote: > > > On Wednesday 04 March 2009 15:40:04 Bill Hart wrote: > > > > This is on a K10. The runs differ significantly. Interesting. >

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread jason
On Wednesday 04 March 2009 15:57:25 Cactus wrote: > On Mar 4, 3:49 pm, ja...@njkfrudils.plus.com wrote: > > On Wednesday 04 March 2009 15:40:04 Bill Hart wrote: > > > This is on a K10. The runs differ significantly. Interesting. > > > > > > It might have to do with the almost completely unpredicta

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Cactus
On Mar 4, 3:49 pm, ja...@njkfrudils.plus.com wrote: > On Wednesday 04 March 2009 15:40:04 Bill Hart wrote: > > > This is on a K10. The runs differ significantly. Interesting. > > > It might have to do with the almost completely unpredictable > > scheduling on the K10. Certainly the differences a

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread jason
On Wednesday 04 March 2009 15:40:04 Bill Hart wrote: > This is on a K10. The runs differ significantly. Interesting. > > It might have to do with the almost completely unpredictable > scheduling on the K10. Certainly the differences at limb n are almost > always made up again, or nearly so, at lim

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
This is on a K10. The runs differ significantly. Interesting. It might have to do with the almost completely unpredictable scheduling on the K10. Certainly the differences at limb n are almost always made up again, or nearly so, at limb n+1, and by the end of 40 limbs the times are usually within

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Jeff Gilchrist
On Wed, Mar 4, 2009 at 10:27 AM, Bill Hart wrote: > I've just noticed that the times from speed are not consistent between > runs!! I thought that was almost impossible. I did several runs on my systems and they all agreed for me, even when the system was loaded or not. Jeff. --~--~-~-

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
I've just noticed that the times from speed are not consistent between runs!! I thought that was almost impossible. Bill. 2009/3/4 Bill Hart : > Brian, > > I am just impressed you have done it at all. Your patience with > spreadsheet programs far exceeds mine! > > Bill. > > 2009/3/4 Cactus : >> -

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
Brian, I am just impressed you have done it at all. Your patience with spreadsheet programs far exceeds mine! Bill. 2009/3/4 Cactus : > - Show quoted text - > > On Mar 4, 2:24 pm, Bill Hart wrote: >> Here are the figures for the 2.66 GHz Xeon Core 2 (Dunnington 6 core >> 16 MB cache) sage.math:

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Cactus
On Mar 4, 2:24 pm, Bill Hart wrote: > Here are the figures for the 2.66 GHz Xeon Core 2 (Dunnington 6 core > 16 MB cache) sage.math: > > wbh...@sage:~/mpir-core2/tune$ ./speed -c -s 1-40 mpn_add_n > overhead 6.00 cycles, precision 100 units of 3.75e-10 secs, CPU > freq 2666.76 MHz >        

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
Here are the figures for the 2.66 GHz Xeon Core 2 (Dunnington 6 core 16 MB cache) sage.math: wbh...@sage:~/mpir-core2/tune$ ./speed -c -s 1-40 mpn_add_n overhead 6.00 cycles, precision 100 units of 3.75e-10 secs, CPU freq 2666.76 MHz mpn_add_n 1 17.00 2

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Jeff Gilchrist
On Wed, Mar 4, 2009 at 9:16 AM, Bill Hart wrote: > Anyhow, the Apple is certainly 65 nm. Don't know about the other. I > think it is Jeff's machine. But the clock speed would seem to indicate > that. The Xeon listed in the graph is mine. It is a 45nm chip, here are the details: vendor_id

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
The Xeon 2.66 GHz figures (from sage.math) seem to be missing. Did I give those? Anyhow, the Apple is certainly 65 nm. Don't know about the other. I think it is Jeff's machine. But the clock speed would seem to indicate that. Bill. 2009/3/4 : > > > Looking at addmul_1 for the core2 we have gli

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread jason
Looking at addmul_1 for the core2 we have glitch at 11-14 limbs , this is caused by a branch mispredict on a threshold changeover , may be worth removing it or changing the threshold. Notice that it only seems to effect the Apple core2 2.66 and the Xeon E5405 2.00 but not the other two core2

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
Here are the K10 figures (AMD Phenom Quad Core 9950 2.6GHz, 512KB L2 cache, 2MB L3 cache) wbh...@cuda1:~/mpir-trunk/tune$ ./speed -c -s 1-40 mpn_add_n overhead 6.00 cycles, precision 100 units of 3.83e-10 secs, CPU freq 2611.82 MHz mpn_add_n 19.00 2

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Jeff Gilchrist
On Wed, Mar 4, 2009 at 8:39 AM, Cactus wrote: > If any of you think that there are six processors listed and only five > graphs, it turns out that Jeffs Core 2 Vista figures and my Core2 > Vista figures are identical on most graphs and overlay each other. I haven't given up on trying the benchm

[mpir-devel] Re: Assembler Code Performance

2009-03-04 Thread Bill Hart
Hi Brian, Nice graphs!! Nothing there surprises me at all. I think it is clear that your mobile Core 2 has some architectural differences which we are consistently seeing. I would not be concerned about it. The results from Jeff seem ultra consistent with the other Core 2 results. It does surpr

[mpir-devel] Assembler Code Performance

2009-03-04 Thread Cactus
Hi All, Jeff kindly sent me figures on the performance of the assembler code on two kore architectures, both Core 2, one Linux and one Vista. So I have uploaded a number of graphs giving the results for the six architectures we now have reports on. I would be most grateful if you folk who under

[mpir-devel] Re: static linking on OSX

2009-03-04 Thread Bill Hart
Yeouch, that is total craziness. But thanks very much for looking into it. Bill. 2009/3/4 Jason Martin : > > On Wed, Mar 4, 2009 at 7:19 AM, Jason Martin > wrote: >> Well, the long answer seems to be: Apple doesn't support linking >> against static versions of Apple supplied libraries.  In othe

[mpir-devel] Re: static linking on OSX

2009-03-04 Thread Jason Martin
On Wed, Mar 4, 2009 at 7:19 AM, Jason Martin wrote: > Well, the long answer seems to be: Apple doesn't support linking > against static versions of Apple supplied libraries.  In other words, > there is no static version of crt0.o > > I'm coming to this conclusion from reading through Apple respon

[mpir-devel] Re: static linking on OSX

2009-03-04 Thread Jason Martin
Well, the long answer seems to be: Apple doesn't support linking against static versions of Apple supplied libraries. In other words, there is no static version of crt0.o I'm coming to this conclusion from reading through Apple responses to users on the developer forums. I haven't found it spel

[mpir-devel] Re: Binary and Include Naming

2009-03-04 Thread Jeff Gilchrist
On Tue, Mar 3, 2009 at 4:34 PM, Cactus wrote: > It is the results of running speed (which now works under Windows) as > follows: > > speed -c -s 1-40 mpn_add_n > speed -c -s 1-40 mpn_addmul_1.333 > speed -c -s 1-40 mpn_mul_1.333 > speed -c -s 1-40 mpn_lshift.23 > speed -c -s 1-40 mpn_rshift.23 >