Re: N-body bench

2014-01-30 Thread Stanislav Blinov
Gah! G'Kar moment... http://dpaste.dzfl.pl/203d237d7413

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 22:45:45 UTC, bearophile wrote: Since my post someone has added a Fortran version based on the algorithm used in the C++11 code. It's a little faster than the C++11 code and it's much nicer looking: Yup, I saw it. They're cheating, they almost don't have to exp

Re: N-body bench

2014-01-30 Thread bearophile
Since my post someone has added a Fortran version based on the algorithm used in the C++11 code. It's a little faster than the C++11 code and it's much nicer looking: http://benchmarksgame.alioth.debian.org/u32/program.php?test=nbody&lang=ifc&id=5 pure subroutine advance(tstep, x, v, mass) r

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: An aging i3? My CPU is older, it doesn't support AVX2 and AVX. This is getting a bit silly now. I must have some compile switches for g++ wrong: g++ -Ofast -fkeep-inline-functions -fomit-frame-pointer -march=native -mfpmath=sse -mavx -mssse3 -flto --std=c++11 -fopenmp

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 21:54:17 UTC, bearophile wrote: You seem to have a quite recent CPU, An aging i3? as the G++ code contains instructions like vmovsd. So you can try to do the same with ldc2, and use AVX or AVX2. Hmm... This is getting a bit silly now. I must have some comp

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: G++: http://codepad.org/oOZQw1VQ LDC: http://codepad.org/5nHoZL1k You seem to have a quite recent CPU, as the G++ code contains instructions like vmovsd. So you can try to do the same with ldc2, and use AVX or AVX2. There are the switches: -march=- Architect

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 21:33:38 UTC, bearophile wrote: If a function takes no time to run, and you tweak it, your program is not supposed to go faster. Right. I was going to compare the asm listings, but C++ seems to have unrolled and inlined the outer loop right inside main(), and

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: I meant that if I unroll it, it's not irrelevant anymore :) If a function takes no time to run, and you tweak it, your program is not supposed to go faster. I was going to compare the asm listings, but C++ seems to have unrolled and inlined the outer loop right inside ma

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 21:04:06 UTC, bearophile wrote: Stanislav Blinov: Unrolling everything except the loop in energy() seems to have squeezed the bits neede to outperform c++, at least on my machine :) That should be impossible, as I remember from my old profilings that energy()

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: ldc2 -release -O3 -disable-boundscheck -vectorize -vectorize-loops All my versions of ldc2 don't even accept -vectorize :-) ldc2: Unknown command line argument '-vectorize'. Try: 'ldc2 -help' ldc2: Did you mean '-vectorize-slp'? And -vectorize-loops should be active on d

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: Unrolling everything except the loop in energy() seems to have squeezed the bits neede to outperform c++, at least on my machine :) That should be impossible, as I remember from my old profilings that energy() should use only an irrelevant amount of run time. http://dpa

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 18:43:02 UTC, bearophile wrote: It's a very silly problem for a statically typed language. The D type system knows the static length of those arrays, but it doesn't use such information. I agree. Unrolling everything except the loop in energy() seems to have

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: Looks like both dmd and ldc don't optimize slice operations yet, had to revert to loops It's a very silly problem for a statically typed language. The D type system knows the static length of those arrays, but it doesn't use such information. (Similarly several algorithms i

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 18:29:42 UTC, bearophile wrote: I see you're compiling with ldmd2 -wi -O -release -inline -noboundscheck nbody.d Try ldc2 -release -O3 -disable-boundscheck -vectorize -vectorize-loops

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: That won't compile with dmd (at least, with 2.064.2): it expects constants as initializers for vectors. :( That's why I rolled up that toDouble2() function. Few more changes, but this version still lacks the toDouble2: http://codepad.org/SpMprWym Bye, bearophile

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: That won't compile with dmd (at least, with 2.064.2): it expects constants as initializers for vectors. :( That's why I rolled up that toDouble2() function. I see. Then probably I will have to put it back... With N = 5_000_000 my timings on an old CPU are 2.23 seconds for

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 16:53:22 UTC, bearophile wrote: Yes. The older version of LDC2 doesn't even compile the code. I need to use 0.13.0-alpha1. Hmm. Your D code with small changes: http://codepad.org/xqqScd42 That won't compile with dmd (at least, with 2.064.2): it expects cons

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: You mean with your current version of ldc? Yes. The older version of LDC2 doesn't even compile the code. I need to use 0.13.0-alpha1. Your D code with small changes: http://codepad.org/xqqScd42 Asm generated by G++ for the advance function (that is the one that uses most

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 15:37:24 UTC, bearophile wrote: Stanislav Blinov: Forgot one slice assignment in toDobule2(). Now the results are more interesting: Is the latest link shown the last version? No. In toDouble2() on line 13: replace result.array = args[0] with result.array[0]

Re: N-body bench

2014-01-30 Thread bearophile
Stanislav Blinov: Forgot one slice assignment in toDobule2(). Now the results are more interesting: Is the latest link shown the last version? I need the 0.13.0-alpha1 to compile the code. I am seeing a significant performance difference between C++ and D-ldc2. Bye, bearophile

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Thursday, 30 January 2014 at 14:17:16 UTC, Stanislav Blinov wrote: Forgot one slice assignment in toDobule2(). Now the results are more interesting: time ./nbody-cpp 5000: -0.169075164 -0.169059907 0:05.20 real, 5.18 user, 0.00 sys, 532 kb, 99% cpu time ./nbody-ldc 5000: -0.169075

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
Ok, didn't need to wait for the weekend :) Looks like both dmd and ldc don't optimize slice operations yet, had to revert to loops (shaved off ~1.5 seconds for ldc, ~9 seconds for dmd). Also, my local pull of ldc had some issues with to!int(string), reverted that to atoi :) Here's the code:

Re: N-body bench

2014-01-30 Thread Stanislav Blinov
On Wednesday, 29 January 2014 at 18:05:41 UTC, Stanislav Blinov wrote: Yep, doesn't seem to be simd-related: struct S(T) { T v1, v2; } void main() { alias T = double; // integrals and float are ok :\ version (workaround) { S!T[1] p = void; } else {

Re: N-body bench

2014-01-29 Thread Stanislav Blinov
Regarding dmd it looks awfully similar to this: http://d.puremagic.com/issues/show_bug.cgi?id=9449 I'd need to do some more runs though.

Re: N-body bench

2014-01-29 Thread Stanislav Blinov
On Wednesday, 29 January 2014 at 16:54:54 UTC, bearophile wrote: Stanislav Blinov: I meant how to make it compile with ldc2? I've translated the code, it compiles and works with dmd (although segfaults in -release mode for some reason, probably a bug somewhere). But with ldc2: nbody.d(68):

Re: N-body bench

2014-01-29 Thread bearophile
Stanislav Blinov: I meant how to make it compile with ldc2? I've translated the code, it compiles and works with dmd (although segfaults in -release mode for some reason, probably a bug somewhere). But with ldc2: nbody.d(68): Error: undefined identifier __simd nbody.d(68): Error: undefined i

Re: N-body bench

2014-01-29 Thread Stanislav Blinov
On Wednesday, 29 January 2014 at 16:43:35 UTC, bearophile wrote: Stanislav Blinov: Hmm.. How would one use core.simd with LDC2? It doesn't seem to define D_SIMD. Or should I go for builtins? I don't know if this is useful for you, but here I wrote a basic usage example of SIMD in ldc2 (seco

Re: N-body bench

2014-01-29 Thread bearophile
Stanislav Blinov: Hmm.. How would one use core.simd with LDC2? It doesn't seem to define D_SIMD. Or should I go for builtins? I don't know if this is useful for you, but here I wrote a basic usage example of SIMD in ldc2 (second D entry): http://rosettacode.org/wiki/Four_bits_adder#D Bye,

Re: N-body bench

2014-01-29 Thread Stanislav Blinov
On Friday, 24 January 2014 at 15:56:26 UTC, bearophile wrote: If someone if willing to test LDC2 with a known benchmark, there's this one: http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody A reformatted C++11 version good as start point for a D translation: http://codepa

Re: N-body bench

2014-01-28 Thread Jerry
"bearophile" writes: > If someone if willing to test LDC2 with a known benchmark, there's this one: > > http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody > > A reformatted C++11 version good as start point for a D translation: > http://codepad.org/4mOHW0fz Just playing with

N-body bench

2014-01-24 Thread bearophile
If someone if willing to test LDC2 with a known benchmark, there's this one: http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody A reformatted C++11 version good as start point for a D translation: http://codepad.org/4mOHW0fz Bye, bearophile