Gah! G'Kar moment...
http://dpaste.dzfl.pl/203d237d7413
On Thursday, 30 January 2014 at 22:45:45 UTC, bearophile wrote:
Since my post someone has added a Fortran version based on the
algorithm used in the C++11 code. It's a little faster than the
C++11 code and it's much nicer looking:
Yup, I saw it. They're cheating, they almost don't have to
exp
Since my post someone has added a Fortran version based on the
algorithm used in the C++11 code. It's a little faster than the
C++11 code and it's much nicer looking:
http://benchmarksgame.alioth.debian.org/u32/program.php?test=nbody&lang=ifc&id=5
pure subroutine advance(tstep, x, v, mass)
r
Stanislav Blinov:
An aging i3?
My CPU is older, it doesn't support AVX2 and AVX.
This is getting a bit silly now. I must have some compile
switches for g++ wrong:
g++ -Ofast -fkeep-inline-functions -fomit-frame-pointer
-march=native -mfpmath=sse -mavx -mssse3 -flto --std=c++11
-fopenmp
On Thursday, 30 January 2014 at 21:54:17 UTC, bearophile wrote:
You seem to have a quite recent CPU,
An aging i3?
as the G++ code contains instructions like vmovsd. So you can
try to do the same with ldc2, and use AVX or AVX2.
Hmm...
This is getting a bit silly now. I must have some comp
Stanislav Blinov:
G++:
http://codepad.org/oOZQw1VQ
LDC:
http://codepad.org/5nHoZL1k
You seem to have a quite recent CPU, as the G++ code contains
instructions like vmovsd. So you can try to do the same with
ldc2, and use AVX or AVX2.
There are the switches:
-march=- Architect
On Thursday, 30 January 2014 at 21:33:38 UTC, bearophile wrote:
If a function takes no time to run, and you tweak it, your
program is not supposed to go faster.
Right.
I was going to compare the asm listings, but C++ seems to have
unrolled and inlined the outer loop right inside main(), and
Stanislav Blinov:
I meant that if I unroll it, it's not irrelevant anymore :)
If a function takes no time to run, and you tweak it, your
program is not supposed to go faster.
I was going to compare the asm listings, but C++ seems to have
unrolled and inlined the outer loop right inside ma
On Thursday, 30 January 2014 at 21:04:06 UTC, bearophile wrote:
Stanislav Blinov:
Unrolling everything except the loop in energy() seems to have
squeezed the bits neede to outperform c++, at least on my
machine :)
That should be impossible, as I remember from my old profilings
that energy()
Stanislav Blinov:
ldc2 -release -O3 -disable-boundscheck -vectorize
-vectorize-loops
All my versions of ldc2 don't even accept -vectorize :-)
ldc2: Unknown command line argument '-vectorize'. Try: 'ldc2
-help'
ldc2: Did you mean '-vectorize-slp'?
And -vectorize-loops should be active on d
Stanislav Blinov:
Unrolling everything except the loop in energy() seems to have
squeezed the bits neede to outperform c++, at least on my
machine :)
That should be impossible, as I remember from my old profilings
that energy() should use only an irrelevant amount of run time.
http://dpa
On Thursday, 30 January 2014 at 18:43:02 UTC, bearophile wrote:
It's a very silly problem for a statically typed language. The
D type system knows the static length of those arrays, but it
doesn't use such information.
I agree.
Unrolling everything except the loop in energy() seems to have
Stanislav Blinov:
Looks like both dmd and ldc don't optimize slice operations
yet, had to revert to loops
It's a very silly problem for a statically typed language. The D
type system knows the static length of those arrays, but it
doesn't use such information.
(Similarly several algorithms i
On Thursday, 30 January 2014 at 18:29:42 UTC, bearophile wrote:
I see you're compiling with
ldmd2 -wi -O -release -inline -noboundscheck nbody.d
Try
ldc2 -release -O3 -disable-boundscheck -vectorize -vectorize-loops
Stanislav Blinov:
That won't compile with dmd (at least, with 2.064.2): it
expects constants as initializers for vectors. :( That's why I
rolled up that toDouble2() function.
Few more changes, but this version still lacks the toDouble2:
http://codepad.org/SpMprWym
Bye,
bearophile
Stanislav Blinov:
That won't compile with dmd (at least, with 2.064.2): it
expects constants as initializers for vectors. :( That's why I
rolled up that toDouble2() function.
I see. Then probably I will have to put it back...
With N = 5_000_000 my timings on an old CPU are 2.23 seconds
for
On Thursday, 30 January 2014 at 16:53:22 UTC, bearophile wrote:
Yes. The older version of LDC2 doesn't even compile the code. I
need to use 0.13.0-alpha1.
Hmm.
Your D code with small changes:
http://codepad.org/xqqScd42
That won't compile with dmd (at least, with 2.064.2): it expects
cons
Stanislav Blinov:
You mean with your current version of ldc?
Yes. The older version of LDC2 doesn't even compile the code. I
need to use 0.13.0-alpha1.
Your D code with small changes:
http://codepad.org/xqqScd42
Asm generated by G++ for the advance function (that is the one
that uses most
On Thursday, 30 January 2014 at 15:37:24 UTC, bearophile wrote:
Stanislav Blinov:
Forgot one slice assignment in toDobule2(). Now the results
are more interesting:
Is the latest link shown the last version?
No. In toDouble2() on line 13:
replace result.array = args[0]
with result.array[0]
Stanislav Blinov:
Forgot one slice assignment in toDobule2(). Now the results are
more interesting:
Is the latest link shown the last version?
I need the 0.13.0-alpha1 to compile the code.
I am seeing a significant performance difference between C++ and
D-ldc2.
Bye,
bearophile
On Thursday, 30 January 2014 at 14:17:16 UTC, Stanislav Blinov
wrote:
Forgot one slice assignment in toDobule2(). Now the results are
more interesting:
time ./nbody-cpp 5000:
-0.169075164
-0.169059907
0:05.20 real, 5.18 user, 0.00 sys, 532 kb, 99% cpu
time ./nbody-ldc 5000:
-0.169075
Ok, didn't need to wait for the weekend :)
Looks like both dmd and ldc don't optimize slice operations yet,
had to revert to loops (shaved off ~1.5 seconds for ldc, ~9
seconds for dmd). Also, my local pull of ldc had some issues with
to!int(string), reverted that to atoi :)
Here's the code:
On Wednesday, 29 January 2014 at 18:05:41 UTC, Stanislav Blinov
wrote:
Yep, doesn't seem to be simd-related:
struct S(T) { T v1, v2; }
void main() {
alias T = double; // integrals and float are ok :\
version (workaround) {
S!T[1] p = void;
} else {
Regarding dmd it looks awfully similar to this:
http://d.puremagic.com/issues/show_bug.cgi?id=9449
I'd need to do some more runs though.
On Wednesday, 29 January 2014 at 16:54:54 UTC, bearophile wrote:
Stanislav Blinov:
I meant how to make it compile with ldc2? I've translated the
code, it compiles and works with dmd (although segfaults in
-release mode for some reason, probably a bug somewhere).
But with ldc2:
nbody.d(68):
Stanislav Blinov:
I meant how to make it compile with ldc2? I've translated the
code, it compiles and works with dmd (although segfaults in
-release mode for some reason, probably a bug somewhere).
But with ldc2:
nbody.d(68): Error: undefined identifier __simd
nbody.d(68): Error: undefined i
On Wednesday, 29 January 2014 at 16:43:35 UTC, bearophile wrote:
Stanislav Blinov:
Hmm.. How would one use core.simd with LDC2? It doesn't seem
to define D_SIMD.
Or should I go for builtins?
I don't know if this is useful for you, but here I wrote a
basic usage example of SIMD in ldc2 (seco
Stanislav Blinov:
Hmm.. How would one use core.simd with LDC2? It doesn't seem to
define D_SIMD.
Or should I go for builtins?
I don't know if this is useful for you, but here I wrote a basic
usage example of SIMD in ldc2 (second D entry):
http://rosettacode.org/wiki/Four_bits_adder#D
Bye,
On Friday, 24 January 2014 at 15:56:26 UTC, bearophile wrote:
If someone if willing to test LDC2 with a known benchmark,
there's this one:
http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody
A reformatted C++11 version good as start point for a D
translation:
http://codepa
"bearophile" writes:
> If someone if willing to test LDC2 with a known benchmark, there's this one:
>
> http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody
>
> A reformatted C++11 version good as start point for a D translation:
> http://codepad.org/4mOHW0fz
Just playing with
If someone if willing to test LDC2 with a known benchmark,
there's this one:
http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody
A reformatted C++11 version good as start point for a D
translation:
http://codepad.org/4mOHW0fz
Bye,
bearophile
31 matches
Mail list logo