On 8/8/2011 3:02 PM, bearophile wrote:
Eric Poggel (JoeCoder):
determinism can be very important when it comes to
reducing network traffic. If you can achieve it, then you can make sure
all players have the same game state and then only send user input
commands over the network.
It seems a h
Eric Poggel (JoeCoder):
> determinism can be very important when it comes to
> reducing network traffic. If you can achieve it, then you can make sure
> all players have the same game state and then only send user input
> commands over the network.
It seems a hard thing to obtain, but I agree
On 8/6/2011 8:34 PM, bearophile wrote:
Walter:
On 8/6/2011 4:46 PM, bearophile wrote:
Walter is not a lover of that -ffast-math switch.
No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking
IEEE conformance is something very, very few should even consider.
I have rea
Anyways, I've tweaked the GDC codegen, and program speed meets that of
C++ now (on my system).
Implementation: http://ideone.com/0j0L1
Command-line:
gdc -O3 -mfpmath=sse -ffast-math -march=native -frelease
g++ bench.cc -O3 -mfpmath=sse -ffast-math -march=native
Best times:
G++-32bit: 114
Walter:
> On 8/6/2011 4:46 PM, bearophile wrote:
> > Walter is not a lover of that -ffast-math switch.
>
> No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking
> IEEE conformance is something very, very few should even consider.
I have read several papers about FP arithm
On 8/6/2011 4:46 PM, bearophile wrote:
Walter is not a lover of that -ffast-math switch.
No, I am not. Few understand the subtleties of IEEE arithmetic, and breaking
IEEE conformance is something very, very few should even consider.
Iain Buclaw:
> Anyways, I've tweaked the GDC codegen, and program speed meets that of C++
> now (on
> my system).
Are you willing to explain your changes (and maybe give a link to the changes)?
Maybe Walter is interested for DMD too.
> Command-line:
> gdc -O3 -mfpmath=sse -ffast-math -march=n
Walter:
> A dynamic array is two values being passed, a pointer is one.
I know, but I think there are many optimization opportunities. An example:
private void foo(int[] a2) {}
void main() {
int[100] a1;
foo(a1);
}
In code like that I think a D compiler is free to compile like this, b
== Quote from bearophile (bearophileh...@lycos.com)'s article
> Iain Buclaw:
> > 1) using pointers over dynamic arrays. (5% speedup)
> > 2) removing the calls to CalVector4's constructor (5.7% speedup)
> With DMD I have seen 180k -> 190k vertices/sec replacing this:
> struct CalVector4 {
> floa
On 8/6/2011 3:19 PM, bearophile wrote:
I don't know why passing pointers gives some more performance here, compared
to passing dynamic arrays (but I have seen the same behaviour in other D
programs of mine).
A dynamic array is two values being passed, a pointer is one.
Iain Buclaw:
> 1) using pointers over dynamic arrays. (5% speedup)
> 2) removing the calls to CalVector4's constructor (5.7% speedup)
With DMD I have seen 180k -> 190k vertices/sec replacing this:
struct CalVector4 {
float X, Y, Z, W;
this(float x, float y, float z, float w = 0.0f) {
== Quote from bearophile (bearophileh...@lycos.com)'s article
> Iain Buclaw:
> Are you using GDC2-64 bit on Linux?
GDC2-32 bit on Linux.
> > Three things that helped improve performance in a minor way for me:
> > 1) using pointers over dynamic arrays. (5% speedup)
> > 2) removing the calls to Ca
Iain Buclaw:
Are you using GDC2-64 bit on Linux?
> Three things that helped improve performance in a minor way for me:
> 1) using pointers over dynamic arrays. (5% speedup)
> 2) removing the calls to CalVector4's constructor (5.7% speedup)
> 3) using core.stdc.time over std.datetime. (1.6% speedu
== Quote from bearophile (bearophileh...@lycos.com)'s article
> Iain Buclaw:
> > I will look into this later from my workstation.
> The remaining thing to look at is just the small performance difference
> between
the D-GDC version and the C++-G++ version.
> Bye,
> bearophile
Three things that he
Iain Buclaw:
> I will look into this later from my workstation.
The remaining thing to look at is just the small performance difference between
the D-GDC version and the C++-G++ version.
Bye,
bearophile
== Quote from bearophile (bearophileh...@lycos.com)'s article
> Trass3r:
> > C++ no SIMD:
> > Skinned vertices per second: 4242
> >
> ...
> > D gdc:
> > Skinned vertices per second: 2345
> Are you able and willing to show me the asm produced by gdc? There's a problem
there.
> Bye,
> bearoph
I'd like to know why the GCC back-end is able to produce a more
efficient binary from the C++ code (compared to the D code), but now the
problem is not large, as before.
I attached both asm versions ;)
cppver.s
Description: Binary data
dver.s
Description: Binary data
Trass3r:
> >> C++ no SIMD:
> >> Skinned vertices per second: 4242
>...
> D gdc with added -frelease -fno-bounds-check:
> Skinned vertices per second: 3771
I'd like to know why the GCC back-end is able to produce a more efficient
binary from the C++ code (compared to the D code), but now
Am 04.08.2011, 04:07 Uhr, schrieb Trass3r :
C++:
Skinned vertices per second: 4866
C++ no SIMD:
Skinned vertices per second: 4242
D dmd:
Skinned vertices per second: 159046
D gdc:
Skinned vertices per second: 2345
D ldc:
Skinned vertices per second: 3791
ldc2 -O3 -release
If you want to go on with this exploration, then I suggest you to find a
way to disable bound tests.
Ok, now I get up to 3293 skinned vertices per second.
Still a bit worse than LDC.
Trass3r:
> > are you willing and able to show me the asm before it gets assembled?
> > (with gcc you do it with the -S switch). (I also suggest to use only the
> > C standard library, with time() and printf() to produce a smaller asm
> > output: http://codepad.org/12EUo16J ).
You are a pers
Adam Ruppe wrote:
But what's the purpose of those callq? They seem to call the
successive asm instruct
I find AT&T syntax to be almost impossible to read, but it looks
like they are comparing the instruction pointer for some reason.
call works by pushing the instruction pointer on the stack, t
> But what's the purpose of those callq? They seem to call the
> successive asm instruct
I find AT&T syntax to be almost impossible to read, but it looks
like they are comparing the instruction pointer for some reason.
call works by pushing the instruction pointer on the stack, then
jumping to th
> Trass3r:
>> are you able and willing to show me the asm produced by gdc? There's a
>> problem there.
> [attach bla.rar]
In the bla.rar attach there's the unstripped Linux binary, so to read the asm I
have used the objdump disassembler. But are you willing and able to show me the
asm before it
e you able and willing to show me the asm produced by gdc? There's a
problem there.
bla.rar
Description: application/rar-compressed
Marco Leise wrote:
> I thought he was referring to the processor being able to handle
> 64-bit ints more efficiently in 64-bit operation mode on a 64-bit OS
> with 64-bit executables.
I was thinking a little of both but this is the main thing. My
suspicion was that Java might have been using a 64
Am 03.08.2011, 21:52 Uhr, schrieb David Nadlinger :
On 8/3/11 9:48 PM, Adam D. Ruppe wrote:
System: Windows XP, Core 2 Duo E6850
Is this Windows XP 32 bit or 64 bit? That will probably make
a difference on the longs I'd expect.
It doesn't, long is 32-bit wide on Windows x86_64 too (LLP64).
Trass3r:
> C++ no SIMD:
> Skinned vertices per second: 4242
>
...
> D gdc:
> Skinned vertices per second: 2345
Are you able and willing to show me the asm produced by gdc? There's a problem
there.
Bye,
bearophile
C++:
Skinned vertices per second: 4866
C++ no SIMD:
Skinned vertices per second: 4242
D dmd:
Skinned vertices per second: 159046
D gdc:
Skinned vertices per second: 2345
D ldc:
Skinned vertices per second: 3791
ldc2 -O3 -release -enable-inlining dver.d
C++:
Skinned vertices per second: 4866
C++ no SIMD:
Skinned vertices per second: 4242
D dmd:
Skinned vertices per second: 159046
D gdc:
Skinned vertices per second: 2345
Compilers:
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)
g++ -s -O3 -mfpmath=sse -ffast-math -march=nativ
Trass3r:
> I'm afraid not. dmd's backend isn't good at floating point calculations.
Studying a bit the asm it's not hard to find the cause, because this benchmark
is quite pure (synthetic, despite I think it comes from real-world code).
This is what G++ generates from the C++ code without intri
Looks like a spiteful joke... In other words: WTF?! JavaScript is about
10 times faster than D in floating point calculations!? Please, tell me
that I'm mistaken.
I'm afraid not. dmd's backend isn't good at floating point calculations.
Denis Shelomovskij:
> (tests from bearophile's message, C++ test is "skinning_test_no_simd.cpp").
For a more realistic test I suggest you to time the C++ version that uses the
intrinsics too (only for float).
> Looks like a spiteful joke... In other words: WTF?! JavaScript is about
> 10 times
03.08.2011 22:48, Adam D. Ruppe пишет:
System: Windows XP, Core 2 Duo E6850
Is this Windows XP 32 bit or 64 bit? That will probably make
a difference on the longs I'd expect.
I meant Windows XP 32 bit (5.1 (Build 2600: Service Pack 3)) (according
to what is "Windows XP" in wikipedia)
> System: Windows XP, Core 2 Duo E6850
Is this Windows XP 32 bit or 64 bit? That will probably make
a difference on the longs I'd expect.
On 8/3/11 9:48 PM, Adam D. Ruppe wrote:
System: Windows XP, Core 2 Duo E6850
Is this Windows XP 32 bit or 64 bit? That will probably make
a difference on the longs I'd expect.
It doesn't, long is 32-bit wide on Windows x86_64 too (LLP64).
David
03.08.2011 22:15, Ziad Hatahet:
I believe that "long" in this case is 32 bits in C++, and 64-bits in the
remaining languages, hence the same result for int and long in C++. Try
with "long long" maybe? :)
--
Ziad
2011/8/3 Denis Shelomovskij mailto:verylonglogin@gmail.com>>
03.08.2011
I believe that "long" in this case is 32 bits in C++, and 64-bits in the
remaining languages, hence the same result for int and long in C++. Try with
"long long" maybe? :)
--
Ziad
2011/8/3 Denis Shelomovskij
> 03.08.2011 18:20, bearophile:
>
> The benchmark info:
>> http://chadaustin.me/2011
03.08.2011 18:20, bearophile:
The benchmark info:
http://chadaustin.me/2011/01/digging-into-javascript-performance/
The code, in C++, JS, Java, C#:
https://github.com/chadaustin/Web-Benchmarks/
The C++/JS/Java code runs on a single core.
D2 version translated from the C# version (the C++ versio
The benchmark info:
http://chadaustin.me/2011/01/digging-into-javascript-performance/
The code, in C++, JS, Java, C#:
https://github.com/chadaustin/Web-Benchmarks/
The C++/JS/Java code runs on a single core.
D2 version translated from the C# version (the C++ version uses struct
inheritance!):
ht
40 matches
Mail list logo