memcpy vs slice copy

2009-03-14 Thread bearophile
While doing some string processing I've seen some unusual timings compared to the C code, so I have written this to see the situation better. When USE_MEMCPY is false this little benchmark runs about 3+ times slower: import std.c.stdlib: malloc; import std.c.string: memcpy; import std.stdio: writ

Re: memcpy vs slice copy

2009-03-15 Thread Moritz Warning
On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: > While doing some string processing I've seen some unusual timings > compared to the C code, so I have written this to see the situation > better. When USE_MEMCPY is false this little benchmark runs about 3+ > times slower: I did a little ben

Re: memcpy vs slice copy

2009-03-15 Thread bearophile
Moritz Warning: > I don't see a very big difference between slice copying and memcpy (but > between compilers). I have taken the times again. My timings, best of 4: true: 1.33 s false: 4.28 s I have used dmd 1.041, with Phobos, on WinXP 32 bit, 2 GB RAM, CPU Core 2 at 2 GHz. This may be anothe

Re: memcpy vs slice copy

2009-03-15 Thread Moritz Warning
On Sun, 15 Mar 2009 13:17:50 +, Moritz Warning wrote: > On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: > >> While doing some string processing I've seen some unusual timings >> compared to the C code, so I have written this to see the situation >> better. When USE_MEMCPY is false this

Re: memcpy vs slice copy

2009-03-15 Thread Sergey Gromov
Sun, 15 Mar 2009 13:17:50 + (UTC), Moritz Warning wrote: > On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: > >> While doing some string processing I've seen some unusual timings >> compared to the C code, so I have written this to see the situation >> better. When USE_MEMCPY is false th

Re: memcpy vs slice copy

2009-03-15 Thread bearophile
For reference here's a simple C version: #include "stdlib.h" #include "string.h" #include "stdio.h" #define N 1 #define L 6 char h[L] = "hello\n"; int main() { char *ptr; if (N <= 100) ptr = malloc(N * L + 1); // the +1 is for the final printing else ptr = ma

Re: memcpy vs slice copy

2009-03-15 Thread Sergey Gromov
Sun, 15 Mar 2009 10:31:10 -0400, bearophile wrote: > The ASM of the inner loop: > > L: movl_h, %eax > movl%eax, (%edx) > movzwl _h+4, %eax > movw%ax, 4(%edx) > addl$6, %edx > cmpl%ecx, %edx > jne L Obviously, a memcpy intrinsic is at work here. DMD

Re: memcpy vs slice copy

2009-03-15 Thread bearophile
Sergey Gromov: > Obviously, a memcpy intrinsic is at work here.< Yes, gcc is able to recognize some calls to C library functions and replace them with intrinsics. I think LDC too uses an intrinsic to copy memory of a slice. This isn't a too much interesting benchmark, there's nothing much intere

Re: memcpy vs slice copy

2009-03-16 Thread Don
Sergey Gromov wrote: Sun, 15 Mar 2009 13:17:50 + (UTC), Moritz Warning wrote: On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: While doing some string processing I've seen some unusual timings compared to the C code, so I have written this to see the situation better. When USE_MEMCPY

Re: memcpy vs slice copy

2009-03-16 Thread bearophile
Don: >Which means that memcpy probably isn't anywhere near optimal, either.< Time ago I have read an article written by AMD that shows that indeed with modern CPUs there are ways to go much faster, using vector asm instructions, loop unrolling and explicit cache prefetching (but it's useful with

Re: memcpy vs slice copy

2009-03-16 Thread Jarrett Billingsley
On Mon, Mar 16, 2009 at 8:43 AM, bearophile wrote: > Don: >>Which means that memcpy probably isn't anywhere near optimal, either.< > > Time ago I have read an article written by AMD that shows that indeed with > modern CPUs there are ways to go much faster, using vector asm instructions, > loop

Re: memcpy vs slice copy

2009-03-16 Thread Sergey Gromov
Mon, 16 Mar 2009 10:34:33 +0100, Don wrote: > Sergey Gromov wrote: >> Sun, 15 Mar 2009 13:17:50 + (UTC), Moritz Warning wrote: >> >>> On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: >>> While doing some string processing I've seen some unusual timings compared to the C code, s

Re: memcpy vs slice copy

2009-03-16 Thread Don
Sergey Gromov wrote: Mon, 16 Mar 2009 10:34:33 +0100, Don wrote: Sergey Gromov wrote: Sun, 15 Mar 2009 13:17:50 + (UTC), Moritz Warning wrote: On Sat, 14 Mar 2009 23:50:58 -0400, bearophile wrote: While doing some string processing I've seen some unusual timings compared to the C code,

Re: memcpy vs slice copy

2009-03-16 Thread Walter Bright
Don wrote: Oh. I didn't see it was only 6 bytes. And the compiler even KNOWS it's six bytes -- it's in the asm. Blimey. It should just be doing that as a direct sequence of loads and stores, for anything up to at least 8 bytes. The compiler will replace it with a simple mov if it is 1, 2, 4 or

Re: memcpy vs slice copy

2009-03-16 Thread BCS
Hello Jarrett, I'm actually kind of shocked that given the prevalence of memory block copy operations that more CPUs haven't implemented it as a basic instruction. Yes, RISC is nice, but geez, this seems like a no-brainer. How about memory to memory DMA, Why even make the CPU wait for it to

Re: memcpy vs slice copy

2009-03-16 Thread Don
Walter Bright wrote: Don wrote: Oh. I didn't see it was only 6 bytes. And the compiler even KNOWS it's six bytes -- it's in the asm. Blimey. It should just be doing that as a direct sequence of loads and stores, for anything up to at least 8 bytes. The compiler will replace it with a simple

Re: memcpy vs slice copy

2009-03-16 Thread Jarrett Billingsley
On Mon, Mar 16, 2009 at 3:29 PM, BCS wrote: >> I'm actually kind of shocked that given the prevalence of memory block >> copy operations that more CPUs haven't implemented it as a basic >> instruction.  Yes, RISC is nice, but geez, this seems like a >> no-brainer. >> > > How about memory to memory

Re: memcpy vs slice copy

2009-03-16 Thread Don
BCS wrote: Hello Jarrett, I'm actually kind of shocked that given the prevalence of memory block copy operations that more CPUs haven't implemented it as a basic instruction. Yes, RISC is nice, but geez, this seems like a no-brainer. How about memory to memory DMA, Why even make the CPU wai

Re: memcpy vs slice copy

2009-03-16 Thread BCS
Hello Don, BCS wrote: Hello Jarrett, I'm actually kind of shocked that given the prevalence of memory block copy operations that more CPUs haven't implemented it as a basic instruction. Yes, RISC is nice, but geez, this seems like a no-brainer. How about memory to memory DMA, Why even mak

Re: memcpy vs slice copy

2009-03-16 Thread BCS
Hello Jarrett, On Mon, Mar 16, 2009 at 3:29 PM, BCS wrote: I'm actually kind of shocked that given the prevalence of memory block copy operations that more CPUs haven't implemented it as a basic instruction. Yes, RISC is nice, but geez, this seems like a no-brainer. How about memory to mem

Re: memcpy vs slice copy

2009-03-16 Thread Christopher Wright
bearophile wrote: Don: Which means that memcpy probably isn't anywhere near optimal, either.< Time ago I have read an article written by AMD that shows that indeed with modern CPUs there are ways to go much faster, using vector asm instructions, loop unrolling and explicit cache prefetching

Re: memcpy vs slice copy

2009-03-17 Thread Don
Christopher Wright wrote: bearophile wrote: Don: Which means that memcpy probably isn't anywhere near optimal, either.< Time ago I have read an article written by AMD that shows that indeed with modern CPUs there are ways to go much faster, using vector asm instructions, loop unrolling and

Re: memcpy vs slice copy

2009-03-17 Thread Lionello Lunesu
This has been discussed before, to no avail. http://d.puremagic.com/issues/show_bug.cgi?id=2313 L.

Re: memcpy vs slice copy

2009-03-19 Thread Sergey Gromov
Mon, 16 Mar 2009 11:36:50 -0700, Walter Bright wrote: > Don wrote: >> Oh. I didn't see it was only 6 bytes. And the compiler even KNOWS it's >> six bytes -- it's in the asm. Blimey. It should just be doing that as a >> direct sequence of loads and stores, for anything up to at least 8 bytes. >