Re: Replacing C's memcpy with a D implementation

2018-06-17 Thread David Nadlinger via Digitalmars-d
On Monday, 11 June 2018 at 03:34:59 UTC, Basile B. wrote: - default linux: https://github.com/gcc-mirror/gcc/blob/master/libgcc/memcpy.c To see what is executed when you call memcpy() on a regular GNU/Linux distro, you'd want to have a look at glibc instead. For example, the AVX2 and AVX512

Re: Replacing C's memcpy with a D implementation

2018-06-17 Thread David Nadlinger via Digitalmars-d
On Monday, 11 June 2018 at 08:02:42 UTC, Walter Bright wrote: On 6/10/2018 9:44 PM, Patrick Schluter wrote: See what Agner Fog has to say about it: Thanks. Agner Fog gets the last word on this topic! Well, Agner is rarely wrong indeed, but there is a limit to how much material a single

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 18:34:58 UTC, Johannes Pfau wrote: I understand that you actually need to reimplement memcpy, as in your microcontroller usecase you don't want to have any C runtime. So you'll basically have to rewrite the C runtime parts D depends on. However, I think for memcpy

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Walter Bright via Digitalmars-d
https://github.com/dlang/druntime/pull/2213

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Walter Bright via Digitalmars-d
On 6/11/2018 11:17 AM, Guillaume Piolat wrote: I don't know if someone really wrote this code, or if it was all from intrinsics. memcpy is so critical to success it is likely written by Intel itself to ensure every drop of perf is wrung out of the CPU. I was Intel CEO I'd direct the CPU

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Johannes Pfau via Digitalmars-d
Am Mon, 11 Jun 2018 10:54:23 + schrieb Mike Franklin: > On Monday, 11 June 2018 at 10:38:30 UTC, Mike Franklin wrote: >> On Monday, 11 June 2018 at 10:07:39 UTC, Walter Bright wrote: >> I think there might also be optimization opportunities using templates, metaprogramming, and type

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Guillaume Piolat via Digitalmars-d
BTW the way memcpy is(was?) implemented in the C runtime coming from the Inter C++ compiler was really enlightening on the sheer difficulty of such a task. First of all there isn't one loop but many depending on the source and destination alignment. - If both are aligned on 16-byte

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Walter Bright via Digitalmars-d
On 6/11/2018 6:00 AM, Steven Schveighoffer wrote: No, __doPostblit is necessary -- you are making a copy. example: File[] fs = new File[5]; fs[0] = ...; // initialize fs auto fs2 = fs; fs.length = 100; At this point, fs points at a separate block from fs2. If you did not do postblit on

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Steven Schveighoffer via Digitalmars-d
On 6/11/18 4:00 AM, Walter Bright wrote: (I notice it is doing __doPostblit(). This looks wrong, D allows data to be moved. As far as I can tell with a perfunctory examination, that's the only "can throw" bit.) No, __doPostblit is necessary -- you are making a copy. example: File[] fs =

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 10:38:30 UTC, Mike Franklin wrote: On Monday, 11 June 2018 at 10:07:39 UTC, Walter Bright wrote: I think there might also be optimization opportunities using templates, metaprogramming, and type introspection, that are not currently possible with the current design.

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 10:07:39 UTC, Walter Bright wrote: I think there might also be optimization opportunities using templates, metaprogramming, and type introspection, that are not currently possible with the current design. Just making it a template doesn't automatically enable any

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 10:07:39 UTC, Walter Bright wrote: We have no design for this function that doesn't rely on the GC, and the GC needs TypeInfo. This function is not usable with betterC with or without the TypeInfo argument. I understand that. I was using `_d_arraysetlengthT` as an

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Walter Bright via Digitalmars-d
On 6/11/2018 1:12 AM, Mike Franklin wrote: On Monday, 11 June 2018 at 08:00:10 UTC, Walter Bright wrote: Making it a template is not really necessary. The compiler knows if there is the possibility of it throwing based on the type, it doesn't need to infer it. There are other reasons to make

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 08:05:14 UTC, Walter Bright wrote: On 6/10/2018 8:34 PM, Basile B. wrote: - default win32 OMF: https://github.com/DigitalMars/dmc/blob/master/src/core/MEMCCPY.C I think you mean: https://github.com/DigitalMars/dmc/blob/master/src/CORE32/MEMCPY.ASM Cool! and it's

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 08:00:10 UTC, Walter Bright wrote: Making it a template is not really necessary. The compiler knows if there is the possibility of it throwing based on the type, it doesn't need to infer it. There are other reasons to make it a template, though. For example, if

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Walter Bright via Digitalmars-d
On 6/10/2018 8:34 PM, Basile B. wrote: - default win32 OMF: https://github.com/DigitalMars/dmc/blob/master/src/core/MEMCCPY.C I think you mean: https://github.com/DigitalMars/dmc/blob/master/src/CORE32/MEMCPY.ASM

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Walter Bright via Digitalmars-d
On 6/10/2018 8:43 PM, Mike Franklin wrote: That only addresses the @safe attribute, and that code is much too complex for anyone to audit it and certify it as safe. Exceptions are also not all handled, so there is no way it can pass as nothrow. The runtime call needs to be replaced with a

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Walter Bright via Digitalmars-d
On 6/10/2018 9:44 PM, Patrick Schluter wrote: See what Agner Fog has to say about it: Thanks. Agner Fog gets the last word on this topic!

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Basile B. via Digitalmars-d
On Monday, 11 June 2018 at 03:34:59 UTC, Basile B. wrote: On Monday, 11 June 2018 at 01:03:16 UTC, Mike Franklin wrote: [...] - default win32 OMF: https://github.com/DigitalMars/dmc/blob/master/src/core/MEMCCPY.C - default linux: https://github.com/gcc-mirror/gcc/blob/master/libgcc/memcpy.c

Re: Replacing C's memcpy with a D implementation

2018-06-11 Thread Basile B. via Digitalmars-d
On Monday, 11 June 2018 at 03:34:59 UTC, Basile B. wrote: On Monday, 11 June 2018 at 01:03:16 UTC, Mike Franklin wrote: [...] - default win32 OMF: https://github.com/DigitalMars/dmc/blob/master/src/core/MEMCCPY.C - default linux: https://github.com/gcc-mirror/gcc/blob/master/libgcc/memcpy.c

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Patrick Schluter via Digitalmars-d
On Sunday, 10 June 2018 at 13:45:54 UTC, Mike Franklin wrote: On Sunday, 10 June 2018 at 13:16:21 UTC, Adam D. Ruppe wrote: memcpyD: 1 ms, 725 μs, and 1 hnsec memcpyD2: 587 μs and 5 hnsecs memcpyASM: 119 μs and 5 hnsecs Still, the ASM version is much faster. rep movsd is very CPU

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 03:31:05 UTC, Walter Bright wrote: On 6/10/2018 7:49 PM, Mike Franklin wrote: On Sunday, 10 June 2018 at 15:12:27 UTC, Kagamin wrote: If the compiler can't get it right then who can? The compiler implementation is faulty.  It rewrites the expressions to an

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Basile B. via Digitalmars-d
On Monday, 11 June 2018 at 01:03:16 UTC, Mike Franklin wrote: I've modified the test based on the feedback so far, so here's what it looks like now: import std.datetime.stopwatch; import std.stdio; import core.stdc.string; import std.random; import std.algorithm; enum length = 4096 * 2; void

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Walter Bright via Digitalmars-d
On 6/10/2018 7:49 PM, Mike Franklin wrote: On Sunday, 10 June 2018 at 15:12:27 UTC, Kagamin wrote: If the compiler can't get it right then who can? The compiler implementation is faulty.  It rewrites the expressions to an `extern(C)` runtime implementation that is not @safe, nothrow, or pure:

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
On Monday, 11 June 2018 at 02:49:00 UTC, Mike Franklin wrote: The compiler implementation is faulty. It rewrites the expressions to an `extern(C)` runtime implementation that is not @safe, nothrow, or pure:

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
On Sunday, 10 June 2018 at 15:12:27 UTC, Kagamin wrote: On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: There are many reasons to do this, one of which is to leverage information available at compile-time and in D's type system (type sizes, alignment, etc...) in order to optimize

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Nick Sabalausky (Abscissa) via Digitalmars-d
On 06/10/2018 08:01 PM, Walter Bright wrote: On 6/10/2018 4:39 PM, David Nadlinger wrote: That's not entirely true. Intel started optimising some of the REP string instructions again on Ivy Bridge and above. There is a CPUID bit to indicate that (ERMS?); I'm sure the Optimization Manual has

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
I've modified the test based on the feedback so far, so here's what it looks like now: import std.datetime.stopwatch; import std.stdio; import core.stdc.string; import std.random; import std.algorithm; enum length = 4096 * 2; void init(ref ubyte[] a) { a.length = length; for(int i =

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Walter Bright via Digitalmars-d
On 6/10/2018 4:39 PM, David Nadlinger wrote: That's not entirely true. Intel started optimising some of the REP string instructions again on Ivy Bridge and above. There is a CPUID bit to indicate that (ERMS?); I'm sure the Optimization Manual has further details. From what I remember, `rep

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread David Nadlinger via Digitalmars-d
On Sunday, 10 June 2018 at 22:23:08 UTC, Walter Bright wrote: On 6/10/2018 11:16 AM, David Nadlinger wrote: Because of the large amounts of noise, the only conclusion one can draw from this is that memcpyD is the slowest, Probably because it does a memory allocation. Of course; that was

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Temtaime via Digitalmars-d
On Sunday, 10 June 2018 at 22:23:08 UTC, Walter Bright wrote: On 6/10/2018 11:16 AM, David Nadlinger wrote: Because of the large amounts of noise, the only conclusion one can draw from this is that memcpyD is the slowest, Probably because it does a memory allocation. followed by the ASM

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread solidstate1991 via Digitalmars-d
On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: void memcpyASM() { auto s = src.ptr; auto d = dst.ptr; size_t len = length; asm pure nothrow @nogc { mov RSI, s; mov RDI, d; cld; mov RCX, len; rep; movsb; } }

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Walter Bright via Digitalmars-d
On 6/10/2018 11:16 AM, David Nadlinger wrote: Because of the large amounts of noise, the only conclusion one can draw from this is that memcpyD is the slowest, Probably because it does a memory allocation. followed by the ASM implementation. The CPU makers abandoned optimizing the REP

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Walter Bright via Digitalmars-d
On 6/10/2018 6:45 AM, Mike Franklin wrote: void memcpyD() {     dst = src.dup; } Note that .dup is doing a GC memory allocation.

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Walter Bright via Digitalmars-d
On 6/10/2018 5:49 AM, Mike Franklin wrote: [...] One source of entropy in the results is src and dst being global variables. Global variables in D are in TLS, and TLS access can be complex (many instructions) and is influenced by the -fPIC switch. Worse, global variable access is not

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread I love Ice Cream via Digitalmars-d
Don't C implementations already do 90% of what you want? I thought most compilers know about and optimize these methods based on context. I thought they were *special* in the eyes of the compiler already. I think you are fighting a battle pitting 40 years of tweaking against you...

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread David Nadlinger via Digitalmars-d
On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: I'm not experienced with this kind of programming, so I'm doubting these results. Have I done something wrong? Am I overlooking something? You've just discovered the fact that one can rarely be careful enough with what is

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Kagamin via Digitalmars-d
On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: There are many reasons to do this, one of which is to leverage information available at compile-time and in D's type system (type sizes, alignment, etc...) in order to optimize the implementation of these functions, and allow them

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Seb via Digitalmars-d
On Sunday, 10 June 2018 at 13:45:54 UTC, Mike Franklin wrote: On Sunday, 10 June 2018 at 13:16:21 UTC, Adam D. Ruppe wrote: arr1[] = arr2[]; // the compiler makes this memcpy, the optimzer can further do its magic void memcpyD() { dst = src.dup; } void memcpyD2() { dst[] = src[]; }

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread rikki cattermole via Digitalmars-d
On 11/06/2018 1:45 AM, Mike Franklin wrote: On Sunday, 10 June 2018 at 13:16:21 UTC, Adam D. Ruppe wrote: arr1[] = arr2[]; // the compiler makes this memcpy, the optimzer can further do its magic void memcpyD() {     dst = src.dup; malloc (for slice not static array) } void memcpyD2()

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
On Sunday, 10 June 2018 at 13:16:21 UTC, Adam D. Ruppe wrote: arr1[] = arr2[]; // the compiler makes this memcpy, the optimzer can further do its magic void memcpyD() { dst = src.dup; } void memcpyD2() { dst[] = src[]; } - memcpyD: 1 ms, 725 μs, and 1 hnsec memcpyD2: 587 μs and

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
On Sunday, 10 June 2018 at 13:17:53 UTC, Guillaume Piolat wrote: Please make one that guarantee the usage of the corresponding backend intrinsic, for example on LLVM. I tested with ldc and got similar results. I thought the implementation in C forwarded to the backend intrinsic. I think

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
On Sunday, 10 June 2018 at 13:16:21 UTC, Adam D. Ruppe wrote: And D already has it built in as well for @safe etc: arr1[] = arr2[]; // the compiler makes this memcpy, the optimzer can further do its magic so be sure to check against that too. My intent is to use the D implementation in

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
On Sunday, 10 June 2018 at 13:05:33 UTC, Nicholas Wilson wrote: On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: I'm exploring the possibility of implementing some of the basic software building blocks (memcpy, memcmp, memmove, etc...) that D utilizes from the C library with D

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Adam D. Ruppe via Digitalmars-d
On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: D utilizes from the C library with D implementations. There are many reasons to do this, one of which is to leverage information available at compile-time and in D's type system (type sizes, alignment, etc...) in order to optimize

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Guillaume Piolat via Digitalmars-d
On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: I'm not experienced with this kind of programming, so I'm doubting these results. Have I done something wrong? Am I overlooking something? Hi, I've spent a lot of time optimizing memcpy. One of the result was that on Intel ICC

Re: Replacing C's memcpy with a D implementation

2018-06-10 Thread Nicholas Wilson via Digitalmars-d
On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote: I'm exploring the possibility of implementing some of the basic software building blocks (memcpy, memcmp, memmove, etc...) that D utilizes from the C library with D implementations. There are many reasons to do this, one of which

Replacing C's memcpy with a D implementation

2018-06-10 Thread Mike Franklin via Digitalmars-d
I'm exploring the possibility of implementing some of the basic software building blocks (memcpy, memcmp, memmove, etc...) that D utilizes from the C library with D implementations. There are many reasons to do this, one of which is to leverage information available at compile-time and in D's