Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Witek via Digitalmars-d
Hi, I was playing with std.parallelism, and implemented a parallel NQueens code to count number of solutions to the classical NQueens problem. I do limited parallel recursion using taskPool.parallel foreach, and then switch at some level to serial algorithm. I use return values / atomic varia

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Dicebot via Digitalmars-d
On 02/18/2016 02:00 PM, Witek wrote: > So, the question is, why is D / DMD allocator so slow under heavy > multithreading? The working set is pretty small (few megabytes at most), > so I do not think this is an issue with GC scanning itself. Can I > plug-in tcmalloc / jemalloc, to be used as the u

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Vladimir Panteleev via Digitalmars-d
On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote: So, the question is, why is D / DMD allocator so slow under heavy multithreading? The working set is pretty small (few megabytes at most), so I do not think this is an issue with GC scanning itself. Can I plug-in tcmalloc / jemalloc,

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Witek via Digitalmars-d
On Thursday, 18 February 2016 at 13:49:45 UTC, Vladimir Panteleev wrote: On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote: So, the question is, why is D / DMD allocator so slow under heavy multithreading? The working set is pretty small (few megabytes at most), so I do not think this

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Vladimir Panteleev via Digitalmars-d
On Thursday, 18 February 2016 at 13:55:02 UTC, Witek wrote: It was pretty hard to find out, because it was hidden behind "~". Yes, -vgc helped here, but still, I was not expecting so terrible performance. Aside -vgc, you can also use @nogc to forbid GC use in code thus annotated.

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Witek via Digitalmars-d
On Thursday, 18 February 2016 at 13:10:28 UTC, Dicebot wrote: On 02/18/2016 02:00 PM, Witek wrote: So, the question is, why is D / DMD allocator so slow under heavy multithreading? The working set is pretty small (few megabytes at most), so I do not think this is an issue with GC scanning itse

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Chris Wright via Digitalmars-d
On Thu, 18 Feb 2016 13:00:12 +, Witek wrote: > So, the question is, why is D / DMD allocator so slow under heavy > multithreading? It's a global GC, possibly with a little per-thread pool. As part of the abortive Amber language project, I was looking into ways to craft per-thread GC. You nee

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-18 Thread Russel Winder via Digitalmars-d
On Thu, 2016-02-18 at 17:27 +, Chris Wright via Digitalmars-d wrote: > […] > > I would like to look into D's GC and parallelism more. I've started > on  > mark/sweep parallelism but haven't made any worthwhile progress. > I'll  > take this as my next task. It's more awkward because it requires

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread Jonathan M Davis via Digitalmars-d
On Thursday, 18 February 2016 at 17:27:13 UTC, Chris Wright wrote: On Thu, 18 Feb 2016 13:00:12 +, Witek wrote: So, the question is, why is D / DMD allocator so slow under heavy multithreading? It's a global GC, possibly with a little per-thread pool. As part of the abortive Amber languag

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread Bottled Gin via Digitalmars-d
On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote: Anyhow, everything was good on 1 or 2 threads, or maybe few more, on my laptop with old Dual Core CPU. I was able to speed it up exactly by a factor of 2x. I wanted to try it out on bigger machine, so used Amazone AWS EC2 c4.8xlarge

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread Kagamin via Digitalmars-d
On Thursday, 18 February 2016 at 13:55:02 UTC, Witek wrote: I will try using std.experimental.allocator, but this doesn't play well with "~", and I would need to manually do expandArray, and array operations, which is a pain. It would be nice to encode allocator used in the type, potentially by

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread Martin Nowak via Digitalmars-d
On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote: So, the question is, why is D / DMD allocator so slow under heavy multithreading? The working set is pretty small (few megabytes at most), so I do not think this is an issue with GC scanning itself. Can I plug-in tcmalloc / jemalloc,

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread Chris Wright via Digitalmars-d
On Fri, 19 Feb 2016 07:01:56 +, Russel Winder via Digitalmars-d wrote: > On Thu, 2016-02-18 at 17:27 +, Chris Wright via Digitalmars-d wrote: >> […] >> >> I would like to look into D's GC and parallelism more. I've started on >> mark/sweep parallelism but haven't made any worthwhile progr

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread Yuxuan Shui via Digitalmars-d
On Friday, 19 February 2016 at 08:29:00 UTC, Jonathan M Davis wrote: On Thursday, 18 February 2016 at 17:27:13 UTC, Chris Wright wrote: [...] Unfortunately, given how easy it is to cast between mutable, const, immutable, shared (and it's quite common to construct something as mutable and the

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread Chris Wright via Digitalmars-d
On Fri, 19 Feb 2016 21:45:31 +, Yuxuan Shui wrote: > On Friday, 19 February 2016 at 08:29:00 UTC, Jonathan M Davis wrote: >> On Thursday, 18 February 2016 at 17:27:13 UTC, Chris Wright wrote: >>> [...] >> >> Unfortunately, given how easy it is to cast between mutable, const, >> immutable, shar

Re: Poor memory allocation performance with a lot of threads on 36 core machine

2016-02-19 Thread rsw0x via Digitalmars-d
On Friday, 19 February 2016 at 08:29:00 UTC, Jonathan M Davis wrote: but the fact that we're a system language that allows you ultimately to do most anything really limits what we can do in comparison to a language sitting in VM. - Jonathan M Davis some small language changes could greatly i