On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and
"dip80-ndslice": "~>0.8.8". If I convert the 2D slice with
.array(), should that first dimension then be compatible with
parallel foreach?
[...]
On Sunday, 10 January 2016 at 00:41:35 UTC, Ilya Yaroshenko wrote:
It is a bug (Slice or Parallel ?). Please fill this issue.
Slice should work with parallel, and array of slices should
work with parallel.
Ok, thanks, I'll submit it.
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and
"dip80-ndslice": "~>0.8.8". If I convert the 2D slice with
.array(), should that first dimension then be compatible with
parallel foreach?
I find that
for example,
means[63] through means[251] are consistently all NaN when using
parallel in this test, but are all computed double values when
parallel is not used.
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
I'm playing around with win32, v2.069.2 dmd and
"dip80-ndslice": "~>0.8.8". If I convert the 2D slice with
.array(), should that first dimension then be compatible with
parallel foreach?
[...]
It
I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice":
"~>0.8.8". If I convert the 2D slice with .array(), should that
first dimension then be compatible with parallel foreach?
I find that without using parallel, all the means get computed,
but with p
On 12/27/2015 04:17 PM, Jay Norwood wrote:
On Sunday, 27 December 2015 at 23:42:57 UTC, Ali Çehreli wrote:
That does not compile because i is size_t but apply_metrics() takes
an int. One solution is to call to!int:
foreach( i, ref a; parallel(samples[])){
apply_metrics(i.to!int,a
On Sunday, 27 December 2015 at 23:42:57 UTC, Ali Çehreli wrote:
That does not compile because i is size_t but apply_metrics()
takes an int. One solution is to call to!int:
foreach( i, ref a; parallel(samples[])){
apply_metrics(i.to!int,a);}
It builds for me still, and executes ok
an use
enumerate():
samples[].enumerate.each!(t=>apply_metrics(t[0].to!int,t[1]));
> foreach( i, ref a; parallel(samples[])){
apply_metrics(i,a);}
That does not compile because i is size_t but apply_metrics()
takes an int. One solution is to call to!int:
forea
trics(t[0].to!int,t[1]));
> foreach( i, ref a; parallel(samples[])){ apply_metrics(i,a);}
That does not compile because i is size_t but apply_metrics() takes an
int. One solution is to call to!int:
foreach( i, ref a; parallel(samples[])){
apply_metrics(i.to!int,a);}
To not
I'm doing some re-writing and measuring. The basic task is to
take 10K samples (in struct S samples below) and calculate some
metrics (just per sample for now). It isn't evident to me how to
write the parallel foreach in the same format as each!, so I just
used the loop f
On Tuesday, 24 November 2015 at 18:49:25 UTC, Bishop120 wrote:
I figured this would be a simple parallel foreach function with
an iota range of sizeX and just making int X declared inside
the function so that I didnt have to worry about shared
variable but I cant get around the alive
On 24.11.2015 19:49, Bishop120 wrote:
I figured this would be a simple parallel foreach function with an iota
range of sizeX and just making int X declared inside the function so
that I didnt have to worry about shared variable but I cant get around
the alive++ reduction and I dont understand
Hey everyone. A new D learner here. So far I love D and how
much better its working than C++. One thing I like doing is
parallel functions so with C++ using OMP. Right now Im trying to
figure out how to do Conways Game of Life in D in parallel.
Serially D is much faster than C++ so I feel
On 11/05/2015 12:58 PM, Handyman wrote:
> On Thursday, 5 November 2015 at 20:54:37 UTC, anonymous wrote:
>> There is not attempt to split the `prepare` action up and run parts of
>> it in parallel.
>
> So 1.25 secs is impossible?
For the given example, yes, impossible.
On Thursday, 5 November 2015 at 21:10:16 UTC, anonymous wrote:
parallel(iota(50)))
Wow. I have dealt with ranges and 'iota' (and with parallel), but
I admit I have to think hard about this example. Thanks a bunch
all for your patience.
can I make the cores prepare a meal of 5 dishes in 1.25 secs? Should I
rewrite, or split, 'prepare'?
You'd have to split `prepare` further into parallelizable parts. In a
real world scenario that may or may not be possible.
When the goal is just sleeping we can do it, of course.
On Thursday, 5 November 2015 at 20:54:37 UTC, anonymous wrote:
There is not attempt to split the `prepare` action up and run
parts of it in parallel.
So 1.25 secs is impossible?
On Thursday, 5 November 2015 at 20:45:25 UTC, Ali Çehreli wrote:
That's still 1 second per task. The function prepare() cannot
be executed by more than one core.
Thanks. OK. So 'prepare' is atomic? Then let's turn it around:
how can I make the cores prepare a meal of 5 dishes in 1.25 secs?
On 05.11.2015 21:43, Handyman wrote:
foreach (i; 0..50)
Thread.sleep(20.msecs);
But then my program still says: '2 secs'. Please enlighten me.
Let's look at the line that does the `parallel` call:
foreach (dish; parallel(dishes, 1)) dish.prepare();
This means t
On 11/05/2015 12:43 PM, Handyman wrote:
On Thursday, 5 November 2015 at 20:40:00 UTC, anonymous wrote:
So one of your four cores has to make two dishes. That takes two seconds.
So make fine-grained?
foreach (i; 0..50)
Thread.sleep(20.msecs);
But then my program still says: '2 secs'. Pl
On Thursday, 5 November 2015 at 20:40:00 UTC, anonymous wrote:
So one of your four cores has to make two dishes. That takes
two seconds.
So make fine-grained?
foreach (i; 0..50)
Thread.sleep(20.msecs);
But then my program still says: '2 secs'. Please enlighten me.
for 1
second in parallel, then complete at roughly the same time. One
second has passed.
Now there's one dish left. It gets scheduled, sleeps for 1
second, and finishes (the other threads remain idle). Two seconds
have passed.
On 05.11.2015 21:30, Handyman wrote:
Seems that 4 cores go all out on first 4 dishes, then 1 core deals with
the last dish. With 4 cores I expect diner is ready after 5/4 = 1.25
secs though. What did I do wrong?
You describe the situation correctly. The unit of work is a dish. That
is, the w
ot; ~ name ~ ".");
Thread.sleep(1.seconds); // kunstmatig tijd verbruiken
say("Finished the " ~ name ~ ".");
}
}
void main() {
auto dishes = [ Dish("soup"), Dish("sauce"), Dish("fries"),
Dish("fish"), Dish("ice") ];
This is another attempt with the metric parallel processing. This
uses the results only to return an int value, which could be used
later as an error return value. The metric value locations are
now allocated as a part of the input measurement values tuple.
The Tuple vs struct definitions
I re-submitted this as:
https://issues.dlang.org/show_bug.cgi?id=15135
measured samples
foreach(i, ref m; meas){
m.L1D_MISS= 100+i; m.L1I_MISS=100-i;
m.L1D_READ= 200+i; m.L1D_WRITE=200-i;
m.cycles= 10+i;
}
ref TI getTerm(int i)
{
return meas[i];
}
// compute the metric results for the
ew TO[samples.length];
// Initialize some values for the measured samples
foreach(i, ref m; meas){
m.L1D_MISS= 100+i; m.L1I_MISS=100-i;
}
ref TI getTerm(int i)
{
return meas[i];
}
// compute the metric results for the above measured sample
On Thursday, 1 October 2015 at 18:08:31 UTC, Ali Çehreli wrote:
Makes sense. Please open a bug at least for investigation why
tuples with named members don't work with amap.
ok, thanks. I opened the issue.
https://issues.dlang.org/show_bug.cgi?id=15134
On 10/01/2015 08:56 AM, Jay Norwood wrote:
> Thanks. My particular use case, working with metric expressions, is
> easier to understand if I use the names.
Makes sense. Please open a bug at least for investigation why tuples
with named members don't work with amap.
> I converted the use of T
;
sw.start();
ref TI getTerm(int i)
{
return meas[i];
}
// compute the metric results for the above measured sample
values in parallel
taskPool.amap!(Metrics)(std.algorithm.map!getTerm(samples),results);
TR rv1 = met_l1_miss( meas[0]);
TR rv2 = met_l1_hit
On 09/30/2015 09:15 PM, Jay Norwood wrote:
> alias TO = Tuple!(TR,"L1_MISS", TR, "L1_HIT", TR,"DATA_ACC",
TR,"ALL_ACC");
Looks like a bug. Workaround: Get rid of member names there:
alias TO = Tuple!(TR, TR, TR, TR);
>
//taskPool.amap!(Metrics)(std.algorithm.map!getTerm(samples),results);
){ proc_cyc = 1_000_000+i*2; DATA_RD = 1000+i; DATA_WR=
2000+i; INST_FETCH=proc_cyc/2;
L1I_HIT= INST_FETCH-100; L1I_MISS=100;
L1D_HIT= DATA_RD+DATA_WR - 200; L1D_MISS=200;}
}
std.datetime.StopWatch sw;
sw.start();
ref TI getTerm
On Wednesday, 30 September 2015 at 22:24:25 UTC, Jay Norwood
wrote:
// various metric definitions
// the Tuples could also define names for each member and use
the names here in the metrics.
long met1( TI m){ return m[0] + m[1] + m[2]; }
long met2( TI m){ return m[1] + m[2] + m[3]; }
long met3(
above measured sample
values in parallel
taskPool.amap!(Metrics)(std.algorithm.map!getTerm(samples),results);
// how long did this take
long exec_ms = sw.peek().msecs;
writeln("results:", results);
writeln("time:", exec_ms);
}
On Saturday, 26 September 2015 at 15:56:54 UTC, Jay Norwood wrote:
This results in a compile error:
auto sum3 = taskPool.reduce!"a + b"(iota(1UL,101UL));
I believe there was discussion of this problem recently ...
https://issues.dlang.org/show_bug.cgi?id=14832
https://issues.dlang.org/sho
On Mon, 2015-09-28 at 12:46 +, John Colvin via Digitalmars-d-learn
wrote:
> […]
>
> Pretty much as expected. Locks are slow, shared accumulators
> suck, much better to write to thread local and then merge.
Quite. Dataflow is where the parallel action is. (Except for t
On Monday, 28 September 2015 at 12:18:28 UTC, Russel Winder wrote:
As a single data point:
== anonymous_fix.d == 5050
real0m0.168s
user0m0.200s
sys 0m0.380s
== colvin_fix.d ==
5050
real0m0.036s
user0m
As a single data point:
== anonymous_fix.d ==
5050
real0m0.168s
user0m0.200s
sys 0m0.380s
== colvin_fix.d ==
5050
real0m0.036s
user0m0.124s
sys 0m0.000s
== norwood_reduce.d
On Mon, 2015-09-28 at 11:38 +, John Colvin via Digitalmars-d-learn
wrote:
> […]
>
> It would be really great if someone knowledgable did a full
> review of std.parallelism to find out the answer, hint, hint...
> :)
Indeed, I would love to be able to do this. However I don't have time
in th
On Monday, 28 September 2015 at 11:31:33 UTC, Russel Winder wrote:
On Sat, 2015-09-26 at 14:33 +0200, anonymous via
Digitalmars-d-learn wrote:
[…]
I'm pretty sure atomicOp is faster, though.
Rough and ready anecdotal evidence would indicate that this is
a reasonable statement, by quite a long
On Sat, 2015-09-26 at 17:20 +, Jay Norwood via Digitalmars-d-learn
wrote:
> This is a work-around to get a ulong result without having the
> ulong as the range variable.
>
> ulong getTerm(int i)
> {
> return i;
> }
> auto sum4 = taskPool.reduce!"a +
> b"(std.algorithm.map!getTerm(iota(10
On Sat, 2015-09-26 at 15:56 +, Jay Norwood via Digitalmars-d-learn
wrote:
> std.parallelism.reduce documentation provides an example of a
> parallel sum.
>
> This works:
> auto sum3 = taskPool.reduce!"a + b"(iota(1.0,101.0));
>
> This results
On Sat, 2015-09-26 at 14:33 +0200, anonymous via Digitalmars-d-learn
wrote:
> […]
> I'm pretty sure atomicOp is faster, though.
Rough and ready anecdotal evidence would indicate that this is a
reasonable statement, by quite a long way. However a proper benchmark
is needed for statistical significa
On Sat, 2015-09-26 at 12:32 +, Zoidberg via Digitalmars-d-learn
wrote:
> > Here's a correct version:
> >
> > import std.parallelism, std.range, std.stdio, core.atomic;
> > void main()
> > {
> > shared ulong i = 0;
>
On Saturday, 26 September 2015 at 17:20:34 UTC, Jay Norwood wrote:
This is a work-around to get a ulong result without having the
ulong as the range variable.
ulong getTerm(int i)
{
return i;
}
auto sum4 = taskPool.reduce!"a +
b"(std.algorithm.map!getTerm(iota(11)));
or
auto sum4
This is a work-around to get a ulong result without having the
ulong as the range variable.
ulong getTerm(int i)
{
return i;
}
auto sum4 = taskPool.reduce!"a +
b"(std.algorithm.map!getTerm(iota(11)));
btw, on my corei5, in debug build,
reduce (using double): 11msec
non_parallel: 37msec
parallel with atomicOp: 123msec
so, that is the reason for using parallel reduce, assuming the
ulong range thing will get fixed.
On Saturday, 26 September 2015 at 13:09:54 UTC, Meta wrote:
On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote:
foreach (f; parallel(iota(1, 100+1)))
{
synchronized i += f;
}
Is this valid syntax? I've never seen synchronized used like
this before.
std.parallelism.reduce documentation provides an example of a
parallel sum.
This works:
auto sum3 = taskPool.reduce!"a + b"(iota(1.0,101.0));
This results in a compile error:
auto sum3 = taskPool.reduce!"a + b"(iota(1UL,101UL));
I believe there was discussion of this problem recently ...
On Saturday, 26 September 2015 at 13:09:54 UTC, Meta wrote:
On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote:
foreach (f; parallel(iota(1, 100+1)))
{
synchronized i += f;
}
Is this valid syntax? I've never seen synchronized used like
this b
On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote:
foreach (f; parallel(iota(1, 100+1)))
{
synchronized i += f;
}
Is this valid syntax? I've never seen synchronized used like this
before.
On Saturday 26 September 2015 14:18, Zoidberg wrote:
> I've run into an issue, which I guess could be resolved easily,
> if I knew how...
>
> [CODE]
> ulong i = 0;
> foreach (f; parallel(iota(1, 100+1)))
> {
> i += f;
&
Here's a correct version:
import std.parallelism, std.range, std.stdio, core.atomic;
void main()
{
shared ulong i = 0;
foreach (f; parallel(iota(1, 100+1)))
{
i.atomicOp!"+="(f);
}
i.writeln;
}
Thanks! Works fine. So "shared" and "atomic" is a must?
On Saturday, 26 September 2015 at 12:18:16 UTC, Zoidberg wrote:
I've run into an issue, which I guess could be resolved easily,
if I knew how...
[CODE]
ulong i = 0;
foreach (f; parallel(iota(1, 100+1)))
{
i += f;
}
thread_joinAll();
i.writeln;
[/CODE]
I've run into an issue, which I guess could be resolved easily,
if I knew how...
[CODE]
ulong i = 0;
foreach (f; parallel(iota(1, 100+1)))
{
i += f;
}
thread_joinAll();
i.writeln;
[/CODE]
It's basically an example which adds all the numbers
Maybe compiler generates wrong code, try to debug at instruction
level.
On Wednesday, 16 September 2015 at 22:30:26 UTC, Ali Çehreli
wrote:
On 09/16/2015 02:01 PM, BBasile wrote:
> On Wednesday, 16 September 2015 at 18:19:07 UTC, Ali Çehreli
wrote:
>> On 09/15/2015 04:49 PM, BBasile wrote:
>>> Under Windows this works fine but under Linux I got a
runtime error.
>>
>>
On 09/16/2015 02:01 PM, BBasile wrote:
> On Wednesday, 16 September 2015 at 18:19:07 UTC, Ali Çehreli wrote:
>> On 09/15/2015 04:49 PM, BBasile wrote:
>>> Under Windows this works fine but under Linux I got a runtime error.
>>
>> Can it be because 'param' is invalid at the time clbck is called?
>
On Wednesday, 16 September 2015 at 18:19:07 UTC, Ali Çehreli
wrote:
On 09/15/2015 04:49 PM, BBasile wrote:
Under Windows this works fine but under Linux I got a runtime
error.
Can it be because 'param' is invalid at the time clbck is
called?
No the callback and its user parameter are set at
On 09/15/2015 04:49 PM, BBasile wrote:
Under Windows this works fine but under Linux I got a runtime error.
Can it be because 'param' is invalid at the time clbck is called? The
following program works under Linux. However, removing thread_joinAll()
is a bug:
import std.parallelism;
import
On Tuesday, 15 September 2015 at 23:49:23 UTC, BBasile wrote:
Under Windows this works fine but under Linux I got a runtime
error.
this could be reduced to :
[...]
If it can help to understand the problem, here is the unreducted
case:
https://github.com/BBasile/Coedit/blob/master/cedast/src/
Under Windows this works fine but under Linux I got a runtime
error.
this could be reduced to :
---
import std.parallelism;
alias CallBack = void function(void*);
class Foo
{
CallBack clbck;
void* param;
void dotask()
{
// some heavy processing
// tells the cal
On Thursday, 14 May 2015 at 17:12:07 UTC, John Colvin wrote:
Would it be OK if I showed some parts of this code as examples
in my DConf talk in 2 weeks?
Sure!!!
is a bit different
(see tables below).
In the middle table I have used gnu parallel in combination
with a slightly modified version of the D program which runs a
single trait (specified in argv[1]). This approach runs the
jobs as completely isolated processes, but at the extra cost of
re-re
have used gnu parallel in combination with
a slightly modified version of the D program which runs a single
trait (specified in argv[1]). This approach runs the jobs as
completely isolated processes, but at the extra cost of
re-reading the common data for each trait. The elapsed time is
On Wednesday, 13 May 2015 at 14:43:50 UTC, John Colvin wrote:
On Wednesday, 13 May 2015 at 14:28:52 UTC, Gerald Jansen wrote:
On Wednesday, 13 May 2015 at 13:40:33 UTC, John Colvin wrote:
On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Ger
be a fair bit of work to provide a working version. Anyway, the
salient bits are like this:
from parallel import Pool
def run_job(args):
(job, arr1, arr2) = args
# ... do the work for each dataset
def main():
# ... read common data and store in numpy arrays...
pool = Pool()
pool.
On Wednesday, 13 May 2015 at 14:28:52 UTC, Gerald Jansen wrote:
On Wednesday, 13 May 2015 at 13:40:33 UTC, John Colvin wrote:
On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikk
On Wednesday, 13 May 2015 at 13:40:33 UTC, John Colvin wrote:
On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole
wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrot
On Wednesday, 13 May 2015 at 14:11:25 UTC, Gerald Jansen wrote:
On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole
wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wr
On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole
wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
ht
On Wednesday, 13 May 2015 at 11:33:55 UTC, John Colvin wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole
wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
ht
On Wednesday, 13 May 2015 at 09:01:05 UTC, Gerald Jansen wrote:
On Wednesday, 13 May 2015 at 03:19:17 UTC, thedeemon wrote:
In case of Python's parallel.Pool() separate processes do the
work without any synchronization issues. In case of D's
std.parallelism it's just threads inside one process
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
http://dekoppel.eu/tmp/pedupg.d
Would it be possible to give us
On 13/05/2015 2:59 a.m., Gerald Jansen wrote:
I am a data analyst trying to learn enough D to decide whether to use D
for a new project rather than Python + Fortran. I have recoded a
non-trivial Python program to do some simple parallel data processing
(using the map function in Python
On Wednesday, 13 May 2015 at 03:19:17 UTC, thedeemon wrote:
In case of Python's parallel.Pool() separate processes do the
work without any synchronization issues. In case of D's
std.parallelism it's just threads inside one process and they
do fight for some locks, thus this result.
Okay, so t
ome locks, thus this result.
Right. To do the same in D, one must use fibers.
No, to do the same one must use separate OS processes. Fibers
won't help you against parallel threads fighting for GC & IO
locks.
On 05/12/2015 08:19 PM, thedeemon wrote:
> In case of Python's parallel.Pool() separate processes do the
> work without any synchronization issues. In case of D's
> std.parallelism it's just threads inside one process and they
> do fight for some locks, thus this result.
Right. To do the same in
On Tuesday, 12 May 2015 at 20:50:45 UTC, Gerald Jansen wrote:
Your advice is appreciated but quite disheartening. I was
hoping for something (nearly) as easy to use as Python's
parallel.Pool() map(), given that this is essentially an
"embarassingly parallel" problem.
that replacement in three points in the
program resulted in roughly a 30% speedup at the cost of about
30% more memory in this specific case. Unfortunately it didn't
help with the performance deteroration problem with parallel
foreach.
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
http://dekoppel.eu/tmp/pedupg.d
Would it be possible to give us
data, doing less f.writef's.
Your advice is appreciated but quite disheartening. I was hoping
for something (nearly) as easy to use as Python's parallel.Pool()
map(), given that this is essentially an "embarassingly parallel"
problem. Avoidance of GC allocation and
On Tuesday, 12 May 2015 at 19:14:23 UTC, Laeeth Isharc wrote:
But if you disable the logging does that change things?
There is only a tiny bit of logging happening.
And are you using optimization on gdc ?
gdc -Ofast -march=native -frelease
Also try byLineFast eg
http://forum.dlang.org/th
On Tuesday, 12 May 2015 at 19:10:13 UTC, Laeeth Isharc wrote:
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole
wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
ht
On Tuesday, 12 May 2015 at 18:14:56 UTC, Gerald Jansen wrote:
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
http://dekoppel.eu/tmp/pedupg.d
Would it be possible to give us
On Tuesday, 12 May 2015 at 16:35:23 UTC, Rikki Cattermole wrote:
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
http://dekoppel.eu/tmp/pedupg.d
Would it be possible to give us some example data?
I might give it a go to try rewriting it to
On Tuesday, 12 May 2015 at 17:02:19 UTC, Gerald Jansen wrote:
About 3.5 million lines read by main(), 0.5 to 2 million lines
read and 3.5 million lines written by runTraits (aka runJob).
Each GC allocation in D is a locking operation (and disabling GC
doesn't help here at all), probably each
On Tuesday, 12 May 2015 at 16:46:42 UTC, thedeemon wrote:
On Tuesday, 12 May 2015 at 14:59:38 UTC, Gerald Jansen wrote:
The output of /usr/bin/time is as follows:
Lang JobsUser System Elapsed %CPU
Py 2 79.242.16 0:48.90 166
D 2 19.41 10.14 0:17.96 164
Py 30
On Tuesday, 12 May 2015 at 14:59:38 UTC, Gerald Jansen wrote:
The output of /usr/bin/time is as follows:
Lang JobsUser System Elapsed %CPU
Py 2 79.242.16 0:48.90 166
D 2 19.41 10.14 0:17.96 164
Py 30 1255.17 58.38 2:39.54 823 * Pool(12)
D 30 421.61
On 13/05/2015 4:20 a.m., Gerald Jansen wrote:
At the risk of great embarassment ... here's my program:
http://dekoppel.eu/tmp/pedupg.d
As per Rick's first suggestion (thanks) I added
import core.memory : GC;
main()
GC.disable;
GC.reserve(1024 * 1024 * 1024);
... to no avail.
thanks
At the risk of great embarassment ... here's my program:
http://dekoppel.eu/tmp/pedupg.d
As per Rick's first suggestion (thanks) I added
import core.memory : GC;
main()
GC.disable;
GC.reserve(1024 * 1024 * 1024);
... to no avail.
thanks for all the help so far.
Gerald
ps. I am using G
On 05/12/2015 08:35 AM, Gerald Jansen wrote:
> I could put it somewhere if that would help.
Please do so. We all want to learn to avoid such issues.
Thank you,
Ali
On 13/05/2015 2:59 a.m., Gerald Jansen wrote:
I am a data analyst trying to learn enough D to decide whether to use D
for a new project rather than Python + Fortran. I have recoded a
non-trivial Python program to do some simple parallel data processing
(using the map function in Python
arly with the number of jobs per cpu core.
It may be related to GC collections. If it hasn't been changed
recently, every allocation from GC triggers a collection cycle.
D's current GC being a stop-the-world kind, you lose all
benefit of parallel processing when that happens.
Without
t been changed recently,
every allocation from GC triggers a collection cycle. D's current GC
being a stop-the-world kind, you lose all benefit of parallel processing
when that happens.
Without seeing runJob, even arr2.dup may be having such an effect on the
performance.
Ali
some simple
parallel data processing (using the map function in Python's
multiprocessing module and parallel foreach in D). I was very
happy that my D version ran considerably faster that Python
version when running a single job but was soon dismayed to
find that the performance of my D ve
On Tuesday, 12 May 2015 at 14:59:38 UTC, Gerald Jansen wrote:
I am a data analyst trying to learn enough D to decide whether
to use D for a new project rather than Python + Fortran. I
have recoded a non-trivial Python program to do some simple
parallel data processing (using the map function
I am a data analyst trying to learn enough D to decide whether to
use D for a new project rather than Python + Fortran. I have
recoded a non-trivial Python program to do some simple parallel
data processing (using the map function in Python's
multiprocessing module and parallel foreach
201 - 300 of 372 matches
Mail list logo