Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread Steven Sagaert
I think what is meant is that in HPC typically this is done via MPI which is just a low level approach where you explicitely have to specify all the data communication (compared to Hadoop & Spark where it is implicit). > > > The only codes that really nail it are carefully handcrafted HPC

[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread cheng wang
Thx a lot. You saved my life :) On Wednesday, October 7, 2015 at 3:00:25 PM UTC+2, Jonathan Malmaud wrote: > > Within the next few days, support for native threads will be merged into > to the development version of Julia ( > https://github.com/JuliaLang/julia/pull/13410 >

[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread cheng wang
Thanks all for replying. I have read th parallel computing document before I post this. Actually, what I mean is a shared memory model not a distributed model. My daily research involves extensively using of blas and parallel for-loop. Julia has a perfect support for blas, as well parallel

[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread Jonathan Malmaud
Within the next few days, support for native threads will be merged into to the development version of Julia (https://github.com/JuliaLang/julia/pull/13410). You can also used the SharedArray type which Julia already has, which lets multiple Julia processes running on the same machine share

[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Andrei Zh
Julia supports multiprocessing pretty well, including map-reduce-like jobs. E.g. in the next example I add 3 processes to a "workgroup", distribute simulation between them and then reduce results via (+) operator: julia> addprocs(3) 3-element Array{Int64,1}: 2 3 4 julia> nheads =

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Tim Holy
There's https://github.com/JuliaParallel/DistributedArrays.jl https://github.com/JuliaParallel/HDFS.jl in case they help. (See the other packages in JuliaParallel, in case you have missed that organization.) --Tim On Tuesday, October 06, 2015 12:57:17 PM Andrei Zh wrote: > Yet, calling Julia

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Andrei Zh
Yet, calling Julia processes on other machines via ssh doesn't address data locality. In big data systems (say, > 1TB) main performance concern is not a number of CPUs, but IO operations and data movement across a cluster, so map reduce tries to do as much as possible on local data without any

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread David van Leeuwen
See also an earlier discussion on a similar topic, for an out-of-core approach. ---david On Tuesday, October 6, 2015 at 10:29:52 PM UTC+2, Tim Holy wrote: > > There's > >

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Stefan Karpinski
In my experience, Hadoop is pretty terrible about minimizing data movement; Spark seems to be significantly better. The only codes that really nail it are carefully handcrafted HPC codes. On Tue, Oct 6, 2015 at 12:57 PM, Andrei Zh wrote: > Yet, calling Julia processes

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Andrei Zh
> In my experience, Hadoop is pretty terrible about minimizing data > movement; Spark seems to be significantly better. > > If you mean MapReduce (the framework, version 1 or 2), it doesn't move data anywhere unless you tell it to do so in reduce phase. You could experience another issue

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Stefan Karpinski
That works fine in a distributed setting if you start Julia workers on other machines, so it is actually a legitimate form of map reduce. It doesn't do anything for handling machine failures, however, which was arguably the major concern of the original MapReduce design. On Tue, Oct 6, 2015 at