Hi, My goal is to have 10 parallel processes read the same file and each process consume 1/10th of that file. They of course all read all lines of the file, but skips over lines not belonging to it. So process #1 would process lines 1, 11, 21 etc. Second one would process lines 2, 12, 22 etc. The issue that I have has nothing to do with efficiency or performance of this, so let us forget the efficiency part of it, for now.
The code is checked in to a repository here *https://github.com/harikb/scratchpad1 *(including some sample data), but also quoted in the email at the end. $ julia --version julia version 0.3.7 *# Input - some sample from nyc public database (see repo link above, but any file might be enough)* $ wc -l nyc311calls.csv 250000 nyc311calls.csv *# Ignore why I am not using a csv reader. this is just test data. there are no multi-line quoted csv data here.* $ julia -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl | wc -l 250001 *# Non-parallel run, everything is fine. One extra line is the initial print statement from _driver.jl* *# Now, let us run with 10 parallel processes* $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl | wc -l 26420 $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl | wc -l 40915 $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl | wc -c 1919321 $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl | wc -c 2172839 *Output seems all over the place. I think the processes stop after reaching certain input.* $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl | tail From worker 8: Process 8 is processing line 46617 From worker 5: Process 5 is processing line 46614 From worker 2: Process 2 is processing line 50751 From worker 4: Process 4 is processing line 45593 From worker 11: Process 11 is processing line 45380 From worker 6: Process 6 is processing line 46685 From worker 7: Process 7 is processing line 50756 From worker 9: Process 9 is processing line 46688 From worker 10: Process 10 is processing line 46699 From worker 3: Process 3 is processing line 46692 Now, I could buy that the STDOUT is getting clobbered by multiple parallel writes to it. I am used to STDOUT getting garbled/mixed data from other environments/languages, but I haven't seen missing data. The characters eventually make it in some form to the output. But if I redirect the output to a file, it is perfectly fine every single time. Why is that STDOUT does not get clobbered in that case? $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl > xx $ wc -l xx 250001 xx $ wc -c xx 12988916 xx $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl > xx; wc -l xx 250001 xx $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl > xx; wc -l xx 250001 xx $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl > xx; wc -l xx 250001 xx $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl > xx; wc -l xx 250001 xx $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl > xx; wc -l xx 250001 xx $ julia -p 10 -L ./julia_test_parallel.jl ./julia_test_parallel_driver.jl > xx; wc -l xx 250001 xx *== Code below same as the quoted github link ==* $ cat ./julia_test_parallel.jl #!/usr/local/julia-cb9bcae93a/bin/julia function processOneFile(filename) np = nprocs() jump = np - 1 jump = jump == 0 ? 1 : jump selfid = myid() # in a single-process setup, this function will be called for parent (id=1) assert(jump == 1 || selfid != 1) f = open(filename); offset = np == 1 ? selfid : selfid - 1 lnum = 0 for l in eachline(f) lnum += 1 if lnum == offset println("Process $(selfid) is processing line $(lnum)") offset += jump end end end $ cat ./julia_test_parallel_driver.jl #!/usr/local/julia-cb9bcae93a/bin/julia filename = "nyc311calls.csv" np = nprocs() println("Started $(np) processes") if (np > 1) if (myid() == 1) # Mulitprocess and I am the parent @sync begin for i = 2:nprocs() @async remotecall_wait(i, processOneFile, filename) end end end else processOneFile(filename) end *Any help is appreciated.* Thanks -- Harry