Hello everybody,

I went through most of the documentation I could find for Giraph and also
most of the messages in this email list, but still I have not figured out
precisely what a "worker" really is. I would really appreciate it if you
could help me understand how the framework works.

At first I thought that a worker has a one-to-one correspondence to a map
task. Apparently this is not exactly the case, since I have noticed that if
I ask for x workers, the job finishes after having used x+1 map tasks. What
is this extra task for?

I have been trying out the example SSSP application on a single node with
12 cores. Giving an input graph of ~400MB and using 1 worker, around 10 GBs
of memory are used during execution. What intrigues me is that if I use 2
workers for the same input (and without limiting memory per map task),
double the memory will be used. Furthermore, there will be no improvement
in performance. I rather notice a slowdown. Are these observations normal?

Might it be the case that 1 and 2 workers are very few and I should go to
the 30-100 range that is the proposed number of mappers for a conventional
MapReduce job?

Finally, a last observation. Even though I use only 1 worker, I see that
there are significant periods during execution where up to 90% of the 12
cores computing power is consumed, that is, almost 10 cores are used in
parallel. Does each worker spawn multiple threads and dynamically balances
the load to utilize the available hardware?

Thanks a lot in advance!

Best,
Alexandros

Reply via email to