Thank you Avery, that helped a lot! Regards, Alexandros
On 27 November 2012 20:57, Avery Ching <ach...@apache.org> wrote: > Hi Alexandros, > > The extra task is for the master process (a coordination task). In your > case, since you are using a single machine, you can use a single task. > > -Dgiraph.SplitMasterWorker=**false > > and you can try multithreading instead of multiple workers. > > -Dgiraph.numComputeThreads=12 > > The reason why cpu usage increases is due to netty threads to handle > network requests. By using multithreading instead, you should bypass this. > > Avery > > > On 11/27/12 9:40 AM, Alexandros Daglis wrote: > >> Hello everybody, >> >> I went through most of the documentation I could find for Giraph and also >> most of the messages in this email list, but still I have not figured out >> precisely what a "worker" really is. I would really appreciate it if you >> could help me understand how the framework works. >> >> At first I thought that a worker has a one-to-one correspondence to a map >> task. Apparently this is not exactly the case, since I have noticed that if >> I ask for x workers, the job finishes after having used x+1 map tasks. What >> is this extra task for? >> >> I have been trying out the example SSSP application on a single node with >> 12 cores. Giving an input graph of ~400MB and using 1 worker, around 10 GBs >> of memory are used during execution. What intrigues me is that if I use 2 >> workers for the same input (and without limiting memory per map task), >> double the memory will be used. Furthermore, there will be no improvement >> in performance. I rather notice a slowdown. Are these observations normal? >> >> Might it be the case that 1 and 2 workers are very few and I should go to >> the 30-100 range that is the proposed number of mappers for a conventional >> MapReduce job? >> >> Finally, a last observation. Even though I use only 1 worker, I see that >> there are significant periods during execution where up to 90% of the 12 >> cores computing power is consumed, that is, almost 10 cores are used in >> parallel. Does each worker spawn multiple threads and dynamically balances >> the load to utilize the available hardware? >> >> Thanks a lot in advance! >> >> Best, >> Alexandros >> >> >> >