During the 24th minute of the recent Hadoop Summit Video [1] Avery Ching
talks about how Giraph is made scalable. I am interested in Hama which is
also based on the BSP model and would like to know more details on how
Giraph is made scalable.

Basically, at the end of each super step, the BSP tasks sends some metrics
to the master and the master partitions the data in the most loaded BSP
tasks and uses the free map available slot to process them.

1) Where is the code for the above logic? I am new to Giraph.

2) What is the logic behind the partitioning of the data in the master
after the super step? Let's say that the data has been partitioned using
Hash partitioning.

3) Similarly will Giraph also scale down? Will the partitions be merged?

Thanks,
Praveen

[1] - http://www.youtube.com/watch?v=b5Qmz4zPj-M

Reply via email to