During the 24th minute of the recent Hadoop Summit Video [1] Avery Ching talks about how Giraph is made scalable. I am interested in Hama which is also based on the BSP model and would like to know more details on how Giraph is made scalable.
Basically, at the end of each super step, the BSP tasks sends some metrics to the master and the master partitions the data in the most loaded BSP tasks and uses the free map available slot to process them. 1) Where is the code for the above logic? I am new to Giraph. 2) What is the logic behind the partitioning of the data in the master after the super step? Let's say that the data has been partitioned using Hash partitioning. 3) Similarly will Giraph also scale down? Will the partitions be merged? Thanks, Praveen [1] - http://www.youtube.com/watch?v=b5Qmz4zPj-M