Hi all, I finished an excursion into Giraph's code and now I kinda know what it takes to port Giraph over to run on top of YARN.
When the base Hadoop clusters are replaced by YARN clusters, Giraph will have two options: - *Giraph still works over mapreduce APIs*: Even after moving to YARN clusters, Giraph can still run over MapreduceV2+YARN. Without any code changes at all. - *Giraph works natively onYARN*: This can be done in such a way that in the medium term, Giraph can continue to work on both a Hadoop Mapreduce cluster as well as a YARN cluster. Two visible effects when this effort goes underway, that I can think of: -- There will be some moving around of classes/interface to separate APIs from implementation details and a bit of reorganisation of code to help support both GiraphV1 and GiraphV2. -- The other thing the port will probably affect is a fork in the community's attention (depending on how much of the community's eyeballs the new world grabs as opposed to the stabilization/feature work on GiraphV1). Now here's the thing. Avery indicated on the other thread(about Giraph over HAMA) that most of the users of Giraph need to work on top of a hadoop mapreduce cluster for quite some time. Which I completely agree with, being a long time maintainer/supporting-dev of Hadoop clusters myself. Given that concern, before embarking on the port, I thought I'd get opinions from the community. Thanks, +Vinod