maybe it would be better if you use mapreduce such that in the map phase each key-value pair at a node is a key and the node is the value...this way you get the first level of connections at the reduce-keys...then u can use the output of reduce phase as adjacency list for the graph to be processed using Giraph... Cheers Pankaj On Mar 28, 2014 6:27 PM, "Matthieu Labour" <matthieu.lab...@gmail.com> wrote:
> Hi > > I am looking for tips on how to leverage Giraph for the use case below: > > I have a list of Nodes. > A Node is a collection of Key-Value pairs. > 2 Nodes are related (have an edge) if they share a Key-Value pair. > > Until now I have been running a Depth First Search algorithm to cluster > the Nodes into Connected Components. > > However, my data set has grown significantly and I need to scale. This is > the reason that brought me to Giraph. > > I have gone through the Connected Component example in Giraph but need a > bit of help to get started. Specifically I wonder how I can change it to > accommodate the use case described above. > > I would greatly appreciate any help. > Thank you in advance. > -matt >