I have an application that evaluate a graph using this algorithm:

-
 use a parallel for loop to evaluate all nodes in a graph (to evaluate a
 node, an image is read, and then result of this node is calculated)

-
 use a second parallel for loop to evaluate all edges in the graph.  The
 function would take in results from both nodes of the edge, and then 
calculate the answer for the edge

The final result will consist of calculated results of each edge.  So each 
node, and each edge is essentially a job, and in this case, an edge is more 
like a job than a message

As you can see, the above 
algorithm would employ two map functions, but no reduce function.  The 
total data size can be very large (say 100GB).  Also, the workload of 
each node and each edge is highly irregular, and thus load balancing 
mechanisms are essential.

In this case, will giraph suit this 
application?  if so, how will my program like?  And 
will giraph be able to strike the balance between a good load balancing 
of the second map function, and minimizing data transfer of the results 
from the first map function?


                                          

Reply via email to