On Sun, Apr 20, 2014 at 6:27 PM, Qi Song <songqi1...@gmail.com> wrote:
> I wander if there exists some > documentation about how to choose partition methods, based on the graph's > structure or some other properties? > The best option is to try all the partition strategies (as well as the default, which is to leave edges in their original partitions) and see which one works best for your particular graph. Here are some ideas about what might work best: - The edge list sometimes comes in a meaningful order by default. For example, a web graph might have edges clustered by domain, and other graphs might have the edge list pre-sorted (i.e., clustered by source vertex ID). In these cases, leaving the edges in the original partitions might provide the best performance. - EdgePartition2D should theoretically be the best strategy in general, but if the number of machines is not a perfect square, it might cause work imbalance (see the Scaladoc for PartitionStrategy.EdgePartition2D). In this case, RandomVertexCut might be better, since it optimizes work balance but completely ignores communication. - For an algorithm where messages are sent backwards along each edge (i.e., to the source of the edge), EdgePartition1D will minimize the communication needed to update each vertex with the aggregated result of mapReduceTriplets. If we had a DestinationEdgePartition1D, a similar statement would be true for algorithms like PageRank where messages are sent forwards along each edge. Ankur <http://www.ankurdave.com/>