Re: Are there any plans to develop Graphx Streaming?

Ankur Dave Mon, 21 Apr 2014 00:36:08 -0700

On Sun, Apr 20, 2014 at 6:27 PM, Qi Song <songqi1...@gmail.com> wrote:


> I wander if there exists some
> documentation about how to choose partition methods, based on the graph's
> structure or some other properties?
>

The best option is to try all the partition strategies (as well as the
default, which is to leave edges in their original partitions) and see
which one works best for your particular graph. Here are some ideas about
what might work best:

- The edge list sometimes comes in a meaningful order by default. For
example, a web graph might have edges clustered by domain, and other graphs
might have the edge list pre-sorted (i.e., clustered by source vertex ID).
In these cases, leaving the edges in the original partitions might provide
the best performance.

- EdgePartition2D should theoretically be the best strategy in general, but
if the number of machines is not a perfect square, it might cause work
imbalance (see the Scaladoc for PartitionStrategy.EdgePartition2D). In this
case, RandomVertexCut might be better, since it optimizes work balance but
completely ignores communication.

- For an algorithm where messages are sent backwards along each edge (i.e.,
to the source of the edge), EdgePartition1D will minimize the communication
needed to update each vertex with the aggregated result of
mapReduceTriplets. If we had a DestinationEdgePartition1D, a similar
statement would be true for algorithms like PageRank where messages are
sent forwards along each edge.

Ankur <http://www.ankurdave.com/>

Re: Are there any plans to develop Graphx Streaming?

Reply via email to