Hi Niko,

The GraphX team recently wrote a longer paper with more benchmarks and
optimizations: http://arxiv.org/abs/1402.2394

Regarding the performance of GraphX vs. GraphLab, I believe GraphX
currently outperforms GraphLab only in end-to-end benchmarks of pipelines
involving both graph-parallel operations (e.g. PageRank) and data-parallel
operations (e.g. ETL and data cleaning). This is due to the overhead of
moving data between GraphLab and a data-parallel system like Spark. There's
an example of a pipeline in Section 5.2 in the linked paper, and the
results are in Figure 10 on page 11.

GraphX has a very similar architecture as GraphLab, so I wouldn't expect it
to have better performance on pure graph algorithms. GraphX may actually be
slower when Spark is configured to launch many tasks per machine, because
shuffle communication between Spark tasks on the same machine still occurs
by reading and writing from disk, while GraphLab uses shared memory for
same-machine communication.

I've CC'd Joey and Reynold as well.

Ankur <http://www.ankurdave.com/>

On Mar 24, 2014 11:00 AM, "Niko Stahl" <r.niko.st...@gmail.com> wrote:

> I'm interested in extending the comparison between GraphX and GraphLab
> presented in Xin et. al (2013). The evaluation presented there is rather
> limited as it only compares the frameworks for one algorithm (PageRank) on
> a cluster with a fixed number of nodes. Are there any graph algorithms
> where one might expect GraphX to perform better than GraphLab? Do you
> expect the scaling properties (i.e. performance as a function of # of
> worker nodes) to differ?
>

Reply via email to