Hi Niko, The GraphX team recently wrote a longer paper with more benchmarks and optimizations: http://arxiv.org/abs/1402.2394
Regarding the performance of GraphX vs. GraphLab, I believe GraphX currently outperforms GraphLab only in end-to-end benchmarks of pipelines involving both graph-parallel operations (e.g. PageRank) and data-parallel operations (e.g. ETL and data cleaning). This is due to the overhead of moving data between GraphLab and a data-parallel system like Spark. There's an example of a pipeline in Section 5.2 in the linked paper, and the results are in Figure 10 on page 11. GraphX has a very similar architecture as GraphLab, so I wouldn't expect it to have better performance on pure graph algorithms. GraphX may actually be slower when Spark is configured to launch many tasks per machine, because shuffle communication between Spark tasks on the same machine still occurs by reading and writing from disk, while GraphLab uses shared memory for same-machine communication. I've CC'd Joey and Reynold as well. Ankur <http://www.ankurdave.com/> On Mar 24, 2014 11:00 AM, "Niko Stahl" <r.niko.st...@gmail.com> wrote: > I'm interested in extending the comparison between GraphX and GraphLab > presented in Xin et. al (2013). The evaluation presented there is rather > limited as it only compares the frameworks for one algorithm (PageRank) on > a cluster with a fixed number of nodes. Are there any graph algorithms > where one might expect GraphX to perform better than GraphLab? Do you > expect the scaling properties (i.e. performance as a function of # of > worker nodes) to differ? >