I am not an expert but some thoughts inline.... On Jun 16, 2016 6:31 AM, "Maja Kabiljo" <majakabi...@fb.com> wrote: > > Hi, > > We are running some experiments with GraphX in order to compare it with other systems. There are multiple settings which significantly affect performance, and we experimented a lot in order to tune them well. I’ll share here what are the best we found so far and which results we got with them, and would really appreciate if anyone who used GraphX before has any advice on what else can make it even better, or confirm that these results are as good as it gets. > > Algorithms we used are pagerank and connected components. We used Twitter and UK graphs from the GraphX paper ( https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf), and also generated graphs with properties similar to Facebook social graph with various number of edges. Apart from performance we tried to see what is the minimum amount of resources it requires in order to handle graph of some size. > > We ran experiments using Spark 1.6.1, on machines which have 20 cores with 2-way SMT, always fixing number of executors (min=max=initial), giving 40GB or 80GB per executor, and making sure we run only a single executor per machine.
*******Deepak******* I guess you have 16 machines in your test. Is that right? ******Deepak******* Additionally we used: > spark.shuffle.manager=hash, spark.shuffle.service.enabled=false > Parallel GC > PartitionStrategy.EdgePartition2D > 8*numberOfExecutors partitions > Here are some data points which we got: > Running on Facebook-like graph with 2 billion edges, using 4 executors with 80GB each it took 451 seconds to do 20 iterations of pagerank and 236 seconds to find connected components. It failed when we tried to use 2 executors, or 4 executors with 40GB each. > For graph with 10 billion edges we needed 16 executors with 80GB each (it failed with 8), 1041 seconds for 20 iterations of pagerank and 716 seconds for connected component ******Deepak***** The executors are not scaling linearly. You should need max of 10 executors. Also what is the error it is showing for 8 executors? *****Deepak****** > Twitter-2010 graph (1.5 billion edges), 8 executors, 40GB each, pagerank 473s, connected components 264s. With 4 executors 80GB each it worked but was struggling (pr 2475s, cc 4499s), with 8 executors 80GB pr 362s, cc 255s. *****Deepak***** For 4 executors can you try with 160GB. Also if you could spell out the system statistics during the test it would be great. My guess is with 4 connectors a lot of spilling is happening *****Deepak*** > One more thing, we were not able to reproduce what’s mentioned in the paper about fault tolerance (section 5.2). If we kill an executor during first few iterations it recovers successfully, but if killed in later iterations reconstruction of each iteration starts taking exponentially longer and doesn’t finish after letting it run for a few hours. Are there some additional parameters which we need to set in order for this to work? > > Any feedback would be highly appreciated! > > Thank you, > Maja