Hi, I am trying to compare Graphx and other distributed graph processing systems (graphlab) on my cluster of 64 nodes, each node having 32 cores and connected with infinite band.
I looked at http://arxiv.org/pdf/1402.2394.pdf and stats provided over there. I had few questions regarding configuration and achieving best performance. * Should I use the pagerank application already available in graphx for this purpose or need to modify or need to write my own? - If I shouldn't use the inbuilt pagerank, can you share your pagerank application? * What should be the executor_memory, i.e. maximum or according to graph size? * Other than, number of cores, executor_memory and partition strategy, Is there any other configuration I should do to have the best performance? I am using following script, import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val startgraphloading = System.currentTimeMillis; val graph = GraphLoader.edgeListFile(sc, "filepath",true,32) val endgraphloading = System.currentTimeMillis; Thanks in advance :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-Perfomance-comparison-over-cluster-tp10222.html Sent from the Apache Spark User List mailing list archive at Nabble.com.