Hi,

I am trying to compare Graphx and other distributed graph processing systems
(graphlab) on my cluster of 64 nodes, each node having 32 cores and
connected with infinite band. 

I looked at http://arxiv.org/pdf/1402.2394.pdf and stats provided over
there. I had few questions regarding configuration and achieving best
performance.

* Should I use the pagerank application already available in graphx for this
purpose or need to modify or need to write my own?
   - If I shouldn't use the inbuilt pagerank, can you share your pagerank
application?

* What should be the executor_memory, i.e. maximum or according to graph
size?

* Other than, number of cores, executor_memory and partition strategy, Is
there any other configuration I should do to have the best performance?

I am using following script,
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

val startgraphloading = System.currentTimeMillis;
val graph = GraphLoader.edgeListFile(sc, "filepath",true,32)
val endgraphloading = System.currentTimeMillis;


Thanks in advance :)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-Perfomance-comparison-over-cluster-tp10222.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to