hi, I found out the major problem of my spark cluster but don't know why it happens. First, I was testing spark by running applications. It was spending about 20 seconds only for counting 10 million strings/items(2GB) on the cluster with 8 nodes (8 cores per node). As we know that it is a very bad performance for a parallel programming framework. Today, I have tested the same counting program using spark-shell over (this time much larger data) 100 million strings/items (20GB). And, it was taking just 10 seconds to complete all the tasks and turned out to be almost 10x times faster than its last result. Its performance was very good and promising. Do you think it is because of my spark setting?
Joe -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/help-tp4648.html Sent from the Apache Spark User List mailing list archive at Nabble.com.