Instead of spark-shell have you tried running it as a job. how many executors and cores, can you share the RDD graph and event timeline on the UI and did you find which of the tasks taking more time was they are any GC
please look at the UI if not already it can provide lot of information -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org