This is likely due to data skew. If you are using key-value pairs, one key has a lot more records, than the other keys. Do you have any groupBy operations?
David On Tue, Jul 14, 2015 at 9:43 AM, shahid <sha...@trialx.com> wrote: > hi > > I have a 10 node cluster i loaded the data onto hdfs, so the no. of > partitions i get is 9. I am running a spark application , it gets stuck on > one of tasks, looking at the UI it seems application is not using all nodes > to do calculations. attached is the screen shot of tasks, it seems tasks > are > put on each node more then once. looking at tasks 8 tasks get completed > under 7-8 minutes and one task takes around 30 minutes so causing the delay > in results. > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n23824/Screen_Shot_2015-07-13_at_9.png > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/No-of-Task-vs-No-of-Executors-tp23824.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- ### Confidential e-mail, for recipient's (or recipients') eyes only, not for distribution. ###