Hi, try using this parameter --conf spark.sql.shuffle.partitions=1000
Thanks, Mohini On Tue, Mar 14, 2017 at 3:30 PM, kpeng1 <kpe...@gmail.com> wrote: > Hi All, > > I am currently on Spark 1.6 and I was doing a sql join on two tables that > are over 100 million rows each and I noticed that it was spawn 30000+ tasks > (this is the progress meter that we are seeing show up). We tried to > coalesece, repartition and shuffle partitions to drop the number of tasks > down because we were getting time outs due to the number of task being > spawned, but those operations did not seem to reduce the number of tasks. > The solution we came up with was actually to set the num executors to 50 > (--num-executors=50) and it looks like it spawned 200 active tasks, but the > total number of tasks remained the same. Was wondering if anyone knows > what > is going on? Is there an optimal number of executors, I was under the > impression that the default dynamic allocation would pick the optimal > number > of executors for us and that this situation wouldn't happen. Is there > something I am missing? > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Setting-Optimal-Number-of-Spark- > Executor-Instances-tp28493.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Thanks & Regards, Mohini Kalamkar M: +1 310 567 9329