Hi,

try using this parameter --conf spark.sql.shuffle.partitions=1000

Thanks,
Mohini

On Tue, Mar 14, 2017 at 3:30 PM, kpeng1 <kpe...@gmail.com> wrote:

> Hi All,
>
> I am currently on Spark 1.6 and I was doing a sql join on two tables that
> are over 100 million rows each and I noticed that it was spawn 30000+ tasks
> (this is the progress meter that we are seeing show up).  We tried to
> coalesece, repartition and shuffle partitions to drop the number of tasks
> down because we were getting time outs due to the number of task being
> spawned, but those operations did not seem to reduce the number of tasks.
> The solution we came up with was actually to set the num executors to 50
> (--num-executors=50) and it looks like it spawned 200 active tasks, but the
> total number of tasks remained the same.  Was wondering if anyone knows
> what
> is going on?  Is there an optimal number of executors, I was under the
> impression that the default dynamic allocation would pick the optimal
> number
> of executors for us and that this situation wouldn't happen.  Is there
> something I am missing?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Setting-Optimal-Number-of-Spark-
> Executor-Instances-tp28493.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Thanks & Regards,
Mohini Kalamkar
M: +1 310 567 9329

Reply via email to