Re: Can reduced parallelism lead to no shuffle spill?

2019-11-07 Thread V0lleyBallJunki3
I am just using the above example to understand how Spark handles partitions -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Can reduced parallelism lead to no shuffle spill?

2019-11-07 Thread Alexander Czech
Why don't you just repartion the dataset ? If partion are really that unevenly sized you should probably do that first. That potentially also saves a lot of trouble later on. On Thu, Nov 7, 2019 at 5:14 PM V0lleyBallJunki3 wrote: > Consider an example where I have a cluster with 5 nodes and each

Can reduced parallelism lead to no shuffle spill?

2019-11-07 Thread V0lleyBallJunki3
Consider an example where I have a cluster with 5 nodes and each node has 64 cores with 244 GB memory. I decide to run 3 executors on each node and set executor-cores to 21 and executor memory of 80GB, so that each executor can execute 21 tasks in parallel. Now consider that 315(63 * 5) partitions