I am just using the above example to understand how Spark handles partitions
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Why don't you just repartion the dataset ? If partion are really that
unevenly sized you should probably do that first. That potentially also
saves a lot of trouble later on.
On Thu, Nov 7, 2019 at 5:14 PM V0lleyBallJunki3
wrote:
> Consider an example where I have a cluster with 5 nodes and each
Consider an example where I have a cluster with 5 nodes and each node has 64
cores with 244 GB memory. I decide to run 3 executors on each node and set
executor-cores to 21 and executor memory of 80GB, so that each executor can
execute 21 tasks in parallel. Now consider that 315(63 * 5) partitions