Hello shreesh,

That would be quite a challenge to understand.
A few things that I think should help estimate those numbers:
1) Understanding the cost of the individual transformations in the
application
E.g a flatMap can be more expansive in memory as opposed to a map 

2) The communication patterns can be helpful to understand the cost. The
four types:

None:
 Map, Filter 
All-to-one:
 reduce
One-to-all:
 broadcast
All-to-all:
 reduceByKey, groupyByKey, Join

3) Understand the cost is the beginning. Depending how much data you have,
the partitions need to be created accordingly. The more the partitions in
smaller sizes is good to improve parallelism but you will need a lot more
executors. On the other hand, fewer partitions with larger sizes can be
lower on the executor count but it will need more individual memory.

To begin with, I would strategize the approach for partitions and try a
starting number of partitions and work from there.
Without looking or understanding your use case, it is hard for me to give
you specific numbers. It would be better to start with a basic strategy and
optimize from there.

Hope that helps.

Thank you.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-one-decide-no-of-executors-cores-memory-allocation-tp23326p23369.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to