I am running on Yarn and do have a question on how spark runs executors on
different data nodes. Is that primarily decided based on number of
receivers?
What do I need to do to ensure that multiple nodes are being used for data
processing?
Hi Mohit,
It depends on whether dynamic allocation is turned on. If not, the number
of executors is specified by the user with the --num-executors option. If
dynamic allocation is turned on, refer to the doc for details:
https://spark.apache.org/docs/1.4.0/job-scheduling.html#dynamic-resource-al