>I am running a spark job with 20 cores but i did not understand why my application get 1-2 cores on couple of machines why not it just run on two nodes like node1=16 cores and node 2=4 cores . but cores are allocated like node1=2 node =1---------node 14=1 like that.
I believe that's the intended behavior for spark. Please refer to https://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts section on 'spark.deploy.spreadOut' mode.If I understand correctly, you may want " spark.deploy.spreadOut false". Hope it helps! Happy Spark(ing). On Thu, Jul 25, 2019 at 7:22 PM Srikanth Sriram < sriramsrikanth1...@gmail.com> wrote: > Hello, > > Below is my understanding. > > The default configuration parameters which will be considered by the spark > job if these are not configured at the time of submitting job to the > required values. > > # - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2) > # - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1). > # - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) > > SPARK_EXECUTOR_INSTANCES -> indicates the number of workers to be started, > it means for a job maximum this many number of executors it can ask/take > from the cluster resource manager. > > SPARK_EXECUTOR_CORES -> indicates the number of cores in each executor, it > means the spark TaskScheduler will ask this many cores to be > allocated/blocked in each of the executor machine. > > SPARK_EXECUTOR_MEMORY -> indicates the maximum amount of RAM/MEMORY it > requires in each executor. > > All these details are asked by the TastScheduler to the cluster manager > (it may be a spark standalone, yarn, mesos and can be kubernetes supported > starting from spark 2.0) to provide before actually the job execution > starts. > > Also, please note that, initial number of executor instances is dependent > on "--num-executors" but when the data is more to be processed and > "spark.dynamicAllocation.enabled" set true, then it will be dynamically add > more executors based on "spark.dynamicAllocation.initialExecutors". > > Note: Always "spark.dynamicAllocation.initialExecutors" should be > configured greater than "--num-executors". > spark.dynamicAllocation.initialExecutors > spark.dynamicAllocation.minExecutors Initial number of executors to run > if dynamic allocation is enabled. > > If `--num-executors` (or `spark.executor.instances`) is set and larger > than this value, it will be used as the initial number of executors. > spark.executor.memory 1g Amount of memory to use per executor process, in > the same format as JVM memory strings with a size unit suffix ("k", "m", > "g" or "t") (e.g. 512m, 2g). > spark.executor.cores 1 in YARN mode, all the available cores on the > worker in standalone and Mesos coarse-grained modes. The number of cores > to use on each executor. In standalone and Mesos coarse-grained modes, for > more detail, see this description > <http://spark.apache.org/docs/latest/spark-standalone.html#Executors%20Scheduling> > . > > On Thu, Jul 25, 2019 at 5:54 PM Amit Sharma <resolve...@gmail.com> wrote: > >> I have cluster with 26 nodes having 16 cores on each. I am running a >> spark job with 20 cores but i did not understand why my application get 1-2 >> cores on couple of machines why not it just run on two nodes like node1=16 >> cores and node 2=4 cores . but cores are allocated like node1=2 node >> =1---------node 14=1 like that. Is there any conf property i need to >> change. I know with dynamic allocation we can use below but without dynamic >> allocation is there any? >> --conf "spark.dynamicAllocation.maxExecutors=2" >> >> >> Thanks >> Amit >> > > > -- > Regards, > Srikanth Sriram >