Hello,

I'm experiencing an issue where yarn is scheduling two executors (the default) 
regardless of what I enter as num-executors when submitting an application.

Background: I'm running Spark with Yarn on Amazon EMR. My cluster has two core 
nodes and three task nodes. All five nodes are accessible by Yarn (when running 
the applications repeatedly I've seen all five nodes in use in various 
attempts.) The applications are written in python and are run via spark-submit 
with yarn-client as the master.

Example application submission: bin/spark-submit --num-executors 1  --conf 
"spark.scheduler.minRegisteredResourcesRatio=1" myPythonApp.py

(master is set to yarn-client within the application)

My goal is naturally to make use of all of the nodes available, but I've also 
tried decreasing the number of executors to one and that still doesn't work, 
which tells me that it's not a case of hitting some sort of limit or my 
applications not requiring more than two executors. It therefore appears that 
either the num-executors value is being overwritten somewhere, or it simply 
isn't getting passed to yarn in the first place.

Is this a known issue or am I screwing up somewhere? Any suggestions would be 
greatly appreciated. Thanks!

Reply via email to