Following is a method that retrieves the list of executors registered to a 
spark context. It worked perfectly with spark-submit in standalone mode for my 
project.
/**   * A simplified method that just returns the current active/registered 
executors   * excluding the driver.   * @param sc   *           The spark 
context to retrieve registered executors.   * @return   *         A list of 
executors each in the form of host:port.   */  def currentActiveExecutors(sc: 
SparkContext): Seq[String] = {    val allExecutors = 
sc.getExecutorMemoryStatus.map(_._1)    val driverHost: String = 
sc.getConf.get("spark.driver.host")    allExecutors.filter(! 
_.split(":")(0).equals(driverHost)).toList  }
 


     On Friday, August 21, 2015 1:53 PM, Virgil Palanciuc <virg...@gmail.com> 
wrote:
   

 Hi Akhil,
I'm using spark 1.4.1. Number of executors is not in the command line, not in 
the getExecutorMemoryStatus (I already mentioned that I tried that, works in 
spark-shell but not when executed via spark-submit). I tried looking at 
"defaultParallelism" too, it's 112 (7 executors * 16 cores) when ran via 
spark-shell, but just 2 when ran via spark-submit.
But the scheduler obviously knows this information. It *must* know it. How can 
I access it? Other that parsing the HTML of the WebUI, that is... that's pretty 
much guaranteed to work, and maybe I'll do that, but it's extremely convoluted.
Regards,Virgil.
On Fri, Aug 21, 2015 at 11:35 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote:

Which version spark are you using? There was a discussion happened over here 
http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.htmlhttp://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3ccacbyxk+ya1rbbnkwjheekpnbsbh10rykuzt-laqgpdanvhm...@mail.gmail.com%3EOn
 Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <vir...@palanciuc.eu> wrote:

Is there any reliable way to find out the number of executors programatically - 
regardless of how the job  is run? A method that preferably works for 
spark-standalone, yarn, mesos, regardless whether the code runs from the shell 
or not?
Things that I tried and don't work:- sparkContext.getExecutorMemoryStatus.size 
- 1 // works from the shell, does not work if task submitted via  spark-submit- 
sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't work 
unless explicitly configured- call to http://master:8080/json (this used to 
work, but doesn't anymore?)
I guess I could parse the output html from the Spark UI... but that seems dumb. 
is there really no better way?
Thanks,Virgil.






  

Reply via email to