I'm assuming you're running Spark 0.9.x, because in the latest version of
Spark you shouldn't have to add the HADOOP_CONF_DIR to the java class path
manually. I tested this out on my own YARN cluster and was able to confirm

In Spark 1.0, SPARK_MEM is deprecated and should not be used. Instead, you
should set the per-executor memory through spark.executor.memory, which has
the same effect but takes higher priority. By YARN_WORKER_MEM, do you mean
SPARK_EXECUTOR_MEMORY? It also does the same thing. In Spark 1.0, the
priority hierarchy is as follows:

spark.executor.memory (set through spark-defaults.conf) >

In Spark 0.9, the hierarchy very similar:

spark.executor.memory (set through SPARK_JAVA_OPTS in spark-env) > SPARK_MEM

For more information:


> I was actually able to get this to work.  I was NOT setting the classpath
> properly originally.
> Simply running
> java -cp /etc/hadoop/conf/:<yarn, hadoop jars> com.domain.JobClass
> and setting yarn-client as the spark master worked for me.  Originally I
> had not put the configuration on the classpath. Also, I used
> $SPARK_HOME/bin/compute_classpath.sh now now to get all of the relevant
> jars.  The job properly connects to the am at the correct port.
> Is there any intuition on how spark executor map to yarn workers or how
> the different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM?
>> Your settings seem reasonable; as long as YARN_CONF_DIR or
>> HADOOP_CONF_DIR is properly set, the application should be able to find the
>> correct RM port. Have you tried running the examples in yarn-client mode,
>> and your custom application in yarn-standalone (now yarn-cluster) mode?
>> Few more details I would like to provide (Sorry as I should have provided
>>> with the previous post):
>>>  *- Spark Version = 0.9.1 (using pre-built spark-0.9.1-bin-hadoop2)
>>>  - Hadoop Version = 2.4.0 (Hortonworks)
>>>  - I am trying to execute a Spark Streaming program*
>>> Because I am using Hortornworks Hadoop (HDP), YARN is configured with
>>> different port numbers than the default Apache's default configurations.
>>> For
>>> example, *resourcemanager.address* is <IP>:8050 in HDP whereas it
>>> defaults
>>> to <IP>:8032.
>>> When I run the Spark examples using bin/run-example, I can see in the
>>> console logs, that it is connecting to the right port configured by HDP,
>>> i.e., 8050. Please refer the below console log:
>>> But, when I running my own custom spark streaming code, it is trying to
>>> connect to port number 8032 instead and hence unable to connect. Refer
>>> the
>>> below log:
>>> Do I need to specify the YARN ports configured by HDP to Spark somehow?
>>> How
>>> the example jobs can detect the correct YARN ports?
