I just checked the YARN config and looks like I need to change this value. Should be upgraded to 48G (the max memory allocated to YARN) per node ?
<property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>6144</value> <source>java.io.BufferedInputStream@2e7e1ee</source> </property> On Fri, Aug 15, 2014 at 2:37 PM, Soumya Simanta <[email protected]> wrote: > Andrew, > > Thanks for your response. > > When I try to do the following. > > ./spark-shell --executor-memory 46g --master yarn > > I get the following error. > > Exception in thread "main" java.lang.Exception: When running with master > 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the > environment. > > at > org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:166) > > at > org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:61) > > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > After this I set the following env variable. > > export YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/ > > The program launches but then halts with the following error. > > > *14/08/15 14:33:22 ERROR yarn.Client: Required executor memory (47104 MB), > is above the max threshold (6144 MB) of this cluster.* > > I guess this is some YARN setting that is not set correctly. > > > Thanks > > -Soumya > > > On Fri, Aug 15, 2014 at 2:19 PM, Andrew Or <[email protected]> wrote: > >> Hi Soumya, >> >> The driver's console output prints out how much memory is actually >> granted to each executor, so from there you can verify how much memory the >> executors are actually getting. You should use the '--executor-memory' >> argument in spark-shell. For instance, assuming each node has 48G of memory, >> >> bin/spark-shell --executor-memory 46g --master yarn >> >> We leave a small cushion for the OS so we don't take up all of the entire >> system's memory. This option also applies to the standalone mode you've >> been using, but if you have been using the ec2 scripts, we set >> "spark.executor.memory" in conf/spark-defaults.conf for you automatically >> so you don't have to specify it each time on the command line. Of course, >> you can also do the same in YARN. >> >> -Andrew >> >> >> >> 2014-08-15 10:45 GMT-07:00 Soumya Simanta <[email protected]>: >> >> I've been using the standalone cluster all this time and it worked fine. >>> Recently I'm using another Spark cluster that is based on YARN and I've >>> not experience with YARN. >>> >>> The YARN cluster has 10 nodes and a total memory of 480G. >>> >>> I'm having trouble starting the spark-shell with enough memory. >>> I'm doing a very simple operation - reading a file 100GB from HDFS and >>> running a count on it. This fails due to out of memory on the executors. >>> >>> Can someone point to the command line parameters that I should use for >>> spark-shell so that it? >>> >>> >>> Thanks >>> -Soumya >>> >>> >> >
