Hey all,

I'm working through the basic SparkPi example on a YARN cluster, and i'm
wondering why my containers don't pick up the spark assembly classes.

I built the latest spark code against CDH5.0.0

Then ran the following:
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
\
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar
examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
\
      --class org.apache.spark.examples.SparkPi \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

The job dies, and in the stderr from the containers I see
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/deploy/yarn/ApplicationMaster
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.deploy.yarn.ApplicationMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

my yarn-site.xml contains the following classpath:
  <property>
    <name>yarn.application.classpath</name>
    <value>
    /etc/hadoop/conf/,
    /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
    /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
    /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
    /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
    /usr/lib/avro/*
    </value>
  </property>

I've confirmed that the spark-assembly JAR has this class.  Does it
actually need to be defined in yarn.application.classpath or should the
spark client take care of ensuring the necessary JARs are added during job
submission?

Any tips would be greatly appreciated!
Cheers,
Jon

Reply via email to