Hey all, I'm working through the basic SparkPi example on a YARN cluster, and i'm wondering why my containers don't pick up the spark assembly classes.
I built the latest spark code against CDH5.0.0 Then ran the following: SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \ ./bin/spark-class org.apache.spark.deploy.yarn.Client \ --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \ --class org.apache.spark.examples.SparkPi \ --args yarn-standalone \ --num-workers 3 \ --master-memory 4g \ --worker-memory 2g \ --worker-cores 1 The job dies, and in the stderr from the containers I see Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ApplicationMaster Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ApplicationMaster at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) my yarn-site.xml contains the following classpath: <property> <name>yarn.application.classpath</name> <value> /etc/hadoop/conf/, /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*, /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*, /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*, /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*, /usr/lib/avro/* </value> </property> I've confirmed that the spark-assembly JAR has this class. Does it actually need to be defined in yarn.application.classpath or should the spark client take care of ensuring the necessary JARs are added during job submission? Any tips would be greatly appreciated! Cheers, Jon