Adding your application jar to the sparkContext will resolve this issue. Eg: sparkContext.addJar("./target/scala-2.10/myTestApp_2.10-1.0.jar")
Thanks Best Regards On Mon, Oct 13, 2014 at 8:42 AM, Tao Xiao <xiaotao.cs....@gmail.com> wrote: > In the beginning I tried to read HBase and found that exception was > thrown, then I start to debug the app. I removed the codes reading HBase > and tried to save an rdd containing a list and the exception was still > thrown. So I'm sure that exception was not caused by reading HBase. > > While debugging I did not change the object name and file name. > > > > 2014-10-13 0:00 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > >> Your app is named scala.HBaseApp >> Does it read / write to HBase ? >> >> Just curious. >> >> On Sun, Oct 12, 2014 at 8:00 AM, Tao Xiao <xiaotao.cs....@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I'm using CDH 5.0.1 (Spark 0.9) and submitting a job in Spark >>> Standalone Cluster mode. >>> >>> The job is quite simple as follows: >>> >>> object HBaseApp { >>> def main(args:Array[String]) { >>> testHBase("student", "/test/xt/saveRDD") >>> } >>> >>> >>> def testHBase(tableName: String, outFile:String) { >>> val sparkConf = new SparkConf() >>> .setAppName("-- Test HBase --") >>> .set("spark.executor.memory", "2g") >>> .set("spark.cores.max", "16") >>> >>> val sparkContext = new SparkContext(sparkConf) >>> >>> val rdd = sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9,10), 3) >>> >>> val c = rdd.count // successful >>> println("\n\n\n" + c + "\n\n\n") >>> >>> rdd.saveAsTextFile(outFile) // This line will throw >>> "java.lang.ClassNotFoundException: >>> com.xt.scala.HBaseApp$$anonfun$testHBase$1" >>> >>> println("\n down \n") >>> } >>> } >>> >>> I submitted this job using the following script: >>> >>> #!/bin/bash >>> >>> HBASE_CLASSPATH=$(hbase classpath) >>> APP_JAR=/usr/games/spark/xt/SparkDemo-0.0.1-SNAPSHOT.jar >>> >>> SPARK_ASSEMBLY_JAR=/usr/games/spark/xt/spark-assembly_2.10-0.9.0-cdh5.0.1-hadoop2.3.0-cdh5.0.1.jar >>> SPARK_MASTER=spark://b02.jsepc.com:7077 >>> >>> CLASSPATH=$CLASSPATH:$APP_JAR:$SPARK_ASSEMBLY_JAR:$HBASE_CLASSPATH >>> export SPARK_CLASSPATH=/usr/lib/hbase/lib/* >>> >>> CONFIG_OPTS="-Dspark.master=$SPARK_MASTER" >>> >>> java -cp $CLASSPATH $CONFIG_OPTS com.xt.scala.HBaseApp $@ >>> >>> After I submitted the job, the count of rdd could be computed >>> successfully, but that rdd could not be saved into HDFS and the following >>> exception was thrown: >>> >>> 14/10/11 16:09:33 WARN scheduler.TaskSetManager: Loss was due to >>> java.lang.ClassNotFoundException >>> java.lang.ClassNotFoundException: >>> com.xt.scala.HBaseApp$$anonfun$testHBase$1 >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:270) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) >>> at >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at scala.collection.immutable.$colon$colon.readObject(List.scala:362) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) >>> at >>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) >>> at >>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) >>> at >>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) >>> at >>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:744) >>> >>> >>> >>> I also noted that, if I add "-Dspark.jars=$APP_JAR" to the variable >>> *CONFIG_OPTS*, i.e., CONFIG_OPTS="-Dspark.master=$SPARK_MASTER >>> Dspark.jars=$APP_JAR", the job will finish successfully and rdd can be >>> written into HDFS. >>> So, what does "java.lang.ClassNotFoundException: >>> com.xt.scala.HBaseApp$$anonfun$testHBase$1" mean and why would it be >>> thrown ? >>> >>> Thanks >>> >>> >> >