ok i put lots of logging statements in the ChildExecutorURLClassLoader. this is what i see:
* the urls for userClassLoader are correct and includes only my one jar. * for one class that only exists in my jar i see it gets loaded correctly using userClassLoader * for a class that exists in both my jar and spark kernel it tries to use userClassLoader and ends up with a NoClassDefFoundError. the class is org.apache.avro.mapred.AvroInputFormat and the NoClassDefFoundError is for org.apache.hadoop.mapred.FileInputFormat (which the parentClassLoader is responsible for since it is not in my jar). i currently catch this NoClassDefFoundError and call parentClassLoader.loadClass but thats clearly not a solution since it loads the wrong version. On Fri, May 16, 2014 at 2:25 PM, Koert Kuipers <ko...@tresata.com> wrote: > well, i modified ChildExecutorURLClassLoader to also delegate to > parentClassloader if NoClassDefFoundError is thrown... now i get yet > another error. i am clearly missing something with these classloaders. such > nasty stuff... giving up for now. just going to have to not use > spark.files.userClassPathFirst=true for now, until i have more time to look > at this. > > 14/05/16 13:58:59 ERROR Executor: Exception in task ID 3 > java.lang.ClassCastException: cannot assign instance of scala.None$ to > field org.apache.spark.rdd.RDD.checkpointData of type scala.Option in > instance of MyRDD > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083) > at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1995) > > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > scala.collection.immutable.$colon$colon.readObject(List.scala:362) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60) > > > > On Fri, May 16, 2014 at 1:46 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> after removing all class paramater of class Path from my code, i tried >> again. different but related eror when i set >> spark.files.userClassPathFirst=true >> >> now i dont even use FileInputFormat directly. HadoopRDD does... >> >> 14/05/16 12:17:17 ERROR Executor: Exception in task ID 45 >> java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileInputFormat >> at java.lang.ClassLoader.defineClass1(Native Method) >> at java.lang.ClassLoader.defineClass(ClassLoader.java:792) >> at >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) >> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) >> at java.net.URLClassLoader.access$100(URLClassLoader.java:71) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at >> org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at >> org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:51) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:270) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:57) >> at >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610) >> at >> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515) >> at >> java.io.ObjectInputStream.readClass(ObjectInputStream.java:1481) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1331) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) >> at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> scala.collection.immutable.$colon$colon.readObject(List.scala:362) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) >> >> >> >> On Thu, May 15, 2014 at 3:03 PM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> when i set spark.files.userClassPathFirst=true, i get java serialization >>> errors in my tasks, see below. when i set userClassPathFirst back to its >>> default of false, the serialization errors are gone. my spark.serializer is >>> KryoSerializer. >>> >>> the class org.apache.hadoop.fs.Path is in the spark assembly jar, but >>> not in my task jars (the ones i added to the SparkConf). so looks like the >>> ClosureSerializer is having trouble with this class once the >>> ChildExecutorURLClassLoader is used? thats me just guessing. >>> >>> Exception in thread "main" org.apache.spark.SparkException: Job aborted >>> due to stage failure: Task 1.0:5 failed 4 times, most recent failure: >>> Exception failure in TID 31 on host node05.tresata.com: >>> java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path >>> java.lang.Class.getDeclaredConstructors0(Native Method) >>> java.lang.Class.privateGetDeclaredConstructors(Class.java:2398) >>> java.lang.Class.getDeclaredConstructors(Class.java:1838) >>> >>> java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1697) >>> java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:50) >>> java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:203) >>> java.security.AccessController.doPrivileged(Native Method) >>> >>> java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:200) >>> >>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:556) >>> >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1580) >>> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493) >>> >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729) >>> >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>> >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950) >>> >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874) >>> >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756) >>> >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) >>> >>> scala.collection.immutable.$colon$colon.readObject(List.scala:362) >>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> java.lang.reflect.Method.invoke(Method.java:597) >>> >>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) >>> >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852) >>> >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756) >>> >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>> >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950) >>> >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874) >>> >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756) >>> >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) >>> >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60) >>> >>> org.apache.spark.scheduler.ShuffleMapTask$.deserializeInfo(ShuffleMapTask.scala:66) >>> >>> org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:139) >>> >>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795) >>> >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754) >>> >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) >>> >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60) >>> >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:82) >>> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:190) >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >>> java.lang.Thread.run(Thread.java:662) >>> >>> >> >