Additional info about this problems:

The deserialize code like this:

public static Block deserializeFrom(byte[] bytes) {
    try {
        Block b = SerializationUtils.deserialize(bytes);
        System.out.println("b="+b);
        return b;
    } catch (ClassCastException e) {
        System.out.println("ClassCastException");
        e.printStackTrace();
    } catch (IllegalArgumentException e) {
        System.out.println("IllegalArgumentException");
        e.printStackTrace();

    } catch (SerializationException e) {
        System.out.println("SerializationException");
        e.printStackTrace();
    }
    return null;
}


The Spark code is:

val fis = spark.sparkContext.binaryFiles("/folder/abc*.file")
val RDD = fis.map(x => {
  val content = x._2.toArray()
  val b = Block.deserializeFrom(content)
  ...
}


All codes above can run successfully in Spark local mode, but when run it in 
Yarn cluster mode, the error happens.



在 2019/6/26 下午5:52, big data 写道:

I use Apache Commons Lang3's SerializationUtils in the code. 
SerializationUtils.serialize() to store a customized class as files into disk 
and SerializationUtils.deserialize(byte[]) to restore them again.

In the Spark local Mode, all serialized files can be deserialized normally and 
no error happens. But when I copy these serialized files into HDFS, and read 
them from HDFS by using Yarn mode, a SerializeException happens.

the stack error as below:

org.apache.commons.lang3.SerializationException: 
java.lang.ClassNotFoundException: com.XXXX.XXXX
    at 
org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:227)
    at 
org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:265)
    at com.com.XXXX.XXXX.deserializeFrom(XXX.java:81)
    at com.XXX.FFFF$$anonfun$3.apply(BXXXX.scala:157)
    at com.XXX.FFFF$$anonfun$3.apply(BXXXX.scala:153)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
    at scala.collection.AbstractIterator.to(Iterator.scala:1336)
    at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
    at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
    at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)
    at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
    at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.XXXX.XXXX
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:686)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at 
org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:223)

I've check the loaded byte[] from BinaryFiles, both from local and from HDFS 
are same. But why it can not be deserialized from HDFS?

The Class which labled ClassNotFound can be run normally in the Spark code.



Reply via email to