Hello - I am looking for insight into an issue I have been having with our Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark jar to the Spark interpreter. The notebook paragraphs run fine until we try to access the data, at which point we get an "Unread Block Data" error from the Spark process. However, this error only occurs when the interpreter setting "zeppelin.spark.useNew" is set to true. If this parameter is set to false, the paragraph works just fine. Here is a paragraph that fails:
%sql select linktype,count(linktype) from linkageview group by linktype The error we get as a result is this: java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) If I drill down and inspect the Spark job itself, I get an error saying "readObject can't find class org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full stack trace is attached. We dug into and opened up the __spark_conf and __spark_libs files associated with the Spark job (under /user/root/.sparkStaging/application_<pid>/ but they did not contain the jar file containing this method. However, it was not present in both the spark.useNew true version false version. Basically I am just trying to figure out why the spark.useNew option would cause the error to happen when it works fine turned off. We can move forward with it turned off for now, but I would like to get to the bottom of this issue in case there is something deeper going wrong. Thanks so much, Chris Krentz
19/05/23 20:45:55 ERROR util.Utils: Exception encountered java.lang.RuntimeException: readObject can't find class org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit at org.apache.hadoop.io.ObjectWritable.loadClass(ObjectWritable.java:377) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:228) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:45) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply(SerializableWritable.scala:41) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply(SerializableWritable.scala:41) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.io.ObjectWritable.loadClass(ObjectWritable.java:373) ... 30 more 19/05/23 20:45:55 ERROR executor.Executor: Exception in task 71.2 in stage 2.0 (TID 36) java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)