Re: Spark job fails when zeppelin.spark.useNew is true

Jeff Zhang Thu, 23 May 2019 18:57:48 -0700

spark.jars and spark.jars.packages is the standard way to adding third
party libraries. And it works for all the native supported modes
(standalone/yarn/mesos and etc). The approach you used only works for old
spark interpreter, and is not a standard way to adding jars for spark
engine (e.g. it wont's works for yarn cluster mode). Here's the ticket I
created to update the doc (sorry for that, we didn't update the doc in
time). I also attach some zeppelin tutorial links, hope they are useful for
you.


https://jira.apache.org/jira/browse/ZEPPELIN-4130?filter=-2

http://zeppelin.apache.org/docs/0.8.0/interpreter/spark.html#dependency-management
https://medium.com/@zjffdu/list-of-zeppelin-tutorials-efd507248f4d



Krentz <cpkre...@gmail.com> 于2019年5月24日周五 上午9:49写道：

> I add the jar by editing the Spark interpreter on the interpreters page
> and adding the path to the jar at the bottom. I am not familiar with the
> spark.jars method. Is there a guide for that somewhere? Could that cause
> the difference between spark.useNew being set to true versus false?
>
> On Thu, May 23, 2019 at 9:16 PM Jeff Zhang <zjf...@gmail.com> wrote:
>
>> >>> adding a Geomesa-Accumulo-Spark jar to the Spark interpreter.
>>
>> How do you add jar to spark interpreter ? It is encouraged to add jar via
>> spark.jars
>>
>>
>> Krentz <cpkre...@gmail.com> 于2019年5月24日周五 上午4:53写道：
>>
>>> Hello - I am looking for insight into an issue I have been having with
>>> our Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark
>>> jar to the Spark interpreter. The notebook paragraphs run fine until we try
>>> to access the data, at which point we get an "Unread Block Data" error from
>>> the Spark process. However, this error only occurs when the interpreter
>>> setting "zeppelin.spark.useNew" is set to true. If this parameter is set to
>>> false, the paragraph works just fine. Here is a paragraph that fails:
>>>
>>> %sql
>>> select linktype,count(linktype) from linkageview group by linktype
>>>
>>> The error we get as a result is this:
>>> java.lang.IllegalStateException: unread block data
>>> at
>>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>> If I drill down and inspect the Spark job itself, I get an error saying
>>> "readObject can't find class
>>> org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full
>>> stack trace is attached. We dug into and opened up the __spark_conf and
>>> __spark_libs files associated with the Spark job (under
>>> /user/root/.sparkStaging/application_<pid>/ but they did not contain the
>>> jar  file containing this method. However, it was not present in both the
>>> spark.useNew true version false version.
>>>
>>> Basically I am just trying to figure out why the spark.useNew option
>>> would cause the error to happen when it works fine turned off. We can move
>>> forward with it turned off for now, but I would like to get to the bottom
>>> of this issue in case there is something deeper going wrong.
>>>
>>> Thanks so much,
>>> Chris Krentz
>>>
>>>
>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

-- 
Best Regards

Jeff Zhang

Re: Spark job fails when zeppelin.spark.useNew is true

Reply via email to