Re: NullPointerException When Reading Avro Sequence Files

Simone Franzini Tue, 09 Dec 2014 07:44:58 -0800

Hi Cristovao,

I have seen a very similar issue that I have posted about in this thread:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
I think your main issue here is somewhat similar, in that the MapWrapper
Scala class is not registered. This gets registered by the Twitter
chill-scala AllScalaRegistrar class that you are currently not using.


As far as I understand, in order to use Avro with Spark, you also have to
use Kryo. This means you have to use the Spark KryoSerializer. This in turn
uses Twitter chill. I posted the basic code that I am using here:

http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491

Maybe there is a simpler solution to your problem but I am not that much of
an expert yet. I hope this helps.

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro <
cristovao.corde...@cern.ch> wrote:

>  Hi Simone,
>
> thanks but I don't think that's it.
> I've tried several libraries within the --jar argument. Some do give what
> you said. But other times (when I put the right version I guess) I get the
> following:
> 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
> 0)
> java.io.NotSerializableException:
> scala.collection.convert.Wrappers$MapWrapper
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>         at
> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>
>
> Which is odd since I am reading a Avro I wrote...with the same piece of
> code:
> https://gist.github.com/MLnick/5864741781b9340cb211
>
>  Cumprimentos / Best regards,
> Cristóvão José Domingues Cordeiro
> IT Department - 28/R-018
> CERN
>    ------------------------------
> *From:* Simone Franzini [captainfr...@gmail.com]
> *Sent:* 06 December 2014 15:48
> *To:* Cristovao Jose Domingues Cordeiro
> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>
>   java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>
>  That is a sign that you are mixing up versions of Hadoop. This is
> particularly an issue when dealing with AVRO. If you are using Hadoop 2,
> you will need to get the hadoop 2 version of avro-mapred. In Maven you
> would do this with the <classifier> hadoop2 </classifier> tag.
>
>  Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
> On Fri, Dec 5, 2014 at 3:52 AM, cjdc <cristovao.corde...@cern.ch> wrote:
>
>> Hi all,
>>
>> I've tried the above example on Gist, but it doesn't work (at least for
>> me).
>> Did anyone get this:
>> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
>> 0)
>> java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>         at
>>
>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>         at
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>         at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>         at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught
>> exception
>> in thread Thread[Executor task launch worker-0,5,main]
>> java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>         at
>>
>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>         at
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>         at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>         at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1
>> times;
>> aborting job
>>
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: NullPointerException When Reading Avro Sequence Files

Reply via email to