Hi Cristovao, I have seen a very similar issue that I have posted about in this thread: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html I think your main issue here is somewhat similar, in that the MapWrapper Scala class is not registered. This gets registered by the Twitter chill-scala AllScalaRegistrar class that you are currently not using.
As far as I understand, in order to use Avro with Spark, you also have to use Kryo. This means you have to use the Spark KryoSerializer. This in turn uses Twitter chill. I posted the basic code that I am using here: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491 Maybe there is a simpler solution to your problem but I am not that much of an expert yet. I hope this helps. Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro < cristovao.corde...@cern.ch> wrote: > Hi Simone, > > thanks but I don't think that's it. > I've tried several libraries within the --jar argument. Some do give what > you said. But other times (when I put the right version I guess) I get the > following: > 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID > 0) > java.io.NotSerializableException: > scala.collection.convert.Wrappers$MapWrapper > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) > > > Which is odd since I am reading a Avro I wrote...with the same piece of > code: > https://gist.github.com/MLnick/5864741781b9340cb211 > > Cumprimentos / Best regards, > Cristóvão José Domingues Cordeiro > IT Department - 28/R-018 > CERN > ------------------------------ > *From:* Simone Franzini [captainfr...@gmail.com] > *Sent:* 06 December 2014 15:48 > *To:* Cristovao Jose Domingues Cordeiro > *Subject:* Re: NullPointerException When Reading Avro Sequence Files > > java.lang.IncompatibleClassChangeError: Found interface > org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected > > That is a sign that you are mixing up versions of Hadoop. This is > particularly an issue when dealing with AVRO. If you are using Hadoop 2, > you will need to get the hadoop 2 version of avro-mapred. In Maven you > would do this with the <classifier> hadoop2 </classifier> tag. > > Simone Franzini, PhD > > http://www.linkedin.com/in/simonefranzini > > On Fri, Dec 5, 2014 at 3:52 AM, cjdc <cristovao.corde...@cern.ch> wrote: > >> Hi all, >> >> I've tried the above example on Gist, but it doesn't work (at least for >> me). >> Did anyone get this: >> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID >> 0) >> java.lang.IncompatibleClassChangeError: Found interface >> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected >> at >> >> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47) >> at >> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115) >> at >> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103) >> at >> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) >> at org.apache.spark.scheduler.Task.run(Task.scala:54) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >> at >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught >> exception >> in thread Thread[Executor task launch worker-0,5,main] >> java.lang.IncompatibleClassChangeError: Found interface >> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected >> at >> >> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47) >> at >> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115) >> at >> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103) >> at >> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) >> at org.apache.spark.scheduler.Task.run(Task.scala:54) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >> at >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 >> times; >> aborting job >> >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >