I'm trying to read and an Avro Sequence File using the sequenceFile method on
the spark context object and I get a NullPointerException.  If I read the
file outside of Spark using AvroSequenceFile.Reader I don't have any
problems.

Anyone have success in doing this?

Below is one I typed and saw at the spark shell:

scala>var myAvroSequenceFile = sc.sequenceFile("hdfs://<my url is here>",
classOf[AvroKey[GenericRecord], ClassOf[AvroValue[GenericRecord]])

scala>myAvroSequenceFile.first
14/07/18 16:31:31 INFO FileInputFormat: Total input paths to process : 1
14/07/18 16:31:31 INFO SparkContext: Starting job: first at <console>:18
14/07/18 16:31:31 INFO DAGScheduler: Got job 2 (first at <console>:18) with
1 output partitions (allowLocal=true)
14/07/18 16:31:31 INFO DAGScheduler: Final stage: Stage 2(first at
<console>:18)
14/07/18 16:31:31 INFO DAGScheduler: Parents of final stage: List()
14/07/18 16:31:31 INFO DAGScheduler: Missing parents: List()
14/07/18 16:31:31 INFO DAGScheduler: Computing the requested partition
locally
14/07/18 16:31:31 INFO HadoopRDD: Input split: hdfs://<my url>
14/07/18 16:31:31 INFO DAGScheduler: Failed to run first at <console>:18
java.lang.NullPointerException
        at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1902)
        at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
        at
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
        at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
        at
org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:190)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at
org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:574)
        at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:559)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-Avro-Sequence-Files-tp10201.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to