I'm trying to read and an Avro Sequence File using the sequenceFile method on the spark context object and I get a NullPointerException. If I read the file outside of Spark using AvroSequenceFile.Reader I don't have any problems.
Anyone have success in doing this? Below is one I typed and saw at the spark shell: scala>var myAvroSequenceFile = sc.sequenceFile("hdfs://<my url is here>", classOf[AvroKey[GenericRecord], ClassOf[AvroValue[GenericRecord]]) scala>myAvroSequenceFile.first 14/07/18 16:31:31 INFO FileInputFormat: Total input paths to process : 1 14/07/18 16:31:31 INFO SparkContext: Starting job: first at <console>:18 14/07/18 16:31:31 INFO DAGScheduler: Got job 2 (first at <console>:18) with 1 output partitions (allowLocal=true) 14/07/18 16:31:31 INFO DAGScheduler: Final stage: Stage 2(first at <console>:18) 14/07/18 16:31:31 INFO DAGScheduler: Parents of final stage: List() 14/07/18 16:31:31 INFO DAGScheduler: Missing parents: List() 14/07/18 16:31:31 INFO DAGScheduler: Computing the requested partition locally 14/07/18 16:31:31 INFO HadoopRDD: Input split: hdfs://<my url> 14/07/18 16:31:31 INFO DAGScheduler: Failed to run first at <console>:18 java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1902) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:190) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:181) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:574) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:559) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-Avro-Sequence-Files-tp10201.html Sent from the Apache Spark User List mailing list archive at Nabble.com.