I have a sequence file SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec?v?
Key = Text Value = Text and it seems to be using GzipCodec. How should i read it from Spark I am using val x = sc.sequenceFile(dwTable, classOf[Text], classOf[Text]).partitionBy( new org.apache.spark.HashPartitioner(7919)) When i do x.take(10).foreach(println) each record return is identical. How is that possible. In this Sequence file records are unique. (guarenteed) -- Deepak