Hi Pratik, If reopen the file reader can solve the problem, can I come to a conclusion that the exported Parquet files are valid?
Best regards --Qian Xu (Stanley) From: pratik khadloya [mailto:[email protected]] Sent: Friday, August 29, 2014 3:46 AM To: [email protected] Subject: Re: Issue with reading parquet file exported by sqoop Strangely enough another version of my reader works https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad The difference is that i have to re-open the file again when i read a new column. The reopening happens through the following line: ParquetFileReader fileReader = new ParquetFileReader(conf, filePath, blocks, schema.getColumns()); which i am calling in a loop where i am looping over column descriptors. ~Pratik On Thu, Aug 28, 2014 at 11:49 AM, pratik khadloya <[email protected]<mailto:[email protected]>> wrote: This issue only occurs for some columns and that too after reading a few thousand records. ~Pratik On Thu, Aug 28, 2014 at 11:48 AM, pratik khadloya <[email protected]<mailto:[email protected]>> wrote: Hello, I am facing the following exception when reading a parquet file exported by sqoop. My parquet column reader code is at https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf Exception in thread "main" parquet.io.ParquetDecodingException: Can't read value in column [description] BINARY at value 44899 out of 57096, 44899 out of 57096 in currentPage. repetition level: 0, definition level: 1 at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:450) at parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:398) at com.rocketfuel.grid.lookup_new.RfiParquetFileReader.load(RfiParquetFileReader.java:147) at com.rocketfuel.grid.lookup_new.RfiParquetFileReader.<init>(RfiParquetFileReader.java:87) at com.rocketfuel.grid.lookup_new.RfiParquetFileReader.main(RfiParquetFileReader.java:114) Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream. at parquet.Preconditions.checkArgument(Preconditions.java:47) at parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80) at parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62) at parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:82) at parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:295) at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:446) ... 4 more Does anyone know what this could be related to? What i could be doing wrong? Thanks, ~Pratik
