Hi Pratik,

If reopen the file reader can solve the problem, can I come to a conclusion 
that the exported Parquet files are valid?

Best regards
--Qian Xu (Stanley)




From: pratik khadloya [mailto:[email protected]]
Sent: Friday, August 29, 2014 3:46 AM
To: [email protected]
Subject: Re: Issue with reading parquet file exported by sqoop

Strangely enough another version of my reader works 
https://gist.github.com/tispratik/f7a66f6a40b7ae3b98ad
The difference is that i have to re-open the file again when i read a new 
column.
The reopening happens through the following line:
ParquetFileReader fileReader = new ParquetFileReader(conf, filePath, blocks, 
schema.getColumns());

which i am calling in a loop where i am looping over column descriptors.


~Pratik

On Thu, Aug 28, 2014 at 11:49 AM, pratik khadloya 
<[email protected]<mailto:[email protected]>> wrote:
This issue only occurs for some columns and that too after reading a few 
thousand records.

~Pratik

On Thu, Aug 28, 2014 at 11:48 AM, pratik khadloya 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I am facing the following exception when reading a parquet file exported by 
sqoop.
My parquet column reader code is at 
https://gist.github.com/tispratik/f0044dd84dc8d8c6cbcf

Exception in thread "main" parquet.io.ParquetDecodingException: Can't read 
value in column [description] BINARY at value 44899 out of 57096, 44899 out of 
57096 in currentPage. repetition level: 0, definition level: 1
at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:450)
at parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:398)
at 
com.rocketfuel.grid.lookup_new.RfiParquetFileReader.load(RfiParquetFileReader.java:147)
at 
com.rocketfuel.grid.lookup_new.RfiParquetFileReader.<init>(RfiParquetFileReader.java:87)
at 
com.rocketfuel.grid.lookup_new.RfiParquetFileReader.main(RfiParquetFileReader.java:114)
Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.
at parquet.Preconditions.checkArgument(Preconditions.java:47)
at 
parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
at 
parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
at 
parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:82)
at parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:295)
at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:446)
... 4 more


Does anyone know what this could be related to? What i could be doing wrong?


Thanks,
~Pratik


Reply via email to