[ 
https://issues.apache.org/jira/browse/PARQUET-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138299#comment-15138299
 ] 

Deepak Majeti edited comment on PARQUET-505 at 2/9/16 3:30 PM:
---------------------------------------------------------------

[~wesmckinn] I followed the Impala code.  I am adding a unit test as well.




-- 
regards,
Deepak Majeti



was (Author: mdeepak):
[~wesm] I followed the Impala code.  I am adding a unit test as well.




-- 
regards,
Deepak Majeti


> Column reader: automatically handle large data pages
> ----------------------------------------------------
>
>                 Key: PARQUET-505
>                 URL: https://issues.apache.org/jira/browse/PARQUET-505
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Assignee: Deepak Majeti
>
> Currently, we are only supporting data pages whose headers are 64K or less 
> (see {{parquet/column/serialized-page.cc}}. Since page headers can 
> essentially be arbitrarily large (in pathological cases) because of the page 
> statistics, if deserializing the page header fails, we should attempt to read 
> a progressively larger amount of file data in effort to find the end of the 
> page header. 
> As part of this (and to make testing easier!), the maximum data page header 
> size should be configurable. We can write test cases by defining appropriate 
> Statistics structs to yield serialized page headers of whatever desired size.
> On malformed files, we may run past the end of the file, in such cases we 
> should raise a reasonable exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to