[ https://issues.apache.org/jira/browse/PARQUET-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Blue resolved PARQUET-922. ------------------------------- Resolution: Fixed Fix Version/s: format-2.4.0 Merged format PR #72. Thanks for getting this pushed through [~lv]! > Add index pages to the format to support efficient page skipping > ---------------------------------------------------------------- > > Key: PARQUET-922 > URL: https://issues.apache.org/jira/browse/PARQUET-922 > Project: Parquet > Issue Type: Improvement > Components: parquet-format > Reporter: Julien Le Dem > Assignee: Marcel Kornacker > Fix For: format-2.4.0 > > > When a Parquet file is sorted we can define an index consisting of the > boundary values for the pages of the columns sorted on as well as the offsets > and length of said pages in the file. > The goal is to optimize lookup and range scan type queries, using this to > read only the pages containing data matching the filter. > We'd require the pages to be aligned accross columns. > [~marcelk] will add a link to the google doc to discuss the spec -- This message was sent by Atlassian JIRA (v6.4.14#64029)