[jira] [Commented] (PARQUET-1801) Add column index support for 'prune' command in Parquet-tools/cli

2020-12-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245625#comment-17245625 ] ASF GitHub Bot commented on PARQUET-1801: - Pavitheran opened a new pull request

[GitHub] [parquet-mr] Pavitheran opened a new pull request #846: [PARQUET-1801] Add parquet-tools 'prune' to parquet-cli

2020-12-07 Thread GitBox
Pavitheran opened a new pull request #846: URL: https://github.com/apache/parquet-mr/pull/846 Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Parquet-1801](https://issues.apache.org/jira/browse/PARQUET-1801) issues and references th

Re: Parquet File Meta Data & Compatibility

2020-12-07 Thread Tim Armstrong
> Introducing new logical types as "experimental" is a bit tricky. Maybe experimental is a bad term. I think mostly new features in the format do need to be backwards compatible and not buggy because data lasts a long time once it's written. Maybe "incubating" or "preview" is a better term. I guess

Re: Parquet File Meta Data & Compatibility

2020-12-07 Thread Gabor Szadovszky
I agree on separating the non widely used features to make the life of the implementers easier and to improve compatibility between these implementations. Meanwhile, it is not always clear how to define the core features. For example, the encryption feature will be released soon in parquet-mr and I

[jira] [Assigned] (PARQUET-1947) DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data

2020-12-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1947: - Assignee: Daniel Dai > DeprecatedParquetInputFormat in CombineFileInputFormat

[jira] [Resolved] (PARQUET-1947) DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data

2020-12-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-1947. --- Resolution: Fixed > DeprecatedParquetInputFormat in CombineFileInputFormat would pr

[jira] [Commented] (PARQUET-1947) DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data

2020-12-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245136#comment-17245136 ] ASF GitHub Bot commented on PARQUET-1947: - gszadovszky merged pull request #844

[GitHub] [parquet-mr] gszadovszky merged pull request #844: PARQUET-1947: DeprecatedParquetInputFormat in CombineFileInputFormat …

2020-12-07 Thread GitBox
gszadovszky merged pull request #844: URL: https://github.com/apache/parquet-mr/pull/844 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Filtering with projection / backward incompatibility

2020-12-07 Thread Gabor Szadovszky
Hi everyone, ParquetFileReader handles both the filtering and the projection. Row group level filtering is done at construction time so the row groups that do not fulfil the filter requirements are dropped at the very beginning. The projected schema can be set by the method setRequestedSchema. In

[jira] [Assigned] (PARQUET-1928) Interpret Parquet INT96 type as FIXED[12] AVRO Schema

2020-12-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1928: - Assignee: Anant Damle > Interpret Parquet INT96 type as FIXED[12] AVRO Schema

[jira] [Commented] (PARQUET-1928) Interpret Parquet INT96 type as FIXED[12] AVRO Schema

2020-12-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245101#comment-17245101 ] ASF GitHub Bot commented on PARQUET-1928: - gszadovszky merged pull request #831

[GitHub] [parquet-mr] gszadovszky merged pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

2020-12-07 Thread GitBox
gszadovszky merged pull request #831: URL: https://github.com/apache/parquet-mr/pull/831 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[jira] [Commented] (PARQUET-1928) Interpret Parquet INT96 type as FIXED[12] AVRO Schema

2020-12-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245100#comment-17245100 ] ASF GitHub Bot commented on PARQUET-1928: - gszadovszky commented on pull reques

[GitHub] [parquet-mr] gszadovszky commented on pull request #831: PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema

2020-12-07 Thread GitBox
gszadovszky commented on pull request #831: URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-739802895 @anantdamle, our usual process is to squash all the changes related to one jira before merging. Thanks a lot for your contribution! ---

[jira] [Commented] (PARQUET-1946) Parquet File not readable by Google big query (works with Spark)

2020-12-07 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245061#comment-17245061 ] Yuming Wang commented on PARQUET-1946: -- Could you try to disable {{parquet.filter.