[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415324#comment-17415324 ] ASF GitHub Bot commented on PARQUET-1968: - huaxingao commented on a change in pull request

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-09-14 Thread GitBox
huaxingao commented on a change in pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#discussion_r708850491 ## File path: parquet-generator/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java ## @@ -1,14 +1,14 @@ -/*

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415323#comment-17415323 ] ASF GitHub Bot commented on PARQUET-1968: - huaxingao commented on a change in pull request

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-09-14 Thread GitBox
huaxingao commented on a change in pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#discussion_r708850129 ## File path: parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java ## @@ -293,8 +290,48 @@ boolean

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415321#comment-17415321 ] ASF GitHub Bot commented on PARQUET-1968: - huaxingao commented on a change in pull request

[GitHub] [parquet-mr] huaxingao commented on a change in pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-09-14 Thread GitBox
huaxingao commented on a change in pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#discussion_r708849684 ## File path: parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java ## @@ -250,27 +250,16 @@ public Eq(Column column, T

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Xinli shang
The vote to release 1.12.1 RC1 as Apache Parquet MR 1.12.1 is PASSED with the required three +1 binding votes and one +1 non-binding votes. (There were no -1 or 0 votes.) Thank you all who verified and voted! I'm going forward with the release process soon. On Tue, Sep 14, 2021 at 5:23 PM

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Julien Le Dem
+1 (binding) I verified the signature the build and tests pass (with java 8) On Tue, Sep 14, 2021 at 4:14 PM Xinli shang wrote: > I also vote +1 (binding). Thanks everybody for verifying! > > On Tue, Sep 14, 2021 at 2:00 PM Chao Sun wrote: > > > +1 (non-binding). > > > > - tested on the Spark

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Xinli shang
I also vote +1 (binding). Thanks everybody for verifying! On Tue, Sep 14, 2021 at 2:00 PM Chao Sun wrote: > +1 (non-binding). > > - tested on the Spark side and all tests passed, including the issue in > SPARK-36696 > - verified signature and checksum of the release > > Thanks Xinli for driving

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Chao Sun
+1 (non-binding). - tested on the Spark side and all tests passed, including the issue in SPARK-36696 - verified signature and checksum of the release Thanks Xinli for driving the release work! Chao On Tue, Sep 14, 2021 at 3:01 AM Gabor Szadovszky wrote: > Thanks for the new RC, Xinli. > >

Re: Concatenation of parquet files

2021-09-14 Thread Weston Pace
A few things that will be expected to change in your experiment (off a cursory scan): RowGroup::Ordinal (https://docs.rs/parquet-format/4.0.0/parquet_format/struct.RowGroup.html#structfield.ordinal) RowGroup::FileOffset

[jira] [Resolved] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Matthew Topol (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Topol resolved PARQUET-2092. Resolution: Invalid Moving to Arrow Project JIRA > [Go] Fix in go implementation >

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Matthew Topol (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415061#comment-17415061 ] Matthew Topol commented on PARQUET-2092: Sure thing, i'll update the PR afterwards too. > [Go]

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Micah Kornfield (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415060#comment-17415060 ] Micah Kornfield commented on PARQUET-2092: -- OK, would you mind closing this and opening up an

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Matthew Topol (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415057#comment-17415057 ] Matthew Topol commented on PARQUET-2092: [~emkornfi...@gmail.com] That's what I assumed and

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Micah Kornfield (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415056#comment-17415056 ] Micah Kornfield commented on PARQUET-2092: -- [~zeroshade] the move option if you are allowed to

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Matthew Topol (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415054#comment-17415054 ] Matthew Topol commented on PARQUET-2092: [~emkornfi...@gmail.com] Maybe it's because it was a

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Micah Kornfield (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415050#comment-17415050 ] Micah Kornfield commented on PARQUET-2092: -- Hmm, it doesn't look like I have permissions to

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Micah Kornfield (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415048#comment-17415048 ] Micah Kornfield commented on PARQUET-2092: -- I'm going to move this to the Arrow tracker. 

[jira] [Commented] (PARQUET-2081) Encryption translation tool - Parquet-hadoop

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415032#comment-17415032 ] ASF GitHub Bot commented on PARQUET-2081: - ggershinsky commented on pull request #928: URL:

[GitHub] [parquet-mr] ggershinsky commented on pull request #928: PARQUET-2081: Encryption translation tool - Parquet-hadoop

2021-09-14 Thread GitBox
ggershinsky commented on pull request #928: URL: https://github.com/apache/parquet-mr/pull/928#issuecomment-919311645 Hi @shangxinli , will be glad to. We got a number of national holidays here, but I'll get to this soon. -- This is an automated message from the Apache Git Service. To

[jira] [Commented] (PARQUET-2084) Upgrade Thrift to 0.14.2

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415008#comment-17415008 ] ASF GitHub Bot commented on PARQUET-2084: - dongjoon-hyun commented on pull request #927: URL:

[GitHub] [parquet-mr] dongjoon-hyun commented on pull request #927: PARQUET-2084: Upgrade Thrift to 0.14.2

2021-09-14 Thread GitBox
dongjoon-hyun commented on pull request #927: URL: https://github.com/apache/parquet-mr/pull/927#issuecomment-919269167 Thank you, @sunchao and @gszadovszky . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[jira] [Commented] (PARQUET-2081) Encryption translation tool - Parquet-hadoop

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414989#comment-17414989 ] ASF GitHub Bot commented on PARQUET-2081: - shangxinli commented on pull request #928: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #928: PARQUET-2081: Encryption translation tool - Parquet-hadoop

2021-09-14 Thread GitBox
shangxinli commented on pull request #928: URL: https://github.com/apache/parquet-mr/pull/928#issuecomment-919239142 @ggershinsky @gszadovszky Can you have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[jira] [Updated] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-2092: Labels: pull-request-available (was: ) > [Go] Fix in go implementation >

[jira] [Commented] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Matthew Topol (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414972#comment-17414972 ] Matthew Topol commented on PARQUET-2092: [~emkornfield] Can we create a component for

[jira] [Created] (PARQUET-2092) [Go] Fix in go implementation

2021-09-14 Thread Matthew Topol (Jira)
Matthew Topol created PARQUET-2092: -- Summary: [Go] Fix in go implementation Key: PARQUET-2092 URL: https://issues.apache.org/jira/browse/PARQUET-2092 Project: Parquet Issue Type: Sub-task

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414964#comment-17414964 ] ASF GitHub Bot commented on PARQUET-1968: - gszadovszky commented on a change in pull request

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #923: [PARQUET-1968] FilterApi support In predicate

2021-09-14 Thread GitBox
gszadovszky commented on a change in pull request #923: URL: https://github.com/apache/parquet-mr/pull/923#discussion_r708260743 ## File path: parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java ## @@ -293,8 +290,48 @@ boolean

[jira] [Comment Edited] (PARQUET-2088) Different created_by field values for application and library

2021-09-14 Thread Joshua Howard (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414923#comment-17414923 ] Joshua Howard edited comment on PARQUET-2088 at 9/14/21, 2:03 PM: --

[jira] [Commented] (PARQUET-2089) [C++] RowGroupMetaData file_offset set incorrectly

2021-09-14 Thread Matthew Topol (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414926#comment-17414926 ] Matthew Topol commented on PARQUET-2089: [~emkornfield] Should I create a separate card or

[jira] [Commented] (PARQUET-2088) Different created_by field values for application and library

2021-09-14 Thread Joshua Howard (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414923#comment-17414923 ] Joshua Howard commented on PARQUET-2088: That is exactly the pain point. Is it the convention

Re: Concatenation of parquet files

2021-09-14 Thread Pau Tallada
Dear Gabor, Thanks a lot for the clarification! ☺ I understand this is not a common use case, I somewhat just had hope it could be done easily :P If you are interested, I attach a collab notebook where it shows this behaviour. The same data written three times produces different binary contents.

[jira] [Commented] (PARQUET-2091) Fix release build error introduced by PARQUET-2043

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414886#comment-17414886 ] Gabor Szadovszky commented on PARQUET-2091: --- [~sha...@uber.com], do you have issues with

[jira] [Resolved] (PARQUET-2084) Upgrade Thrift to 0.14.2

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2084. --- Resolution: Fixed > Upgrade Thrift to 0.14.2 > > >

[jira] [Resolved] (PARQUET-2083) Expose getFieldPath from ColumnIO

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2083. --- Resolution: Fixed > Expose getFieldPath from ColumnIO >

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Gabor Szadovszky
Thanks for the new RC, Xinli. The content seems correct to me. The checksum and sign are correct. Unit tests pass. My vote is +1 (binding) On Mon, Sep 13, 2021 at 8:11 PM Xinli shang wrote: > Hi everyone, > > > I propose the following RC to be released as the official Apache Parquet > 1.12.1

[jira] [Commented] (PARQUET-2084) Upgrade Thrift to 0.14.2

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414844#comment-17414844 ] ASF GitHub Bot commented on PARQUET-2084: - gszadovszky merged pull request #927: URL:

[GitHub] [parquet-mr] gszadovszky merged pull request #927: PARQUET-2084: Upgrade Thrift to 0.14.2

2021-09-14 Thread GitBox
gszadovszky merged pull request #927: URL: https://github.com/apache/parquet-mr/pull/927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2083) Expose getFieldPath from ColumnIO

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414842#comment-17414842 ] ASF GitHub Bot commented on PARQUET-2083: - gszadovszky merged pull request #926: URL:

[GitHub] [parquet-mr] gszadovszky merged pull request #926: PARQUET-2083: Expose getFieldPath from ColumnIO

2021-09-14 Thread GitBox
gszadovszky merged pull request #926: URL: https://github.com/apache/parquet-mr/pull/926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2088) Different created_by field values for application and library

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414829#comment-17414829 ] Gabor Szadovszky commented on PARQUET-2088: --- Ah, I see. So, that code part is not about a

[jira] [Commented] (PARQUET-2085) Formatting is broken for description of BIT_PACKED

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414823#comment-17414823 ] Gabor Szadovszky commented on PARQUET-2085: --- [~alexott], I got it now. You are talking about

Re: Concatenation of parquet files

2021-09-14 Thread Gabor Szadovszky
Hi Pau, I guess attachments are not allowed in the apache lists so we cannot see the image. If the two row groups contain the very same data in the same order and encoded with the same encoding, compressed with the same codec I think, they should be the same binary. I am not sure why you have

Concatenation of parquet files

2021-09-14 Thread Pau Tallada
Hi, I am a developer of cosmohub.pic.es, a science platform that provides interactive analysis and exploration of large scientific datasets. Working with Hive, users are able to generate the subset of data they are interested in, and this result set is stored as a set of files. When users want to

[jira] [Updated] (PARQUET-2080) Deprecate RowGroup.file_offset

2021-09-14 Thread Gidon Gershinsky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gidon Gershinsky updated PARQUET-2080: -- Description: Due to PARQUET-2078 RowGroup.file_offset is not reliable. This field

[jira] [Assigned] (PARQUET-2080) Deprecate RowGroup.file_offset

2021-09-14 Thread Gidon Gershinsky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gidon Gershinsky reassigned PARQUET-2080: - Assignee: Gidon Gershinsky (was: Gabor Szadovszky) > Deprecate

[jira] [Updated] (PARQUET-2089) [C++] RowGroupMetaData file_offset set incorrectly

2021-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-2089: Labels: pull-request-available (was: ) > [C++] RowGroupMetaData file_offset set