[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691413#comment-17691413 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on PR #1023: URL:

[GitHub] [parquet-mr] yabola commented on pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-20 Thread via GitHub
yabola commented on PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#issuecomment-1437950981 @gszadovszky @shangxinli If you have time, please also take a look, thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Commented] (PARQUET-2164) CapacityByteArrayOutputStream overflow while writing causes negative row group sizes to be written

2023-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691348#comment-17691348 ] ASF GitHub Bot commented on PARQUET-2164: - wgtmac commented on code in PR #1032: URL:

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1032: PARQUET-2164: Check size of buffered data to prevent page data from overflowing

2023-02-20 Thread via GitHub
wgtmac commented on code in PR #1032: URL: https://github.com/apache/parquet-mr/pull/1032#discussion_r1112489810 ## parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java: ## @@ -164,6 +164,12 @@ public CapacityByteArrayOutputStream(int

[jira] [Commented] (PARQUET-2164) CapacityByteArrayOutputStream overflow while writing causes negative row group sizes to be written

2023-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691347#comment-17691347 ] ASF GitHub Bot commented on PARQUET-2164: - cxzl25 commented on PR #1032: URL:

[GitHub] [parquet-mr] cxzl25 commented on pull request #1032: PARQUET-2164: Check size of buffered data to prevent page data from overflowing

2023-02-20 Thread via GitHub
cxzl25 commented on PR #1032: URL: https://github.com/apache/parquet-mr/pull/1032#issuecomment-1437794593 Is it also similar to #1031 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Commented] (PARQUET-2164) CapacityByteArrayOutputStream overflow while writing causes negative row group sizes to be written

2023-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691281#comment-17691281 ] ASF GitHub Bot commented on PARQUET-2164: - parthchandra opened a new pull request, #1032: URL:

[GitHub] [parquet-mr] parthchandra opened a new pull request, #1032: PARQUET-2164: Check size of buffered data to prevent page data from overflowing

2023-02-20 Thread via GitHub
parthchandra opened a new pull request, #1032: URL: https://github.com/apache/parquet-mr/pull/1032 This PR addresses the following [PARQUET-2164](https://issues.apache.org/jira/browse/PARQUET-2164) The configuration parameters ``` parquet.page.size.check.estimate=false

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691259#comment-17691259 ] ASF GitHub Bot commented on PARQUET-2228: - shangxinli commented on PR #1026: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-20 Thread via GitHub
shangxinli commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1437355639 Let me know if you have more feedbacks @ggershinsky @gszadovszky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Xuwei Fu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691257#comment-17691257 ] Xuwei Fu commented on PARQUET-2249: --- I guess maybe we can take a look at: #

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Xuwei Fu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691256#comment-17691256 ] Xuwei Fu commented on PARQUET-2249: --- I guess NaN is not always larger than all values. # Postgres,

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691199#comment-17691199 ] Gang Wu commented on PARQUET-2249: -- As of today, there are many different parquet implementations

[jira] [Comment Edited] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691184#comment-17691184 ] Jan Finis edited comment on PARQUET-2249 at 2/20/23 1:50 PM: - I would be

[jira] [Comment Edited] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691184#comment-17691184 ] Jan Finis edited comment on PARQUET-2249 at 2/20/23 1:48 PM: - I would be

[jira] [Comment Edited] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691184#comment-17691184 ] Jan Finis edited comment on PARQUET-2249 at 2/20/23 1:48 PM: - I would be

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691184#comment-17691184 ] Jan Finis commented on PARQUET-2249: I would be willing to suggest a fix for this, but I'm not part

[jira] [Comment Edited] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Xuwei Fu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691179#comment-17691179 ] Xuwei Fu edited comment on PARQUET-2249 at 2/20/23 1:30 PM: The problem

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Xuwei Fu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691179#comment-17691179 ] Xuwei Fu commented on PARQUET-2249: --- Seems that iceberg provides NaN counts. And min-max is

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691126#comment-17691126 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on code in PR #1023: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-20 Thread via GitHub
yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1109441802 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Comment Edited] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691108#comment-17691108 ] Jan Finis edited comment on PARQUET-2249 at 2/20/23 9:55 AM: - [~wgtmac]

[jira] [Commented] (PARQUET-2249) Parquet spec (parquet.thrift) is inconsistent w.r.t. ColumnIndex + NaNs

2023-02-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691108#comment-17691108 ] Jan Finis commented on PARQUET-2249: [~wgtmac] True, not writing a column index in this case is