[jira] [Assigned] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2241: Assignee: Gang Wu > ByteStreamSplitDecoder broken in presence of nulls >

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687342#comment-17687342 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on code in PR #1023: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread via GitHub
yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102921044 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java: ## @@ -289,8 +320,14 @@ public > Boolean visit(Lt lt) { T

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687341#comment-17687341 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on code in PR #1023: URL:

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687340#comment-17687340 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on code in PR #1023: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread via GitHub
yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102921044 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java: ## @@ -289,8 +320,14 @@ public > Boolean visit(Lt lt) { T

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread via GitHub
yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102921044 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java: ## @@ -289,8 +320,14 @@ public > Boolean visit(Lt lt) { T

[jira] [Commented] (PARQUET-2229) ParquetRewriter supports masking and encrypting the same column

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687306#comment-17687306 ] ASF GitHub Bot commented on PARQUET-2229: - shangxinli merged PR #1021: URL:

[GitHub] [parquet-mr] shangxinli merged pull request #1021: PARQUET-2229: ParquetRewriter masks and encrypts the same column

2023-02-10 Thread via GitHub
shangxinli merged PR #1021: URL: https://github.com/apache/parquet-mr/pull/1021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687304#comment-17687304 ] ASF GitHub Bot commented on PARQUET-2241: - shangxinli merged PR #192: URL:

[GitHub] [parquet-format] shangxinli merged pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

2023-02-10 Thread via GitHub
shangxinli merged PR #192: URL: https://github.com/apache/parquet-format/pull/192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687228#comment-17687228 ] ASF GitHub Bot commented on PARQUET-2241: - emkornfield commented on PR #192: URL:

[GitHub] [parquet-format] emkornfield commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

2023-02-10 Thread via GitHub
emkornfield commented on PR #192: URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1426163629 Seems OK to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687174#comment-17687174 ] ASF GitHub Bot commented on PARQUET-2241: - mapleFU commented on PR #192: URL:

[GitHub] [parquet-format] mapleFU commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

2023-02-10 Thread via GitHub
mapleFU commented on PR #192: URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1426068317 I think should we check that no more padding is added in all impl? At least, seems C++, Rust, parquet-mr didn't padding at the end of data. -- This is an automated message from

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687131#comment-17687131 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on code in PR #1023: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread via GitHub
yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102921044 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java: ## @@ -289,8 +320,14 @@ public > Boolean visit(Lt lt) { T

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687119#comment-17687119 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on code in PR #1023: URL:

[GitHub] [parquet-mr] yabola commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread via GitHub
yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102881433 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687115#comment-17687115 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on PR #1023: URL:

[GitHub] [parquet-mr] yabola commented on pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-10 Thread via GitHub
yabola commented on PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#issuecomment-1425942666 @wgtmac Sorry, `Boolean` type has to be used here, so that we can distinguish the `BLOCK_MIGHT_MATCH` and `BLOCK_MUST_MATCH`. This is example: ``` Boolean b1 = new Boolean(true);

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687113#comment-17687113 ] ASF GitHub Bot commented on PARQUET-2241: - pitrou commented on PR #192: URL:

[GitHub] [parquet-format] pitrou commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

2023-02-10 Thread via GitHub
pitrou commented on PR #192: URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1425935594 cc @wjones127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687109#comment-17687109 ] ASF GitHub Bot commented on PARQUET-2241: - wgtmac commented on PR #192: URL:

[GitHub] [parquet-format] wgtmac commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

2023-02-10 Thread via GitHub
wgtmac commented on PR #192: URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1425920012 cc @shangxinli @gszadovszky @ggershinsky @pitrou @emkornfield -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687106#comment-17687106 ] ASF GitHub Bot commented on PARQUET-2241: - wgtmac opened a new pull request, #192: URL:

[GitHub] [parquet-format] wgtmac opened a new pull request, #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

2023-02-10 Thread via GitHub
wgtmac opened a new pull request, #192: URL: https://github.com/apache/parquet-format/pull/192 Propose to explicitly state that no padding is allowed within a data page. This makes it easier for BYTE_STREAM_SPLIT decoder to decode page with nulls. In this way, it can simply get the number

[jira] [Updated] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2241: - Fix Version/s: (was: format-2.10.0) > ByteStreamSplitDecoder broken in presence of nulls >

[jira] [Updated] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2241: - Component/s: parquet-mr > ByteStreamSplitDecoder broken in presence of nulls >

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686941#comment-17686941 ] Gang Wu commented on PARQUET-2241: -- Have you seen any relevant issue in production? [~gershinsky]

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-10 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686938#comment-17686938 ] Gang Wu commented on PARQUET-2241: -- It seems that the *ByteStreamSplitValuesReader* in the parquet-mr