[jira] [Commented] (PARQUET-2202) Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte

2023-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697670#comment-17697670 ] ASF GitHub Bot commented on PARQUET-2202: - wgtmac commented on code in PR #1035

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1035: PARQUET-2202: Review usage and implementation of Preconditions.checkargument method

2023-03-07 Thread via GitHub
wgtmac commented on code in PR #1035: URL: https://github.com/apache/parquet-mr/pull/1035#discussion_r1128836441 ## parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java: ## @@ -477,15 +477,15 @@ public Builder withMaxBloomFilterBytes(int maxBloomFilterB

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-03-07 Thread via GitHub
wgtmac commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1128833713 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +189,69 @@ public ParquetRewriter(TransParquetFileReader reader,

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697669#comment-17697669 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on code in PR #1026

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697647#comment-17697647 ] ASF GitHub Bot commented on PARQUET-2228: - vectorijk commented on code in PR #1

[GitHub] [parquet-mr] vectorijk commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-03-07 Thread via GitHub
vectorijk commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1128712362 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +189,69 @@ public ParquetRewriter(TransParquetFileReader read

[GitHub] [parquet-mr] vectorijk commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-03-07 Thread via GitHub
vectorijk commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1128712362 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +189,69 @@ public ParquetRewriter(TransParquetFileReader read

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697646#comment-17697646 ] ASF GitHub Bot commented on PARQUET-2228: - vectorijk commented on code in PR #1

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697515#comment-17697515 ] ASF GitHub Bot commented on PARQUET-2237: - yabola commented on PR #1023: URL: h

[GitHub] [parquet-mr] yabola commented on pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-03-07 Thread via GitHub
yabola commented on PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#issuecomment-1458436214 > Thanks @yabola for coming up with this idea. Let's continue the discussion about the BloomFilter building idea in the jira. > > Meanwhile, I've been thinking about the actual pr

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697510#comment-17697510 ] Gabor Szadovszky commented on PARQUET-2254: --- 1) I think, for creating bloom f

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697463#comment-17697463 ] Gang Wu commented on PARQUET-2254: -- Here are two questions: 1) creating bloom filters

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697455#comment-17697455 ] ASF GitHub Bot commented on PARQUET-2237: - wgtmac commented on PR #1023: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-03-07 Thread via GitHub
wgtmac commented on PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#issuecomment-1458274839 > Thanks @yabola for coming up with this idea. Let's continue the discussion about the BloomFilter building idea in the jira. > > Meanwhile, I've been thinking about the actual pr

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697301#comment-17697301 ] Gabor Szadovszky commented on PARQUET-2254: --- I think this is a good idea. Mea

[jira] [Assigned] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2254: - Assignee: Mars > Build a BloomFilter with a more precise size > --

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697285#comment-17697285 ] ASF GitHub Bot commented on PARQUET-2237: - gszadovszky commented on PR #1023: U

[GitHub] [parquet-mr] gszadovszky commented on pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-03-07 Thread via GitHub
gszadovszky commented on PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#issuecomment-1457722446 Thanks @yabola for coming up with this idea. Let's continue the discussion about the BloomFilter building idea in the jira. Meanwhile, I've been thinking about the actual pro