[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689557#comment-17689557 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028:

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432596607 Didn't think about comparisons with non-null values before submitting this PR. I don't know if there is a downstream that relies on Parquet judge `value <> null` as TRUE instead

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689553#comment-17689553 ] ASF GitHub Bot commented on PARQUET-2237: - wgtmac commented on code in PR #1023

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1108046928 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (A

[jira] [Commented] (PARQUET-2237) Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689552#comment-17689552 ] ASF GitHub Bot commented on PARQUET-2237: - wgtmac commented on code in PR #1023

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1023: PARQUET-2237 Improve performance when filters in RowGroupFilter can match exactly

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102205460 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (AS

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689546#comment-17689546 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028:

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432572977 I haven't encountered any troubles caused by this situation in practice. I found this while looking at the code, when evaluating `notIn`, dictionary filter returns `BLOCK_MIGHT_M

[jira] [Commented] (PARQUET-2245) Improve dictionary filter evaluating notEq

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689543#comment-17689543 ] ASF GitHub Bot commented on PARQUET-2245: - wgtmac commented on code in PR #1029

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1029: PARQUET-2245: Improve dictionary filter evaluating notEq

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1029: URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java: ## @@ -187,10 +196,7 @@ public > Boolean visit(NotEq notEq) {

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689522#comment-17689522 ] ASF GitHub Bot commented on PARQUET-2244: - wgtmac commented on PR #1028: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
wgtmac commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432527608 > I did a quick test using Spark > > ``` > Seq("A", "A", null).toDF("column").repartition(1).write.mode("overwrite").parquet("t") > spark.read.parquet("t

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689514#comment-17689514 ] ASF GitHub Bot commented on PARQUET-2243: - wgtmac commented on code in PR #1027

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1027: URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1108004026 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java: ## @@ -66,7 +66,8 @@ import org.slf4j.Logger; import org.slf4j.LoggerF

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689493#comment-17689493 ] ASF GitHub Bot commented on PARQUET-2244: - huaxingao commented on PR #1028: URL

[GitHub] [parquet-mr] huaxingao commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
huaxingao commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432481988 I did a quick test using Spark ``` Seq("A", "A", null).toDF("column").repartition(1).write.mode("overwrite").parquet("t") spark.read.parquet("t").wher

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689138#comment-17689138 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on PR #1026: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431423625 > > You're right. We might add an option to force rewriting the input files record by record so row groups are regenerated by the writer. Does that sound good? @gszadovszky > > I

[jira] [Commented] (PARQUET-1950) Define core features / compliance level

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689124#comment-17689124 ] ASF GitHub Bot commented on PARQUET-1950: - gszadovszky commented on PR #164: UR

[GitHub] [parquet-format] gszadovszky commented on pull request #164: PARQUET-1950: Define core features

2023-02-15 Thread via GitHub
gszadovszky commented on PR #164: URL: https://github.com/apache/parquet-format/pull/164#issuecomment-1431376526 I don't know other implementation either. Since `parquet-format` is managed by this community I would expect the "implementors" to listen to the dev mailing list at least. I beli

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689118#comment-17689118 ] ASF GitHub Bot commented on PARQUET-2228: - gszadovszky commented on PR #1026: U

[GitHub] [parquet-mr] gszadovszky commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
gszadovszky commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431359196 > You're right. We might add an option to force rewriting the input files record by record so row groups are regenerated by the writer. Does that sound good? @gszadovszky It

[jira] [Commented] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689115#comment-17689115 ] ASF GitHub Bot commented on PARQUET-2159: - gszadovszky commented on code in PR

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1011: PARQUET-2159: java17 vector parquet bit-packing decode optimization

2023-02-15 Thread via GitHub
gszadovszky commented on code in PR #1011: URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1107005376 ## parquet-generator/src/main/java/org/apache/parquet/encoding/vectorbitpacking/BitPackingGenerator512Vector.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Ap

[jira] [Commented] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689114#comment-17689114 ] ASF GitHub Bot commented on PARQUET-2246: - zhongyujiang opened a new pull reque

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #1030: PARQUET-2246: Add short circuit logic to column index filter

2023-02-15 Thread via GitHub
zhongyujiang opened a new pull request, #1030: URL: https://github.com/apache/parquet-mr/pull/1030 Jira: [PARQUET-2246](https://issues.apache.org/jira/browse/PARQUET-2246) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[jira] [Created] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-15 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2246: -- Summary: Add short circuit logic to column index filter Key: PARQUET-2246 URL: https://issues.apache.org/jira/browse/PARQUET-2246 Project: Parquet Issue

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689103#comment-17689103 ] ASF GitHub Bot commented on PARQUET-2243: - gszadovszky commented on code in PR

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
gszadovszky commented on code in PR #1027: URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1107084792 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java: ## @@ -66,7 +66,8 @@ import org.slf4j.Logger; import org.slf4j.Lo

[jira] [Commented] (PARQUET-2245) Improve dictionary filter evaluating notEq

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689088#comment-17689088 ] ASF GitHub Bot commented on PARQUET-2245: - zhongyujiang opened a new pull reque

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #1029: PARQUET-2245: Improve dictionary filter evaluating notEq

2023-02-15 Thread via GitHub
zhongyujiang opened a new pull request, #1029: URL: https://github.com/apache/parquet-mr/pull/1029 JIRA: [PARQUET-2245](https://issues.apache.org/jira/browse/PARQUET-2245) This is a minor improvement for evaluating `notEq`. When evaluating `notEq`, if the column may contain nulls and

[jira] [Created] (PARQUET-2245) Improve dictionary filter evaluating notEq

2023-02-15 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2245: -- Summary: Improve dictionary filter evaluating notEq Key: PARQUET-2245 URL: https://issues.apache.org/jira/browse/PARQUET-2245 Project: Parquet Issue Type

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689062#comment-17689062 ] ASF GitHub Bot commented on PARQUET-2243: - wgtmac commented on code in PR #1027

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
wgtmac commented on code in PR #1027: URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1107006966 ## parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java: ## @@ -66,7 +66,8 @@ import org.slf4j.Logger; import org.slf4j.LoggerF

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689050#comment-17689050 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on PR #1026: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431185993 > > @wgtmac, by supporting multiple files to rewrite them into one we will end up with the same number of row-groups, right? Therefore, this tool is not ment to be used to solve the "sm

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689046#comment-17689046 ] ASF GitHub Bot commented on PARQUET-2228: - wgtmac commented on PR #1026: URL: h

[GitHub] [parquet-mr] wgtmac commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431182101 > @wgtmac, by supporting multiple files to rewrite them into one we will end up with the same number of row-groups, right? Therefore, this tool is not ment to be used to solve the "smal

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689045#comment-17689045 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028:

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431180145 @gszadovszky Thanks for reviewing and the quick merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[jira] [Assigned] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2244: - Assignee: Yujiang Zhong > Dictionary filter may skip row-groups incorrectly wh

[jira] [Resolved] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2244. --- Resolution: Fixed > Dictionary filter may skip row-groups incorrectly when evaluati

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689035#comment-17689035 ] ASF GitHub Bot commented on PARQUET-2244: - gszadovszky merged PR #1028: URL: ht

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689034#comment-17689034 ] ASF GitHub Bot commented on PARQUET-2244: - gszadovszky commented on PR #1028: U

[GitHub] [parquet-mr] gszadovszky merged pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
gszadovszky merged PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet

[GitHub] [parquet-mr] gszadovszky commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
gszadovszky commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431153336 @shangxinli, it might require a backport and releases on the branches `In` and `NotIn` were released. -- This is an automated message from the Apache Git Service. To respond to t

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689011#comment-17689011 ] ASF GitHub Bot commented on PARQUET-2228: - gszadovszky commented on PR #1026: U

[GitHub] [parquet-mr] gszadovszky commented on pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
gszadovszky commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431102177 @wgtmac, by supporting multiple files to rewrite them into one we will end up with the same number of row-groups, right? Therefore, this tool is not ment to be used to solve the "s

[jira] [Commented] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689007#comment-17689007 ] ASF GitHub Bot commented on PARQUET-2228: - gszadovszky commented on code in PR

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1026: PARQUET-2228: ParquetRewriter supports more than one input file

2023-02-15 Thread via GitHub
gszadovszky commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1106931407 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader re

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689002#comment-17689002 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang commented on PR #1028:

[GitHub] [parquet-mr] zhongyujiang commented on pull request #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang commented on PR #1028: URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431067528 @huaxingao @gszadovszky Can you help review this? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[jira] [Commented] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688999#comment-17688999 ] ASF GitHub Bot commented on PARQUET-2244: - zhongyujiang opened a new pull reque

[GitHub] [parquet-mr] zhongyujiang opened a new pull request, #1028: PARQUET-2244: Fix notIn for columns with null values

2023-02-15 Thread via GitHub
zhongyujiang opened a new pull request, #1028: URL: https://github.com/apache/parquet-mr/pull/1028 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references th

[jira] [Updated] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Yujiang Zhong (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yujiang Zhong updated PARQUET-2244: --- Description: Dictionary filter may skip row-groups incorrectly when evaluating `notIn` on

[jira] [Created] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Yujiang Zhong (Jira)
Yujiang Zhong created PARQUET-2244: -- Summary: Dictionary filter may skip row-groups incorrectly when evaluating notIn Key: PARQUET-2244 URL: https://issues.apache.org/jira/browse/PARQUET-2244 Project

[jira] [Commented] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688932#comment-17688932 ] ASF GitHub Bot commented on PARQUET-2243: - gszadovszky opened a new pull reques

[GitHub] [parquet-mr] gszadovszky opened a new pull request, #1027: PARQUET-2243: Support zstd-jni in DirectCodecFactory

2023-02-15 Thread via GitHub
gszadovszky opened a new pull request, #1027: URL: https://github.com/apache/parquet-mr/pull/1027 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references the