[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689557#comment-17689557
]
ASF GitHub Bot commented on PARQUET-2244:
-
zhongyujiang commented on PR #1028:
zhongyujiang commented on PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432596607
Didn't think about comparisons with non-null values before submitting this
PR. I don't know if there is a downstream that relies on Parquet judge `value
<> null` as TRUE instead
[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689553#comment-17689553
]
ASF GitHub Bot commented on PARQUET-2237:
-
wgtmac commented on code in PR #1023
wgtmac commented on code in PR #1023:
URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1108046928
##
parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (A
[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689552#comment-17689552
]
ASF GitHub Bot commented on PARQUET-2237:
-
wgtmac commented on code in PR #1023
wgtmac commented on code in PR #1023:
URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102205460
##
parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java:
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689546#comment-17689546
]
ASF GitHub Bot commented on PARQUET-2244:
-
zhongyujiang commented on PR #1028:
zhongyujiang commented on PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432572977
I haven't encountered any troubles caused by this situation in practice. I
found this while looking at the code, when evaluating `notIn`, dictionary
filter returns `BLOCK_MIGHT_M
[
https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689543#comment-17689543
]
ASF GitHub Bot commented on PARQUET-2245:
-
wgtmac commented on code in PR #1029
wgtmac commented on code in PR #1029:
URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382
##
parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java:
##
@@ -187,10 +196,7 @@ public > Boolean visit(NotEq
notEq) {
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689522#comment-17689522
]
ASF GitHub Bot commented on PARQUET-2244:
-
wgtmac commented on PR #1028:
URL: h
wgtmac commented on PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432527608
> I did a quick test using Spark
>
> ```
> Seq("A", "A",
null).toDF("column").repartition(1).write.mode("overwrite").parquet("t")
> spark.read.parquet("t
[
https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689514#comment-17689514
]
ASF GitHub Bot commented on PARQUET-2243:
-
wgtmac commented on code in PR #1027
wgtmac commented on code in PR #1027:
URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1108004026
##
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java:
##
@@ -66,7 +66,8 @@
import org.slf4j.Logger;
import org.slf4j.LoggerF
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689493#comment-17689493
]
ASF GitHub Bot commented on PARQUET-2244:
-
huaxingao commented on PR #1028:
URL
huaxingao commented on PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1432481988
I did a quick test using Spark
```
Seq("A", "A",
null).toDF("column").repartition(1).write.mode("overwrite").parquet("t")
spark.read.parquet("t").wher
[
https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689138#comment-17689138
]
ASF GitHub Bot commented on PARQUET-2228:
-
wgtmac commented on PR #1026:
URL: h
wgtmac commented on PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431423625
> > You're right. We might add an option to force rewriting the input files
record by record so row groups are regenerated by the writer. Does that sound
good? @gszadovszky
>
> I
[
https://issues.apache.org/jira/browse/PARQUET-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689124#comment-17689124
]
ASF GitHub Bot commented on PARQUET-1950:
-
gszadovszky commented on PR #164:
UR
gszadovszky commented on PR #164:
URL: https://github.com/apache/parquet-format/pull/164#issuecomment-1431376526
I don't know other implementation either. Since `parquet-format` is managed
by this community I would expect the "implementors" to listen to the dev
mailing list at least. I beli
[
https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689118#comment-17689118
]
ASF GitHub Bot commented on PARQUET-2228:
-
gszadovszky commented on PR #1026:
U
gszadovszky commented on PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431359196
> You're right. We might add an option to force rewriting the input files
record by record so row groups are regenerated by the writer. Does that sound
good? @gszadovszky
It
[
https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689115#comment-17689115
]
ASF GitHub Bot commented on PARQUET-2159:
-
gszadovszky commented on code in PR
gszadovszky commented on code in PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1107005376
##
parquet-generator/src/main/java/org/apache/parquet/encoding/vectorbitpacking/BitPackingGenerator512Vector.java:
##
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Ap
[
https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689114#comment-17689114
]
ASF GitHub Bot commented on PARQUET-2246:
-
zhongyujiang opened a new pull reque
zhongyujiang opened a new pull request, #1030:
URL: https://github.com/apache/parquet-mr/pull/1030
Jira: [PARQUET-2246](https://issues.apache.org/jira/browse/PARQUET-2246)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and us
Yujiang Zhong created PARQUET-2246:
--
Summary: Add short circuit logic to column index filter
Key: PARQUET-2246
URL: https://issues.apache.org/jira/browse/PARQUET-2246
Project: Parquet
Issue
[
https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689103#comment-17689103
]
ASF GitHub Bot commented on PARQUET-2243:
-
gszadovszky commented on code in PR
gszadovszky commented on code in PR #1027:
URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1107084792
##
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java:
##
@@ -66,7 +66,8 @@
import org.slf4j.Logger;
import org.slf4j.Lo
[
https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689088#comment-17689088
]
ASF GitHub Bot commented on PARQUET-2245:
-
zhongyujiang opened a new pull reque
zhongyujiang opened a new pull request, #1029:
URL: https://github.com/apache/parquet-mr/pull/1029
JIRA: [PARQUET-2245](https://issues.apache.org/jira/browse/PARQUET-2245)
This is a minor improvement for evaluating `notEq`. When evaluating `notEq`,
if the column may contain nulls and
Yujiang Zhong created PARQUET-2245:
--
Summary: Improve dictionary filter evaluating notEq
Key: PARQUET-2245
URL: https://issues.apache.org/jira/browse/PARQUET-2245
Project: Parquet
Issue Type
[
https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689062#comment-17689062
]
ASF GitHub Bot commented on PARQUET-2243:
-
wgtmac commented on code in PR #1027
wgtmac commented on code in PR #1027:
URL: https://github.com/apache/parquet-mr/pull/1027#discussion_r1107006966
##
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java:
##
@@ -66,7 +66,8 @@
import org.slf4j.Logger;
import org.slf4j.LoggerF
[
https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689050#comment-17689050
]
ASF GitHub Bot commented on PARQUET-2228:
-
wgtmac commented on PR #1026:
URL: h
wgtmac commented on PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431185993
> > @wgtmac, by supporting multiple files to rewrite them into one we will
end up with the same number of row-groups, right? Therefore, this tool is not
ment to be used to solve the "sm
[
https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689046#comment-17689046
]
ASF GitHub Bot commented on PARQUET-2228:
-
wgtmac commented on PR #1026:
URL: h
wgtmac commented on PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431182101
> @wgtmac, by supporting multiple files to rewrite them into one we will end
up with the same number of row-groups, right? Therefore, this tool is not ment
to be used to solve the "smal
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689045#comment-17689045
]
ASF GitHub Bot commented on PARQUET-2244:
-
zhongyujiang commented on PR #1028:
zhongyujiang commented on PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431180145
@gszadovszky Thanks for reviewing and the quick merge!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Szadovszky reassigned PARQUET-2244:
-
Assignee: Yujiang Zhong
> Dictionary filter may skip row-groups incorrectly wh
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Szadovszky resolved PARQUET-2244.
---
Resolution: Fixed
> Dictionary filter may skip row-groups incorrectly when evaluati
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689035#comment-17689035
]
ASF GitHub Bot commented on PARQUET-2244:
-
gszadovszky merged PR #1028:
URL: ht
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689034#comment-17689034
]
ASF GitHub Bot commented on PARQUET-2244:
-
gszadovszky commented on PR #1028:
U
gszadovszky merged PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@parquet
gszadovszky commented on PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431153336
@shangxinli, it might require a backport and releases on the branches `In`
and `NotIn` were released.
--
This is an automated message from the Apache Git Service.
To respond to t
[
https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689011#comment-17689011
]
ASF GitHub Bot commented on PARQUET-2228:
-
gszadovszky commented on PR #1026:
U
gszadovszky commented on PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431102177
@wgtmac, by supporting multiple files to rewrite them into one we will end
up with the same number of row-groups, right? Therefore, this tool is not ment
to be used to solve the "s
[
https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689007#comment-17689007
]
ASF GitHub Bot commented on PARQUET-2228:
-
gszadovszky commented on code in PR
gszadovszky commented on code in PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1106931407
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java:
##
@@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader re
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689002#comment-17689002
]
ASF GitHub Bot commented on PARQUET-2244:
-
zhongyujiang commented on PR #1028:
zhongyujiang commented on PR #1028:
URL: https://github.com/apache/parquet-mr/pull/1028#issuecomment-1431067528
@huaxingao @gszadovszky Can you help review this? Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688999#comment-17688999
]
ASF GitHub Bot commented on PARQUET-2244:
-
zhongyujiang opened a new pull reque
zhongyujiang opened a new pull request, #1028:
URL: https://github.com/apache/parquet-mr/pull/1028
Make sure you have checked _all_ steps below.
### Jira
- [ ] My PR addresses the following [Parquet
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references
th
[
https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yujiang Zhong updated PARQUET-2244:
---
Description:
Dictionary filter may skip row-groups incorrectly when evaluating `notIn` on
Yujiang Zhong created PARQUET-2244:
--
Summary: Dictionary filter may skip row-groups incorrectly when
evaluating notIn
Key: PARQUET-2244
URL: https://issues.apache.org/jira/browse/PARQUET-2244
Project
[
https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688932#comment-17688932
]
ASF GitHub Bot commented on PARQUET-2243:
-
gszadovszky opened a new pull reques
gszadovszky opened a new pull request, #1027:
URL: https://github.com/apache/parquet-mr/pull/1027
Make sure you have checked _all_ steps below.
### Jira
- [x] My PR addresses the following [Parquet
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references
the
58 matches
Mail list logo