[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324404#comment-17324404
]
ASF GitHub Bot commented on PARQUET-41:
---
jbapple commented on pull request #757:
URL:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324126#comment-17324126
]
ASF GitHub Bot commented on PARQUET-41:
---
shannonwells edited a comment on pull request #757:
URL:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324125#comment-17324125
]
ASF GitHub Bot commented on PARQUET-41:
---
shannonwells commented on pull request #757:
URL:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276862#comment-17276862
]
Nicholas Chammas commented on PARQUET-41:
-
Thanks for the link [~yumwang]. That
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276854#comment-17276854
]
Yuming Wang commented on PARQUET-41:
[~nchammas] You can check the related configuration parameters:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276842#comment-17276842
]
Nicholas Chammas commented on PARQUET-41:
-
Where is the user documentation for all the bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061558#comment-17061558
]
Gabor Szadovszky commented on PARQUET-41:
-
[~junma], the target release for this feature is
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060490#comment-17060490
]
Jun Ma commented on PARQUET-41:
---
[~gszadovszky]/[~junjie], what's the release timeline for this feature?
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045530#comment-17045530
]
Gabor Szadovszky commented on PARQUET-41:
-
[~junjie], feature branch for parquet-mr has been
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045527#comment-17045527
]
ASF GitHub Bot commented on PARQUET-41:
---
gszadovszky commented on pull request #757: PARQUET-41:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647258#comment-16647258
]
ASF GitHub Bot commented on PARQUET-41:
---
majetideepak closed pull request #113: PARQUET-41: Add
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647248#comment-16647248
]
ASF GitHub Bot commented on PARQUET-41:
---
majetideepak opened a new pull request #113: PARQUET-41:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647243#comment-16647243
]
ASF GitHub Bot commented on PARQUET-41:
---
majetideepak closed pull request #112: PARQUET-41: Add
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636560#comment-16636560
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust closed pull request #99: PARQUET-41: add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636508#comment-16636508
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust opened a new pull request #112: PARQUET-41: Add
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627622#comment-16627622
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust closed pull request #62: PARQUET-41: Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550201#comment-16550201
]
Junjie Chen commented on PARQUET-41:
[~aniket486], Thanks for watching this.
Yes, I 'm still
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550166#comment-16550166
]
Aniket Mokashi commented on PARQUET-41:
---
[~junjie] - thanks for driving this project! Are you
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517638#comment-16517638
]
Junjie Chen commented on PARQUET-41:
[~jbapple], I just created a new parquet-format PR since
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517637#comment-16517637
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust opened a new pull request #99: PARQUET-41: add
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517298#comment-16517298
]
Jim Apple commented on PARQUET-41:
--
Is there and updated PR for parquet-format that matches the open PRs
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513908#comment-16513908
]
Junjie Chen commented on PARQUET-41:
Thanks [~jbapple]
Since the jira may contains several
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509737#comment-16509737
]
Junjie Chen commented on PARQUET-41:
Hi
Here is benchmark link:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486936#comment-16486936
]
Junjie Chen commented on PARQUET-41:
[~jbapple], I understood your point, I will do benchmark to
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486884#comment-16486884
]
Jim Apple commented on PARQUET-41:
--
In response to [~junjie]'s question above, "Sure, it is feasible,
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486862#comment-16486862
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust commented on issue #432: PARQUET-41: Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486844#comment-16486844
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust commented on issue #425: PARQUET-41:Add Bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486836#comment-16486836
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust closed pull request #484: PARQUET-41: rebase to
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486833#comment-16486833
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust opened a new pull request #484: PARQUET-41: rebase
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483791#comment-16483791
]
Junping Du commented on PARQUET-41:
---
Thanks [~Ferd] for quick update. The plan sounds good.
> Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481309#comment-16481309
]
Ferdinand Xu commented on PARQUET-41:
-
[~djp] I didn't have circles to move it forwards recently.
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480460#comment-16480460
]
Junping Du commented on PARQUET-41:
---
This is a very critical feature from performance perspective.
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462365#comment-16462365
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust commented on issue #425: PARQUET-41:Add Bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462359#comment-16462359
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust commented on issue #432: PARQUET-41: Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462316#comment-16462316
]
ASF GitHub Bot commented on PARQUET-41:
---
BenoitHanotte commented on issue #425: PARQUET-41:Add Bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462312#comment-16462312
]
ASF GitHub Bot commented on PARQUET-41:
---
BenoitHanotte commented on issue #425: PARQUET-41:Add Bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343563#comment-16343563
]
ASF GitHub Bot commented on PARQUET-41:
---
daedric commented on issue #432: PARQUET-41: Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343497#comment-16343497
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust commented on issue #432: PARQUET-41: Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343376#comment-16343376
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust commented on a change in pull request #432:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343306#comment-16343306
]
ASF GitHub Bot commented on PARQUET-41:
---
daedric commented on a change in pull request #432:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343305#comment-16343305
]
ASF GitHub Bot commented on PARQUET-41:
---
daedric commented on a change in pull request #432:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338722#comment-16338722
]
Junjie Chen commented on PARQUET-41:
Sure, it is feasible, then we are comparing bloom filter vs
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338709#comment-16338709
]
Jim Apple commented on PARQUET-41:
--
Why not tweak that logic in parquet-mr to allow dictionary encoding
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338706#comment-16338706
]
Junjie Chen commented on PARQUET-41:
In Parquet-mr, when we set dictionary encoding to true, the
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338680#comment-16338680
]
Jim Apple commented on PARQUET-41:
--
Could you elaborate on "A column with large cardinality can not even
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338670#comment-16338670
]
Junjie Chen commented on PARQUET-41:
Hi [~jbapple], AFAIK, we don't have benchmark progress to compare
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338631#comment-16338631
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust commented on a change in pull request #432:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338244#comment-16338244
]
Jim Apple commented on PARQUET-41:
--
IIRC, there was a plan to create an end-to-end benchmark of an MR or
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331898#comment-16331898
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust opened a new pull request #432: PARQUET-41: Add
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331602#comment-16331602
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust closed pull request #431: PARQUET-41: Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331598#comment-16331598
]
ASF GitHub Bot commented on PARQUET-41:
---
cjjnjust opened a new pull request #431: PARQUET-41: Add
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143463#comment-16143463
]
Junjie Chen commented on PARQUET-41:
please see initial PR:
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143288#comment-16143288
]
Junjie Chen commented on PARQUET-41:
Add related [design
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095685#comment-16095685
]
Jim Apple commented on PARQUET-41:
--
In response to your request for a benchmark, see
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095614#comment-16095614
]
Ferdinand Xu commented on PARQUET-41:
-
Thanks [~jbapple] for the suggestions.
{noformat}
As a result,
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094006#comment-16094006
]
Junjie Chen commented on PARQUET-41:
Thanks Jim
Very useful links and example code!
> Add bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093432#comment-16093432
]
Jim Apple commented on PARQUET-41:
--
We might want to consider "Cache-, Hash- and Space-Efficient Bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019813#comment-16019813
]
Ryan Blue commented on PARQUET-41:
--
Yeah, I'll help review it.
> Add bloom filters to parquet statistics
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019812#comment-16019812
]
Ferdinand Xu commented on PARQUET-41:
-
The pull request for PARQUET-319 is out of date which requires
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019802#comment-16019802
]
Ryan Blue commented on PARQUET-41:
--
[~Ferd], it shouldn't matter that the bloom filter is stored at the
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019792#comment-16019792
]
Ferdinand Xu commented on PARQUET-41:
-
Thanks [~rdblue] for your comments.
bq. For example, if you
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018218#comment-16018218
]
Junjie Chen commented on PARQUET-41:
Hi [~rdblue]
In telecom example, query column is not unique if
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016790#comment-16016790
]
Junjie Chen commented on PARQUET-41:
Hi [~rdblue]
The distinct values in each column is increasing
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016458#comment-16016458
]
Ryan Blue commented on PARQUET-41:
--
[~junjie] & [~Ferd], it would be great to get a bit more data on
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015449#comment-16015449
]
Junjie Chen commented on PARQUET-41:
Hi [~rdblue]
We have a real use case from a Telecom company which
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007446#comment-16007446
]
Ferdinand Xu commented on PARQUET-41:
-
bq. This is only applicable for columns that aren't
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005755#comment-16005755
]
Ferdinand Xu commented on PARQUET-41:
-
It's very useful when trying to filter non-partitioning column.
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005480#comment-16005480
]
Ryan Blue commented on PARQUET-41:
--
[~costimuraru], dictionary-based filters were added that satisfy much
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005452#comment-16005452
]
Constantin Muraru commented on PARQUET-41:
--
Any news on this one? This would be great.
> Add
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148130#comment-15148130
]
Ferdinand Xu commented on PARQUET-41:
-
Hi [~rdblue],
I have a basic idea about how to estimate the
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009099#comment-15009099
]
Ryan Blue commented on PARQUET-41:
--
[~Ferd], I think we need a design doc for this feature and some data
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616251#comment-14616251
]
Ferdinand Xu commented on PARQUET-41:
-
Hi [~rdblue], I have some thoughts for the
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605183#comment-14605183
]
Ferdinand Xu commented on PARQUET-41:
-
Hi [~rdblue], really appreciate for your long
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605258#comment-14605258
]
Ferdinand Xu commented on PARQUET-41:
-
I did a check for some entries in the 1st page
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606674#comment-14606674
]
Ryan Blue commented on PARQUET-41:
--
I should also point out there's a table on the first
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602532#comment-14602532
]
Ryan Blue commented on PARQUET-41:
--
Thanks for working on this, [~Ferd], it's great to be
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599093#comment-14599093
]
Ferdinand Xu commented on PARQUET-41:
-
Hi,
I have updated the PR for multiple bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600367#comment-14600367
]
Ryan Blue commented on PARQUET-41:
--
I don't think the counting bloom filter idea is worth
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597767#comment-14597767
]
Prateek Rungta commented on PARQUET-41:
---
Hey [~Ferd],
I did a quick glance through
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597859#comment-14597859
]
Jason Altekruse commented on PARQUET-41:
I did not get a chance to look through
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597906#comment-14597906
]
Ryan Blue commented on PARQUET-41:
--
Interesting, I hadn't heard about the counting bloom
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597901#comment-14597901
]
Jason Altekruse commented on PARQUET-41:
This might have been a little confusing,
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598800#comment-14598800
]
Ferdinand Xu commented on PARQUET-41:
-
Hi [~nezihyigitbasi] [~jaltekruse], I don’t
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590073#comment-14590073
]
Ryan Blue commented on PARQUET-41:
--
Great, thanks [~Ferd]! Could you also tell us a bit
[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589347#comment-14589347
]
Ferdinand Xu commented on PARQUET-41:
-
Hi guys,
The pull request for parquet-format-mr
85 matches
Mail list logo