[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851965#comment-15851965
]
Cheng Lian commented on SPARK-18539:
SPARK-19409 upgrades parquet-mr to 1.8.2 and fixed this issue.
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840857#comment-15840857
]
Liang-Chi Hsieh commented on SPARK-18539:
-
[~lian cheng] Yea, I see. The term {{optional}} is
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840186#comment-15840186
]
Cheng Lian commented on SPARK-18539:
[~viirya], sorry for the (super) late reply. What I mentioned
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727542#comment-15727542
]
Liang-Chi Hsieh commented on SPARK-18539:
-
[~lian cheng], in Parquet's code, looks like a null
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724110#comment-15724110
]
Liang-Chi Hsieh commented on SPARK-18539:
-
That's cool.
> Cannot filter by nonexisting column in
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724013#comment-15724013
]
Cheng Lian commented on SPARK-18539:
[~xwu0226], thanks for the new use case!
[~viirya], I do think
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723960#comment-15723960
]
Liang-Chi Hsieh commented on SPARK-18539:
-
Actually I am not sure if this is a valid usage. I
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723949#comment-15723949
]
Liang-Chi Hsieh commented on SPARK-18539:
-
Because we respect user-specified schema, we won't
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723888#comment-15723888
]
Xin Wu commented on SPARK-18539:
I think we will hit the issue if we use user-specified schema. Here is
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723781#comment-15723781
]
Cheng Lian commented on SPARK-18539:
Please remind me if I missed anything important, otherwise, we
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723747#comment-15723747
]
Cheng Lian commented on SPARK-18539:
[~v-gerasimov], [~smilegator], and [~xwu0226], after some
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723718#comment-15723718
]
Cheng Lian commented on SPARK-18539:
As commented on GitHub, there're two issues right now:
# This
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723473#comment-15723473
]
Apache Spark commented on SPARK-18539:
--
User 'xwu0226' has created a pull request for this issue:
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723296#comment-15723296
]
Xin Wu commented on SPARK-18539:
Yes. I have the fix and will submit PR and cc everyone for review.
>
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723259#comment-15723259
]
Xiao Li commented on SPARK-18539:
-
[~lian cheng] [~rxin] We might be able to capture and process the
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722891#comment-15722891
]
Cheng Lian commented on SPARK-18539:
Haven't looked deeply into this issue, but my hunch is that this
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721507#comment-15721507
]
Xiao Li commented on SPARK-18539:
-
Below is the test case you can try.
{noformat}
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721499#comment-15721499
]
Xiao Li commented on SPARK-18539:
-
The error is from Parquet.
{noformat}
16/11/22 17:43:47 ERROR
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721488#comment-15721488
]
Reynold Xin commented on SPARK-18539:
-
Why don't we fix the parquet reader so it can tolerate
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721481#comment-15721481
]
Xiao Li commented on SPARK-18539:
-
The default of `spark.sql.parquet.mergeSchema` is false. To figure out
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721430#comment-15721430
]
Vitaly Gerasimov commented on SPARK-18539:
--
I think this is another reason why we should make
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721423#comment-15721423
]
Vitaly Gerasimov commented on SPARK-18539:
--
If we can neglect the performance when we use schema
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721414#comment-15721414
]
Xiao Li commented on SPARK-18539:
-
FYI, I checked the other formats, CSV and JSON work as expected. No
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721392#comment-15721392
]
Xiao Li commented on SPARK-18539:
-
Yeah. It is very slow when you have many many small parquet files.
>
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721385#comment-15721385
]
Vitaly Gerasimov commented on SPARK-18539:
--
Hmm.. How it works when we use schema merging? Does
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721376#comment-15721376
]
Xiao Li commented on SPARK-18539:
-
Basically, we have to know whether a column exists or not before
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721372#comment-15721372
]
Vitaly Gerasimov commented on SPARK-18539:
--
If I turn off `spark.sql.parquet.filterPushdown` it
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721316#comment-15721316
]
Xiao Li commented on SPARK-18539:
-
Could you turn off `spark.sql.parquet.filterPushdown`? Is that
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721310#comment-15721310
]
Dongjoon Hyun commented on SPARK-18539:
---
The use case makes sense. I see now!
> Cannot filter by
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721300#comment-15721300
]
Vitaly Gerasimov commented on SPARK-18539:
--
Thank you for your reply. I think we need to make
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721065#comment-15721065
]
Dongjoon Hyun commented on SPARK-18539:
---
Thank you so much!
> Cannot filter by nonexisting column
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720964#comment-15720964
]
Xiao Li commented on SPARK-18539:
-
The parquet filter push-down of Spark 2.x is different from the Spark
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720726#comment-15720726
]
Dongjoon Hyun commented on SPARK-18539:
---
It looks like the predicates are pushed down to Parquet
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15699367#comment-15699367
]
Vitaly Gerasimov commented on SPARK-18539:
--
[~dongjoon] I don't know. However, it works fine in
[
https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15697793#comment-15697793
]
Dongjoon Hyun commented on SPARK-18539:
---
Interesting. Is it valid?
{code}
sc.read
35 matches
Mail list logo