[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2017-02-03 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851965#comment-15851965 ] Cheng Lian commented on SPARK-18539: SPARK-19409 upgrades parquet-mr to 1.8.2 and fixed this issue.

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2017-01-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840857#comment-15840857 ] Liang-Chi Hsieh commented on SPARK-18539: - [~lian cheng] Yea, I see. The term {{optional}} is

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2017-01-26 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840186#comment-15840186 ] Cheng Lian commented on SPARK-18539: [~viirya], sorry for the (super) late reply. What I mentioned

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727542#comment-15727542 ] Liang-Chi Hsieh commented on SPARK-18539: - [~lian cheng], in Parquet's code, looks like a null

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724110#comment-15724110 ] Liang-Chi Hsieh commented on SPARK-18539: - That's cool. > Cannot filter by nonexisting column in

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724013#comment-15724013 ] Cheng Lian commented on SPARK-18539: [~xwu0226], thanks for the new use case! [~viirya], I do think

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723960#comment-15723960 ] Liang-Chi Hsieh commented on SPARK-18539: - Actually I am not sure if this is a valid usage. I

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723949#comment-15723949 ] Liang-Chi Hsieh commented on SPARK-18539: - Because we respect user-specified schema, we won't

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Xin Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723888#comment-15723888 ] Xin Wu commented on SPARK-18539: I think we will hit the issue if we use user-specified schema. Here is

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723781#comment-15723781 ] Cheng Lian commented on SPARK-18539: Please remind me if I missed anything important, otherwise, we

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723747#comment-15723747 ] Cheng Lian commented on SPARK-18539: [~v-gerasimov], [~smilegator], and [~xwu0226], after some

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723718#comment-15723718 ] Cheng Lian commented on SPARK-18539: As commented on GitHub, there're two issues right now: # This

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723473#comment-15723473 ] Apache Spark commented on SPARK-18539: -- User 'xwu0226' has created a pull request for this issue:

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Xin Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723296#comment-15723296 ] Xin Wu commented on SPARK-18539: Yes. I have the fix and will submit PR and cc everyone for review. >

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723259#comment-15723259 ] Xiao Li commented on SPARK-18539: - [~lian cheng] [~rxin] We might be able to capture and process the

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-05 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722891#comment-15722891 ] Cheng Lian commented on SPARK-18539: Haven't looked deeply into this issue, but my hunch is that this

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721507#comment-15721507 ] Xiao Li commented on SPARK-18539: - Below is the test case you can try. {noformat}

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721499#comment-15721499 ] Xiao Li commented on SPARK-18539: - The error is from Parquet. {noformat} 16/11/22 17:43:47 ERROR

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721488#comment-15721488 ] Reynold Xin commented on SPARK-18539: - Why don't we fix the parquet reader so it can tolerate

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721481#comment-15721481 ] Xiao Li commented on SPARK-18539: - The default of `spark.sql.parquet.mergeSchema` is false. To figure out

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721430#comment-15721430 ] Vitaly Gerasimov commented on SPARK-18539: -- I think this is another reason why we should make

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721423#comment-15721423 ] Vitaly Gerasimov commented on SPARK-18539: -- If we can neglect the performance when we use schema

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721414#comment-15721414 ] Xiao Li commented on SPARK-18539: - FYI, I checked the other formats, CSV and JSON work as expected. No

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721392#comment-15721392 ] Xiao Li commented on SPARK-18539: - Yeah. It is very slow when you have many many small parquet files. >

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721385#comment-15721385 ] Vitaly Gerasimov commented on SPARK-18539: -- Hmm.. How it works when we use schema merging? Does

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721376#comment-15721376 ] Xiao Li commented on SPARK-18539: - Basically, we have to know whether a column exists or not before

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721372#comment-15721372 ] Vitaly Gerasimov commented on SPARK-18539: -- If I turn off `spark.sql.parquet.filterPushdown` it

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721316#comment-15721316 ] Xiao Li commented on SPARK-18539: - Could you turn off `spark.sql.parquet.filterPushdown`? Is that

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721310#comment-15721310 ] Dongjoon Hyun commented on SPARK-18539: --- The use case makes sense. I see now! > Cannot filter by

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721300#comment-15721300 ] Vitaly Gerasimov commented on SPARK-18539: -- Thank you for your reply. I think we need to make

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721065#comment-15721065 ] Dongjoon Hyun commented on SPARK-18539: --- Thank you so much! > Cannot filter by nonexisting column

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720964#comment-15720964 ] Xiao Li commented on SPARK-18539: - The parquet filter push-down of Spark 2.x is different from the Spark

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-12-04 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720726#comment-15720726 ] Dongjoon Hyun commented on SPARK-18539: --- It looks like the predicates are pushed down to Parquet

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-11-27 Thread Vitaly Gerasimov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15699367#comment-15699367 ] Vitaly Gerasimov commented on SPARK-18539: -- [~dongjoon] I don't know. However, it works fine in

[jira] [Commented] (SPARK-18539) Cannot filter by nonexisting column in parquet file

2016-11-26 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15697793#comment-15697793 ] Dongjoon Hyun commented on SPARK-18539: --- Interesting. Is it valid? {code} sc.read