[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-05-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21143 @cloud-fan, here's a commit that demonstrates the idea and implementation: https://github.com/rdblue/spark/commit/b41eb1ef2af38c510f5426a096c586a93e4a5556 That adds `residualFilters` to the `

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-05-08 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21143 I think we would only need `DataSourceReader` to implement `SupportsPushDownFilter` because it is primarily used to push filters to the data source and the query's filters are determined while planni

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-05-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21143 If we don't care about whole stage codegen, I think it's possible to dynamically change the filter condition at executors and codegen it. Now the problem becomes, how would the data source

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-05-02 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21143 @cloud-fan, that's kind of the point I was trying to make. It is too difficult to do whole-stage codegen, but we could add a codegen filter before whole-stage codegen. Why make the implementation han

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-05-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21143 Hi @rdblue , you can also create a codegen-ed filter in your data source, but the real problem is whole-stage-codegen. The code is generated at the driver side, sent to and compiled at the executo

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-05-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21143 @cloud-fan, union doesn't really help. I already have support for mixed-formats working just fine. The format isn't the problem, it is filtering (and a similar problem with projection). Parquet allow

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-30 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21143 LGTM. Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89919/ Test PASSed. ---

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89919/testReport)** for PR 21143 at commit [`172dca0`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21143 @rdblue After some more thoughts, I don't think Spark can support per-split physical plan changing in the near future. My best suggestion for mixed format data source is to have different implemen

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2713/ Tes

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89919/testReport)** for PR 21143 at commit [`172dca0`](https://github.com/apache/spark/commit/17

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21143 LGTM except above comments. This PR does not change anything else but just clean the implementation of existing Data Source V2 API --- --

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21143 This is a very good point! Unfortunately Spark SQL doesn't support change the physical plan at a per-split fashion, and I'd say this feature is non-trivial to implement, and needs a design doc.

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21143 Thanks for working on this, @cloud-fan! I was thinking about needing it just recently so that data sources can delegate to Spark when needed. I'll have a thorough look at it tomorrow, but one

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89826/ Test PASSed. ---

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89826/testReport)** for PR 21143 at commit [`85a3ac3`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2661/ Tes

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2659/ Tes

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89826/testReport)** for PR 21143 at commit [`85a3ac3`](https://github.com/apache/spark/commit/85

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21143 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: rev

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21143 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89822/testReport)** for PR 21143 at commit [`85a3ac3`](https://github.com/apache/spark/commit/8

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89822/ Test FAILed. ---

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2655/ Tes

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89822/testReport)** for PR 21143 at commit [`85a3ac3`](https://github.com/apache/spark/commit/85

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2654/ Tes

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89821/testReport)** for PR 21143 at commit [`615cdfa`](https://github.com/apache/spark/commit/6

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89821/ Test FAILed. ---

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89821/testReport)** for PR 21143 at commit [`615cdfa`](https://github.com/apache/spark/commit/61

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89817/ Test FAILed. ---

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89817/testReport)** for PR 21143 at commit [`5c5e4ea`](https://github.com/apache/spark/commit/5

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2651/ Tes

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89817/testReport)** for PR 21143 at commit [`5c5e4ea`](https://github.com/apache/spark/commit/5c

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89795/ Test PASSed. ---

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89795/testReport)** for PR 21143 at commit [`abc6426`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89797/ Test FAILed. ---

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89797/testReport)** for PR 21143 at commit [`1fdc9ae`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2638/ Tes

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21143 **[Test build #89797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89797/testReport)** for PR 21143 at commit [`1fdc9ae`](https://github.com/apache/spark/commit/1f