[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-08 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 OK I'll submit a pr to fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16401 hm concrete suggestion: 1. pass the conf and cache the computed statistics the first time 2. have a simple invalidateStatsCache method that can be called manually to invalidate.

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-08 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 So you mean we still pass the conf and do caching, and also add the invalidation method, although the cache invalidation logic has some problem in current stage? --- If your project is set up for

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16401 But you can't do a lazy val unless you have the conf passed in, since you will be adding knobs to the CBO stats soon wouldn't you? Also there is really nothing basic vs CBO here. Both are

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-08 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 If our goal is to fix two sets of stats issue first, is it ok to just use two `lazy val`s now, and leave the configuration and cache invalidation issues in the future? In current implementation, we

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16401 So to be clear, my biggest issue with the current code is that it is confusing to track two sets of stats. The invalidation problem already exists in the current code base. I'm OK for the invalidation

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-08 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 Yes that's a problem... Then how about still using your code and adding a method `refreshStats()` in `SessionCatalog` which calls invalidateStatsCache() for every cached table, and we call

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16401 When would you set statsConfChanged to false? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-06 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 With the above change, in my understanding, we will call `plan.invalidateStatsCache` when the conf is changed? Otherwise the stats cache is invalidated only when it's None. How about adding a

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16401 How about the following? ``` def stats(conf: CatalystConf): Statistics = statsCache.getOrElse { invalidateStatsCache() statsCache = Some(computeStats(conf)) statsCache.get

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-05 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 do you mean no cache for `def statistics(conf)`: ``` def statistics(conf: CatalystConf): Statistics = { if(conf.cboEnabled) newStats else oldStats } lazy val newStats... lazy

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16401 @wzhfy I thought about this more. Why don't we just get rid of the existing "def statistics", and keep only one function ``` def statistics(conf: CatalystConf): Statistics ```

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16401 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-02 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 @cloud-fan In the current stage, we have Filter, Agg, Join, Project, etc. Although there are only four plans, the `if` code is still repeated. Moreover, in the future, when we have other kind of

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16401 > Then we need to modify all the existing implementation of statistics and do if(cboEnabled) test in each of them. That would be tedious. hm? I think we only need to do `if(cboEnabled)`

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-02 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16401 LGTM, except one comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70780/ Test PASSed. ---

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70780/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70780/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16401 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70774/ Test FAILed. ---

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70774/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70774/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2017-01-01 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 > What if we just add the conf parameter to the statistics method and give it a default value? e.g. def statistics(conf: CatalystConf = SimpleCatalystConf) @cloud-fan Then we need to modify

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-30 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16401 Just one minor question about the config. other LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-30 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16401 LGTM. What if we just add the conf parameter to the `statistics` method and give it a default value? e.g. `def statistics(conf: CatalystConf = SimpleCatalystConf)`. How much mode do we need to

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70591/ Test PASSed. ---

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70591/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70586/ Test PASSed. ---

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70586/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70591/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70586/testReport)** for PR 16401 at commit

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16401 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16401 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70580/ Test FAILed. ---

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 cc @rxin @cloud-fan @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70580/testReport)** for PR 16401 at commit