Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
OK I'll submit a pr to fix this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16401
hm concrete suggestion:
1. pass the conf and cache the computed statistics the first time
2. have a simple invalidateStatsCache method that can be called manually to
invalidate.
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
So you mean we still pass the conf and do caching, and also add the
invalidation method, although the cache invalidation logic has some problem in
current stage?
---
If your project is set up for
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16401
But you can't do a lazy val unless you have the conf passed in, since you
will be adding knobs to the CBO stats soon wouldn't you?
Also there is really nothing basic vs CBO here. Both are
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
If our goal is to fix two sets of stats issue first, is it ok to just use
two `lazy val`s now, and leave the configuration and cache invalidation issues
in the future? In current implementation, we
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16401
So to be clear, my biggest issue with the current code is that it is
confusing to track two sets of stats. The invalidation problem already exists
in the current code base. I'm OK for the invalidation
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
Yes that's a problem... Then how about still using your code and adding a
method `refreshStats()` in `SessionCatalog` which calls invalidateStatsCache()
for every cached table, and we call
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16401
When would you set statsConfChanged to false?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
With the above change, in my understanding, we will call
`plan.invalidateStatsCache` when the conf is changed? Otherwise the stats cache
is invalidated only when it's None.
How about adding a
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16401
How about the following?
```
def stats(conf: CatalystConf): Statistics = statsCache.getOrElse {
invalidateStatsCache()
statsCache = Some(computeStats(conf))
statsCache.get
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
do you mean no cache for `def statistics(conf)`:
```
def statistics(conf: CatalystConf): Statistics = {
if(conf.cboEnabled) newStats else oldStats
}
lazy val newStats...
lazy
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16401
@wzhfy I thought about this more. Why don't we just get rid of the existing
"def statistics", and keep only one function
```
def statistics(conf: CatalystConf): Statistics
```
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16401
thanks, merging to master!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
@cloud-fan In the current stage, we have Filter, Agg, Join, Project, etc.
Although there are only four plans, the `if` code is still repeated. Moreover,
in the future, when we have other kind of
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16401
> Then we need to modify all the existing implementation of statistics and
do if(cboEnabled) test in each of them. That would be tedious.
hm? I think we only need to do `if(cboEnabled)`
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16401
LGTM, except one comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70780/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70780 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70780/testReport)**
for PR 16401 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70780 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70780/testReport)**
for PR 16401 at commit
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16401
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70774/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70774 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70774/testReport)**
for PR 16401 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70774 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70774/testReport)**
for PR 16401 at commit
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
> What if we just add the conf parameter to the statistics method and give
it a default value? e.g. def statistics(conf: CatalystConf = SimpleCatalystConf)
@cloud-fan Then we need to modify
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16401
Just one minor question about the config. other LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16401
LGTM. What if we just add the conf parameter to the `statistics` method and
give it a default value? e.g. `def statistics(conf: CatalystConf =
SimpleCatalystConf)`. How much mode do we need to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70591/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70591 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70591/testReport)**
for PR 16401 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70586/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70586 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70586/testReport)**
for PR 16401 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70591 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70591/testReport)**
for PR 16401 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70586 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70586/testReport)**
for PR 16401 at commit
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/16401
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16401
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70580/
Test FAILed.
---
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/16401
cc @rxin @cloud-fan @viirya
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16401
**[Test build #70580 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70580/testReport)**
for PR 16401 at commit
41 matches
Mail list logo