[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-28 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18105 > I also think we should respect Spark-generated statistics over Hive's when it is available. @gatorsmile OK. Then it's consistent with the current implementation. I'll change the

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18105 Now, we have two sources of statistics. We need a mechanism to decide which one should be chosen. We might need to update the code comments at least to document the behaviors we choose. ---

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18105 I don't think the analyze table command is bound with CBO, neither. I just want to change how we read stats from metastore. That is, which side (spark or hive) of stats we respect based on cbo

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18105 @cloud-fan I mean the behavior when cbo is disabled should be the same as the behavior previously without cbo. Previously, size is read from "totalSize", and it changes after update. Now, when

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18105 If users have not analyzed the table in Spark yet, we should respect the stats from hive metastore. But if users have already run the analyze table command in Spark, I think it's fair to ask them

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18105 I think we'd better respect the "totalSize" stats when cbo is disabled, otherwise user has no way to the default behavior unless he re-runs the analyze command. I personally think that's unfriendly

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18105 @cloud-fan > What was the behavior before? Previously, analyze table command only updates the size of table, and it uses the same hive stats name "totalSize", and stores it in metastore

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18105 @cloud-fan > What was the behavior before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18105 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77365/ Test PASSed. ---

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18105 **[Test build #77365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77365/testReport)** for PR 18105 at commit

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18105 I think we should always trust Spark's table stats over Hive's, no matter CBO is on or not. If users update the stats at hive side, it's their own responsibility to update it at Spark side.

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18105 **[Test build #77365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77365/testReport)** for PR 18105 at commit

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18105 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77359/ Test FAILed. ---

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18105 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18105 **[Test build #77359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77359/testReport)** for PR 18105 at commit

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/18105 cc @cloud-fan @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18105 **[Test build #77359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77359/testReport)** for PR 18105 at commit