[ https://issues.apache.org/jira/browse/SPARK-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhenhua Wang updated SPARK-20881: --------------------------------- Description: Currently statistics are generated by "analyze command" in Spark. However, when user updates the table and collects stats in Hive, "totalSize"/"numRows" will be updated in metastore. Now, in spark side, table stats become stale. If cbo is enabled, this is ok because we suppose user will handle this and re-run the command to update stats. If cbo is disabled, then we should fallback to original way and respect hive's stats. But in current implementation, spark's stats always override hive's stats, no matter cbo is enabled or disabled. The right thing to do is to use (don't override) hive's stats when cbo is disabled. was: Spark's statistics are generated by "analyze command". However, when user updates the table and collects stats in Hive, "totalSize"/"numRows" will be updated in metastore. Now, in spark side, table stats are stale even if we turn off cbo, because in current implementation, spark's stats always override hive's stats, no matter cbo is enabled or disabled. The right thing to do is to use hive's stats in this case. > Use Hive's stats in metastore when cbo is disabled > -------------------------------------------------- > > Key: SPARK-20881 > URL: https://issues.apache.org/jira/browse/SPARK-20881 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.2.0 > Reporter: Zhenhua Wang > > Currently statistics are generated by "analyze command" in Spark. > However, when user updates the table and collects stats in Hive, > "totalSize"/"numRows" will be updated in metastore. > Now, in spark side, table stats become stale. > If cbo is enabled, this is ok because we suppose user will handle this and > re-run the command to update stats. > If cbo is disabled, then we should fallback to original way and respect > hive's stats. But in current implementation, spark's stats always override > hive's stats, no matter cbo is enabled or disabled. > The right thing to do is to use (don't override) hive's stats when cbo is > disabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org