[ https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-27403: ---------------------------------- Summary: Fix `updateTableStats` to update table stats always with new stats or None (was: Failed to update the table size automatically even though spark.sql.statistics.size.autoUpdate.enabled is set as rue) > Fix `updateTableStats` to update table stats always with new stats or None > -------------------------------------------------------------------------- > > Key: SPARK-27403 > URL: https://issues.apache.org/jira/browse/SPARK-27403 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 > Reporter: Sujith Chacko > Assignee: Sujith Chacko > Priority: Major > Fix For: 2.4.2, 3.0.0 > > > system shall update the table stats automatically if user set > spark.sql.statistics.size.autoUpdate.enabled as true, currently this property > is not having any significance even if it is enabled or disabled. This > feature is similar to Hives auto-gather feature where statistics are > automatically computed by default if this feature is enabled. > Reference: > [https://cwiki.apache.org/confluence/display/Hive/StatsDev] > Reproducing steps: > scala> spark.sql("create table table1 (name string,age int) stored as > parquet") > scala> spark.sql("insert into table1 select 'a',29") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("desc extended table1").show(false) > > +-------------------------------+-----------------------------------------------------------++------- > |col_name|data_type|comment| > +-------------------------------+-----------------------------------------------------------++------- > |name|string|null| > |age|int|null| > | | | | > | # Detailed Table Information| | | > |Database|default| | > |Table|table1| | > |Owner|Administrator| | > |Created Time|Sun Apr 07 23:41:56 IST 2019| | > |Last Access|Thu Jan 01 05:30:00 IST 1970| | > |Created By|Spark 2.4.1| | > |Type|MANAGED| | > |Provider|hive| | > |Table Properties|[transient_lastDdlTime=1554660716]| | > |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| | > |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| | > |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| | > |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| > | > |Storage Properties|[serialization.format=1]| | > |Partition Provider|Catalog| | > +-------------------------------+-----------------------------------------------------------++------- -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org