[jira] [Updated] (SPARK-27403) Fix `updateTableStats` to update table stats always with new stats or None

Dongjoon Hyun (JIRA) Wed, 17 Apr 2019 09:27:09 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun updated SPARK-27403:
----------------------------------
    Summary: Fix `updateTableStats` to update table stats always with new stats 
or None  (was: Failed to update the table size automatically even though 
spark.sql.statistics.size.autoUpdate.enabled is set as rue)

> Fix `updateTableStats` to update table stats always with new stats or None
> --------------------------------------------------------------------------
>
>                 Key: SPARK-27403
>                 URL: https://issues.apache.org/jira/browse/SPARK-27403
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>            Reporter: Sujith Chacko
>            Assignee: Sujith Chacko
>            Priority: Major
>             Fix For: 2.4.2, 3.0.0
>
>
> system shall update the table stats automatically if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is enabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")
> scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +-------------------------------+-----------------------------------------------------------++-------
> |col_name|data_type|comment|
> +-------------------------------+-----------------------------------------------------------++-------
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +-------------------------------+-----------------------------------------------------------++-------



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27403) Fix `updateTableStats` to update table stats always with new stats or None

Reply via email to