liyunzhang_intel created HIVE-17182: ---------------------------------------
Summary: Invalid statistics like "RAW DATA SIZE" info for parquet file Key: HIVE-17182 URL: https://issues.apache.org/jira/browse/HIVE-17182 Project: Hive Issue Type: Bug Reporter: liyunzhang_intel on TPC-DS 200g scale store_sales use "describe formatted store_sales" to view the statistics {code} hive> describe formatted store_sales; OK # col_name data_type comment ss_sold_time_sk bigint ss_item_sk bigint ss_customer_sk bigint ss_cdemo_sk bigint ss_hdemo_sk bigint ss_addr_sk bigint ss_store_sk bigint ss_promo_sk bigint ss_ticket_number bigint ss_quantity int ss_wholesale_cost double ss_list_price double ss_sales_price double ss_ext_discount_amt double ss_ext_sales_price double ss_ext_wholesale_cost double ss_ext_list_price double ss_ext_tax double ss_coupon_amt double ss_net_paid double ss_net_paid_inc_tax double ss_net_profit double # Partition Information # col_name data_type comment ss_sold_date_sk bigint # Detailed Table Information Database: tpcds_bin_partitioned_parquet_200 Owner: root CreateTime: Tue Jun 06 11:51:48 CST 2017 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://bdpe38:9000/user/hive/warehouse/tpcds_bin_partitioned_parquet_200.db/store_sales Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} numFiles 2023 numPartitions 1824 numRows 575995635 rawDataSize 12671903970 totalSize 46465926745 transient_lastDdlTime 1496721108 {code} the rawDataSize is nearly 12G while the totalSize is nearly 46G. view the original data on hdfs {format} #hadoop fs -du -h /tmp/tpcds-generate/200/ 75.8 G /tmp/tpcds-generate/200/store_sales {format} view the parquet file on hdfs {format} # hadoop fs -du -h /user/hive/warehouse/tpcds_bin_partitioned_parquet_200.db 43.3 G /user/hive/warehouse/tpcds_bin_partitioned_parquet_200.db/store_sales {format} It seems that the rawDataSize is nearly 75G but in "describe formatted store_sales" command, it shows only 12G. -- This message was sent by Atlassian JIRA (v6.4.14#64029)