Well not the correct way ,you can check the statistics in mysql part_col_stats like tables in mysql data base if you are using mysql stat database . Or the other way is calling max,min,distinct on int columns ,largest length on string columns etc,if they run whole map reduce on these operation then statistics are not getting created .
From: Suma Shivaprasad [mailto:sumasai.shivapra...@gmail.com] Sent: Friday, July 25, 2014 12:43 PM To: user@hive.apache.org Subject: Re: Reg:Column Statistics with Parquet Hi , I tried the same with compute statistics for columns a, b,c as above and still seeing the same results in explain plan. How do I confirm if its generating all the column stats for a given column. If this is confirmed, we can debug why Hive is still not using it? Thanks Suma On Thu, Jul 24, 2014 at 11:49 PM, Prasanth Jayachandran <pjayachand...@hortonworks.com<mailto:pjayachand...@hortonworks.com>> wrote: You have to explicit specifics column list in analyze command for gathering columns stats. This command will only collect basic stats like number of rows, total file size, raw data size, number of files. analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics; To collect column statistics add the column list like below analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics for columns a, b, c; Thanks Prasanth Jayachandran On Jul 24, 2014, at 5:13 AM, Sandeep Samudrala <sandeep.samudr...@inmobi.com<mailto:sandeep.samudr...@inmobi.com>> wrote: I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though Basic stats: COMPLETE is seen Column stats is seen asNONE. Can someone please explain what else I need to debug/fix this. set hive.compute.query.using.stats=true; set hive.stats.reliable=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.cbo.enable=true; analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics; explain select min(a), max(b), min(c) from user_table; hive> explain select min(a), max(b), min(c) from usertable; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: user_table Statistics: Num rows: 55490383 Data size: 1831182639 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: double), b (type: double), c (type: int) outputColumnNames: a, b, c Statistics: Num rows: 55490383 Data size: 1831182639 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: min(a), max(b), min(c) mode: hash outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: double), _col1 (type: double), _col2 (type: int) Reduce Operator Tree: Group By Operator aggregations: min(VALUE._col0), max(VALUE._col1), min(VALUE._col2) mode: mergepartial outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: double), _col1 (type: double), _col2 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Thanks, -sandeep _____________________________________________________________ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.