Hi , I tried the same with compute statistics for columns a, b,c as above and still seeing the same results in explain plan.
How do I confirm if its generating all the column stats for a given column. If this is confirmed, we can debug why Hive is still not using it? Thanks Suma On Thu, Jul 24, 2014 at 11:49 PM, Prasanth Jayachandran < [email protected]> wrote: > You have to explicit specifics column list in analyze command for > gathering columns stats. > > This command will only collect basic stats like number of rows, total file > size, raw data size, number of files. > analyze table user_table partition(dt='2014-06-01',hour='00') compute > statistics; > > To collect column statistics add the column list like below > analyze table user_table partition(dt='2014-06-01',hour='00') compute > statistics for columns a, b, c; > > Thanks > Prasanth Jayachandran > > On Jul 24, 2014, at 5:13 AM, Sandeep Samudrala < > [email protected]> wrote: > > I am trying to enable Column statistics usage with Parquet tables. This is > the query I am executing. However on explain, I see that even though *Basic > stats: COMPLETE *is seen *Column stats *is seen as*NONE.* > Can someone please explain what else I need to debug/fix this. > > set hive.compute.query.using.stats=true; > set hive.stats.reliable=true; > set hive.stats.fetch.column.stats=true; > set hive.stats.fetch.partition.stats=true; > set hive.cbo.enable=true; > > analyze table user_table partition(dt='2014-06-01',hour='00') compute > statistics; > > explain select min(a), max(b), min(c) from user_table; > > hive> explain select min(a), max(b), min(c) from usertable; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: user_table > Statistics: Num rows: 55490383 Data size: 1831182639 *Basic > stats: COMPLETE Column stats: NONE* > Select Operator > expressions: a (type: double), b (type: double), c (type: > int) > outputColumnNames: a, b, c > Statistics: Num rows: 55490383 Data size: 1831182639* Basic > stats: COMPLETE Column stats: NONE* > Group By Operator > aggregations: min(a), max(b), min(c) > mode: hash > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 20 *Basic stats: > COMPLETE Column stats: NONE* > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 20 *Basic stats: > COMPLETE Column stats: NONE* > value expressions: _col0 (type: double), _col1 (type: > double), _col2 (type: int) > Reduce Operator Tree: > Group By Operator > aggregations: min(VALUE._col0), max(VALUE._col1), > min(VALUE._col2) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: _col0 (type: double), _col1 (type: double), _col2 > (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE > Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > Stage: Stage-0 > Fetch Operator > limit: -1 > > > Thanks, > -sandeep > > _____________________________________________________________ > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify > us immediately by responding to this email and then delete it from your > system. The firm is neither liable for the proper and complete transmission > of the information contained in this communication nor for any delay in its > receipt. > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
