Hi ,

I tried the same with compute statistics for columns a, b,c as above and
still seeing the same results in explain plan.

How do I confirm if its generating all the column stats for a given column.
If this is confirmed, we can debug why Hive is still not using it?

Thanks
Suma


On Thu, Jul 24, 2014 at 11:49 PM, Prasanth Jayachandran <
[email protected]> wrote:

> You have to explicit specifics column list in analyze command for
> gathering columns stats.
>
> This command will only collect basic stats like number of rows, total file
> size, raw data size, number of files.
> analyze table user_table partition(dt='2014-06-01',hour='00') compute
> statistics;
>
> To collect column statistics add the column list like below
> analyze table user_table partition(dt='2014-06-01',hour='00') compute
> statistics for columns a, b, c;
>
> Thanks
> Prasanth Jayachandran
>
> On Jul 24, 2014, at 5:13 AM, Sandeep Samudrala <
> [email protected]> wrote:
>
> I am trying to enable Column statistics usage with Parquet tables. This is
> the query I am executing. However on explain, I see that even though *Basic
> stats: COMPLETE *is seen *Column stats *is seen as*NONE.*
> Can someone please explain what else I need to debug/fix this.
>
> set hive.compute.query.using.stats=true;
> set hive.stats.reliable=true;
> set hive.stats.fetch.column.stats=true;
> set hive.stats.fetch.partition.stats=true;
> set hive.cbo.enable=true;
>
> analyze table user_table partition(dt='2014-06-01',hour='00') compute
> statistics;
>
> explain select min(a), max(b), min(c) from user_table;
>
> hive> explain select min(a), max(b), min(c) from usertable;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
>
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: user_table
>             Statistics: Num rows: 55490383 Data size: 1831182639 *Basic
> stats: COMPLETE Column stats: NONE*
>             Select Operator
>               expressions: a (type: double), b (type: double), c (type:
> int)
>               outputColumnNames: a, b, c
>               Statistics: Num rows: 55490383 Data size: 1831182639* Basic
> stats: COMPLETE Column stats: NONE*
>               Group By Operator
>                 aggregations: min(a), max(b), min(c)
>                 mode: hash
>                 outputColumnNames: _col0, _col1, _col2
>                 Statistics: Num rows: 1 Data size: 20 *Basic stats:
> COMPLETE Column stats: NONE*
>                 Reduce Output Operator
>                   sort order:
>                   Statistics: Num rows: 1 Data size: 20 *Basic stats:
> COMPLETE Column stats: NONE*
>                   value expressions: _col0 (type: double), _col1 (type:
> double), _col2 (type: int)
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations: min(VALUE._col0), max(VALUE._col1),
> min(VALUE._col2)
>           mode: mergepartial
>           outputColumnNames: _col0, _col1, _col2
>           Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
> Column stats: NONE
>           Select Operator
>             expressions: _col0 (type: double), _col1 (type: double), _col2
> (type: int)
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
> Column stats: NONE
>             File Output Operator
>               compressed: false
>               Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
> Column stats: NONE
>               table:
>                   input format: org.apache.hadoop.mapred.TextInputFormat
>                   output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>
>
> Thanks,
> -sandeep
>
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Reply via email to