Reg:Column Statistics with Parquet

Sandeep Samudrala Thu, 24 Jul 2014 05:15:14 -0700

I am trying to enable Column statistics usage with Parquet tables. This is
the query I am executing. However on explain, I see that even though *Basic
stats: COMPLETE *is seen *Column stats *is seen as*NONE.*
Can someone please explain what else I need to debug/fix this.


set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

analyze table user_table partition(dt='2014-06-01',hour='00') compute
statistics;

explain select min(a), max(b), min(c) from user_table;

hive> explain select min(a), max(b), min(c) from usertable;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: user_table
            Statistics: Num rows: 55490383 Data size: 1831182639 *Basic
stats: COMPLETE Column stats: NONE*
            Select Operator
              expressions: a (type: double), b (type: double), c (type: int)
              outputColumnNames: a, b, c
              Statistics: Num rows: 55490383 Data size: 1831182639* Basic
stats: COMPLETE Column stats: NONE*
              Group By Operator
                aggregations: min(a), max(b), min(c)
                mode: hash
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
                Reduce Output Operator
                  sort order:
                  Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
                  value expressions: _col0 (type: double), _col1 (type:
double), _col2 (type: int)
      Reduce Operator Tree:
        Group By Operator
          aggregations: min(VALUE._col0), max(VALUE._col1), min(VALUE._col2)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
          Select Operator
            expressions: _col0 (type: double), _col1 (type: double), _col2
(type: int)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1


Thanks,
-sandeep

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Reg:Column Statistics with Parquet

Reply via email to