I am trying to enable Column statistics usage with Parquet tables. This is
the query I am executing. However on explain, I see that even though *Basic
stats: COMPLETE *is seen *Column stats *is seen as*NONE.*
Can someone please explain what else I need to debug/fix this.
set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;
analyze table user_table partition(dt='2014-06-01',hour='00') compute
statistics;
explain select min(a), max(b), min(c) from user_table;
hive> explain select min(a), max(b), min(c) from usertable;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: user_table
Statistics: Num rows: 55490383 Data size: 1831182639 *Basic
stats: COMPLETE Column stats: NONE*
Select Operator
expressions: a (type: double), b (type: double), c (type: int)
outputColumnNames: a, b, c
Statistics: Num rows: 55490383 Data size: 1831182639* Basic
stats: COMPLETE Column stats: NONE*
Group By Operator
aggregations: min(a), max(b), min(c)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
value expressions: _col0 (type: double), _col1 (type:
double), _col2 (type: int)
Reduce Operator Tree:
Group By Operator
aggregations: min(VALUE._col0), max(VALUE._col1), min(VALUE._col2)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: _col0 (type: double), _col1 (type: double), _col2
(type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Thanks,
-sandeep
--
_____________________________________________________________
The information contained in this communication is intended solely for the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally privileged
information. If you are not the intended recipient you are hereby notified
that any disclosure, copying, distribution or taking any action in reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify
us immediately by responding to this email and then delete it from your
system. The firm is neither liable for the proper and complete transmission
of the information contained in this communication nor for any delay in its
receipt.