Hi,
I am new to Hive (just few days of learning).
I am an Oracle Performance Expert and am comparing the Explain feature of Hive
with Oracle Explain Plan command. I have an Internal Table with around 100 Rows
in it. However, the Explain command in Hive computes this as 33, which looks to
be huge discrepancy and cause a performance issues (for a larger table). Wanted
to know the internal calculations of Hive to come out with the NUM ROWS. The
output is pasted below (The table actually has 100 Rows) :
hive (vivek)> explain select country, empno from emp;
OK
Explain
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: emp
Statistics: Num rows: 33 Data size: 3501 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: country (type: string), empno (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 33 Data size: 3501 Basic stats: COMPLETE
Column stats: NONE
ListSink
Regards
Vivek