Congling Xia created KYLIN-4315:
-----------------------------------

             Summary: use metadata numRows in beeline client for quick row 
counting
                 Key: KYLIN-4315
                 URL: https://issues.apache.org/jira/browse/KYLIN-4315
             Project: Kylin
          Issue Type: Improvement
          Components: Job Engine
            Reporter: Congling Xia
            Assignee: Congling Xia


Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses "select 
count(*) from <tb_name>" for table row counting. The method is invoked in flat 
intermediate table redistribution step in cube building.

This stats can be loaded in metastore. It costs much less time than scanning 
all rows in Hive table. Since intermediate tables are created and inserted by 
Kylin, statistics will be automatically calculated and stored in metastore when 
`[hive.stats.autogather|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.autogather]`
 is enabled (which is the default setting for Hive). 

ref Hive wiki for more detail about `numRows` stats: 
[https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to