Jinfeng Ni created DRILL-4126:
---------------------------------

             Summary: Adding HiveMetaStore caching when impersonation is 
enabled. 
                 Key: DRILL-4126
                 URL: https://issues.apache.org/jira/browse/DRILL-4126
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni
            Assignee: Jinfeng Ni


Currently, HiveMetastore caching is used only when impersonation is disabled, 
such that all the hivemetastore call goes through 
NonCloseableHiveClientWithCaching [1]. However, if impersonation is enabled, 
caching is not used for HiveMetastore access.

This could significantly increase the planning time when hive storage plugin is 
enabled, or when running a query against INFORMATION_SCHEMA. Depending on the # 
of databases/tables in Hive storage plugin, the planning time or 
INFORMATION_SCHEMA query could become unacceptable. This becomes even worse if 
the hive metastore is running on a different node from drillbit, making the 
access of hivemetastore even slower.

We are seeing that it could takes 30~60 seconds for planning time, or execution 
time for INFORMATION_SCHEMA query.  The long planning or execution time for 
INFORMATION_SCHEMA query prevents Drill from acting "interactively" for such 
queries. 

We should enable caching when impersonation is used. As long as the authorizer 
verifies the user has the access to databases/tables, we should get the data 
from caching. By doing that, we should see reduced number of api call to 
HiveMetaStore.


[1] 
https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to