bharath v created IMPALA-8937:
---------------------------------

             Summary: Fine grained table metadata loading on Catalog server
                 Key: IMPALA-8937
                 URL: https://issues.apache.org/jira/browse/IMPALA-8937
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog, Frontend
    Affects Versions: Impala 2.12.0, Impala 3.3.0
            Reporter: bharath v


*Background*:

Currently the table _on the Catalog server_ is either in a loaded or unloaded 
state (IncompleteTable). When Catalog server starts for the first time, we 
first fetch a list of table names for each databases and every table in this 
list starts as an unloaded table. The table lists are propagated to the 
coordinators so that they know whether a table with a given name exists or not 
and they can start analyzing the queries. No metadata is loaded in the 
incomplete tables (like schema/ownership, comments etc.)

The table metadata is loaded lazily (and the table moves into a loaded state) 
when it is referenced in any query. When a load request comes in, all the table 
metadata is loaded including file block information. 

*Problem:* 

Coordinators need some additional information when analyzing unloaded tables. 
For example: IMPALA-8228. The ownership information is a part of the HMS table 
schema which is not loaded until the table is marked fully loaded. While this 
is not a problem for regular queries (like select * from <tbl>), it is an issue 
with queries like "show tables" which do not trigger a table load. In this 
particular case, due to the lack of ownership information, the output of the 
table listing could be different depending on whether the table is loaded. 
Another example is IMPALA-8606 where the GET_TABLES request does not return the 
table comments because they are not available for unloaded tables.

*Ask:*

We need to consider finer grained loading on the Catalog server in general. 
Instead of having a binary state (loaded vs unloaded), the table could be in a 
partially loaded state. We could also start with aggressively fetching certain 
pieces of information that we think could aid with analysis and lazily load the 
remaining pieces of metadata. Finer grained loading also integrates well with 
the LocalCatalog implementation on the coordinators where the the entire table 
need not be loaded on the Catalog server to serve partial meta information 
(e.g: show partitions <large-table>).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to