[ https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vihang Karajgaonkar resolved IMPALA-7954. ----------------------------------------- Fix Version/s: Impala 4.1.0 Resolution: Fixed > Support automatic invalidates using metastore notification events > ----------------------------------------------------------------- > > Key: IMPALA-7954 > URL: https://issues.apache.org/jira/browse/IMPALA-7954 > Project: IMPALA > Issue Type: Improvement > Components: Catalog > Affects Versions: Impala 3.1.0 > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Priority: Major > Fix For: Impala 4.1.0 > > Attachments: Automatic_invalidate_DesignDoc_v1.pdf, > Impala_Catalogd_Auto_Metadata_Update_v2.pdf > > > Currently, in Impala there are multiple ways to invalidate or refresh the > metadata stored in Catalog for Tables. Objects in Catalog can be invalidated > either on usage based approach (invalidate_tables_timeout_s) or when there is > GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. > However, most users issue invalidate commands when they want to sync to the > latest information from HDFS or HMS. Unfortunately, when data is modified or > new data is added outside Impala (eg. Hive) or a different Impala cluster, > users don't have a clear idea on whether they have to issue invalidate or > not. To be on the safer side, users keep issuing invalidate commands more > than necessary and it causes performance as well as stability issues. > Hive Metastore provides a simple API to get incremental updates to the > metadata information stored in its database. Each API which does a > add/alter/drop operation in metastore generates event(s) which can be fetched > using {{get_next_notification}} API. Each event has a unique and increasing > event_id. The current notification event id can be fetched using > {{get_current_notificationEventId}} API. > This JIRA proposes to make use of such events from metastore to proactively > either invalidate or refresh information in the catalogD. When configured, > CatalogD could poll for such events and take action (like add/drop/refresh > partition, add/drop/invalidate tables and databases) based on the events. > This way we can automatically refresh the catalogD state using events and it > would greatly help the use-cases where users want to see the latest > information (within a configurable interval of time delay) without flooding > the system with invalidate requests. > I will be attaching a design doc to this JIRA and create subtasks for the > work. Feel free to make comments on the JIRA or make suggestions to improve > the design. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org