Todd Lipcon created IMPALA-7533: ----------------------------------- Summary: Optimize fetch-from-catalog by caching partitions across table versions Key: IMPALA-7533 URL: https://issues.apache.org/jira/browse/IMPALA-7533 Project: IMPALA Issue Type: Sub-task Reporter: Todd Lipcon
Currently, the cached partition-level information in CatalogdMetaProvider is tied to a particular version number of its containing table. This means that if the table is modified in any way (eg even a comment changes) all of the partitions are effectively invalidated and need to be re-loaded from catalogd. We could avoid this invalidation-and-refetch in a couple ways: 1) make partitions immutable given an ID. Instead of modifying partitions in place, we could drop the partition and add a new one with a new ID. This is already done in several code paths, but not all. If we did this, then we'd just need to invalidate the partition _list_ for a table, and when we fetched the new list, we'd see which partitions changed and need to be reloaded. 2) add a partition-level version/sequence number which is modified whenever the partition is mutated in place. If we fetched that as part of the partition list, and used it as part of the cache key, we could avoid invalidating partitions when nothing changed. This would have the cost of 4 or 8 bytes per partition (perhaps manageable considering the hundreds of bytes saved by recent patches) -- This message was sent by Atlassian JIRA (v7.6.3#76005)