Todd Lipcon created IMPALA-7533:
-----------------------------------

             Summary: Optimize fetch-from-catalog by caching partitions across 
table versions
                 Key: IMPALA-7533
                 URL: https://issues.apache.org/jira/browse/IMPALA-7533
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Todd Lipcon


Currently, the cached partition-level information in CatalogdMetaProvider is 
tied to a particular version number of its containing table. This means that if 
the table is modified in any way (eg even a comment changes) all of the 
partitions are effectively invalidated and need to be re-loaded from catalogd.

We could avoid this invalidation-and-refetch in a couple ways:
1) make partitions immutable given an ID. Instead of modifying partitions in 
place, we could drop the partition and add a new one with a new ID. This is 
already done in several code paths, but not all. If we did this, then we'd just 
need to invalidate the partition _list_ for a table, and when we fetched the 
new list, we'd see which partitions changed and need to be reloaded.
2) add a partition-level version/sequence number which is modified whenever the 
partition is mutated in place. If we fetched that as part of the partition 
list, and used it as part of the cache key, we could avoid invalidating 
partitions when nothing changed. This would have the cost of 4 or 8 bytes per 
partition (perhaps manageable considering the hundreds of bytes saved by recent 
patches)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to