Maryann Xue created SPARK-24596: ----------------------------------- Summary: Non-cascading Cache Invalidation Key: SPARK-24596 URL: https://issues.apache.org/jira/browse/SPARK-24596 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Maryann Xue Fix For: 2.4.0
When invalidating a cache, we invalid other caches dependent on this cache to ensure cached data is up to date. For example, when the underlying table has been modified or the table has been dropped itself, all caches that use this table should be invalidated or refreshed. However, in other cases, like when user simply want to drop a cache to free up memory, we do not need to invalidate dependent caches since no underlying data has been changed. For this reason, we would like to introduce a new cache invalidation mode: the non-cascading cache invalidation. And we choose between the existing mode and the new mode for different cache invalidation scenarios: # Drop tables and regular (persistent) views: regular mode # Drop temporary views: non-cascading mode # Modify table contents (INSERT/UPDATE/MERGE/DELETE): regular mode # Call DataSet.unpersist(): non-cascading mode Note that a regular (persistent) view is a database object just like a table, so after dropping a regular view (whether cached or not cached), any query referring to that view should no long be valid. Hence if a cached persistent view is dropped, we need to invalidate the all dependent caches so that exceptions will be thrown for any later reference. On the other hand, a temporary view is in fact equivalent to an unnamed DataSet, and dropping a temporary view should have no impact on queries referencing that view. Thus we should do non-cascading uncaching for temporary views, which also guarantees a consistent uncaching behavior between temporary views and unnamed DataSets. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org