Maryann Xue created SPARK-24596:
-----------------------------------

             Summary: Non-cascading Cache Invalidation
                 Key: SPARK-24596
                 URL: https://issues.apache.org/jira/browse/SPARK-24596
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Maryann Xue
             Fix For: 2.4.0


When invalidating a cache, we invalid other caches dependent on this cache to 
ensure cached data is up to date. For example, when the underlying table has 
been modified or the table has been dropped itself, all caches that use this 
table should be invalidated or refreshed.

However, in other cases, like when user simply want to drop a cache to free up 
memory, we do not need to invalidate dependent caches since no underlying data 
has been changed. For this reason, we would like to introduce a new cache 
invalidation mode: the non-cascading cache invalidation. And we choose between 
the existing mode and the new mode for different cache invalidation scenarios:
 # Drop tables and regular (persistent) views: regular mode
 # Drop temporary views: non-cascading mode
 # Modify table contents (INSERT/UPDATE/MERGE/DELETE): regular mode
 # Call DataSet.unpersist(): non-cascading mode

Note that a regular (persistent) view is a database object just like a table, 
so after dropping a regular view (whether cached or not cached), any query 
referring to that view should no long be valid. Hence if a cached persistent 
view is dropped, we need to invalidate the all dependent caches so that 
exceptions will be thrown for any later reference. On the other hand, a 
temporary view is in fact equivalent to an unnamed DataSet, and dropping a 
temporary view should have no impact on queries referencing that view. Thus we 
should do non-cascading uncaching for temporary views, which also guarantees a 
consistent uncaching behavior between temporary views and unnamed DataSets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to