[ 
https://issues.apache.org/jira/browse/SPARK-30470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-30470.
--------------------------------------
    Resolution: Duplicate

> Uncache table in tempViews if needed on session closed
> ------------------------------------------------------
>
>                 Key: SPARK-30470
>                 URL: https://issues.apache.org/jira/browse/SPARK-30470
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: liupengcheng
>            Priority: Major
>
> Currently, Spark will not cleanup cached tables in tempViews produced by sql 
> like following
> `CACHE TABLE table1 as SELECT ....`
> There are risks that the `uncache table` not called due to session closed 
> unexpectedly, or user closed manually. Then these temp views will lost, and 
> we can not visit them in other session, but the cached plan still exists in 
> the `CacheManager`.
> Moreover, the leaks may cause the failure of the subsequent query, one 
> failure we encoutered in our production environment is as below:
> {code:java}
> Caused by: java.io.FileNotFoundException: File does not exist: 
> /user/xxxx/xx/data__db60e76d_91b8_42f3_909d_5c68692ecdd4Caused by: 
> java.io.FileNotFoundException: File does not exist: 
> /user/xxxx/xx/data__db60e76d_91b8_42f3_909d_5c68692ecdd4It is possible the 
> underlying files have been updated. You can explicitly invalidate the cache 
> in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating 
> the Dataset/DataFrame involved. at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:131)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.scan_nextBatch_0$(Unknown
>  Source) at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.processNext(Unknown
>  Source) at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) at 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> {code}
> The above exception happens when user update the data of the table, but spark 
> still use the old cached plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to