[ https://issues.apache.org/jira/browse/SPARK-30470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro resolved SPARK-30470. -------------------------------------- Resolution: Duplicate > Uncache table in tempViews if needed on session closed > ------------------------------------------------------ > > Key: SPARK-30470 > URL: https://issues.apache.org/jira/browse/SPARK-30470 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.2 > Reporter: liupengcheng > Priority: Major > > Currently, Spark will not cleanup cached tables in tempViews produced by sql > like following > `CACHE TABLE table1 as SELECT ....` > There are risks that the `uncache table` not called due to session closed > unexpectedly, or user closed manually. Then these temp views will lost, and > we can not visit them in other session, but the cached plan still exists in > the `CacheManager`. > Moreover, the leaks may cause the failure of the subsequent query, one > failure we encoutered in our production environment is as below: > {code:java} > Caused by: java.io.FileNotFoundException: File does not exist: > /user/xxxx/xx/data__db60e76d_91b8_42f3_909d_5c68692ecdd4Caused by: > java.io.FileNotFoundException: File does not exist: > /user/xxxx/xx/data__db60e76d_91b8_42f3_909d_5c68692ecdd4It is possible the > underlying files have been updated. You can explicitly invalidate the cache > in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating > the Dataset/DataFrame involved. at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:131) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.scan_nextBatch_0$(Unknown > Source) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) at > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at > scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > {code} > The above exception happens when user update the data of the table, but spark > still use the old cached plan. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org