[ https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tathagata Das reopened SPARK-5836: ---------------------------------- > Highlight in Spark documentation that by default Spark does not delete its > temporary files > ------------------------------------------------------------------------------------------ > > Key: SPARK-5836 > URL: https://issues.apache.org/jira/browse/SPARK-5836 > Project: Spark > Issue Type: Improvement > Components: Documentation > Reporter: Tomasz Dudziak > Assignee: Ilya Ganelin > Priority: Minor > Fix For: 1.3.1, 1.4.0 > > > We recently learnt the hard way (in a prod system) that Spark by default does > not delete its temporary files until it is stopped. WIthin a relatively short > time span of heavy Spark use the disk of our prod machine filled up > completely because of multiple shuffle files written to it. We think there > should be better documentation around the fact that after a job is finished > it leaves a lot of rubbish behind so that this does not come as a surprise. > Probably a good place to highlight that fact would be the documentation of > {{spark.local.dir}} property, which controls where Spark temporary files are > written. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org