I have set SPARK_WORKER_OPTS in spark-env.sh for that. For example:

export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true 
-Dspark.worker.cleanup.appDataTtl=<seconds>"

> On 11.04.2015, at 00:01, Wang, Ningjun (LNG-NPV) 
> <ningjun.w...@lexisnexis.com> wrote:
> 
> Does anybody have an answer for this?
>  
> Thanks
> Ningjun
>  
> From: Wang, Ningjun (LNG-NPV) 
> Sent: Thursday, April 02, 2015 12:14 PM
> To: user@spark.apache.org <mailto:user@spark.apache.org>
> Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up?
>  
> I set SPARK_LOCAL_DIRS   to   C:\temp\spark-temp. When RDDs are shuffled, 
> spark writes to this folder. I found that the disk space of this folder keep 
> on increase quickly and at certain point I will run out of disk space. 
>  
> I wonder does spark clean up the disk space in this folder once the shuffle 
> operation is done? If not, I need to write a job to clean it up myself. But 
> how do I know which sub folders there can be removed?
>  
> Ningjun

Reply via email to