Re: automatically unpersist RDDs which are not used for 24 hours?

Andrew Or Wed, 13 Jan 2016 11:47:45 -0800

Hi Alex,

Yes, you can set `spark.cleaner.ttl`:
http://spark.apache.org/docs/1.6.0/configuration.html, but I would not
recommend it!


We are actually removing this property in Spark 2.0 because it has caused
problems for many users in the past. In particular, if you accidentally use
a variable that has been automatically cleaned, then you will run into
problems like shuffle fetch failures or broadcast variable not found etc,
which may fail your job.

Alternatively, Spark already automatically cleans up all variables that
have been garbage collected, including RDDs, shuffle dependencies,
broadcast variables and accumulators. This context-based cleaning has been
enabled by default for many versions by now so it should be reliable. The
only caveat is that it may not work super well in a shell environment,
where some variables may never exit the scope.

Please let me know if you have more questions,
-Andrew


2016-01-13 11:36 GMT-08:00 Alexander Pivovarov <apivova...@gmail.com>:

> Is it possible to automatically unpersist RDDs which are not used for 24
> hours?
>

Re: automatically unpersist RDDs which are not used for 24 hours?

Reply via email to