Yes, with spark.cleaner.ttl set there is no cleanup. We pass --properties-file spark-dev.conf to spark-submit where spark-dev.conf contains:
spark.master spark://10.250.241.66:7077 spark.logConf true spark.cleaner.ttl 1800 spark.executor.memory 10709m spark.cores.max 4 spark.shuffle.consolidateFiles true On Thu, Apr 2, 2015 at 7:12 PM, Tathagata Das <[email protected]> wrote: > Are you saying that even with the spark.cleaner.ttl set your files are not > getting cleaned up? > > TD > > On Thu, Apr 2, 2015 at 8:23 AM, andrem <[email protected]> wrote: > >> Apparently Spark Streaming 1.3.0 is not cleaning up its internal files and >> the worker nodes eventually run out of inodes. >> We see tons of old shuffle_*.data and *.index files that are never >> deleted. >> How do we get Spark to remove these files? >> >> We have a simple standalone app with one RabbitMQ receiver and a two node >> cluster (2 x r3large AWS instances). >> Batch interval is 10 minutes after which we process data and write results >> to DB. No windowing or state mgmt is used. >> >> I've poured over the documentation and tried setting the following >> properties but they have not helped. >> As a work around we're using a cron script that periodically cleans up old >> files but this has a bad smell to it. >> >> SPARK_WORKER_OPTS in spark-env.sh on every worker node >> spark.worker.cleanup.enabled true >> spark.worker.cleanup.interval >> spark.worker.cleanup.appDataTtl >> >> Also tried on the driver side: >> spark.cleaner.ttl >> spark.shuffle.consolidateFiles true >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Worker-runs-out-of-inodes-tp22355.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
