Re: Spark Streaming Worker runs out of inodes

Akhil Das Thu, 02 Apr 2015 23:10:55 -0700

Did you try these?

- Disable shuffle : spark.shuffle.spill=false
- Enable log rotation:


sparkConf.set("spark.executor.logs.rolling.strategy", "size")
.set("spark.executor.logs.rolling.size.maxBytes", "1024")
.set("spark.executor.logs.rolling.maxRetainedFiles", "3")


Thanks
Best Regards

On Fri, Apr 3, 2015 at 9:09 AM, a mesar <[email protected]> wrote:

> Yes, with spark.cleaner.ttl set there is no cleanup.  We pass 
> --properties-file
> spark-dev.conf to spark-submit where  spark-dev.conf contains:
>
> spark.master spark://10.250.241.66:7077
> spark.logConf true
> spark.cleaner.ttl 1800
> spark.executor.memory 10709m
> spark.cores.max 4
> spark.shuffle.consolidateFiles true
>
> On Thu, Apr 2, 2015 at 7:12 PM, Tathagata Das <[email protected]> wrote:
>
>> Are you saying that even with the spark.cleaner.ttl set your files are
>> not getting cleaned up?
>>
>> TD
>>
>> On Thu, Apr 2, 2015 at 8:23 AM, andrem <[email protected]> wrote:
>>
>>> Apparently Spark Streaming 1.3.0 is not cleaning up its internal files
>>> and
>>> the worker nodes eventually run out of inodes.
>>> We see tons of old shuffle_*.data and *.index files that are never
>>> deleted.
>>> How do we get Spark to remove these files?
>>>
>>> We have a simple standalone app with one RabbitMQ receiver and a two node
>>> cluster (2 x r3large AWS instances).
>>> Batch interval is 10 minutes after which we process data and write
>>> results
>>> to DB. No windowing or state mgmt is used.
>>>
>>> I've poured over the documentation and tried setting the following
>>> properties but they have not helped.
>>> As a work around we're using a cron script that periodically cleans up
>>> old
>>> files but this has a bad smell to it.
>>>
>>> SPARK_WORKER_OPTS in spark-env.sh on every worker node
>>>   spark.worker.cleanup.enabled true
>>>   spark.worker.cleanup.interval
>>>   spark.worker.cleanup.appDataTtl
>>>
>>> Also tried on the driver side:
>>>   spark.cleaner.ttl
>>>   spark.shuffle.consolidateFiles true
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Worker-runs-out-of-inodes-tp22355.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>

Re: Spark Streaming Worker runs out of inodes

Reply via email to