I somehow missed that parameter when I was reviewing the documentation,
that should do the trick! Thank you!

2014-09-10 2:10 GMT+01:00 Shao, Saisai <saisai.s...@intel.com>:

>  Hi Luis,
>
>
>
> The parameter “spark.cleaner.ttl” and “spark.streaming.unpersist” can be
> used to remove useless timeout streaming data, the difference is that
> “spark.cleaner.ttl” is time-based cleaner, it does not only clean streaming
> input data, but also Spark’s useless metadata; while
> “spark.streaming.unpersist” is reference-based cleaning mechanism,
> streaming data will be removed when out of slide duration.
>
>
>
> Both these two parameter can alleviate the memory occupation of Spark
> Streaming. But if the data is flooded into Spark Streaming when start up
> like your situation using Kafka, these two parameters cannot well mitigate
> the problem. Actually you need to control the input data rate to not inject
> so fast, you can try “spark.straming.receiver.maxRate” to control the
> inject rate.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Luis Ángel Vicente Sánchez [mailto:langel.gro...@gmail.com]
> *Sent:* Wednesday, September 10, 2014 5:21 AM
> *To:* user@spark.apache.org
> *Subject:* spark.cleaner.ttl and spark.streaming.unpersist
>
>
>
> The executors of my spark streaming application are being killed due to
> memory issues. The memory consumption is quite high on startup because is
> the first run and there are quite a few events on the kafka queues that are
> consumed at a rate of 100K events per sec.
>
> I wonder if it's recommended to use spark.cleaner.ttl and
> spark.streaming.unpersist together to mitigate that problem. And I also
> wonder if new RDD are being batched while a RDD is being processed.
>
> Regards,
>
> Luis
>

Reply via email to