Are there any news about this issue? I was using a local folder in linux for checkpointing, "file:///opt/sparkfolders/checkpoints". I think that being able to use the ReliableKafkaReceiver in a 24x7 system without having to worry about disk getting full is a reasonable expectation.
Regards, Luis 2014-11-21 15:17 GMT+00:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > I have seen the same behaviour while testing the latest spark 1.2.0 > snapshot. > > I'm trying the ReliableKafkaReceiver and it works quite well but the > checkpoints folder is always increasing in size. The receivedMetaData > folder remains almost constant in size but the receivedData folder is > always increasing in size even if I set spark.cleaner.ttl to 300 seconds. > > Regards, > > Luis > > 2014-09-23 22:47 GMT+01:00 RodrigoB <rodrigo.boav...@aspect.com>: > >> Just a follow-up. >> >> Just to make sure about the RDDs not being cleaned up, I just replayed the >> app both on the windows remote laptop and then on the linux machine and at >> the same time was observing the RDD folders in HDFS. >> >> Confirming the observed behavior: running on the laptop I could see the >> RDDs >> continuously increasing. When I ran on linux, only two RDD folders were >> there and continuously being recycled. >> >> Metadata checkpoints were being cleaned on both scenarios. >> >> tnks, >> Rod >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-data-checkpoint-cleaning-tp14847p14939.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >