While checkpointing RDDs as a part of an application that doesn't use spark-streaming, I observed that the checkpointed files are not being cleaned up even after the application completes successfully.
Is it because we assume that checkpointing would be primarily used for spark-streaming applications which run in continuum? Also the current mechanism supports recovery only in spark-streaming which can survive driver crashes. There's no support to recover from previously checkpointed RDDs in subsequent application attempts. It would be consistent and nice to have the ability to recover across app attempts in non streaming jobs. Is there any specific reason for the current behavior of not cleaning the files and lack of support across app attempts? If not I can raise a JIRA for this. Thanks, Dhruve