Cleaning old incremental checkpoint files

Robin Cassan Thu, 29 Jul 2021 05:31:45 -0700

Hi all!

We've happily been running a Flink job in production for a year now, with
the RocksDB state backend and incremental retained checkpointing on S3. We
often release new versions of our jobs, which means we cancel the running
one and submit another while restoring the previous jobId's last retained
checkpoint.


This works fine, but we also need to clean old files from S3 which are
starting to pile up. We are wondering two things:
- once the newer job has restored the older job's checkpoint, is it safe to
delete it? Or will the newer job's checkpoints reference files from the
older job, in which case deleting the old checkpoints might cause errors
during the next restore?
- also, since all our state has a 7 days TTL, is it safe to set a 7 or 8
days retention policy on S3 which would automatically clean old files, or
could we still need to retain files older than 7 days even with the TTL?

Don't hesitate to ask me if anything is not clear enough!

Thanks,
Robin

Cleaning old incremental checkpoint files

Reply via email to