Hi Flink community,

We recently investigated a scenario that led us to revisit Flink’s recovery
semantics around checkpoints and savepoints, and I’d like to ask some
questions. We run Flink 1.20 on Kubernetes with the
flink-kubernetes-operator. Autoscaling is enabled, so restarts due to
rescaling or spec changes can occur around the same time as manual
savepointing. Our sinks use the default Flink FileSink writing Parquet
files to S3 with exactly-once semantics.

In a reproduced scenario, a checkpoint completed, then a manual savepoint
completed, and before the next periodic checkpoint finished the job
restarted. On restart, Flink restored from the earlier checkpoint rather
than the newer savepoint, leading to reprocessing and duplicate Parquet
files being written. From code inspection and docs, this appears to be
expected behavior: Flink never uses savepoints for automatic recovery and
always restores from the latest completed checkpoint (FLIP-193 /
FLINK-25191).

So my questions are:
- Has there been any discussion about making recovery-from-savepoint
configurable (e.g. if newer than the last checkpoint and still present)?
- Is the expectation today that 2PC sinks must avoid irreversible side
effects during savepoint snapshots or encode enough metadata to deduplicate
on recovery?

Thanks for any insight or pointers to prior discussions.

Best regards,

Lucas

Reply via email to