[flink] 05/05: [FLINK-25191][checkpointing] Update documentation for savepoints & failure recovery

dwysakowicz Thu, 16 Dec 2021 04:21:33 -0800

This is an automated email from the ASF dual-hosted git repository.

dwysakowicz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git


commit 5be5e1fbc42bff07f0c9e8353838452b63b14255
Author: Dawid Wysakowicz <dwysakow...@apache.org>
AuthorDate: Mon Dec 13 10:40:28 2021 +0100

    [FLINK-25191][checkpointing] Update documentation for savepoints & failure 
recovery
---
 docs/content/docs/ops/state/savepoints.md | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/docs/content/docs/ops/state/savepoints.md 
b/docs/content/docs/ops/state/savepoints.md
index 0d7a3290..215ec4e 100644
--- a/docs/content/docs/ops/state/savepoints.md
+++ b/docs/content/docs/ops/state/savepoints.md
@@ -130,9 +130,23 @@ Unlike savepoints, checkpoints cannot generally be moved 
to a different location
 
 If you use `JobManagerCheckpointStorage`, metadata *and* savepoint state will 
be stored in the `_metadata` file, so don't be confused by the absence of 
additional data files.
 
-{{< hint warning  >}}
-It is discouraged to move or delete the last savepoint of a running job, 
because this might interfere with failure-recovery. Savepoints have 
side-effects on exactly-once sinks, therefore 
-to ensure exactly-once semantics, if there is no checkpoint after the last 
savepoint, the savepoint will be used for recovery. 
+{{< hint warning  >}} 
+Starting from Flink 1.15 intermediate savepoints (savepoints other than
+created with [stop-with-savepoint](#stopping-a-job-with-savepoint)) are not 
used for recovery and do
+not commit any side effects.
+
+This has to be taken into consideration, especially when running multiple jobs 
in the same
+checkpointing timeline. It is possible in that solution that if the original 
job (after taking a
+savepoint) fails, then it will fall back to a checkpoint prior to the 
savepoint. However, if we now
+resume a job from the savepoint, then we might commit transactions that 
might’ve never happened
+because of falling back to a checkpoint before the savepoint (assuming 
non-determinism).
+
+If one wants to be safe in those scenarios, we advise dropping the state of 
transactional sinks, by
+changing sinks [uids](#assigning-operator-ids).
+
+It should not require any additional steps if there is just a single job 
running in the same
+checkpointing timeline, which means that you stop the original job before 
running a new job from the
+savepoint. 
 {{< /hint >}}
 
 #### Trigger a Savepoint

[flink] 05/05: [FLINK-25191][checkpointing] Update documentation for savepoints & failure recovery

Reply via email to