[jira] [Commented] (FLINK-26783) Restore from a stop-with-savepoint if failed during committing

Dawid Wysakowicz (Jira) Thu, 24 Mar 2022 01:39:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-26783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511694#comment-17511694
 ]


Dawid Wysakowicz commented on FLINK-26783:
------------------------------------------

After an offline discussion we said that simply adding the savepoint to the 
{{CompletedCheckpointStore}} poses a problem for the savepoint ownership, as 
after a restart the savepoint will remain in the `CompletedCheckpointStore` and 
Flink will depend on its existence.

Therefore we propose a different approach to solve the issue that if we 
fallback to a checkpoint we might end up with duplicated records. We suggest to 
already not trigger a global failover in case the savepoint completed 
successfully, but the job failed during committing side effects. In that case 
we will finish the completable future with an exception that explains that the 
savepoint is consistent, but it might have uncommitted side effects and ask 
users to manually restart a job from that savepoint if they want to commit side 
effects.

> Restore from a stop-with-savepoint if failed during committing
> --------------------------------------------------------------
>
>                 Key: FLINK-26783
>                 URL: https://issues.apache.org/jira/browse/FLINK-26783
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.15.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> We decided stop-with-savepoint should commit side-effects and thus we should 
> fail over to those savepoints if a failure happens when committing side 
> effects.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26783) Restore from a stop-with-savepoint if failed during committing

Reply via email to