Hi everyone,

I am currently exploring the fault tolerance and recovery mechanism in
batch mode within Apache Flink.

If I terminate the task manager process while the job is running, the job
restarts from the point of failure. However, at some point, the job
restarts from the very beginning.

The documentation mentions that the checkpointing and state backend do not
work in batch mode.

How does recovery after a failure occur in BATCH mode?

According to the documentation: “In BATCH runtime mode, Flink will attempt
to return to previous processing steps for which intermediate results are
still available. Potentially, only those tasks that fail (or their
predecessors in the graph) will have to be restarted.”

https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/execution_mode/



I would appreciate any information regarding this matter.

Kind regards,

Vladimir

Reply via email to