The fault tolerance and recovery mechanism in batch mode within Apache Flink.

Вова Фролов Fri, 16 Feb 2024 04:29:29 -0800

Hi everyone,

I am currently exploring the fault tolerance and recovery mechanism in
batch mode within Apache Flink.


If I terminate the task manager process while the job is running, the job
restarts from the point of failure. However, at some point, the job
restarts from the very beginning.

The documentation mentions that the checkpointing and state backend do not
work in batch mode.

How does recovery after a failure occur in BATCH mode?

According to the documentation: “In BATCH runtime mode, Flink will attempt
to return to previous processing steps for which intermediate results are
still available. Potentially, only those tasks that fail (or their
predecessors in the graph) will have to be restarted.”

https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/execution_mode/



I would appreciate any information regarding this matter.

Kind regards,

Vladimir

The fault tolerance and recovery mechanism in batch mode within Apache Flink.

Reply via email to