Hi all,

We're running a standalone Flink cluster with 2 Job Managers and 3 Task 
Managers. Whenever a TM crashes, we simply restart that particular TM and 
proceed with the processing.

But reading the comments on 
this<https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash>
 question makes it look like we need to restart all the 5 nodes that form a 
cluster to deal with the failure of a single TM. Am I reading this right? What 
would be the consequences if we restart just the crashed TM and let the healthy 
ones run as is?

Thanks,
Harshith

Reply via email to