Hi, Trying to understand how JobManager. kills TaskManager that didn't respond for heartbeat after a certain time.
For example: If a network connection b/w JobManager and TaskManager is lost for some reasons, the JobManager will bring up another Taskmanager post hearbeat timeout. In such a case, how does JobManager make sure all connections like to Kafka from lost Taskmanager are cut down and the new one will take from a certain consistent point. Also want to learn ways to debug what caused the timeout, our job fairly handles 5k records/s, not a heavy traffic job. -- A.Narasimha Swamy