How does JobManager terminate dangling task manager

narasimha Wed, 12 May 2021 20:05:57 -0700

Hi,

Trying to understand how JobManager. kills TaskManager that didn't respond
for heartbeat after a certain time.


For example:

If a network connection b/w JobManager and TaskManager is lost for some
reasons, the JobManager will bring up another Taskmanager post
hearbeat timeout.
In such a case, how does JobManager make sure all connections like to Kafka
from lost Taskmanager are cut down and the new one will take from a certain
consistent point.

Also want to learn ways to debug what caused the timeout, our job fairly
handles 5k records/s, not a heavy traffic job.
-- 
A.Narasimha Swamy

How does JobManager terminate dangling task manager

Reply via email to