Hi,

Trying to understand how JobManager. kills TaskManager that didn't respond
for heartbeat after a certain time.

For example:

If a network connection b/w JobManager and TaskManager is lost for some
reasons, the JobManager will bring up another Taskmanager post
hearbeat timeout.
In such a case, how does JobManager make sure all connections like to Kafka
from lost Taskmanager are cut down and the new one will take from a certain
consistent point.

Also want to learn ways to debug what caused the timeout, our job fairly
handles 5k records/s, not a heavy traffic job.
-- 
A.Narasimha Swamy

Reply via email to