Anand Mazumdar created MESOS-7564: ------------------------------------- Summary: Introduce a heartbeat mechanism for executor <-> agent communication. Key: MESOS-7564 URL: https://issues.apache.org/jira/browse/MESOS-7564 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar
Currently, we do not have heartbeats for executor <-> agent communication. This is especially problematic in scenarios when IPFilters are enabled since the default conntrack keep alive timeout is 5 days. When that timeout elapses, the executor doesn't get notified via a socket disconnection when the agent process restarts. The executor would then get killed if it doesn't re-register when the agent recovery process is completed. Enabling application level heartbeats or TCP KeepAlive's can be a possible way for fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346)