[ https://issues.apache.org/jira/browse/MESOS-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anand Mazumdar updated MESOS-5295: ---------------------------------- Summary: The task launched by non-checkpointed HTTP command executor will keep running till executor shutdown grace period (5s) after agent process exits. (was: The task launched by non-checkpointed HTTP command executor will keep running after agent is done) > The task launched by non-checkpointed HTTP command executor will keep running > till executor shutdown grace period (5s) after agent process exits. > ------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: MESOS-5295 > URL: https://issues.apache.org/jira/browse/MESOS-5295 > Project: Mesos > Issue Type: Bug > Components: HTTP API > Reporter: Qian Zhang > Assignee: Qian Zhang > > When I test HTTP command executor, I found an issue, here is my steps: > 1. A framework which has no checkpoint enabled launches a long running task > (e.g., sleep 1000). > 2. After the task is running, kill the agent. > > Then I see the HTTP command executor will terminate after 5s > ("DEFAULT_EXECUTOR_SHUTDOWN_GRACE_PERIOD"), but the task will always run. > This behavior is not consistent with driver based command executor: after > agent is killed, that executor will kill the task and then self terminate > after 1s (there is a "os::sleep(Seconds(1));" in "reaped()"). > The root cause of this difference is, for driver based command executor, when > the driver found agent is down, it will call executor->shutdown() > (https://github.com/apache/mesos/blob/0.28.1/src/exec/exec.cpp#L487), so the > executor will kill the task and then self terminate. But for HTTP command > executor, its "disconnected()" will be called > (https://github.com/apache/mesos/blob/0.28.1/src/executor/executor.cpp#L388) > when agent is down, and currently we do not do anything in its > "disconnected()", so the task will keep running and the executor will be > killed after 5s > (https://github.com/apache/mesos/blob/0.28.1/src/executor/executor.cpp#L623). > The behavior of driver based command executor is correct, we need to make > sure HTTP command executor kills the task when agent is down if checkpoint is > not enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)