Jie Yu created MESOS-5380: ----------------------------- Summary: Killing a queued task can cause the corresponding command executor never terminates. Key: MESOS-5380 URL: https://issues.apache.org/jira/browse/MESOS-5380 Project: Mesos Issue Type: Bug Affects Versions: 0.28.1, 0.28.0 Reporter: Jie Yu Assignee: Vinod Kone Priority: Blocker Fix For: 0.29.0, 0.28.2
We observed that in our testing environment. So here is the sequence of events: 1) A command task is queued, the executor is not registered yet 2) The framework issues a killTask 3) Since executor is in REGISTERING state, agent calls `statusUpdate(TASK_KILLED, UPID())` 4) `statusUpdate` now will call `containerizer->status()` before calling `executor->terminateTask(status.task_id(), status);` which will remove the queued task. (introduced in this patch https://reviews.apache.org/r/43258). 5) Since the above is async, it's possible that the task is still in queued task when we trying to see if we need to kill unregistered executor in `killTask`: ``` // TODO(jieyu): Here, we kill the executor if it no longer has // any task to run and has not yet registered. This is a // workaround for those single task executors that do not have a // proper self terminating logic when they haven't received the // task within a timeout. if (executor->queuedTasks.empty()) { CHECK(executor->launchedTasks.empty()) << " Unregistered executor '" << executor->id << "' has launched tasks"; LOG(WARNING) << "Killing the unregistered executor " << *executor << " because it has no tasks"; executor->state = Executor::TERMINATING; containerizer->destroy(executor->containerId); } ``` 6) The executor will never be terminated by Mesos after that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)