[ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1070:
------------------------------

    Attachment: YARN-1070.4.patch

Update the patch against the latest trunk.

bq. Taking a step back, this approach will work, though the code is hard to 
read for me. A very simple state machine should make this code a lot cleaner.

IMHO, the state machine will not help a lot here, because Callable is running 
on a separate thread, and is proceeding asynchronously compared to 
ContainerImpl. The container state will be changed to KILLING at any time: 
before Callable starts, when Callable is running, and after Callable is 
finished. We can check the state in many places, but the important one is the 
beginning of Callable. When the container is already at KILLING, there's no 
need to go through all the following logic. This actually behaves like 
canceling the Callable.

bq. Also, as part of ContainerLaunch.cleanupContainer(), we should try to 
cancel the Callable.

It's not necessary if we can terminate the Callable early, and will cause the 
bug in YARN-906. When cleanupContainer() is invoked, the container state is 
already KILLING, cancel will just cancel the Callable that is not started. On 
the other side, if the Callable is not started, while the container state is 
already KILLING, the Callable will terminate at very beginning. Meanwhile, a 
CONTAINER_KILLED_ON_REQUEST will be emitted. If we did cancel Callable(), we 
still need to check the container state there, and decide whether we need to 
emit a CONTAINER_KILLED_ON_REQUEST there as well, which returns to the initial 
problem of this ticket.


                
> ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
> CONTAINER_CLEANEDUP_AFTER_KILL
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1070
>                 URL: https://issues.apache.org/jira/browse/YARN-1070
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Hitesh Shah
>            Assignee: Zhijie Shen
>         Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch, 
> YARN-1070.4.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to