[ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500195#comment-13500195
 ] 

Tom White commented on YARN-72:
-------------------------------

Sandy, this looks like a good start, hooking in the code for container cleanup. 
I would focus on the part to cleanup on shutdown in this patch, and tackle 
cleanup on startup in YARN-73.

As Bikas mentioned there needs to be a timeout on waiting for the containers to 
shutdown. The shutdown process waits for up to 
yarn.nodemanager.process-kill-wait.ms for the PID to appear, then 
yarn.nodemanager.sleep-delay-before-sigkill.ms before sending a SIGKILL signal 
(after a SIGTERM) if the process hasn't died - see 
ContainerLaunch#cleanupContainer. Waiting for a little longer than the sum of 
these durations would be sufficient.

Regarding testing, you could have a test like the one in 
TestContainerLaunch#testDelayedKill to test that containers are correctly 
cleaned up after stopping a NM.
                
> NM should handle cleaning up containers when it shuts down ( and kill 
> containers from an earlier instance when it comes back up after an unclean 
> shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-72
>                 URL: https://issues.apache.org/jira/browse/YARN-72
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>            Assignee: Sandy Ryza
>         Attachments: YARN-72.patch
>
>
> Ideally, the NM should wait for a limited amount of time when it gets a 
> shutdown signal for existing containers to complete and kill the containers ( 
> if we pick an aggressive approach ) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through 
> its directories for existing container.pids and try and kill an existing 
> containers matching the pids found. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to