[ https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500195#comment-13500195 ]
Tom White commented on YARN-72: ------------------------------- Sandy, this looks like a good start, hooking in the code for container cleanup. I would focus on the part to cleanup on shutdown in this patch, and tackle cleanup on startup in YARN-73. As Bikas mentioned there needs to be a timeout on waiting for the containers to shutdown. The shutdown process waits for up to yarn.nodemanager.process-kill-wait.ms for the PID to appear, then yarn.nodemanager.sleep-delay-before-sigkill.ms before sending a SIGKILL signal (after a SIGTERM) if the process hasn't died - see ContainerLaunch#cleanupContainer. Waiting for a little longer than the sum of these durations would be sufficient. Regarding testing, you could have a test like the one in TestContainerLaunch#testDelayedKill to test that containers are correctly cleaned up after stopping a NM. > NM should handle cleaning up containers when it shuts down ( and kill > containers from an earlier instance when it comes back up after an unclean > shutdown ) > ----------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-72 > URL: https://issues.apache.org/jira/browse/YARN-72 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Hitesh Shah > Assignee: Sandy Ryza > Attachments: YARN-72.patch > > > Ideally, the NM should wait for a limited amount of time when it gets a > shutdown signal for existing containers to complete and kill the containers ( > if we pick an aggressive approach ) after this time interval. > For NMs which come up after an unclean shutdown, the NM should look through > its directories for existing container.pids and try and kill an existing > containers matching the pids found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira