Sandy Ryza created YARN-1110: -------------------------------- Summary: NodeManager doesn't complete container after transition from LOCALIZED to KILLING Key: YARN-1110 URL: https://issues.apache.org/jira/browse/YARN-1110 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Sandy Ryza
Multiple containers are sticking around on an NM, taking up resources, after they have been killed. {code} 2013-08-27 15:56:36,597 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1377559361179_0018_01_001337 by user llama 2013-08-27 15:56:36,597 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=llama IP=10.20.191.233 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1377559361179_0018 CONTAINERID=container_1377559361179_0018_01_001337 2013-08-27 15:56:36,598 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1377559361179_0018_01_001337 to application application_1377559361179_0018 2013-08-27 15:56:36,598 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1377559361179_0018_01_001337 transitioned from NEW to LOCALIZED 2013-08-27 15:56:36,613 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1377559361179_0018_01_001337 2013-08-27 15:56:36,616 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=llama IP=10.20.191.233 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1377559361179_0018 CONTAINERID=container_1377559361179_0018_01_001337 2013-08-27 15:56:36,616 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1377559361179_0018_01_001337 transitioned from LOCALIZED to KILLING 2013-08-27 15:56:36,616 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1377559361179_0018_01_001337 2013-08-27 15:56:36,616 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1377559361179_0018_01_001337 not launched. No cleanup needed to be done 2013-08-27 15:56:36,617 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 18, cluster_timestamp: 1377559361179, }, attemptId: 1, }, id: 402, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, {code} This is the last time the container is mentioned in the logs. We never get a {code} 2013-08-27 15:56:38,832 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container <containerID> {code} like we do for other completed containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira