[ https://issues.apache.org/jira/browse/YARN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482853#comment-13482853 ]
Vinod Kumar Vavilapalli commented on YARN-167: ---------------------------------------------- bq. There isn't anything like a missed state that is causing this issue if I understand Ravi's issue description correctly. Obviously, this could be wrong. Ravi, if you have one of these stuck AMs lying around, can you take a thread dump please? > AM stuck in KILL_WAIT for days > ------------------------------ > > Key: YARN-167 > URL: https://issues.apache.org/jira/browse/YARN-167 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 0.23.3 > Reporter: Ravi Prakash > Assignee: Vinod Kumar Vavilapalli > Attachments: TaskAttemptStateGraph.jpg > > > We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them > as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a > few maps running. All these maps were scheduled on nodes which are now in the > RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira