[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725518#comment-13725518 ]
Vinod Kumar Vavilapalli commented on YARN-966: ---------------------------------------------- bq. Potentially I don't see when we will in fact start ContainerLaunch#call without its all resources getting downloaded. This is the most important point. bq. This I still see should not be done via NULL check. Proper way is to set boolean flag of ContainerLaunch in the event of KILL synchronously. bq. which is completely misleading.. Indeed this occurred because user killed container not because it failed to localize resources. I think we are beating this down to death. Like I said, this error SHOULD NOT happen in practice. I don't know whey the assert was originally put in place. That said, I didn't want to blindly remove it without knowing why it was there to begin with. If ever we run into this in real life, we can fix the message. > The thread of ContainerLaunch#call will fail without any signal if > getLocalizedResources() is called when the container is not at LOCALIZED > ------------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-966 > URL: https://issues.apache.org/jira/browse/YARN-966 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Zhijie Shen > Assignee: Zhijie Shen > Fix For: 2.1.1-beta > > Attachments: YARN-966.1.patch > > > In ContainerImpl.getLocalizedResources(), there's: > {code} > assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! > {code} > ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), > which is scheduled on a separate thread. If the container is not at LOCALIZED > (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and > fails the thread without notifying NM. Therefore, the container cannot > receive more events, which are supposed to be sent from > ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira