[ 
https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725483#comment-13725483
 ] 

Omkar Vinit Joshi commented on YARN-966:
----------------------------------------

bq. One more consideration. Empty map can means the case that the container is 
at LOCALIZED, but actually there's no localized resources. Returning null is to 
distinguish this case with the case of fetch the localized resources when the 
container is not at LOCALIZED.
This assumption is wrong. State of container has nothing to do with localized 
resources map. we can call getState and know its state irrespective of this 
null check. Potentially I don't see when we will in fact start 
ContainerLaunch#call without its all resources getting downloaded. Its 
different issue that user  may kill the container resulting into a state 
transition. This I still see should not be done via NULL check. Proper way is 
to set boolean flag of ContainerLaunch in the event of KILL synchronously. 
                
> The thread of ContainerLaunch#call will fail without any signal if 
> getLocalizedResources() is called when the container is not at LOCALIZED
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-966
>                 URL: https://issues.apache.org/jira/browse/YARN-966
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>             Fix For: 2.1.1-beta
>
>         Attachments: YARN-966.1.patch
>
>
> In ContainerImpl.getLocalizedResources(), there's:
> {code}
> assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!!
> {code}
> ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), 
> which is scheduled on a separate thread. If the container is not at LOCALIZED 
> (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and 
> fails the thread without notifying NM. Therefore, the container cannot 
> receive more events, which are supposed to be sent from 
> ContainerLaunch.call(), and move towards completion. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to