[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725099#comment-13725099 ] Hudson commented on YARN-966: - SUCCESS: Integrated in Hadoop-Yarn-trunk #287 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/287/]) YARN-966. Fixed ContainerLaunch to not fail quietly when there are no localized resources due to some other failure. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508688) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725211#comment-13725211 ] Hudson commented on YARN-966: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1477 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1477/]) YARN-966. Fixed ContainerLaunch to not fail quietly when there are no localized resources due to some other failure. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508688) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725275#comment-13725275 ] Hudson commented on YARN-966: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1504 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1504/]) YARN-966. Fixed ContainerLaunch to not fail quietly when there are no localized resources due to some other failure. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508688) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725483#comment-13725483 ] Omkar Vinit Joshi commented on YARN-966: bq. One more consideration. Empty map can means the case that the container is at LOCALIZED, but actually there's no localized resources. Returning null is to distinguish this case with the case of fetch the localized resources when the container is not at LOCALIZED. This assumption is wrong. State of container has nothing to do with localized resources map. we can call getState and know its state irrespective of this null check. Potentially I don't see when we will in fact start ContainerLaunch#call without its all resources getting downloaded. Its different issue that user may kill the container resulting into a state transition. This I still see should not be done via NULL check. Proper way is to set boolean flag of ContainerLaunch in the event of KILL synchronously. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725489#comment-13725489 ] Omkar Vinit Joshi commented on YARN-966: So if user say kills the container what error we will see is {code} +RPCUtil.getRemoteException( +Unable to get local resources when Container + containerID + + is at + container.getContainerState()); {code} which is completely misleading.. Indeed this occurred because user killed container not because it failed to localize resources. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725518#comment-13725518 ] Vinod Kumar Vavilapalli commented on YARN-966: -- bq. Potentially I don't see when we will in fact start ContainerLaunch#call without its all resources getting downloaded. This is the most important point. bq. This I still see should not be done via NULL check. Proper way is to set boolean flag of ContainerLaunch in the event of KILL synchronously. bq. which is completely misleading.. Indeed this occurred because user killed container not because it failed to localize resources. I think we are beating this down to death. Like I said, this error SHOULD NOT happen in practice. I don't know whey the assert was originally put in place. That said, I didn't want to blindly remove it without knowing why it was there to begin with. If ever we run into this in real life, we can fix the message. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725526#comment-13725526 ] Zhijie Shen commented on YARN-966: -- bq. Potentially I don't see when we will in fact start ContainerLaunch#call without its all resources getting downloaded. YARN-906 is such a corner case. bq. This I still see should not be done via NULL check. Proper way is to set boolean flag of ContainerLaunch in the event of KILL synchronously. The original code checks state == LOCALIZED, and throws AssertError when getting the localized resources. I just modified the way to indicate the error, such that the callers of it can more easily handle the error. If you think calling getLocalizedResources() when the container is not at LOCALIZED is not wrong, I'm afraid we're in the different conversation. bq. which is completely misleading.. Indeed this occurred because user killed container not because it failed to localize resources. I don't think the message is misleading. Again, getLocalizedResources() is not allowed to be called when the container is not at LOCALIZED (at least the original code means it). So the message clearly states problem. Please note that killing signal is not the root problem of the thread failure here. If getLocalizedResources() were not called, the thread would still complete without exception. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724621#comment-13724621 ] Vinod Kumar Vavilapalli commented on YARN-966: -- Looks straight-forward. +1. Checking this in. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724631#comment-13724631 ] Omkar Vinit Joshi commented on YARN-966: few questions.. {code} -assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! -return localizedResources; + if (ContainerState.LOCALIZED == getContainerState()) { +return localizedResources; + } else { +return null; + } {code} after this change are we saying that container's partial localized resources will not be accessible during the state other than LOCALIZED and we will return null?? The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724638#comment-13724638 ] Hudson commented on YARN-966: - SUCCESS: Integrated in Hadoop-trunk-Commit #4190 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4190/]) YARN-966. Fixed ContainerLaunch to not fail quietly when there are no localized resources due to some other failure. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508688) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724642#comment-13724642 ] Omkar Vinit Joshi commented on YARN-966: * Also I think we are mixing YARN-966 with YARN-906. I don't see any point why we should return null...we can return empty map if no resources are localized.. thoughts? {code} private final MapPath,ListString localizedResources = new HashMapPath,ListString(); {code} * I guess we should not fix that TODO. that has nothing to do with getLocalizedResources() call. Probably caller can make getState() call and figure out in what state container isthoughts? The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724694#comment-13724694 ] Vinod Kumar Vavilapalli commented on YARN-966: -- bq. after this change are we saying that container's partial localized resources will not be accessible during the state other than LOCALIZED and we will return null?? There is only one user of the API today. We can change it in future. bq. Also I think we are mixing YARN-966 with YARN-906. I don't see any point why we should return null...we can return empty map if no resources are localized.. thoughts? We aren't mixing YARN-966. We are just fixing an assert to be explicitly checked so that IN case something happens, we can report properly. In all my understanding, this code path should never get triggered. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724704#comment-13724704 ] Zhijie Shen commented on YARN-966: -- bq. Also I think we are mixing YARN-966 with YARN-906. I don't see any point why we should return null...we can return empty map if no resources are localized.. thoughts? One more consideration. Empty map can means the case that the container is at LOCALIZED, but actually there's no localized resources. Returning null is to distinguish this case with the case of fetch the localized resources when the container is not at LOCALIZED. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.1-beta Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718138#comment-13718138 ] Hadoop QA commented on YARN-966: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593887/YARN-966.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1564//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1564//console This message is automatically generated. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-966) The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718194#comment-13718194 ] Hadoop QA commented on YARN-966: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593887/YARN-966.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1572//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1572//console This message is automatically generated. The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED --- Key: YARN-966 URL: https://issues.apache.org/jira/browse/YARN-966 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-966.1.patch In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira