[ https://issues.apache.org/jira/browse/YARN-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641308#comment-13641308 ]
Bikas Saha commented on YARN-562: --------------------------------- Shouldnt the new exception be inheriting from YarnException, the common base class? I actually like NMNotConnectedWithRMException because NotYetReady could be due to various other reasons. No strong opinion. Is there an existing InvalidContainerException for cases when ContainerToken is invalid? How about InvalidContainerException as a name. If the only thing the client can do is get a new container from the RM then there may not be any point in differentiating the reasons. If we really want to keep RM in the name then maybe InvalidContainerFromUnknownRM. Previous may not be correct. I think the invalidation need to be done before sending the event because technically this thread could be suspended immediately after sending the event. So the handler thread could run before the invalidation happens. {code} dispatcher.getEventHandler().handle( new NodeManagerEvent(NodeManagerEventType.RESYNC)); + // Invalidate the RMIdentifier while resync + setRMIdentifier(ResourceManagerConstants.RM_INVALID_IDENTIFIER); break; {code} Reads weird that container manager is notifying itself. {code} + + LOG.info("Notifying ContainerManager to block new container-requests as " + + "NodeManager is still starting."); + this.setBlockNewContainerRequests(true); {code} Would be good to continue looping until notified that the containermanager is no longer blocked. {code} + try { // HERE set FLAG to stop thread + launchContainersThread.join(); + super.setBlockNewContainerRequests(blockNewContainerRequests); .... + try { // HERE check FLAG to stop thread + while (numContainers++ < 10) { {code} > NM should reject containers allocated by previous RM > ---------------------------------------------------- > > Key: YARN-562 > URL: https://issues.apache.org/jira/browse/YARN-562 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-562.10.patch, YARN-562.1.patch, YARN-562.2.patch, > YARN-562.3.patch, YARN-562.4.patch, YARN-562.5.patch, YARN-562.6.patch, > YARN-562.7.patch, YARN-562.8.patch, YARN-562.9.patch > > > Its possible that after RM shutdown, before AM goes down,AM still call > startContainer on NM with containers allocated by previous RM. When RM comes > back, NM doesn't know whether this container launch request comes from > previous RM or the current RM. we should reject containers allocated by > previous RM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira