[ https://issues.apache.org/jira/browse/YARN-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140512#comment-15140512 ]
Rohith Sharma K S commented on YARN-4685: ----------------------------------------- Currently RMAppAttemptImpl start calling allocate method only when CONTAINER_ALLOCATED event triggered. But If container is not allocated then RMAppAttemptImpl will not call allocate method continuously. So even if we add code for sending updated blacklist addition/removal in {{RMAppAttemptImpl#AMContainerAllocatedTransition#transition}} that does not useful. Need to think alternatives to handle this scenario > AM blacklist addition/removal should get updated for every allocate call from > RMAppAttemptImpl. > ----------------------------------------------------------------------------------------------- > > Key: YARN-4685 > URL: https://issues.apache.org/jira/browse/YARN-4685 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > > AM blacklist addition or removal is updated only when RMAppAttempt is > scheduled i.e {{RMAppAttemptImpl#ScheduleTransition#transition}}. But once > attempt is scheduled if there is any removeNode/addNode in cluster then this > is not updated to {{BlackListManager#refreshNodeHostCount}}. This leads > BlackListManager to operate on stale NM's count. And application is in > ACCEPTED state and wait forever even if we add more nodes to cluster. > Solution is update BlacklistManager for every > {{RMAppAttemptImpl#AMContainerAllocatedTransition#transition}} call. This > ensures if there is any addition/removal in nodes, this will be updated to > BlacklistManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)