[ https://issues.apache.org/jira/browse/YARN-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohith Sharma K S updated YARN-4685: ------------------------------------ Summary: AM blacklisting result in application to get hanged (was: AM blacklist addition/removal should get updated for every allocate call from RMAppAttemptImpl.) > AM blacklisting result in application to get hanged > --------------------------------------------------- > > Key: YARN-4685 > URL: https://issues.apache.org/jira/browse/YARN-4685 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > > AM blacklist addition or removal is updated only when RMAppAttempt is > scheduled i.e {{RMAppAttemptImpl#ScheduleTransition#transition}}. But once > attempt is scheduled if there is any removeNode/addNode in cluster then this > is not updated to {{BlackListManager#refreshNodeHostCount}}. This leads > BlackListManager to operate on stale NM's count. And application is in > ACCEPTED state and wait forever even if we add more nodes to cluster. > Solution is update BlacklistManager for every > {{RMAppAttemptImpl#AMContainerAllocatedTransition#transition}} call. This > ensures if there is any addition/removal in nodes, this will be updated to > BlacklistManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)