[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995316#comment-13995316 ]
Bikas Saha commented on YARN-2001: ---------------------------------- I think the offline discussion agreement was that there would be a threshold for NM's to resync. After that threshold the scheduler would be started. After that the NM's have until the NM heartbeat expire interval to resync. After the NM expiry interval, the NM's are considered lost (consistent with current behavior). > Threshold for RM to accept requests from AM after failover > ---------------------------------------------------------- > > Key: YARN-2001 > URL: https://issues.apache.org/jira/browse/YARN-2001 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Jian He > Assignee: Jian He > > After failover, RM may require a certain threshold to determine whether it’s > safe to make scheduling decisions and start accepting new container requests > from AMs. The threshold could be a certain amount of nodes. i.e. RM waits > until a certain amount of nodes joining before accepting new container > requests. Or it could simply be a timeout, only after the timeout RM accepts > new requests. > NMs joined after the threshold can be treated as new NMs and instructed to > kill all its containers. -- This message was sent by Atlassian JIRA (v6.2#6252)