[ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985709#comment-13985709
 ] 

Bikas Saha commented on YARN-2001:
----------------------------------

Requiring all NM's to re-register might to too constraining because after a 
full code rollout, it may be common for some NM's to not come back. If the RM 
gets stuck for a minority of NM's not re-registering then it would effectively 
be loss of HA.
I like the idea of waiting for a time period before considering the cluster 
fully up. However this timeout has to be small or else we will have a lot of 
downtime. Can this timeout be less than the AM liveliness period? If not then 
how do we treat AMs that are running on NM's that have not re-registered within 
the NM timeout?


> Persist NMs info for RM restart
> -------------------------------
>
>                 Key: YARN-2001
>                 URL: https://issues.apache.org/jira/browse/YARN-2001
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>
> RM should not accept allocate requests from AMs until all the NMs have 
> registered with RM. For that, RM needs to remember the previous NMs and wait 
> for all the NMs to register.
> This is also useful for remembering decommissioned nodes across restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to