[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630672#comment-13630672
 ] 

Xuan Gong commented on YARN-513:
--------------------------------

>From ApplicationMaster perspective: 
1. The very first communication it will have with the RM is for Register itself 
with RM which is from AMRMClientImpl::registerApplicationMaster(), so we can 
add waitting logic here, to try several times until it is accepted or throw out 
the exceptions

>From Client Perspective: 
1. The very first communication it will have with the RM is 
getNewApplication(), which is in YarnClientImpl::getNewApplication(request), we 
can add waitting logic here.

In order to do that, we need add several const and variables to 
YarnConfiguration, such as AM_RM_CONNECTION_RETRY_INTERVAL_SECS, 
AM_RM_CONNECT_WAIT_SECS, CLIENT_RM_CONNECTION_RETRY_INTERVAL_SECS and 
CLIENT_RM_CONNECTION_WAIT_SECS.
                
> Verify all clients will wait for RM to restart
> ----------------------------------------------
>
>                 Key: YARN-513
>                 URL: https://issues.apache.org/jira/browse/YARN-513
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>
> When the RM is restarting, the NM, AM and Clients should wait for some time 
> for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to