[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630672#comment-13630672 ]
Xuan Gong commented on YARN-513: -------------------------------- >From ApplicationMaster perspective: 1. The very first communication it will have with the RM is for Register itself with RM which is from AMRMClientImpl::registerApplicationMaster(), so we can add waitting logic here, to try several times until it is accepted or throw out the exceptions >From Client Perspective: 1. The very first communication it will have with the RM is getNewApplication(), which is in YarnClientImpl::getNewApplication(request), we can add waitting logic here. In order to do that, we need add several const and variables to YarnConfiguration, such as AM_RM_CONNECTION_RETRY_INTERVAL_SECS, AM_RM_CONNECT_WAIT_SECS, CLIENT_RM_CONNECTION_RETRY_INTERVAL_SECS and CLIENT_RM_CONNECTION_WAIT_SECS. > Verify all clients will wait for RM to restart > ---------------------------------------------- > > Key: YARN-513 > URL: https://issues.apache.org/jira/browse/YARN-513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Xuan Gong > > When the RM is restarting, the NM, AM and Clients should wait for some time > for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira