[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049563#comment-14049563 ]
Rohith commented on YARN-1366: ------------------------------ bq. These two synchronized block can be merged into one ? This I separated intentionally for the handling the very corner scenario i.e after AM gets resync it go for re registering the AM. By worst case, with this period of time, if again RM goes down, then registerapplicationmaster start retry both RM's. Thought not to block AMRMClient oprations such as updateblacklist,addContainerRequest and others so on... Would you think time taken to retry is not more and it can be blocked? > AM should implement Resync with the ApplicationMasterService instead of > shutting down > ------------------------------------------------------------------------------------- > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, > YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, > YARN-1366.8.patch, YARN-1366.9.patch, YARN-1366.patch, > YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)