[ https://issues.apache.org/jira/browse/YARN-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Botong Huang updated YARN-8673: ------------------------------- Attachment: YARN-8673.v2.patch > [AMRMProxy] More robust responseId resync after an YarnRM master slave switch > ----------------------------------------------------------------------------- > > Key: YARN-8673 > URL: https://issues.apache.org/jira/browse/YARN-8673 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy > Reporter: Botong Huang > Assignee: Botong Huang > Priority: Major > Attachments: YARN-8673.v1.patch, YARN-8673.v2.patch > > > After master slave switch of YarnRM, an _ApplicationNotRegisteredException_ > will be thrown from the new YarnRM. AM will re-regsiter and reset the > responseId to zero. _AMRMClientRelayer_ inside _FederationInterceptor_ > follows the same protocol, and does the automatic re-register and responseId > resync. However, when exceptions or temporary network issue happens in the > allocate call after re-register, the resync logic might be broken. This patch > improves the robustness of the process by parsing the expected repsonseId > from YarnRM exception message. So that whenever the responseId is out of sync > for whatever reason, we can automatically resync and move on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org