[ https://issues.apache.org/jira/browse/YARN-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Subru Krishnan reassigned YARN-5711: ------------------------------------ Assignee: Subru Krishnan > AM cannot reconnect to RM after failover when using > RequestHedgingRMFailoverProxyProvider > ----------------------------------------------------------------------------------------- > > Key: YARN-5711 > URL: https://issues.apache.org/jira/browse/YARN-5711 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, resourcemanager > Affects Versions: 2.9.0, 3.0.0-alpha1 > Reporter: Subru Krishnan > Assignee: Subru Krishnan > Priority: Critical > > When RM failsover, it does _not_ auto re-register running apps and so they > need to re-register when reconnecting to new primary. This is done by > catching {{ApplicationMasterNotRegisteredException}} in *allocate* calls and > re-registering. But *RequestHedgingRMFailoverProxyProvider* does _not_ > propagate {{YarnException}} as the actual invocation is done asynchronously > using seperate threads, so AMs cannot reconnect to RM after failover. > This JIRA proposes that the *RequestHedgingRMFailoverProxyProvider* propagate > any {{YarnException}} that it encounters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org