[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995743#comment-13995743 ]
Karthik Kambatla commented on YARN-1861: ---------------------------------------- bq. Also, we need to make sure that when automatic failover is enabled, all external interventions like a fence like this bug (and forced-manual failover from CLI?) do a similar reset into the leader election. There may not be cases like this today though. One way to future-proof this is to call resetLeaderElection in ResourceManager#transitionToStandby itself. That looks hacky, but doesn't require new external interventions to explicitly handle it. [~vinodkv] - do you think that would be a better approach? > Both RM stuck in standby mode when automatic failover is enabled > ---------------------------------------------------------------- > > Key: YARN-1861 > URL: https://issues.apache.org/jira/browse/YARN-1861 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.4.0 > Reporter: Arpit Gupta > Assignee: Karthik Kambatla > Priority: Blocker > Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, > YARN-1861.5.patch, yarn-1861-1.patch, yarn-1861-6.patch > > > In our HA tests we noticed that the tests got stuck because both RM's got > into standby state and no one became active. -- This message was sent by Atlassian JIRA (v6.2#6252)