[ https://issues.apache.org/jira/browse/YARN-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szilard Nemeth updated YARN-10789: ---------------------------------- Fix Version/s: 3.3.2 3.2.3 > RM HA startup can fail due to race conditions in ZKConfigurationStore > --------------------------------------------------------------------- > > Key: YARN-10789 > URL: https://issues.apache.org/jira/browse/YARN-10789 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Tarun Parimi > Assignee: Tarun Parimi > Priority: Major > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: YARN-10789.001.patch, YARN-10789.002.patch, > YARN-10789.branch-3.2.001.patch, YARN-10789.branch-3.3.001.patch > > > We are observing below error randomly during hadoop install and RM initial > startup when HA is enabled and yarn.scheduler.configuration.store.class=zk is > configured. This causes one of the RMs to not startup. > {code:java} > 2021-05-26 12:59:18,986 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state INITED > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /confstore/CONF_STORE > {code} > We are trying to create the znode /confstore/CONF_STORE when we initialize > the ZKConfigurationStore. But the problem is that the ZKConfigurationStore is > initialized when CapacityScheduler does a serviceInit. This serviceInit is > done by both Active and Standby RM. So we can run into a race condition when > both Active and Standby try to create the same znode when both RM are started > at same time. > ZKRMStateStore on the other hand avoids such race conditions, by creating the > znodes only after serviceStart. serviceStart only happens for the active RM > which won the leader election, unlike serviceInit which happens irrespective > of leader election. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org