[ https://issues.apache.org/jira/browse/YARN-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335205#comment-17335205 ]
Qi Zhu commented on YARN-9443: ------------------------------ [~prabhujoseph] [~ztang] [~ebadger] [~epayne] Is this going on, now the state store is used to store in ZK, but in large cluster will not run very well. YARN-5123 use sql based to store the state, but it is also not a hot HA like NameNode in HDFS. If we want to realize the hot HA for resourcemanager, it's a very good choice to use ratis(raft) to make the state consistent in HA mode (the actvie RM state consistent with standby RM state, use log commit in raft), when we transform to standby we don't need fence to load the large state from ZK, we can realize the hot HA. Thanks. > Fast RM Failover using Ratis (Raft protocol) > -------------------------------------------- > > Key: YARN-9443 > URL: https://issues.apache.org/jira/browse/YARN-9443 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager > Affects Versions: 3.2.0 > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Major > > During Failover, the RM Standby will have a lag as it has to recover from > Zookeeper / FileSystem StateStore. RM HA using Ratis (Raft Protocol) can > achieve Fast failover as all RMs are in sync already. This is used by Ozone - > HDDS-505. > > cc [~nandakumar131] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org