[ 
https://issues.apache.org/jira/browse/YARN-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335205#comment-17335205
 ] 

Qi Zhu commented on YARN-9443:
------------------------------

[~prabhujoseph] [~ztang] [~ebadger] [~epayne]

Is this going on, now the state store is used to store in ZK, but in large 
cluster will not run very well. 

 YARN-5123 use sql based to store the state, but it is also not a hot HA like 
NameNode in HDFS.

If we want to realize the hot HA for resourcemanager, it's a very good choice 
to use ratis(raft) to make the state consistent in HA mode (the actvie RM state 
consistent with standby RM state, use log commit in raft), when we transform to 
standby we don't need fence to load the large state from ZK, we can realize the 
hot HA.

Thanks.

> Fast RM Failover using Ratis (Raft protocol)
> --------------------------------------------
>
>                 Key: YARN-9443
>                 URL: https://issues.apache.org/jira/browse/YARN-9443
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>    Affects Versions: 3.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>
> During Failover, the RM Standby will have a lag as it has to recover from 
> Zookeeper / FileSystem StateStore. RM HA using Ratis (Raft Protocol) can 
> achieve Fast failover as all RMs are in sync already. This is used by Ozone - 
> HDDS-505.
>  
> cc [~nandakumar131]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to