[ https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Botong Huang updated YARN-8010: ------------------------------- Attachment: YARN-8010.v1.patch > add config in FederationRMFailoverProxy to not bypass facade cache when > failing over > ------------------------------------------------------------------------------------ > > Key: YARN-8010 > URL: https://issues.apache.org/jira/browse/YARN-8010 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Botong Huang > Assignee: Botong Huang > Priority: Minor > Attachments: YARN-8010.v1.patch > > > Today when YarnRM is failing over, the FederationRMFailoverProxy running in > AMRMProxy will perform failover, try to get latest subcluster info from > FederationStateStore and then retry connect to the latest YarnRM master. When > calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache > with a flush flag. When YarnRM is failing over, every AM heartbeat thread > creates a different thread inside FederationInterceptor, each of which keeps > performing failover several times. This leads to a big spike of getSubCluster > call to FederationStateStore. > Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), > YarnRM master slave change might not result in RM addr change. In other > cases, a small delay of getting latest subcluster information may be > acceptable. This patch thus creates a config option, so that it is possible > to ask the FederationRMFailoverProxy to not flush cache when calling > getSubCluster(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org