[ https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan updated YARN-6667: ----------------------------- Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Handle containerId duplicate without failing the heartbeat in Federation > Interceptor > ------------------------------------------------------------------------------------ > > Key: YARN-6667 > URL: https://issues.apache.org/jira/browse/YARN-6667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router > Affects Versions: 3.4.0 > Reporter: Botong Huang > Assignee: Shilun Fan > Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > From the actual situation, the probability of this happening is very low. > It can only be caused by the master-slave fail-hover of YARN and the wrong > Epoch parameter configuration. > We will try to be compatible with this situation and let the Application run > as much as possible, using the following measures: > 1. Select a node whose heartbeat does not time out for allocation, and at the > same time require the node to be in the RUNNING state. > 2. If the heartbeat of both RMs does not time out, and both are in the > RUNNING state, select the previously allocated RM for Container processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org