Marouane RAJI created YARN-9506: ----------------------------------- Summary: Node Managers fail to update cached IP entries of Resource Managers Key: YARN-9506 URL: https://issues.apache.org/jira/browse/YARN-9506 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.1 Reporter: Marouane RAJI Attachments: NM_logs.txt
Hi, We are running a Yarn Cluster (for Samza Jobs) on AWS. We are running it in HA mode, with yarn.resourcemanager.ha.automatic-failover.enabled= true To reproduce the issue : # Have a running cluster with 2 NodeManagers and 2 Resource Managers in HA mode, with fail-over enabled. ** These Resource Managers need to have DNS entries defined, and set in the config: *** ex: yarnrm1.me.local and yarnrm2.me.local # stop the active resource manager (yarnrm1.me.local), and retire its instance. (Node Managers will fallback to the standby yarnrm2.me.local) # provision a new resource manager with a new IP. Make sure the DNS entry yarnrm1.me.local is assigned to it. # stop the new active resource manager (yarnrm2.me.local). # Check the logs of NodeManagers failing to access the newly provisioned Resource Manager, and trying to access it through the old IP. I can provide config files, yarn-site and core-site if needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org