[ https://issues.apache.org/jira/browse/YARN-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953202#comment-14953202 ]
Jason Lowe commented on YARN-4254: ---------------------------------- True, registering could take significantly longer if DNS is slow. However IIRC the NameNode also resolves datanodes when they register and rejects datanodes that cannot be resolved, so I believe there is precedent for it. Curious, was this new node added as a datanode as well, and if so what did the NameNode do? Anyway we don't have to do registration rejection as part of this JIRA, and even with that fix it wouldn't solve the problem if the node was resolvable when it joined but not when the AM launched. The real issue for this JIRA is why did it try forever on a bad nodename resolution. Did it really try forever, or was it a case of something like YARN-3208 where it would eventually complete but just not for a really long time due to retries at multiple levels? > ApplicationAttempt stuck for ever due to UnknowHostexception > ------------------------------------------------------------ > > Key: YARN-4254 > URL: https://issues.apache.org/jira/browse/YARN-4254 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4254.patch > > > Scenario > ======= > 1. RM HA and 5 NMs available in cluster and are working fine > 2. Add one more NM to the same cluster but RM /etc/hosts not updated. > 3. Submit application to the same cluster > If Am get allocated to the newly added NM the *application attempt will get > stuck for ever*.User will not get to know why the same happened. > Impact > 1.RM logs gets overloaded with exception > 2.Application gets stuck for ever. > Handling suggestion YARN-261 allows for Fail application attempt . > If we fail the same next attempt could get assigned to another NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)