Jim Brennan created YARN-10479:
----------------------------------

             Summary: RMProxy should retry on SocketTimeout Exceptions
                 Key: YARN-10479
                 URL: https://issues.apache.org/jira/browse/YARN-10479
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: yarn
    Affects Versions: 2.10.1, 3.4.1
            Reporter: Jim Brennan
            Assignee: Jim Brennan


During an incident involving a DNS outage, a large number of nodemanagers 
failed to come back into service because they hit a socket timeout when trying 
to re-register with the RM.

SocketTimeoutException is not currently one of the exceptions that the RMProxy 
will retry.  Based on this incident, it seems like it should be.  We made this 
change internally about a year ago and it has been running in production since.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to