[ https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
SammiChen updated YARN-7450: ---------------------------- Target Version/s: 2.9.2 (was: 2.9.1) > ATS Client should retry on intermittent Kerberos issues. > -------------------------------------------------------- > > Key: YARN-7450 > URL: https://issues.apache.org/jira/browse/YARN-7450 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 > Affects Versions: 2.7.3 > Environment: Hadoop-2.7.3 > Reporter: Ravi Prakash > Priority: Major > > We saw a stack trace (posted in the first comment) in the ResourceManager > logs for the TimelineClientImpl not being able to relogin from keytab. > I'm guessing there was an intermittent issue that failed the kerberos relogin > from keytab. However, I'm assuming this was *not* retried because I only saw > one instance of this stack trace. I propose that this operation should have > been retried. > It seems, this caused events at the ResourceManager to queue up and > eventually stop responding to even basic {{yarn application -list}} commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org