[ 
https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240899#comment-16240899
 ] 

Ravi Prakash commented on YARN-7450:
------------------------------------

{code}
2017-10-29 02:30:30,260 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher: 
Error when publishing entity [YARN_APPLICATION,application_1507181091525_3046]
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Login 
failure for <SOME_PRINCIPAL>@<SOME_REALM> from keytab <SOME_KEYTAB_FILE>
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:235)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:184)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:246)
        at com.sun.jersey.api.client.Client.handle(Client.java:648)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at 
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:483)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:332)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:329)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:329)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:314)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:452)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationCreatedEvent(SystemMetricsPublisher.java:265)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:220)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:469)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:464)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Login failure for <SOME_PRINCIPAL>@<SOME_REALM> 
from keytab <SOME_KEYTAB_FILE>
        at 
org.apache.hadoop.security.UserGroupInformation.reloginFromKeytab(UserGroupInformation.java:1109)
        at 
org.apache.hadoop.security.UserGroupInformation.checkTGTAndReloginFromKeytab(UserGroupInformation.java:1042)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:500)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
        ... 23 more
Caused by: javax.security.auth.login.LoginException: Generic error (description 
in e-text) (60) - LOOKING_UP_CLIENT
        at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
        at 
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
        at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
        at java.security.AccessController.doPrivileged(Native Method)
        at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
        at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
        at 
org.apache.hadoop.security.UserGroupInformation.reloginFromKeytab(UserGroupInformation.java:1101)
        ... 27 more
Caused by: KrbException: Generic error (description in e-text) (60) - 
LOOKING_UP_CLIENT
        at sun.security.krb5.KrbAsRep.<init>(KrbAsRep.java:82)
        at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316)
        at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361)
        at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776)
        ... 40 more
Caused by: KrbException: Identifier doesn't match expected value (906)
        at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
        at sun.security.krb5.internal.ASRep.init(ASRep.java:64)
        at sun.security.krb5.internal.ASRep.<init>(ASRep.java:59)
        at sun.security.krb5.KrbAsRep.<init>(KrbAsRep.java:60)
        ... 43 more
{code}

> ATS Client should retry on intermittent Kerberos issues.
> --------------------------------------------------------
>
>                 Key: YARN-7450
>                 URL: https://issues.apache.org/jira/browse/YARN-7450
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: ATSv2
>    Affects Versions: 2.7.3
>         Environment: Hadoop-2.7.3
>            Reporter: Ravi Prakash
>
> We saw a stack track (posted in the first comment) in the ResourceManager 
> logs for the TimelineClientImpl not being able to relogin from keytab.
> I'm guessing there was an intermittent network issue that failed the kerberos 
> relogin from keytab. However, I'm assuming this was *not* retried because I 
> only saw one instance of this stack trace.  I propose that this operation 
> should have been retried.
> It seems, this caused events at the ResourceManager to queue up and 
> eventually stop responding to even basic {{yarn application -list}} commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to