[jira] [Commented] (YARN-7450) ATS Client should retry on intermittent Kerberos issues.

2018-03-01 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383031#comment-16383031
 ] 

SammiChen commented on YARN-7450:
-

Hi [~raviprak],  does this still target for 2.9.1?  If not, can we push this 
out to next 2.9.2 release? 

> ATS Client should retry on intermittent Kerberos issues.
> 
>
> Key: YARN-7450
> URL: https://issues.apache.org/jira/browse/YARN-7450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 2.7.3
> Environment: Hadoop-2.7.3
>Reporter: Ravi Prakash
>Priority: Major
>
> We saw a stack trace (posted in the first comment) in the ResourceManager 
> logs for the TimelineClientImpl not being able to relogin from keytab.
> I'm guessing there was an intermittent issue that failed the kerberos relogin 
> from keytab. However, I'm assuming this was *not* retried because I only saw 
> one instance of this stack trace.  I propose that this operation should have 
> been retried.
> It seems, this caused events at the ResourceManager to queue up and 
> eventually stop responding to even basic {{yarn application -list}} commands.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7450) ATS Client should retry on intermittent Kerberos issues.

2017-11-13 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249816#comment-16249816
 ] 

Ravi Prakash commented on YARN-7450:


This is more complicated than I thought. 
[UserGroupInformation.reloginFromKeyTab()|https://github.com/apache/hadoop/blob/975a57a6886e81e412bea35bf597beccc807a66f/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1321]
 *clears* away existing credentials, so a temporary Kerberos failure which may 
have been resolved by trying to retry the login in some time, will be exhibited 
immediately.

Hi [~daryn] ! Am I reading this right? Is it practical to fix this behavior?



> ATS Client should retry on intermittent Kerberos issues.
> 
>
> Key: YARN-7450
> URL: https://issues.apache.org/jira/browse/YARN-7450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 2.7.3
> Environment: Hadoop-2.7.3
>Reporter: Ravi Prakash
>
> We saw a stack trace (posted in the first comment) in the ResourceManager 
> logs for the TimelineClientImpl not being able to relogin from keytab.
> I'm guessing there was an intermittent network issue that failed the kerberos 
> relogin from keytab. However, I'm assuming this was *not* retried because I 
> only saw one instance of this stack trace.  I propose that this operation 
> should have been retried.
> It seems, this caused events at the ResourceManager to queue up and 
> eventually stop responding to even basic {{yarn application -list}} commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7450) ATS Client should retry on intermittent Kerberos issues.

2017-11-06 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240899#comment-16240899
 ] 

Ravi Prakash commented on YARN-7450:


{code}
2017-10-29 02:30:30,260 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher: 
Error when publishing entity [YARN_APPLICATION,application_1507181091525_3046]
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Login 
failure for @ from keytab 
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:235)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:184)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:246)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at 
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:483)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:332)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:329)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:329)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:314)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:452)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationCreatedEvent(SystemMetricsPublisher.java:265)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:220)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:469)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:464)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Login failure for @ 
from keytab 
at 
org.apache.hadoop.security.UserGroupInformation.reloginFromKeytab(UserGroupInformation.java:1109)
at 
org.apache.hadoop.security.UserGroupInformation.checkTGTAndReloginFromKeytab(UserGroupInformation.java:1042)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:500)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
... 23 more
Caused by: javax.security.auth.login.LoginException: Generic error (description 
in e-text) (60) - LOOKING_UP_CLIENT
at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
at 
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)