[ https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276956#comment-14276956 ]
Steve Loughran commented on YARN-3061: -------------------------------------- in the source {{ RMAppAttemptMetrics attemptMetrics = rmApp.getCurrentAppAttempt().getRMAppAttemptMetrics();}} clearly the app failed *before any app attempt was created* The root cause looks like some token renewal thing probably caused by the VM save/resume, related to kerberos renewal by the look of things {code} org.apache.slider.funtest.lifecycle.AgentWebPagesIT testAgentWeb(org.apache.slider.funtest.lifecycle.AgentWebPagesIT) Time elapsed: 194.768 sec <<< FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 65 Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 192.168.1.134:8188, Ident: (owner=stevel, renewer=yarn, realUser=, issueDate=1421245210012, maxDate=1421850010012, sequenceNumber=11, masterKeyId=6) at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:691) at org.apache.slider.funtest.lifecycle.AgentWebPagesIT.testAgentWeb(AgentWebPagesIT.groovy:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Server side {code} 2015-01-14 14:20:16,993 ERROR metrics.SystemMetricsPublisher (SystemMetricsPublisher.java:putEntity(427)) - Error when publishing entity [YARN_APPLICATION,application_1420734007650_0010] org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:425) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationCreatedEvent(SystemMetricsPublisher.java:258) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:213) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:442) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:437) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-01-14 14:20:35,026 INFO impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(285)) - Timeline service address: http://devix.cotham.uk:8188/ws/v1/timeline/ 2015-01-14 14:20:35,766 WARN security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(785)) - Unable to add the application to the delegation token renewer. java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 192.168.1.134:8188, Ident: (owner=stevel, renewer=yarn, realUser=, issueDate=1421245210012, maxDate=1421850010012, sequenceNumber=11, masterKeyId=6) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: HTTP status [401], message [Unauthorized] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:394) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:380) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:449) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:464) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:398) at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) at org.apache.hadoop.security.token.Token.renew(Token.java:377) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) ... 6 more 2015-01-14 14:20:36,169 INFO rmapp.RMAppImpl (RMAppImpl.java:rememberTargetTransitionsAndStoreState(992)) - Updating application application_1420734007650_0010 with final state: FAILED 2015-01-14 14:20:36,185 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(718)) - application_1420734007650_0010 State change from NEW to FINAL_SAVING 2015-01-14 14:20:36,490 INFO recovery.RMStateStore (RMStateStore.java:transition(161)) - Updating info for app: application_1420734007650_0010 2015-01-14 14:20:37,274 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(718)) - application_1420734007650_0010 State change from FINAL_SAVING to FAILED {code} I plan to fix all that by restarting the VM...the NPE in the web view is something that could reoccur in similar circumstances > NPE in RM AppBlock render > ------------------------- > > Key: YARN-3061 > URL: https://issues.apache.org/jira/browse/YARN-3061 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Steve Loughran > Assignee: Varun Saxena > Priority: Minor > > An RM (running in a VM which did a sleep/resume) overnight no longer launches > apps, and when you try to look at the logs, Web UI says "500 look at the > logs", which show a stack trace -- This message was sent by Atlassian JIRA (v6.3.4#6332)