[ 
https://issues.apache.org/jira/browse/YARN-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-8358.
------------------------------
    Resolution: Duplicate

> ResourceManager restart fail to recover due to TimelineServiceV1Publisher NPE
> -----------------------------------------------------------------------------
>
>                 Key: YARN-8358
>                 URL: https://issues.apache.org/jira/browse/YARN-8358
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.9.1
>         Environment: Ubuntu 16.04
> java version "1.8.0_91"
>            Reporter: Che Yufei
>            Priority: Major
>
> I'm upgrading from Hadoop 2.7.3 to 2.9.1. ResourceManager restart works fine 
> for 2.7.3, but fails on 2.9.1.
> I'm using LevelDB as the RM state store, the problem seems related to 
> TimelineServiceV1Publisher. If I set 
> yarn.resourcemanager.system-metrics-publisher.enabled to false, then recovery 
> works fine. But if the option is set to true, RM fails to start with the 
> following log:
>  
> {{2018-05-24 23:11:54,597 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started}}
> {{2018-05-24 23:11:54,673 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.1}}
> {{2018-05-24 23:11:54,688 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore: 
> Recovered 12 RM delegation token master keys}}
> {{2018-05-24 23:11:54,688 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore: 
> Recovered 0 RM delegation tokens}}
> {{2018-05-24 23:11:54,990 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore: 
> Recovered 2099 applications and 2100 application attempts}}
> {{2018-05-24 23:11:54,998 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore: 
> Recovered 0 reservations}}
> {{2018-05-24 23:11:54,998 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager:
>  recovering RMDelegationTokenSecretManager.}}
> {{2018-05-24 23:11:55,003 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Recovering 2099 
> applications}}
> {{2018-05-24 23:11:55,107 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Successfully 
> recovered 0 out of 2099 applications}}
> {{2018-05-24 23:11:55,108 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state}}
> {{java.lang.NullPointerException}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.appCreated(TimelineServiceV1Publisher.java:90)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.sendATSCreateEvent(RMAppImpl.java:1954)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:931)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1061)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1054)}}
> {{ at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)}}
> {{ at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)}}
> {{ at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)}}
> {{ at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:878)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:339)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:533)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1394)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:758)}}
> {{ at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1147)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1187)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183)}}
> {{ at java.security.AccessController.doPrivileged(Native Method)}}
> {{ at javax.security.auth.Subject.doAs(Subject.java:422)}}
> {{ at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1183)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1223)}}
> {{ at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)}}
> {{ at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1422)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to