[ https://issues.apache.org/jira/browse/YARN-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862405#comment-15862405 ]
Bibin A Chundatt commented on YARN-6178: ---------------------------------------- [~varun_saxena] Looked into the code and did not find a case this could happen. Even if label list is empty file size should not be empty. But the cause of failure for RM not starting in my laptop setup is due to node label mirror image size. Will try to reproduce the same. We could handle file size 0 case also so that RM startup will not fail. Thoughts?? > RM recovery failure on node label mirror load > --------------------------------------------- > > Key: YARN-6178 > URL: https://issues.apache.org/jira/browse/YARN-6178 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Priority: Critical > > Node label feature enabled .File state store the mirror file size is zero. > {noformat} > secureuser@vm2:/tmp/hadoop-yarn-yarn/node-labels> l > total 8 > drwxr-xr-x 2 secureuser hadoop 4096 Feb 6 18:56 ./ > drwxr-xr-x 3 secureuser hadoop 4096 Jan 22 22:07 ../ > -rw-r--r-- 1 secureuser hadoop 0 Feb 6 18:56 nodelabel.editlog > -rw-r--r-- 1 secureuser hadoop 0 Feb 6 18:56 nodelabel.mirror > {noformat} > {noformat} > 2017-02-11 10:04:59,034 INFO org.apache.hadoop.conf.Configuration: > dynamic-resources.xml not found > 2017-02-11 10:04:59,042 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > OPERATION=transitionToActive TARGET=RM RESULT=FAILURE > DESCRIPTION=Exception transitioning to active PERMISSIONS= > 2017-02-11 10:04:59,042 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.initLocalNodeLabels(AddToClusterNodeLabelsRequestPBImpl.java:117) > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.getNodeLabels(AddToClusterNodeLabelsRequestPBImpl.java:129) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:169) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:205) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:761) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1139) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1175) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:316) > ... 5 more > 2017-02-11 10:04:59,042 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > {noformat} > Should skip load if the size is zero. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org