[ 
https://issues.apache.org/jira/browse/YARN-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862405#comment-15862405
 ] 

Bibin A Chundatt commented on YARN-6178:
----------------------------------------

[~varun_saxena]

Looked into the code and did not find a case this could happen. Even if label 
list is  empty file size should not be empty.
But  the cause of failure for RM not starting in my laptop setup is due to node 
label mirror image size. Will try to reproduce the same.
We could  handle  file size 0 case also so that RM startup will not fail. 
Thoughts??


> RM recovery failure on node label mirror load
> ---------------------------------------------
>
>                 Key: YARN-6178
>                 URL: https://issues.apache.org/jira/browse/YARN-6178
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>
> Node label feature enabled .File state store the mirror file size is zero. 
> {noformat}
> secureuser@vm2:/tmp/hadoop-yarn-yarn/node-labels> l
> total 8
> drwxr-xr-x 2 secureuser hadoop 4096 Feb  6 18:56 ./
> drwxr-xr-x 3 secureuser hadoop 4096 Jan 22 22:07 ../
> -rw-r--r-- 1 secureuser hadoop    0 Feb  6 18:56 nodelabel.editlog
> -rw-r--r-- 1 secureuser hadoop    0 Feb  6 18:56 nodelabel.mirror
> {noformat}
> {noformat}
> 2017-02-11 10:04:59,034 INFO org.apache.hadoop.conf.Configuration: 
> dynamic-resources.xml not found
> 2017-02-11 10:04:59,042 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn     
> OPERATION=transitionToActive    TARGET=RM       RESULT=FAILURE  
> DESCRIPTION=Exception transitioning to active   PERMISSIONS=
> 2017-02-11 10:04:59,042 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
>         ... 4 more
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.initLocalNodeLabels(AddToClusterNodeLabelsRequestPBImpl.java:117)
>         at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.getNodeLabels(AddToClusterNodeLabelsRequestPBImpl.java:129)
>         at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:169)
>         at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:205)
>         at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
>         at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:761)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1139)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1175)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1175)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:316)
>         ... 5 more
> 2017-02-11 10:04:59,042 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> {noformat}
> Should skip load if the size is zero. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to