[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242081#comment-15242081
 ] 

Wangda Tan commented on YARN-3971:
----------------------------------

Thanks [~bibinchundatt]/[~Naganarasimha] looking at this issue.

I think we need to fix AbstractService.state transition, only return started 
after serviceStart() returns.

Do you know why we need to change service state to started before invoke 
serviceStart()? It doesn't make sense to me actually...

+ [~vinodkv], who made this change in YARN-530.

> Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
> recovery
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-3971
>                 URL: https://issues.apache.org/jira/browse/YARN-3971
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>             Fix For: 2.8.0
>
>         Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
> 0003-YARN-3971.patch, 0004-YARN-3971.patch, 
> 0005-YARN-3971.001.addendum.patch, 0005-YARN-3971.addendum.patch, 
> 0005-YARN-3971.patch
>
>
> Steps to reproduce 
> # Create label x,y
> # Delete label x,y
> # Create label x,y add capacity scheduler xml for labels x and y too
> # Restart RM 
>  
> Both RM will become Standby.
> Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
> {code}
> 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
> queue=a1 is using this label. Please remove label on queue before remove the 
> label
> java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
> label. Please remove label on queue before remove the label
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
>         at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
>         at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
>         at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
>         at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to