[ https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555388#comment-16555388 ]
Sunil Govindan commented on YARN-8541: -------------------------------------- Thanks [~bibinchundatt]. TestPlacementManager is not in 3.1 and hence makes sense to remove for 3.1. +1 > RM startup failure on recovery after user deletion > -------------------------------------------------- > > Key: YARN-8541 > URL: https://issues.apache.org/jira/browse/YARN-8541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.1.0 > Reporter: yimeng > Assignee: Bibin A Chundatt > Priority: Blocker > Attachments: YARN-8541-branch-3.1.003.patch, YARN-8541.001.patch, > YARN-8541.002.patch, YARN-8541.003.patch > > > My hadoop version 3.1.0. I found that a problem RM startup failure on > recovery as the follow test step: > 1.create a user "user1" have the permisson to submit app. > 2.use user1 to submit a job ,wait job finished. > 3.delete user "user1" > 4.restart yarn > 5.the RM restart failed > RM logs: > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue > root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, > usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, > numContainers=0 | CapacitySchedulerQueueManager.java:163 > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue > mappings, override: false | UserGroupMappingPlacementRule.java:232 > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized > CapacityScheduler with calculator=class > org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, > minimumAllocation=<<memory:512, vCores:1>>, maximumAllocation=<<memory:65536, > vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | > CapacityScheduler.java:392 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not > found | Configuration.java:2767 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS > Processing chain. Root > Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor]. > | AMSProcessingChain.java:62 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement > handler will be used, all scheduling requests will be rejected. | > ApplicationMasterService.java:130 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding > [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor] > tp top of AMS Processing chain. | AMSProcessingChain.java:75 > 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the > winning of election | ActiveStandbyElector.java:897 > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application > application_1531624956005_0001 submitted by user super reason: No groups > found for user super > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > ... 5 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application application_1531624956005_0001 submitted by user super reason: No > groups found for user super > at > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:206) > at > org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:68) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:798) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:369) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:568) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1455) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:828) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > ... 13 more > 2018-07-16 16:24:59,713 | INFO | main-EventThread | Trying to re-establish ZK > session | ActiveStandbyElector.java:746 > 2018-07-16 16:24:59,715 | INFO | main-EventThread | Session: > 0x1100001cdf8c2ea7 closed | ZooKeeper.java:1325 > 2018-07-16 16:25:00,716 | INFO | main-EventThread | Initiating client > connection, > connectString=187-4-64-187:24002,187-4-64-119:24002,187-4-64-248:24002 > sessionTimeout=45000 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@62f6291c > | ZooKeeper.java:861 > 2018-07-16 16:25:00,716 | INFO | main-EventThread | zookeeper.request.timeout > configured value is 120000. | ClientCnxn.java:141 > 2018-07-16 16:25:00,716 | INFO | main-EventThread | > zookeeper.client.bind.port.range is not configured. | ClientCnxn.java:177 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org