[ https://issues.apache.org/jira/browse/YARN-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengchenyu updated YARN-7560: ------------------------------ Fix Version/s: (was: 2.7.5) 3.0.0 > Resourcemanager hangs when resourceUsedWithWeightToResourceRatio return a > overflow value > ------------------------------------------------------------------------------------------ > > Key: YARN-7560 > URL: https://issues.apache.org/jira/browse/YARN-7560 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager > Affects Versions: 3.0.0 > Reporter: zhengchenyu > Fix For: 3.0.0 > > > In our cluster, we changed the configuration, then refreshQueues, we found > the resourcemanager hangs. And the Resourcemanager can't restart > successfully. We got jstack information, like this: > {code} > "main" #1 prio=5 os_prio=0 tid=0x00007f98e8017000 nid=0x2f5 runnable > [0x00007f98eed9a000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:182) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSteadyShares(ComputeFairShares.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeSteadyShares(FairSharePolicy.java:148) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSteadyShares(FSParentQueue.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:148) > - locked <0x00007f8c4a8177a0> (a java.util.HashMap) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.updateAllocationConfiguration(QueueManager.java:387) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$AllocationReloadListener.onReload(FairScheduler.java:1728) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:422) > - locked <0x00007f8c4a7eb2e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1597) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1621) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > - locked <0x00007f8c4a76ac48> (a java.lang.Object) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:569) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > - locked <0x00007f8c49254268> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:257) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > - locked <0x00007f8c467495e0> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220) > {code} > When we debug the cluster, we found resourceUsedWithWeightToResourceRatio > return a negative value. So the loop can't return. We found in our cluster, > all minRes is over int.max. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org