[ https://issues.apache.org/jira/browse/YARN-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-7636: --------------------------- Description: Exception stack: {noformat} java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count of 2147483647 at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246) at com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80) at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) {noformat} Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we can ignore this exception to avoid this problem. This problem may happens in SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity, fix it in the same way. was: Exception stack: {noformat} java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count of 2147483647 at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246) at com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80) at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) {noformat} We can add check condition {{getReReservations(schedulerKey) < Integer.MAX_VALUE}} before addReReservation to avoid this problem. > Re-reservation count may overflow when cluster resource exhausted for a long > time > ---------------------------------------------------------------------------------- > > Key: YARN-7636 > URL: https://issues.apache.org/jira/browse/YARN-7636 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 3.0.0-alpha4, 2.9.1 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Major > Attachments: YARN-7636.001.patch > > > Exception stack: > {noformat} > java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count > of 2147483647 > at > com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246) > at > com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80) > at > com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we > can ignore this exception to avoid this problem. > This problem may happens in > SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity, > fix it in the same way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org