[ https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084280#comment-16084280 ]
Sunil G commented on YARN-6714: ------------------------------- [~Tao Yang] could you please help to check javac warning as per above jenkins run. > IllegalStateException while handling APP_ATTEMPT_REMOVED event when > async-scheduling enabled in CapacityScheduler > ----------------------------------------------------------------------------------------------------------------- > > Key: YARN-6714 > URL: https://issues.apache.org/jira/browse/YARN-6714 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.9.0, 3.0.0-alpha3 > Reporter: Tao Yang > Assignee: Tao Yang > Attachments: YARN-6714.001.patch, YARN-6714.002.patch, > YARN-6714.003.patch, YARN-6714.branch-2.003.patch > > > Currently in async-scheduling mode of CapacityScheduler, after AM failover > and unreserve all reserved containers, it still have chance to get and commit > the outdated reserve proposal of the failed app attempt. This problem > happened on an app in our cluster, when this app stopped, it unreserved all > reserved containers and compared these appAttemptId with current > appAttemptId, if not match it will throw IllegalStateException and make RM > crashed. > Error log: > {noformat} > 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.IllegalStateException: Trying to unreserve for application > appattempt_1495188831758_0121_000002 when currently reserved for application > application_1495188831758_0121 on node host: node1:45454 #containers=2 > available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822) > at java.lang.Thread.run(Thread.java:834) > {noformat} > When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and > CapacityScheduler#tryCommit both need to get write_lock before executing, so > we can check the app attempt state in commit process to avoid committing > outdated proposals. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org