[ https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238838#comment-14238838 ]
Karthik Kambatla commented on YARN-2910: ---------------------------------------- Here is the deadlock Wilfred was mentioning: {noformat} "FairSchedulerContinuousScheduling": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:553) - waiting to lock <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:769) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:228) - locked <0x00000007f6b5ec00> (a java.util.Collections$SynchronizedRandomAccessList) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1072) - locked <0x00000007f68f25e8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:280) "Thread-434": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:152) - waiting to lock <0x00000007f6b5ec00> (a java.util.Collections$SynchronizedRandomAccessList) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) - locked <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:939) - locked <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3509) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} > FSLeafQueue can throw ConcurrentModificationException > ----------------------------------------------------- > > Key: YARN-2910 > URL: https://issues.apache.org/jira/browse/YARN-2910 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2 > Reporter: Wilfred Spiegelenburg > Assignee: Wilfred Spiegelenburg > Attachments: FSLeafQueue_concurrent_exception.txt, > YARN-2910.004.patch, YARN-2910.1.patch, YARN-2910.2.patch, YARN-2910.3.patch, > YARN-2910.4.patch, YARN-2910.patch > > > The list that maintains the runnable and the non runnable apps are a standard > ArrayList but there is no guarantee that it will only be manipulated by one > thread in the system. This can lead to the following exception: > {noformat} > 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN > CONTACTING RM. > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) > at java.util.ArrayList$Itr.next(ArrayList.java:831) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516) > {noformat} > Full stack trace in the attached file. > We should guard against that by using a thread safe version from > java.util.concurrent.CopyOnWriteArrayList -- This message was sent by Atlassian JIRA (v6.3.4#6332)