[ https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469504#comment-16469504 ]
Jim Brennan commented on YARN-8244: ----------------------------------- Looking into why the other tests are not failing for me. The concurrent modification exception is happening during simultaneous container launches. Most of these tests only actually launch one container at a time - they are queuing additional requests. This is true for the following tests in TestContainerSchedulerQueuing.java: * testQueueMultipleContainers * testStartAndQueueMultipleContainers * testStartOpportunistcsWhenOppQueueIsFull * testKillOpportunisticForGuaranteedContainer * testPauseOpportunisticForGuaranteedContainer * testQueueShedding * testContainerDeQueuedAfterAMKill * testStopQueuedContainer * testPromotionOfOpportunisticContainers These tests are getting the concurrent modification exception, but it is not causing the test to fail: * testKillMultipleOpportunisticContainers * testKillOnlyRequiredOpportunisticContainers This one is using startContainers, but it's only starting one container: * testContainerUpdateExecTypeGuaranteedToOpportunistic I checked all of the other tests I could find that use ContainerManager.startContainers(). Most of them are using it with a single request (a list with one element). The ones that are actually launching multiple containers are in TestContainerManager.java. These also do not fail, but they do throw the concurrent modification exception. * testMultipleContainersLaunch * testMultipleContainersStopAndGetStatus * testIncreaseContainerResourceWithInvalidRequests [~jlowe], I will work on a patch to fix all of these tests. Let me know if you feel like we should use a different approach. In code, I don't think we are re-using the full ContainerLaunchContext anywhere. For example, in TaskAttemptImpl.createContainerLaunchContext(), we create a new context which is mostly copied from a common one, but with private copies of the environment, commands, service buffers, and ACLs. > TestContainerSchedulerQueuing.testStartMultipleContainers failed > ----------------------------------------------------------------- > > Key: YARN-8244 > URL: https://issues.apache.org/jira/browse/YARN-8244 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Miklos Szegedi > Assignee: Jim Brennan > Priority: Major > Attachments: YARN-8244.001.patch > > > {code:java} > testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing) > Time elapsed: 22.198 s <<< FAILURE! > java.lang.AssertionError: ContainerState is not correct (timedout) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code} > {code:java} > 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch > (ContainerLaunch.java:call(329)) - Failed to launch container. > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) > at java.util.HashMap$EntryIterator.next(HashMap.java:1471) > at java.util.HashMap$EntryIterator.next(HashMap.java:1469) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org