[ 
https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469504#comment-16469504
 ] 

Jim Brennan commented on YARN-8244:
-----------------------------------

Looking into why the other tests are not failing for me.

The concurrent modification exception is happening during simultaneous 
container launches.  Most of these tests only actually launch one container at 
a time - they are queuing additional requests.  This is true for the following 
tests in TestContainerSchedulerQueuing.java:
 * testQueueMultipleContainers
 * testStartAndQueueMultipleContainers
 * testStartOpportunistcsWhenOppQueueIsFull
 * testKillOpportunisticForGuaranteedContainer
 * testPauseOpportunisticForGuaranteedContainer
 * testQueueShedding
 * testContainerDeQueuedAfterAMKill
 * testStopQueuedContainer
 * testPromotionOfOpportunisticContainers

These tests are getting the concurrent modification exception, but it is not 
causing the test to fail:
 * testKillMultipleOpportunisticContainers
 * testKillOnlyRequiredOpportunisticContainers

This one is using startContainers, but it's only starting one container:
 * testContainerUpdateExecTypeGuaranteedToOpportunistic

I checked all of the other tests I could find that use 
ContainerManager.startContainers().  Most of them are using it with a single 
request (a list with one element).

The ones that are actually launching multiple containers are in 
TestContainerManager.java.  These also do not fail, but they do throw the 
concurrent modification exception.
 * testMultipleContainersLaunch

 * testMultipleContainersStopAndGetStatus

 * testIncreaseContainerResourceWithInvalidRequests

[~jlowe], I will work on a patch to fix all of these tests.  Let me know if you 
feel like we should use a different approach.

In code, I don't think we are re-using the full ContainerLaunchContext 
anywhere.  For example, in TaskAttemptImpl.createContainerLaunchContext(), we 
create a new context which is mostly copied from a common one, but with private 
copies of the environment, commands, service buffers, and ACLs.

>  TestContainerSchedulerQueuing.testStartMultipleContainers failed
> -----------------------------------------------------------------
>
>                 Key: YARN-8244
>                 URL: https://issues.apache.org/jira/browse/YARN-8244
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Miklos Szegedi
>            Assignee: Jim Brennan
>            Priority: Major
>         Attachments: YARN-8244.001.patch
>
>
> {code:java}
> testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing)
>   Time elapsed: 22.198 s  <<< FAILURE!
> java.lang.AssertionError: ContainerState is not correct (timedout)
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.assertTrue(Assert.java:41)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>         at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code}
> {code:java}
> 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch 
> (ContainerLaunch.java:call(329)) - Failed to launch container.
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1471)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1469)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to