[ 
https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469262#comment-16469262
 ] 

Jason Lowe commented on YARN-8244:
----------------------------------

Thanks for the patch, Jim!

There are a lot of other tests that are also reusing container launch contexts, 
e.g.: testQueueMultipleContainers, testStartAndQueueMultipleContainers, etc.  
If fixing the test is indeed the correct fix then there are many other tests 
that also need to be fixed.

Interestingly, I don't see anything in the ContainerLaunchContext that is 
necessarily specific to a particular container (e.g.: a container ID) that 
would force the container launch context to have a 1-to-1 mapping to a 
container instance.  Therefore I think theoretically ContainerLaunchContext 
could be reused across containers.  However in practice these things are 
deserialized from protocol buffers, and I don't see how the startContainers 
method could be invoked with a reused ContainerLaunchContext object for all 
container start requests.

So we either need to fix all instances in the tests that are sharing container 
launch context objects across container start requests, or the NM code would 
need to stop assuming ContainerLaunchContext objects are safe to scribble on if 
we ever support calling startContainers in such a way where a common launch 
context could be reused across container start requests.


>  TestContainerSchedulerQueuing.testStartMultipleContainers failed
> -----------------------------------------------------------------
>
>                 Key: YARN-8244
>                 URL: https://issues.apache.org/jira/browse/YARN-8244
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Miklos Szegedi
>            Assignee: Jim Brennan
>            Priority: Major
>         Attachments: YARN-8244.001.patch
>
>
> {code:java}
> testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing)
>   Time elapsed: 22.198 s  <<< FAILURE!
> java.lang.AssertionError: ContainerState is not correct (timedout)
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.assertTrue(Assert.java:41)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>         at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>         at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>         at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>         at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>         at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code}
> {code:java}
> 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch 
> (ContainerLaunch.java:call(329)) - Failed to launch container.
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1471)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1469)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to