[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755851#comment-13755851
 ] 

Vinod Kumar Vavilapalli commented on YARN-292:
----------------------------------------------

bq. 3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl 
doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is 
not thread safe (FairScheduler#applications is a HashMap while 
CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of 
accessing the map are not consistently synchronized, thus, read and write on 
the same map can operate simultaneously. RMAppAttemptImpl on the thread of 
AsyncDispatcher will eventually call FiFoScheduler#applications#get in 
AMContainerAllocatedTransition, while FiFoScheduler on thread of 
SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. 
Therefore, getting null when the application actually exists happens under a 
big number of concurrent operations.
This doesn't sound right to me. The thing is scheduler will be told to remove 
app only by RMAppAttempt. Now if the RMAppAttempt is going to 
AMContainerAllocatedTransition, it cannot tell the scheduler to remove app. 
While the theory of unsafe data-structures seems right, I still can't see the 
case when the original exception can happen. Clearly the app was removed, then 
the RMAppAttempt would have gone into KILLING state, right? If so, why is it 
now trying to get the AM Container?
                
> ResourceManager throws ArrayIndexOutOfBoundsException while handling 
> CONTAINER_ALLOCATED for application attempt
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-292
>                 URL: https://issues.apache.org/jira/browse/YARN-292
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.0.1-alpha
>            Reporter: Devaraj K
>            Assignee: Zhijie Shen
>         Attachments: YARN-292.1.patch, YARN-292.2.patch, YARN-292.3.patch
>
>
> {code:xml}
> 2012-12-26 08:41:15,030 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
> Calling allocate on removed or non existant application 
> appattempt_1356385141279_49525_000001
> 2012-12-26 08:41:15,031 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_ALLOCATED for applicationAttempt 
> application_1356385141279_49525
> java.lang.ArrayIndexOutOfBoundsException: 0
>       at java.util.Arrays$ArrayList.get(Arrays.java:3381)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>       at java.lang.Thread.run(Thread.java:662)
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to