[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679982#comment-13679982
 ] 

Mayank Bansal commented on YARN-502:
------------------------------------

By Looking at the code looks like if there is race condition between 
ReconnectNodeTransition and UnhealthyTrabsntion in event dispatcher 

This condition may arrise when Nodemanager tries to register itself and 
ResourceTrackerService puts this node in the Nodes list and schedule the event 
for recoonect however in the mean time there is an unhealthy event come first 
to RM and it deletes this Node from the Nodes map.

Thanks,
Mayank
                
> RM crash with NPE on NODE_REMOVED event
> ---------------------------------------
>
>                 Key: YARN-502
>                 URL: https://issues.apache.org/jira/browse/YARN-502
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.0.3-alpha
>            Reporter: Lohit Vijayarenu
>            Assignee: Mayank Bansal
>
> While running some test and adding/removing nodes, we see RM crashed with the 
> below exception. We are testing with fair scheduler and running 
> hadoop-2.0.3-alpha
> {noformat}
> 2013-03-22 18:54:27,015 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node YYYY:55680 as it is now LOST
> 2013-03-22 18:54:27,015 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: YYYY:55680 
> Node Transitioned from UNHEALTHY to LOST
> 2013-03-22 18:54:27,015 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_REMOVED to the scheduler
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-03-22 18:54:27,016 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
> SelectChannelConnector@XXXX:50030
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to