[ 
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739178#comment-13739178
 ] 

Rohith Sharma K S commented on YARN-1061:
-----------------------------------------

Actual issue I got in 5 node cluster (1 RM and 5 NM).It is hard to recure 
scenario for resourcemanager is hang up state in real cluster. 

The same scenario can be simulated manually bringing resourcemanager to hang up 
state with help of linux command "KILL -STOP <RM_PID>". All the NM->RM call 
wait indefinitely. Another case where we can observer indefinite wait is "Add 
new NodeManager when ResouceMangaer is hang up state".

    
                
> NodeManager is indefinitely waiting for nodeHeartBeat() response from 
> ResouceManager.
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1061
>                 URL: https://issues.apache.org/jira/browse/YARN-1061
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.5-alpha
>            Reporter: Rohith Sharma K S
>
> It is observed that in one of the scenario, NodeManger is indefinetly waiting 
> for nodeHeartbeat response from ResouceManger where ResouceManger is in 
> hanged up state.
> NodeManager should get timeout exception instead of waiting indefinetly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to