[ 
https://issues.apache.org/jira/browse/YARN-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212899#comment-15212899
 ] 

Sunil G commented on YARN-4881:
-------------------------------

One more thing to note is that for NodeLabel, we explicitly use 
{{dfs.client.retry.policy.enabled}} as true. And for single RM cases, i think 
it will be a fatal as RM will be go down. [~rohithsharma], is it correct?

> RM continuously switch if HDFS is too busy when NodeLabel is configured
> -----------------------------------------------------------------------
>
>                 Key: YARN-4881
>                 URL: https://issues.apache.org/jira/browse/YARN-4881
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Priority: Critical
>
> It is observed in the production cluster that RM fail to become active and 
> keep continuously switching if the HDFS is too busy and node label is 
> configured. This is causing RM down time as very high. 
> Exception from RM logs
> {noformat}
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/mapred/node-labels/nodelabel.mirror.writing could only be replicated to 
> 0 nodes instead of minReplication (=1). There are 7 datanode(s) running and 
> no node(s) are excluded in this operation.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to