[ https://issues.apache.org/jira/browse/YARN-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212899#comment-15212899 ]
Sunil G commented on YARN-4881: ------------------------------- One more thing to note is that for NodeLabel, we explicitly use {{dfs.client.retry.policy.enabled}} as true. And for single RM cases, i think it will be a fatal as RM will be go down. [~rohithsharma], is it correct? > RM continuously switch if HDFS is too busy when NodeLabel is configured > ----------------------------------------------------------------------- > > Key: YARN-4881 > URL: https://issues.apache.org/jira/browse/YARN-4881 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Rohith Sharma K S > Priority: Critical > > It is observed in the production cluster that RM fail to become active and > keep continuously switching if the HDFS is too busy and node label is > configured. This is causing RM down time as very high. > Exception from RM logs > {noformat} > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/mapred/node-labels/nodelabel.mirror.writing could only be replicated to > 0 nodes instead of minReplication (=1). There are 7 datanode(s) running and > no node(s) are excluded in this operation. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)