[ https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991287#comment-12991287 ]
Matthias Friedrich commented on HDFS-1125: ------------------------------------------ We also got complaints from our admins about this because it makes it really hard to set up professional monitoring. My company operates close to a 100,000 machines (only a handful Hadoop nodes though), so it's a big concern that our infrastructure behaves well. Also, node decommissioning is one of the things QA departments typically test during product evaluation, so this could hamper Hadoop adoption in some organizations. > Removing a datanode (failed or decommissioned) should not require a namenode > restart > ------------------------------------------------------------------------------------ > > Key: HDFS-1125 > URL: https://issues.apache.org/jira/browse/HDFS-1125 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.20.2 > Reporter: Alex Loddengaard > Priority: Critical > > I've heard of several Hadoop users using dfsadmin -report to monitor the > number of dead nodes, and alert if that number is not 0. This mechanism > tends to work pretty well, except when a node is decommissioned or fails, > because then the namenode requires a restart for said node to be entirely > removed from HDFS. More details here: > http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results > Removal from the exclude file and a refresh should get rid of the dead node. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira