Namenode returning the same Datanode to client, due to infrequent heartbeat
---------------------------------------------------------------------------

                 Key: HDFS-1235
                 URL: https://issues.apache.org/jira/browse/HDFS-1235
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
            Reporter: Thanh Do


This bug has been reported.
Basically since datanode's hearbeat messages are infrequent (~ every 10 
minutes),
NameNode always gives the client the same datanode even if the datanode is dead.
 
We want to point out that the client wait 6 seconds before retrying,
which could be considered long and useless retries in this scenario,
because in 6 secs, the namenode hasn't declared the datanode dead.

Overall this happens when a datanode is dead during the first phase of the 
pipeline (file setups).
If a datanode is dead during the second phase (byte transfer), the DFSClient 
still
could proceed with the other surviving datanodes (which is consistent with what
Hadoop books always say -- the write should proceed if at least we have one good
datanode).  But unfortunately this specification is not true during the first 
phase of the
pipeline.  Overall we suggest that the namenode take into consideration the 
client's
view of unreachable datanodes.  That is, if a client says that it cannot reach 
DN-X,
then the namenode might give the client another node other than X (but the 
namenode
does not have to declare N dead). 

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to