Enhance FSDataOutputStream to allow retrieving the current number of replicas 
of current block
----------------------------------------------------------------------------------------------

                 Key: HADOOP-6450
                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


The current HDFS implementation has the limitation that it does not replicate 
the last partial block of a file when it is being written into until the file 
is closed. There are some long running applications (e.g. HBase) which writes 
transactions logs into HDFS. If datanode(s) in the write pipeline dies off, the 
application has no knowledge of it until all the datanode(s) fail and the 
application gets an IO error.

These applictions would benefit a lot if they can determine the number of live 
replicas of the current block to which it is writing data. For example, the 
application can decide that when one of the datanode in the write pipeline 
fails it will close the file and start writing to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to