Kihwal Lee created HDFS-11817:
---------------------------------

             Summary: A faulty node can cause a lease leak and NPE on accessing 
data
                 Key: HDFS-11817
                 URL: https://issues.apache.org/jira/browse/HDFS-11817
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.8.0
            Reporter: Kihwal Lee
            Priority: Critical


When the namenode performs a lease recovery for a failed write, the 
{{commitBlockSynchronization()}} will fail, if none of the new target has sent 
a received-IBR.  At this point, the data is inaccessible, as the namenode will 
throw a {{NullPointerException}} upon {{getBlockLocations()}}.

The lease recovery will be retried in about an hour by the namenode. If the 
nodes are faulty (usually when there is only one new target), they may not 
block report until this point. If this happens, lease recovery throws an 
{{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
the lease without  finalizing the inode.  

This results in an inconsistent lease state. The inode stays 
under-construction, but no more lease recovery is attempted. A manual lease 
recovery is also not allowed. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to