DataNode fails stop due to a bad disk (or storage directory)
------------------------------------------------------------

                 Key: HDFS-1223
                 URL: https://issues.apache.org/jira/browse/HDFS-1223
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 0.20.1
            Reporter: Thanh Do


A datanode can store block files in multiple volumes.
If a datanode sees a bad volume during start up (i.e, face an exception
when accessing that volume), it simply fail stops, making all block files
stored in other healthy volumes inaccessible. Consequently, these lost
replicas will be generated later on in other datanodes. 
If a datanode is able to mark the bad disk and continue working with
healthy ones, this will increase availability and avoid unnecessary 
regeneration. As an extreme example, consider one datanode which has
2 volumes V1 and V2, each contains about 10000 64MB block files.
During startup, the datanode gets an exception when accessing V1, it then 
fail stops, making 20000 block files generated later on.
If the datanode masks V1 as bad and continues working with V2, the number
of replicas needed to be regenerated is cut in to half.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to