Bad retry logic at DFSClient ---------------------------- Key: HDFS-1233 URL: https://issues.apache.org/jira/browse/HDFS-1233 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do
- Summary: failover bug, bad retry logic at DFSClient, cannot failover to the 2nd disk - Setups: + # available datanodes = 1 + # disks / datanode = 2 + # failures = 1 + failure type = bad disk + When/where failure happens = (see below) - Details: The setup is: 1 datanode, 1 replica, and each datanode has 2 disks (Disk1 and Disk2). We injected a single disk failure to see if we can failover to the second disk or not. If a persistent disk failure happens during createBlockOutputStream (the first phase of the pipeline creation) (e.g. say DN1-Disk1 is bad), then createBlockOutputStream (cbos) will get an exception and it will retry! When it retries it will get the same DN1 from the namenode, and then DN1 will call DN.writeBlock(), FSVolume.createTmpFile, and finally getNextVolume() which a moving volume#. Thus, on the second try, the write will be successfully go to the second disk. So essentially createBlockOutputStream is wrapped in a do/while(retry && --count >= 0). The first cbos will fail, the second will be successful in this particular scenario. NOW, say cbos is successful, but the failure is persistent. Then the "retry" is in a different while loop. First, hasError is set to true in RP.run (responder packet). Thus, DataStreamer.run() will go back to the loop: while(!closed && clientRunning && !lastPacketInBlock). Now this second iteration of the loop will call processDatanodeError because hasError has been set to true. In processDatanodeError (pde), the client sees that this is the only datanode in the pipeline, and hence it considers that the node is bad! Although actually only 1 disk is bad! Hence, pde throws IOException suggesting all the datanodes (in this case, only DN1) in the pipeline is bad. Hence, in this error, the exception is thrown to the client. But if the exception, say, is catched by the most outer while loop do-while(retry && --count >= 0), then this outer retry will be successful then (as suggested in the previous paragraph). In summary, if in a deployment scenario, we only have one datanode that has multiple disks, and one disk goes bad, then the current retry logic at the DFSClient side is not robust enough to mask the failure from the client. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.