[ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Colin Patrick McCabe reassigned HDFS-5809: ------------------------------------------ Assignee: Colin Patrick McCabe Hi ikweesung, Thanks for finding this. Do you mind if I take this one? > BlockPoolSliceScanner and high speed hdfs appending make datanode to drop > into infinite loop > -------------------------------------------------------------------------------------------- > > Key: HDFS-5809 > URL: https://issues.apache.org/jira/browse/HDFS-5809 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.0.0-alpha > Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0 > Reporter: ikweesung > Assignee: Colin Patrick McCabe > Priority: Critical > Labels: blockpoolslicescanner, datanode, infinite-loop > > Hello, everyone. > When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks > in my cluster. > Then, randomly one datanode drop into infinite loop as the log show, and > finally all datanodes drop into infinite loop. > Every datanode just verify fail by one block. > When i check the fail block like this : hadoop fsck / -files -blocks | grep > blk_1223474551535936089_4702249, no hdfs file contains the block. > It seems that in while block of BlockPoolSliceScanner's scan method drop into > infinite loop . > BlockPoolSliceScanner: 650 > while (datanode.shouldRun > && !datanode.blockScanner.blockScannerThread.isInterrupted() > && datanode.isBPServiceAlive(blockPoolId)) { .... > The log finally printed in method verifyBlock(BlockPoolSliceScanner:453). > Please excuse my poor English. > ------------------------------------------------------------------------------------------------------------------------------------------------- > LOG: > 2014-01-21 18:36:50,582 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > failed for > BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - > may be due to race with write > 2014-01-21 18:36:50,582 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > failed for > BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - > may be due to race with write > 2014-01-21 18:36:50,582 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification > failed for > BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - > may be due to race with write -- This message was sent by Atlassian JIRA (v6.2#6252)