hi,maillist: i use scribe to receive data from app ,write to hadoop hdfs,when the system in high concurrency connect it will cause hdfs error like the following ,the incoming connect will be blocked ,and the tomcat will die,
in dir user/hive/warehouse/dsp.db/request ,the file data_00000 will be rotate each hour ,but the scribe ( we modified the scribe code) will switch the same file when rotate happen ,so data_00000 will be close ,and reopen . and when the load is high ,i can observe the corrupt replica of data_00000,how can i handle with it? thanks [Thu Feb 13 23:59:59 2014] "[hdfs] disconnected fileSys for /user/hive/warehouse/dsp.db/request" [Thu Feb 13 23:59:59 2014] "[hdfs] closing /user/hive/warehouse/dsp.db/request/2014-02-13/data_00000" [Thu Feb 13 23:59:59 2014] "[hdfs] disconnecting fileSys for /user/hive/warehouse/dsp.db/request/2014-02-13/data_00000" [Thu Feb 13 23:59:59 2014] "[hdfs] disconnected fileSys for /user/hive/warehouse/dsp.db/request/2014-02-13/data_00000" [Thu Feb 13 23:59:59 2014] "[hdfs] Connecting to HDFS for /user/hive/warehouse/dsp.db/request/2014-02-13/data_00000" [Thu Feb 13 23:59:59 2014] "[hdfs] opened for append /user/hive/warehouse/dsp.db/request/2014-02-13/data_00000" [Thu Feb 13 23:59:59 2014] "[dsp_request] Opened file </user/hive/warehouse/dsp.db/request/2014-02-13/data_00000> for writing" [Thu Feb 13 23:59:59 2014] "[dsp_request] 23:59 rotating file <2014-02-13/data> old size <10027577955> max size <10000000000>" [Thu Feb 13 23:59:59 2014] "[hdfs] Connecting to HDFS for /user/hive/warehouse/dsp.db/request" [Thu Feb 13 23:59:59 2014] "[hdfs] disconnecting fileSys for /user/hive/warehouse/dsp.db/request" 14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.11.13:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494) 14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in pipeline 192.168.11.12:50010, 192.168.11.13:50010, 192.168.11.14:50010, 192.168.11.10:50010, 192.168.11.15:50010: bad datanode 192.168.11.13:50010 14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.11.10:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494) 14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.10:50010, 192.168.11.15:50010: bad datanode 192.168.11.10:50010 14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.11.15:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494) 14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.15:50010: bad datanode 192.168.11.15:50010 /user/hive/warehouse/dsp.db/request/2014-02-13/data_00000: blk_433572108425800355_3411509 (replicas: l: 1 d: 0 c: 4 e: 0) 192.168.11.12:50010 : 192.168.11.13:50010(corrupt) : 192.168.11.14:50010(corrupt) : 192.168.11.10:50010(corrupt) : 192.168.11.15:50010(corrupt) :