Hi,

We are facing problems with really slow HBase region server recoveries ~ 20
minuted. Version is hbase 0.94.3 compiled with hadoop.profile=2.0.

Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912 patched and stale
node timeouts configured correctly. Time for dead node detection is still
10 minutes.

We see that our region server is trying to read an HLog is stuck there for
a long time. Logs here:

2013-04-12 21:14:30,248 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /10.156.194.251:50010 for file
/hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599
for block
BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException:
15000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.156.192.173:52818remote=/
10.156.194.251:50010]

I would think that HDFS 3703 would make the server fail fast and go to the
third datanode. Currently, the recovery seems way too slow for production
usage...

Varun

Reply via email to