Hey Varun,

Could you please share the logs and the configuration (hdfs / hbase
settings + cluster description). What's the failure scenario?
>From an HDFS pov, HDFS 3703 does not change the dead node status. But these
node will be given the lowest priority when reading.


Cheers,

Nicolas


On Fri, Apr 19, 2013 at 3:01 AM, Varun Sharma <va...@pinterest.com> wrote:

> Hi,
>
> We are facing problems with really slow HBase region server recoveries ~ 20
> minuted. Version is hbase 0.94.3 compiled with hadoop.profile=2.0.
>
> Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912 patched and stale
> node timeouts configured correctly. Time for dead node detection is still
> 10 minutes.
>
> We see that our region server is trying to read an HLog is stuck there for
> a long time. Logs here:
>
> 2013-04-12 21:14:30,248 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> connect to /10.156.194.251:50010 for file
>
> /hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599
> for block
>
> BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException:
> 15000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.156.192.173:52818
> remote=/
> 10.156.194.251:50010]
>
> I would think that HDFS 3703 would make the server fail fast and go to the
> third datanode. Currently, the recovery seems way too slow for production
> usage...
>
> Varun
>

Reply via email to