DFS stability running HBase and dfs.datanode.handler.count...

Andy Sautins Sat, 09 Apr 2011 16:35:58 -0700

    I ran across an mailing list posting from 1/4/2009 that seemed to indicate 
increasing dfs.datanode.handler.count could help improve DFS stability 
(http://mail-archives.apache.org/mod_mbox/hbase-user/200901.mbox/%[email protected]%3E
 ).  The posting seems to indicate the wiki was updated, but I don't seen 
anything in the wiki about increasing dfs.datanode.handler.count.   I have seen 
a few other notes that seem to show examples that have raised 
dfs.datanode.handler.count including one from an IBM article 
(http://software.intel.com/en-us/articles/hadoop-and-hbase-optimization-for-read-intensive-search-applications/
 ) and the Pro Hadoop book, but other than that the only other mention I see is 
from cloudera seems luke-warm on increasing dfs.datanode.handler.count 
(http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/
 ).


    Given the post is from 2009 I thought I'd ask if anyone has had any success 
improving stability of HBase/DFS when increasing dfs.datanode.handler.count.  
The specific error we are seeing somewhat  frequently ( few hundred times per 
day ) in the datanode longs is as follows:

2011-04-09 00:12:48,035 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.18.0.33:50010, 
storageID=DS-1501576934-10.18.0.33-50010-1296248656454, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-163126943925471435_28809750 is not valid.

   The above seems to correspond to ClosedChannelExceptions in the hbase 
regionserver logs as well as some warnings about long write to hlog ( some in 
the 50+ seconds ).

    The biggest end-user facing issue we are seeing is that Task Trackers keep 
getting blacklisted.  It's quite possible our problem is unrelated to anything 
HBase, but I thought it was worth asking given what we've been seeing.

   We are currently running 0.91 on an 18 node cluster with ~3k total regions 
and each region server is running with 2G of memory.

   Any insight would be appreciated.

   Thanks

    Andy

DFS stability running HBase and dfs.datanode.handler.count...

Reply via email to