Can you try like start any of the regionservers that are not connecting at
all.  May be start 2 of them.
Observer master logs.  See whether it says 
'Waiting for RegionServers to checkin'?.  

Just to confirm your ZK ip and port is correct thro out the cluster? If
multitenant cluster then you may be the other regionservers are connecting
to someother ZK cluster? 
Wild guess :)

Regards
Ram
> -----Original Message-----
> From: Dan Brodsky [mailto:danbrod...@gmail.com]
> Sent: Wednesday, October 17, 2012 6:31 PM
> To: user@hbase.apache.org
> Subject: Regionservers not connecting to master
> 
> Good morning,
> 
> I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus three
> Zookeeper quorum peers (one on the namenode, one on a dedicated ZK
> peer VM, and one on a third box). All 10 HDFS datanodes are also Hbase
> regionservers.
> 
> Several weeks ago, we had six HDFS datanodes go offline suddenly (with
> no meaningful error messages), and since then, I have been unable to
> get all 10 regionservers to connect to the Hbase master. I've tried
> bringing the cluster down and rebooting all the boxes, but no joy. The
> machines are all running, and hbase-regionserver appears to start
> normally on each one.
> 
> Right now, my master status page (http://namenode:60010) shows 3
> regionservers online. There are also dozens of regions in transition
> listed on the status page (in the PENDING_OPEN state), but each of
> those are on one of the regionservers already online.
> 
> The 7 other regionservers' log files show a successful connection to
> one ZK peer, followed by a regular trail of these messages:
> 
> 2012-10-17 12:36:08,394 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=8.17
> MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0,
> hitRatio=0cachingAccesses=0, cachingHits=0,
> cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
> 
> If I had to wager a guess, it seems like the 7 offline regionservers
> are not connecting to other ZK peers, but there isn't anything in the
> ZK logs to indicate why.
> 
> Thoughts?
> 
> Dan

Reply via email to