Just check out your etc/hosts files. I have not worked on VMs anyway to tell the problem more precisely.
Regards Ram > -----Original Message----- > From: Dan Brodsky [mailto:danbrod...@gmail.com] > Sent: Wednesday, October 17, 2012 11:05 PM > To: user@hbase.apache.org > Subject: Re: Regionservers not connecting to master > > Well, slight change: only 1 of the ZK peers happens to work. When a RS > connects to the other 2, it doesn't go further than that. The 1 ZK > node that happens to work is the one that runs on the same VM as the > master. > > Sounds like it could be network connectivity issues, so I'm going to > investigate that a bit further, but other suggestions are welcome. > > > On Wed, Oct 17, 2012 at 1:29 PM, Dan Brodsky <danbrod...@gmail.com> > wrote: > > Ram, > > > > Thanks for your suggestions. > > > > The datanodes are all built using the same image, so I know they're > > all pointed to the same ZK nodes. > > > > I monitored all three ZK logs, the master log, and the regionserver > > log for each RS I was trying to bring back online. I'm glad I have a > > big screen. :-) Here is what I found: > > > > Whenever a regionserver connects to one particular ZK peer *first*, > it > > never goes online. The ZK log shows a successful connection > > negotiating a timeout value, and the RS's log shows a successful ZK > > connection, but then it just sits there. > > > > When a regionserver starts up and connects to one of the other two ZK > > peers first, it connects to a second one successfully, then contacts > > the master, and it comes up and all is happy. > > > > So the problem of regionservers not connecting to master only happens > > when the RS tries one particular ZK node as its first ZK connection. > > But the logs aren't helpful for diagnosing further than that. > > > > Additional thoughts? > > > > > > On Wed, Oct 17, 2012 at 9:12 AM, Ramkrishna.S.Vasudevan > > <ramkrishna.vasude...@huawei.com> wrote: > >> Can you try like start any of the regionservers that are not > connecting at > >> all. May be start 2 of them. > >> Observer master logs. See whether it says > >> 'Waiting for RegionServers to checkin'?. > >> > >> Just to confirm your ZK ip and port is correct thro out the cluster? > If > >> multitenant cluster then you may be the other regionservers are > connecting > >> to someother ZK cluster? > >> Wild guess :) > >> > >> Regards > >> Ram > >>> -----Original Message----- > >>> From: Dan Brodsky [mailto:danbrod...@gmail.com] > >>> Sent: Wednesday, October 17, 2012 6:31 PM > >>> To: user@hbase.apache.org > >>> Subject: Regionservers not connecting to master > >>> > >>> Good morning, > >>> > >>> I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus > three > >>> Zookeeper quorum peers (one on the namenode, one on a dedicated ZK > >>> peer VM, and one on a third box). All 10 HDFS datanodes are also > Hbase > >>> regionservers. > >>> > >>> Several weeks ago, we had six HDFS datanodes go offline suddenly > (with > >>> no meaningful error messages), and since then, I have been unable > to > >>> get all 10 regionservers to connect to the Hbase master. I've tried > >>> bringing the cluster down and rebooting all the boxes, but no joy. > The > >>> machines are all running, and hbase-regionserver appears to start > >>> normally on each one. > >>> > >>> Right now, my master status page (http://namenode:60010) shows 3 > >>> regionservers online. There are also dozens of regions in > transition > >>> listed on the status page (in the PENDING_OPEN state), but each of > >>> those are on one of the regionservers already online. > >>> > >>> The 7 other regionservers' log files show a successful connection > to > >>> one ZK peer, followed by a regular trail of these messages: > >>> > >>> 2012-10-17 12:36:08,394 DEBUG > >>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: > total=8.17 > >>> MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0, > >>> hitRatio=0cachingAccesses=0, cachingHits=0, > >>> cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN > >>> > >>> If I had to wager a guess, it seems like the 7 offline > regionservers > >>> are not connecting to other ZK peers, but there isn't anything in > the > >>> ZK logs to indicate why. > >>> > >>> Thoughts? > >>> > >>> Dan > >>