Re: Regionservers not connecting to master

Kevin O'dell Fri, 02 Nov 2012 11:23:24 -0700

Do you use Puppet?

On Fri, Nov 2, 2012 at 1:13 PM, Dan Brodsky <[email protected]> wrote:


> Ram,
>
> I wanted to follow up with you since you helped me with your below comment.
>
> It turns out that the ZK configuration files somehow got changed (reverted
> to their default values?), and I'm not sure who/when/how. The zoo.cfg files
> didn't have the list of quorum peers, and the myid files that told each ZK
> peer their ordinal value had been deleted. So, effectively, I had three ZK
> standalone servers, instead of one quorum.
>
> Problem fixed, Hbase is happy again.
>
> Cheers,
>
> Dan
>
>
>
> On Wed, Oct 17, 2012 at 9:12 AM, Ramkrishna.S.Vasudevan <
> [email protected]> wrote:
>
> > Can you try like start any of the regionservers that are not connecting
> at
> > all.  May be start 2 of them.
> > Observer master logs.  See whether it says
> > 'Waiting for RegionServers to checkin'?.
> >
> > Just to confirm your ZK ip and port is correct thro out the cluster? If
> > multitenant cluster then you may be the other regionservers are
> connecting
> > to someother ZK cluster?
> > Wild guess :)
> >
> > Regards
> > Ram
> > > -----Original Message-----
> > > From: Dan Brodsky [mailto:[email protected]]
> > > Sent: Wednesday, October 17, 2012 6:31 PM
> > > To: [email protected]
> > > Subject: Regionservers not connecting to master
> > >
> > > Good morning,
> > >
> > > I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus three
> > > Zookeeper quorum peers (one on the namenode, one on a dedicated ZK
> > > peer VM, and one on a third box). All 10 HDFS datanodes are also Hbase
> > > regionservers.
> > >
> > > Several weeks ago, we had six HDFS datanodes go offline suddenly (with
> > > no meaningful error messages), and since then, I have been unable to
> > > get all 10 regionservers to connect to the Hbase master. I've tried
> > > bringing the cluster down and rebooting all the boxes, but no joy. The
> > > machines are all running, and hbase-regionserver appears to start
> > > normally on each one.
> > >
> > > Right now, my master status page (http://namenode:60010) shows 3
> > > regionservers online. There are also dozens of regions in transition
> > > listed on the status page (in the PENDING_OPEN state), but each of
> > > those are on one of the regionservers already online.
> > >
> > > The 7 other regionservers' log files show a successful connection to
> > > one ZK peer, followed by a regular trail of these messages:
> > >
> > > 2012-10-17 12:36:08,394 DEBUG
> > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=8.17
> > > MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0,
> > > hitRatio=0cachingAccesses=0, cachingHits=0,
> > > cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
> > >
> > > If I had to wager a guess, it seems like the 7 offline regionservers
> > > are not connecting to other ZK peers, but there isn't anything in the
> > > ZK logs to indicate why.
> > >
> > > Thoughts?
> > >
> > > Dan
> >
> >
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Regionservers not connecting to master

Reply via email to