Nope. I'm honestly not sure how the files changed, but I will keep an eye on it.
On Fri, Nov 2, 2012 at 2:22 PM, Kevin O'dell <[email protected]>wrote: > Do you use Puppet? > > On Fri, Nov 2, 2012 at 1:13 PM, Dan Brodsky <[email protected]> wrote: > > > Ram, > > > > I wanted to follow up with you since you helped me with your below > comment. > > > > It turns out that the ZK configuration files somehow got changed > (reverted > > to their default values?), and I'm not sure who/when/how. The zoo.cfg > files > > didn't have the list of quorum peers, and the myid files that told each > ZK > > peer their ordinal value had been deleted. So, effectively, I had three > ZK > > standalone servers, instead of one quorum. > > > > Problem fixed, Hbase is happy again. > > > > Cheers, > > > > Dan > > > > > > > > On Wed, Oct 17, 2012 at 9:12 AM, Ramkrishna.S.Vasudevan < > > [email protected]> wrote: > > > > > Can you try like start any of the regionservers that are not connecting > > at > > > all. May be start 2 of them. > > > Observer master logs. See whether it says > > > 'Waiting for RegionServers to checkin'?. > > > > > > Just to confirm your ZK ip and port is correct thro out the cluster? If > > > multitenant cluster then you may be the other regionservers are > > connecting > > > to someother ZK cluster? > > > Wild guess :) > > > > > > Regards > > > Ram > > > > -----Original Message----- > > > > From: Dan Brodsky [mailto:[email protected]] > > > > Sent: Wednesday, October 17, 2012 6:31 PM > > > > To: [email protected] > > > > Subject: Regionservers not connecting to master > > > > > > > > Good morning, > > > > > > > > I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus three > > > > Zookeeper quorum peers (one on the namenode, one on a dedicated ZK > > > > peer VM, and one on a third box). All 10 HDFS datanodes are also > Hbase > > > > regionservers. > > > > > > > > Several weeks ago, we had six HDFS datanodes go offline suddenly > (with > > > > no meaningful error messages), and since then, I have been unable to > > > > get all 10 regionservers to connect to the Hbase master. I've tried > > > > bringing the cluster down and rebooting all the boxes, but no joy. > The > > > > machines are all running, and hbase-regionserver appears to start > > > > normally on each one. > > > > > > > > Right now, my master status page (http://namenode:60010) shows 3 > > > > regionservers online. There are also dozens of regions in transition > > > > listed on the status page (in the PENDING_OPEN state), but each of > > > > those are on one of the regionservers already online. > > > > > > > > The 7 other regionservers' log files show a successful connection to > > > > one ZK peer, followed by a regular trail of these messages: > > > > > > > > 2012-10-17 12:36:08,394 DEBUG > > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=8.17 > > > > MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0, > > > > hitRatio=0cachingAccesses=0, cachingHits=0, > > > > cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN > > > > > > > > If I had to wager a guess, it seems like the 7 offline regionservers > > > > are not connecting to other ZK peers, but there isn't anything in the > > > > ZK logs to indicate why. > > > > > > > > Thoughts? > > > > > > > > Dan > > > > > > > > > > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera >
