Re: cold restart/region servers issue

2010-10-22 Thread Jack Levin
lso I expect. > > You said your ZooKeeper ensemble peer was unhappy? Can we see the logs? Did > you report this to the ZK guys? > > Best regards, > >    - Andy > > > --- On Fri, 10/22/10, Jack Levin wrote: > >> From: Jack Levin >> Subject: R

Re: cold restart/region servers issue

2010-10-22 Thread Andrew Purtell
: cold restart/region servers issue > To: user@hbase.apache.org > Date: Friday, October 22, 2010, 1:31 PM > one of my zookeepers was unhappy, and > did not report /hbase directory, > I shut it down, and things started to work much better. > > -Jack > > On Fri, Oct 22, 2010 at

Re: cold restart/region servers issue

2010-10-22 Thread Jack Levin
one of my zookeepers was unhappy, and did not report /hbase directory, I shut it down, and things started to work much better. -Jack On Fri, Oct 22, 2010 at 10:56 AM, Stack wrote: > Hmm... does it emit that message once or continuously.  In log we emit > the ensemble we're trying to contact.  Do

Re: cold restart/region servers issue

2010-10-22 Thread Stack
Hmm... does it emit that message once or continuously. In log we emit the ensemble we're trying to contact. Does it look correct? When the machine is having this issue next time, try running the zk cmdline client and see if you can see a znode at /hbase/master: $ ./bin/hbase org.apache.zookeepe

Re: cold restart/region servers issue

2010-10-22 Thread Jack Levin
Same ZK all the time, restart of regionserver clears the issue. I even see them talking to ZK via tcpdump, is there a way to enable debug log output on ZK to see with might be going on? -Jack On Fri, Oct 22, 2010 at 10:28 AM, Stack wrote: > Are they pointed to the same zk ensemble as the other

Re: cold restart/region servers issue

2010-10-22 Thread Stack
Are they pointed to the same zk ensemble as the other 22 servers? That is, are they running with the same config? The below complaint is that the regionserver is not seeing master register, perhaps because they are homed at the wrong location in zk or because they are going to a different zk? St.A

cold restart/region servers issue

2010-10-22 Thread Jack Levin
I have 30 region servers, after cold restart (master, zookepeers, and all regionservers), 22 regionservers start, but the other 8 have following errors, any idea how to debug this? Is zookeeper giving the RS wrong msg? Can I log it via tcpdump maybe? 2010-10-22 08:32:42,035 WARN org.apache.hadoop