Trying to decode what the exceptions means without any context is extremely hard. Your configurations looks good except for:
<property> <name>hbase.regionserver.dns.interface</name> <value>192.168.1.122</value> <description></description> </property> It expects an interface (like eth0) not an IP. And setting this alone; <property> <name>hbase.zookeeper.dns.nameserver</name> <value>192.168.1.122</value> <description></description> </property> Won't work either, you need to set also the interface. Let's try something, try to stop all the processes (kill -9 if needed) then wipe out the logs. Start anew, zip all the logs, and send them to me directly. J-D On Thu, Jun 24, 2010 at 1:42 AM, 梁景明 <[email protected]> wrote: > i dont know how to describe my situation more. i just want to restart > successful again and get my data back. > 1、bin/start-hbase.sh show all running. > 2、bin/stop-hbase.sh can't stop normally. > 3、regionserver cant see sometimes. after kill master process ,and restart > bin/start-hbase.sh ,it shows ok. but master can't work. > 4、hadoop hdfs runs ok.and on port 50070 i can read /hbase folders. > 5、here is my hbase-site.xml,and test1 and s1.idfs.cn is the same ip > 192.168.1.122 ,first i set s1.idfs.cn on hbase.zookeeper.quorum but it only > know the hostname test1. s1.idfs.cn is based onmy dns. > <configuration> > <property> > <name>hbase.rootdir</name> > <value>hdfs://s1.idfs.cn:9000/hbase</value> > <description>The directory shared by region servers. > </description> > </property> > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > <description> > </description> > </property> > <property> > <name>fs.default.name</name> > <value>hdfs://s1.idfs.cn:9000</value> > <description></description> > </property> > <property> > <name>hbase.zookeeper.dns.nameserver</name> > <value>192.168.1.122</value> > <description></description> > </property> > <property> > <name>hbase.regionserver.dns.interface</name> > <value>192.168.1.122</value> > <description></description> > </property> > <property> > <name>hbase.zookeeper.property.clientPort</name> > <value>2222</value> > <description>Property from ZooKeeper's config zoo.cfg. > The port at which the clients will connect. > </description> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>test1</value> > </property> > </configuration> > > regionserver file is > s1.idfs.cn > s2.idfs.cn > > hbase runs ok first time ,and i create tables and insert data. > > 6、i try to use bin/zkCli.sh -server 192.168.1.122:2222 to look at /hbase in > zookeeper ,maybe some useful info to you.thanks. > > [zk: 192.168.1.122:2222(CONNECTED) 0] ls / > [hbase, zookeeper] > [zk: 192.168.1.122:2222(CONNECTED) 16] ls /hbase > [safe-mode, root-region-server, rs, master, shutdown] > > see hbase in / > > [zk: 192.168.1.122:2222(CONNECTED) 10] get /hbase/master > 192.168.1.122:60000 > cZxid = 0x1c > ctime = Thu Jun 24 14:39:21 CST 2010 > mZxid = 0x1c > mtime = Thu Jun 24 14:39:21 CST 2010 > pZxid = 0x1c > cversion = 0 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x12968ae99ca0000 > dataLength = 19 > numChildren = 0 > > that 's my master 192.168.1.122 > > [zk: 192.168.1.122:2222(CONNECTED) 14] get /hbase/root-region-server > 192.168.1.123:60020 > cZxid = 0xa > ctime = Thu Jun 24 10:38:00 CST 2010 > mZxid = 0x25 > mtime = Thu Jun 24 14:39:31 CST 2010 > pZxid = 0xa > cversion = 0 > dataVersion = 1 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 19 > numChildren = 0 > > i set two region servers but here just one. > > [zk: 192.168.1.122:2222(CONNECTED) 11] get /hbase/shutdown > up > cZxid = 0x1d > ctime = Thu Jun 24 14:39:21 CST 2010 > mZxid = 0x1d > mtime = Thu Jun 24 14:39:21 CST 2010 > pZxid = 0x1d > cversion = 0 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 2 > numChildren = 0 > > [zk: 192.168.1.122:2222(CONNECTED) 12] get /hbase/rs > > cZxid = 0x6 > ctime = Thu Jun 24 10:37:28 CST 2010 > mZxid = 0x6 > mtime = Thu Jun 24 10:37:28 CST 2010 > pZxid = 0x21 > cversion = 6 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 0 > numChildren = 2 > > [zk: 192.168.1.122:2222(CONNECTED) 19] ls /hbase/safe-mode > [] > > > > > 2010/6/24 梁景明 <[email protected]> > >> and more details, when i kill the process of hbase. restart it again >> ,regionserver on 60030 can see,it started ok. >> ,but master on 60010 show this . and the data /hbase still in hadoop hdfs. >> that 's what i want to say. >> the data /hbase stays ,but i can't find any way to start hbase again. >> >> >> HTTP ERROR: 500 >> >> Trying to contact region server null for region , row '', but failed after 3 >> attempts. >> Exceptions: >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying >> to locate root region because: Failed setting up proxy to >> /192.168.1.123:60020 after attempts=1 >> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying >> to locate root region because: Failed setting up proxy to >> /192.168.1.123:60020 after attempts=1 >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying >> to locate root region because: Failed setting up proxy to >> /192.168.1.123:60020 after attempts=1 >> >> RequestURI=/master.jsp >> Caused by: >> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact >> region server null for region , row '', but failed after 3 attempts. >> Exceptions: >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying >> to locate root region because: Failed setting up proxy to >> /192.168.1.123:60020 after attempts=1 >> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying >> to locate root region because: Failed setting up proxy to >> /192.168.1.123:60020 after attempts=1 >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying >> to locate root region because: Failed setting up proxy to >> /192.168.1.123:60020 after attempts=1 >> >> at >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1055) >> >> at >> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:75) >> at >> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:48) >> at >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:454) >> >> at >> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127) >> at >> org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:132) >> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) >> >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) >> at >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) >> at >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) >> >> at >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >> at >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) >> at >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) >> >> at >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) >> at >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) >> at >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) >> >> at org.mortbay.jetty.Server.handle(Server.java:324) >> at >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) >> at >> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) >> >> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) >> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) >> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) >> at >> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) >> >> at >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) >> >> *Powered by Jetty:// <http://jetty.mortbay.org/>* >> >> >> >> >> >> >> >> >> >> >> 2010/6/24 梁景明 <[email protected]> >> >> exactly like this . it 's some problem with zookeeper, i am not sure what >>> happen to zookeeper, >>> it is all started .but port 60030 and 60010 not ok. >>> >>> --------------------------------------------------------------------------- >>> futur...@test1:~/hbase$ bin/start-hbase.sh >>> test1: zookeeper running as process 18596. Stop it first. >>> master running as process 20047. Stop it first. >>> s1.idfs.cn: regionserver running as process 18829. Stop it first. >>> s2.idfs.cn: regionserver running as process 18763. Stop it first. >>> >>> ------------------------------------------------------------------------------------------ >>> >>> and logs in hbase give me the following, and i dont know how to deal with >>> it.if zookeeper is dead or goes with some problems, >>> how do i do> stop-hbase.sh & start-hbase.sh don't work at all >>> >>> >>> ------------------------------------------------------------------------------------------------------------ >>> 2010-06-24 11:33:29,713 WARN >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase >>> -- check quorum servers, currently=test1:2222 >>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>> KeeperErrorCode = ConnectionLoss for /hbase >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780) >>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:808) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:432) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeMasterAddress(ZooKeeperWrapper.java:520) >>> at >>> org.apache.hadoop.hbase.master.HMaster.writeAddressToZooKeeper(HMaster.java:260) >>> at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:242) >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>> Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>> at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1230) >>> at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1271) >>> 2010-06-24 11:33:31,202 INFO org.apache.zookeeper.ClientCnxn: Attempting >>> connection to server test1/192.168.1.122:2222 >>> 2010-06-24 11:33:31,203 INFO org.apache.zookeeper.ClientCnxn: Priming >>> connection to java.nio.channels.SocketChannel[connected local=/ >>> 192.168.1.122:52706 remote=test1/192.168.1.122:2222] >>> 2010-06-24 11:33:31,203 INFO org.apache.zookeeper.ClientCnxn: Server >>> connection successful >>> 2010-06-24 11:33:31,204 WARN org.apache.zookeeper.ClientCnxn: Exception >>> closing session 0x0 to sun.nio.ch.selectionkeyi...@163f7a1 >>> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 >>> lim=4 cap=4] >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) >>> 2010-06-24 11:33:31,204 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown input >>> java.net.SocketException: Transport endpoint is not connected >>> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) >>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>> 2010-06-24 11:33:31,204 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown output >>> java.net.SocketException: Transport endpoint is not connected >>> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) >>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >>> >>> >>> >>> 2010/6/22 Jean-Daniel Cryans <[email protected]> >>> >>> I'm not sure I understand what you describe, and since you didn't post >>>> any output from your logs then it's really hard to help you debug. >>>> >>>> What's the problem exactly and do you see any exception in the logs? >>>> >>>> J-D >>>> >>>> On Mon, Jun 21, 2010 at 2:48 AM, 梁景明 <[email protected]> wrote: >>>> > after reading "Description of how HBase uses ZooKeeper"i see my problem >>>> > maybe that the regionserver session in zk is lost! >>>> > >>>> > and i use bin/start-hbase.sh cant start hbase successfully . >>>> > >>>> > because they connect to zookeeper something lost? >>>> > >>>> > to start it.one way i think zookeeper start alone ,and i delete >>>> "/hbase" in >>>> > it , and run the start-hbase.sh shell again? >>>> > >>>> > will it be ok? >>>> > >>>> > 2010/6/19 Jean-Daniel Cryans <[email protected]> >>>> > >>>> >> > do u mean if ZooKeeper is dead,the data will lose? >>>> >> >>>> >> If your Zookeeper ensemble is dead, then HBase will be unavailable but >>>> >> you won't lose any data. And even if your zookeeper data is wiped out, >>>> >> like I said it's only runtime data so it doesn't matter. >>>> >> >>>> >> > >>>> >> > in that case,ZooKeeper lost .META or .ROOT ,the data in hadoop will >>>> never >>>> >> be >>>> >> > recover , thought there were some table folders in hadoop. >>>> >> >>>> >> HBase stores the location of -ROOT- in Zookeeper, and that's changed >>>> >> everytime the region moves. Losing that won't make -ROOT- disappear >>>> >> forever, it's still in HDFS. >>>> >> >>>> >> Does it answer the question? (I'm not sure I fully understand you) >>>> >> >>>> >> J-D >>>> >> >>>> > >>>> >>> >>> >> >
