A clean log of a full master startup would be really useful, can't tell much more by the current info you provided.
J-D On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D. <[email protected]> wrote: > I am having issues performing any operations (list/create/put) on my hbase > instance once it starts up. > > The environment: > Red Hat 5.5 > Hadoop 0.20.2 > HBase 0.20.4 > java 1.6.0_20 > 1 running master > 23 running regionserver + 3 also running zookeeper > > When attemting to do a list from the hbase shell it returns this error: > NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null > > When attempting to perform inserts from a hadoop job I see the following > error in my application: > > 2010-08-13 14:03:22.207 INFO [main] JobClient:1317 Task Id : > attempt_201006091333_0031_m_000000_0, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg > ion(HConnectionManager.java:930) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > HConnectionManager.java:581) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio > n(HConnectionManager.java:563) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI > nMeta(HConnectionManager.java:694) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > HConnectionManager.java:590) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio > n(HConnectionManager.java:563) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI > nMeta(HConnectionManager.java:694) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > HConnectionManager.java:594) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > HConnectionManager.java:557) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127) > ... > > Now contrary to what the shell is reporting, the HMaster process is > definitely running (along with HRegionServer and HQuorumPeer on the > appropriate other nodes in the cluster). I do not see any errors in the > master log, though interestingly I noticed a log message mentioning only 7 > region servers - in fact there are more than twice that many in the cluster. > > 2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7 > region servers, 0 dead, average load 3.142857142857143 > > The last clue I have is some exceptions in the zookeeper logs: > > 2010-08-13 13:34:16,041 WARN > org.apache.zookeeper.server.PrepRequestProcessor: Got exception when > processing sessionid:0x12a6d2847e40000 type:create cxid:0x28 > zxid:0xfffffffffffffffe txntype:unknown n/a > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess > or.java:245) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja > va:114) > 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn: > Connected to /128.115.210.161:35883 lastZxid 0 > 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn: > Creating new session 0x12a6d2847e40001 > 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn: > Finished init of 0x12a6d2847e40001 valid:true > 2010-08-13 14:05:08,802 WARN > org.apache.zookeeper.server.PrepRequestProcessor: Got exception when > processing sessionid:0x12a6d2847e40001 type:create cxid:0x1 > zxid:0xfffffffffffffffe txntype:unknown n/a > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess > or.java:245) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja > va:114) > 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn: > Exception causing close of session 0x12a6d2847e40001 due to > java.io.IOException: Read error > 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn: > closing session:0x12a6d2847e40001 NIOServerCnxn: > java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181 > remote=/128.115.210.161:35883] > > HBase was running on this cluster a few months ago so I doubt it is a > blatant misconfiguration at fault. I've tried restarting everything hbase or > hadoop related as well as wiping out the hbase data directory on hdfs to > start fresh with no result. Any hints or suggestions as to what the problem > might be are greatly appreciated. Thanks! > > > > > > >
