Unable to perform list/create after startup

Marchwiak, Patrick D. Fri, 13 Aug 2010 16:50:58 -0700

I am having issues performing any operations (list/create/put) on my hbase
instance once it starts up.


The environment:
Red Hat 5.5
Hadoop 0.20.2
HBase 0.20.4
java 1.6.0_20
1 running master
23 running regionserver + 3 also running zookeeper

When attemting to do a list from the hbase shell it returns this error:
NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null

When attempting to perform inserts from a hadoop job I see the following
error in my application:

2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
attempt_201006091333_0031_m_000000_0, Status : FAILED
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
ion(HConnectionManager.java:930)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:581)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
n(HConnectionManager.java:563)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
nMeta(HConnectionManager.java:694)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:590)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
n(HConnectionManager.java:563)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
nMeta(HConnectionManager.java:694)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:594)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:557)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
...

Now contrary to what the shell is reporting, the HMaster process is
definitely running (along with HRegionServer and HQuorumPeer on the
appropriate other nodes in the cluster). I do not see any errors in the
master log, though interestingly I noticed a log message mentioning only 7
region servers - in fact there are more than twice that many in the cluster.

2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7
region servers, 0 dead, average load 3.142857142857143

The last clue I have is some exceptions in the zookeeper logs:

2010-08-13 13:34:16,041 WARN
org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
zxid:0xfffffffffffffffe txntype:unknown n/a
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists
        at 
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
or.java:245)
        at 
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
va:114)
2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
Connected to /128.115.210.161:35883 lastZxid 0
2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
Creating new session 0x12a6d2847e40001
2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn:
Finished init of 0x12a6d2847e40001 valid:true
2010-08-13 14:05:08,802 WARN
org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
zxid:0xfffffffffffffffe txntype:unknown n/a
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists
        at 
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
or.java:245)
        at 
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
va:114)
2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn:
Exception causing close of session 0x12a6d2847e40001 due to
java.io.IOException: Read error
2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn:
closing session:0x12a6d2847e40001 NIOServerCnxn:
java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
remote=/128.115.210.161:35883]

HBase was running on this cluster a few months ago so I doubt it is a
blatant misconfiguration at fault. I've tried restarting everything hbase or
hadoop related as well as wiping out the hbase data directory on hdfs to
start fresh with no result. Any hints or suggestions as to what the problem
might be are greatly appreciated. Thanks!

Unable to perform list/create after startup

Reply via email to