Try cleaning up your zookeeper data.. I have had similar issues before due to corrupt zookeeper data/bad zookeeper state
------------------------------ On Sat 30 Jun, 2012 4:12 AM IST Jay Wilson wrote: >I "somewhat" have HBase up and running in a distributed mode. It starts >fine, I can use "hbase shell" to create, disable, and drop tables; >however, after a short period of time HMaster and the HRegionalservers >terminate. Decoding the error messages is a bit bewildering and the >O'Reilly HBase book hasn't helped much with message decoding. > > > >Here is a snippet of the messages from a regionalserver log: > >~~~ > >U Stats: total=6.68 MB, free=807.12 MB, max=813.8 MB, blocks=2, >accesses=19, hits=17, hitRatio=89.47%%, cachingAccesses=17, >cachingHits=15, cachingHitsRatio=88. > >23%%, evictions=0, evicted=0, evictedPerRun=NaN > >2012-06-27 12:36:47,103 DEBUG >org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.68 >MB, free=807.12 MB, max=813.8 MB, blocks=2, accesses=19, hits=17, >hitRatio=89.47%%, cachingAccesses=17, cachingHits=15, cachingHitsRatio=88. > >23%%, evictions=0, evicted=0, evictedPerRun=NaN > >2012-06-27 12:40:02,106 INFO org.apache.zookeeper.ClientCnxn: Unable to >read additional data from server sessionid 0x382f6861690003, likely >server has closed socket, closing socket connection and attempting >reconnect > >2012-06-27 12:40:02,112 INFO org.apache.zookeeper.ClientCnxn: Unable to >read additional data from server sessionid 0x382f6861690004, likely >server has closed socket, closing socket connection and attempting >reconnect > >2012-06-27 12:40:02,245 INFO org.apache.zookeeper.ClientCnxn: Opening >socket connection to server devrackA-01/172.18.0.2:2181 > >2012-06-27 12:40:02,247 WARN org.apache.zookeeper.ClientCnxn: Session >0x382f6861690003 for server null, unexpected error, closing socket >connection and attempting reconnect > >java.net.NoRouteToHostException: No route to host > >~~~ > >No route to host would imply it can't reach one of my HQuorumpeers, but >it talks to them when I first run start-hase.sh. Also there is no DNS >involved, the /etc/hosts files are identical on all nodes, and it's >currently a closed cluster. All nodes are on the same subnet 172.18/16 > > > >Do I have something wrong in one of my xml files: > > > >Core-site.xml: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>/var/hbase-hadoop/tmp</value> > </property> > <property> > <name>fs.default.name</name> > <value>hdfs://devrackA-00:8020</value> > <final>true</final> > </property> > </configuration> > > > >Hdfs-site.xml: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > <configuration> > <property> > <name>dfs.replication</name> > <value>3</value> > </property> > <property> > <name>dfs.name.dir</name> > <value>/var/hbase-hadoop/name</value> > </property> > <property> > <name>dfs.data.dir</name> > <value>/var/hbase-hadoop/data</value> > </property> > <property> > <name>fs.checkpoint.dir</name> > <value>/var/hbase-hadoop/namesecondary</value> > </property> > <property> > <name>dfs.datanode.max.xcievers</name> > <value>4096</value> > </property> > </configuration> > > > >Hbase-site.xml: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > <!-- > /** > * Copyright 2010 The Apache Software Foundation > * > * Licensed to the Apache Software Foundation (ASF) under one > * or more contributor license agreements. See the NOTICE file > * distributed with this work for additional information > * regarding copyright ownership. The ASF licenses this file > * to you under the Apache License, Version 2.0 (the > * "License"); you may not use this file except in compliance > * with the License. You may obtain a copy of the License at > * > * http://www.apache.org/licenses/LICENSE-2.0 > * > * Unless required by applicable law or agreed to in writing, software > * distributed under the License is distributed on an "AS IS" BASIS, > * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > implied. > * See the License for the specific language governing permissions and > * limitations under the License. > */ > --> > <configuration> > <property> > <name>hbase.rootdir</name> > <value>hdfs://devrackA-00:8020/var/hbase-hadoop/hbase</value> > </property> > <property> > <name>dfs.datanode.max.xcievers</name> > <value>4096</value> > </property> > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > <property> > <name>hbase.regionserver.handler.count</name> > <value>20</value> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>devrackA-00,devrackA-01,devrackA-25</value> > </property> > <property> > <name>hbase.zookeeper.property.dataDir</name> > <value>/var/hbase-hadoop/zookeeper</value> > </property> > <property> > <name>hbase.zookeeper.property.maxClientCnxns</name> > <value>500</value> > </property> > <property> > <name>hbase.zookeeper.property.clientPort</name> > <value>2181</value> > </property> > <property> > <name>hbase.zookeeper.property.initLimit</name> > <value>10</value> > </property> > <property> > <name>hbase.zookeeper.property.syncLimit</name> > <value>5</value> > </property> > <property> > <name>hbase.zookeeper.property.tickTime</name> > <value>2000000</value> > </property> > </configuration> > > > >Hbase-env.sh: > > # > #/** > # * Copyright 2007 The Apache Software Foundation > # * > # * Licensed to the Apache Software Foundation (ASF) under one > # * or more contributor license agreements. See the NOTICE file > # * distributed with this work for additional information > # * regarding copyright ownership. The ASF licenses this file > # * to you under the Apache License, Version 2.0 (the > # * "License"); you may not use this file except in compliance > # * with the License. You may obtain a copy of the License at > # * > # * http://www.apache.org/licenses/LICENSE-2.0 > # * > # * Unless required by applicable law or agreed to in writing, software > # * distributed under the License is distributed on an "AS IS" BASIS, > # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > implied. > # * See the License for the specific language governing permissions and > # * limitations under the License. > # */ > export HBASE_HEAPSIZE=4096 > export HBASE_MANAGES_ZK=true > > > >regionalservers: > > devrackA-03 > devrackB-31 > devrackA-25 > > > >master: > > devrackA-01 > > > >slaves: > > devrackA-01 > devrackA-03 > devrackA-04 > devrackA-05 > devrackA-06 > devrackA-07 > devrackA-08 > devrackA-09 > devrackA-10 > devrackA-11 > devrackA-12 > devrackA-17 > devrackA-18 > devrackA-19 > devrackA-20 > devrackA-21 > devrackA-22 > devrackA-23 > devrackA-24 > devrackA-25 > devrackB-07 > devrackB-08 > devrackB-09 > devrackB-10 > devrackB-11 > devrackB-12 > devrackB-13 > devrackB-14 > devrackB-15 > devrackB-16 > devrackB-17 > devrackB-18 > devrackB-19 > devrackB-20 > devrackB-21 > devrackB-22 > devrackB-23 > devrackB-24 > devrackB-25 > devrackB-26 > devrackB-27 > devrackB-28 > devrackB-29 > devrackB-30 > devrackB-31 > > > >Thanks for the help > > > >Jay Wilson >