I cannot find any useful information from pasted logs.
On Tue, Apr 16, 2013 at 11:22 AM, dylan <dwld0...@gmail.com> wrote: > yes. I have just discovered.**** > > ** ** > > I find the Slave01 and Slave03 zookeeper.out under zookeeper_home/bin/*** > * > > But Slave02(which reboot before) zookeeper_home under / directory after > reboot **** > > ** ** > > *Slave02 zookeeper.out show:* > > WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting > SendWorker**** > > 2013-04-15 16:38:31,987 [myid:2] - WARN > [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while > waiting for message on queue**** > > java.lang.InterruptedException**** > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > **** > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094) > **** > > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370)* > *** > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831) > **** > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62) > **** > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667) > **** > > [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@688] - Send > worker leaving thread**** > > [myid:2] - INFO [Slave02/192.168.75.243:3888 > :QuorumCnxManager$Listener@493] - Received connection request / > 192.168.75.242:51136**** > > [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@542] - > Notification: 1 (n.leader), 0x50000037d (n.zxid), 0x1 (n.round), LOOKING > (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)**** > > [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@542] - > Notification: 1 (n.leader), 0x50000037d (n.zxid), 0x2 (n.round), LOOKING > (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)**** > > ** ** > > ** ** > > *Slave01 zookeeper.out show:* > > [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627] > - Got user-level KeeperException when processing > sessionid:0x13e0dc5a0890005 type:create cxid:0x1e zxid:0xb0000003c > txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired > Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired**** > > 2013-04-16 10:58:26,415 [myid:1] - INFO [ProcessThread(sid:1 > cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException > when processing sessionid:0x13e0dc5a0890006 type:create cxid:0x7 > zxid:0xb0000003d txntype:-1 reqpath:n/a Error > Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for > /hbase/online-snapshot/acquired**** > > 2013-04-16 10:58:26,431 [myid:1] - INFO [ProcessThread(sid:1 > cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException > when processing sessionid:0x13e0dc5a0890007 type:create cxid:0x7 > zxid:0xb0000003e txntype:-1 reqpath:n/a Error > Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for > /hbase/online-snapshot/acquired**** > > 2013-04-16 10:58:26,489 [myid:1] - INFO [ProcessThread(sid:1 > cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException > when processing sessionid:0x23e0dc5a333000a type:create cxid:0x7 > zxid:0xb0000003f txntype:-1 reqpath:n/a Error > Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for > /hbase/online-snapshot/acquired**** > > 2013-04-16 10:58:36,001 [myid:1] - INFO > [SessionTracker:ZooKeeperServer@325] - Expiring session > 0x33e0dc5b4de0003, timeout of 40000ms exceeded**** > > 2013-04-16 10:58:36,001 [myid:1] - INFO [ProcessThread(sid:1 > cport:-1)::PrepRequestProcessor@476] - Processed session termination for > sessionid: 0x33e0dc5b4de0003**** > > 2013-04-16 11:03:44,000 [myid:1] - INFO > [SessionTracker:ZooKeeperServer@325] - Expiring session > 0x23e0dc5a333000b, timeout of 40000ms exceeded**** > > 2013-04-16 11:03:44,001 [myid:1] - INFO [ProcessThread(sid:1 > cport:-1)::PrepRequestProcessor@476] - Processed session termination for > sessionid: 0x23e0dc5a333000b**** > > ** ** > > ** ** > > ** ** > > *发件人:* Azuryy Yu [mailto:azury...@gmail.com] > *发送时间:* 2013年4月16日 11:13 > *收件人:* user@hadoop.apache.org > *主题:* Re: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this > should eventually complete or the server will expire, send RPC again**** > > ** ** > > then, can you find zookeeper log under zookeeper_home/zookeeper.out ?**** > > ** ** > > On Tue, Apr 16, 2013 at 11:04 AM, dylan <dwld0...@gmail.com> wrote:**** > > I use hbase shell **** > > **** > > I always show :**** > > ERROR: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hbase.PleaseHoldException: Master is initializing**** > > **** > > *发件人:* Azuryy Yu [mailto:azury...@gmail.com] **** > > *发送时间:* 2013年4月16日 10:59 > *收件人:* user@hadoop.apache.org > *主题:* Re: 答复: 答复: 答复: Region has been CLOSING for too long, this should > eventually complete or the server will expire, send RPC again**** > > **** > > did your hbase managed zookeeper? or did you set export > HBASE_MANAGES_ZK=false in the hbase-env.sh?**** > > **** > > if not, then that's zookeeper port conflicted.**** > > **** > > On Tue, Apr 16, 2013 at 10:55 AM, dylan <dwld0...@gmail.com> wrote:**** > > # The number of milliseconds of each tick**** > > tickTime=2000**** > > # The number of ticks that the initial **** > > # synchronization phase can take**** > > initLimit=10**** > > # The number of ticks that can pass between **** > > # sending a request and getting an acknowledgement**** > > syncLimit=5**** > > # the directory where the snapshot is stored.**** > > # do not use /tmp for storage, /tmp here is just **** > > # example sakes.**** > > dataDir=/usr/cdh4/zookeeper/data**** > > # the port at which the clients will connect**** > > clientPort=2181**** > > **** > > server.1=Slave01:2888:3888**** > > server.2=Slave02:2888:3888**** > > server.3=Slave03:2888:3888**** > > **** > > *发件人:* Azuryy Yu [mailto:azury...@gmail.com] **** > > *发送时间:* 2013年4月16日 10:45 > *收件人:* user@hadoop.apache.org > *主题:* Re: 答复: 答复: Region has been CLOSING for too long, this should > eventually complete or the server will expire, send RPC again**** > > **** > > and paste ZK configuration in the zookeerp_home/conf/zoo.cfg**** > > **** > > On Tue, Apr 16, 2013 at 10:42 AM, Azuryy Yu <azury...@gmail.com> wrote:*** > * > > it located under hbase-home/logs/ if your zookeeper is managed by hbase.* > *** > > **** > > but I noticed you configured QJM, then did your QJM and Hbase share the > same ZK cluster? if so, then just paste your QJM zk configuration in the > hdfs-site.xml and hbase zk configuration in the hbase-site.xml.**** > > **** > > On Tue, Apr 16, 2013 at 10:37 AM, dylan <dwld0...@gmail.com> wrote:**** > How to check zookeeper log?? It is the binary files, how to transform it > to normal log? **** ****I find the “ > org.apache.zookeeper.server.LogFormatter”, how to run?**** **** > > **** > > *发件人:* Azuryy Yu [mailto:azury...@gmail.com] > *发送时间:* 2013年4月16日 10:01 > *收件人:* user@hadoop.apache.org > *主题:* Re: 答复: Region has been CLOSING for too long, this should > eventually complete or the server will expire, send RPC again**** > > **** > > This is zookeeper issue.**** > > **** > > please paste zookeeper log here. thanks.**** > > **** > > On Tue, Apr 16, 2013 at 9:58 AM, dylan <dwld0...@gmail.com> wrote:**** > > It is hbase-0.94.2-cdh4.2.0.**** > > **** > > *发件人:* Ted Yu [mailto:yuzhih...@gmail.com] > *发送时间:* 2013年4月16日 9:55 > *收件人:* u...@hbase.apache.org > *主题:* Re: Region has been CLOSING for too long, this should eventually > complete or the server will expire, send RPC again**** > > **** > > I think this question would be more appropriate for HBase user mailing > list.**** > > **** > > Moving hadoop user to bcc.**** > > **** > > Please tell us the HBase version you are using.**** > > **** > > Thanks**** > > On Mon, Apr 15, 2013 at 6:51 PM, dylan <dwld0...@gmail.com> wrote:**** > > Hi**** > > **** > > I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes > for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node. > ), 3 DN nodes with zookeepers, It works fine. When I reboot one data > node machine which includes zookeeper, after that , restart all processes. > The hadoop works fine, but hbase not. I cannot disable tables and drop > tables.**** > > **** > > The logs an follows:**** > > The Hbase HMaster log:**** > > DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to > unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere > **** > > ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in > transition timed out: -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865, > server=Master,60000,1366001238313**** > > ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has > been CLOSING for too long, this should eventually complete or the server > will expire, send RPC again**** > > 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting > unassignment of region -ROOT-,,0.70236052 (offlining)**** > > **** > > The Hbase HRegionServer log:**** > > **** > > DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44 > MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0, > hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, > evictions=0, evicted=0, evictedPerRun=NaN**** > > **** > > The Hbase Web show:**** > > Region State**** > > 70236052 -ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST > 2013 (75440s ago), server=Master,60000,1366001238313**** > > **** > > How fix it?**** > > **** > > Thanks.**** > > **** > > **** > > **** > > **** > > **** > > ** ** >