Re: HBase Cluster not responding after shutting down one of slave nodes

おぎばやしひろのり Tue, 23 Feb 2016 01:51:08 -0800

Thanks Samir,

I tried the same operation again, this time cluster was recovered
correctly, I compared the logs.


Here is the host7517 regionserver log in the cluster-not-responding
case. "NB dead servers", "Log hdfs://... still exists at ..." lines
appears again and again.

----
2016-02-17 15:41:49,160 INFO  [ReplicationExecutor-0]
replication.ReplicationQueuesZKImpl: Atomically moving
host7516.mydomain,16020,1455618299902's wals to my queue
2016-02-17 15:41:49,171 INFO  [ReplicationExecutor-0]
replication.ReplicationQueuesZKImpl: Atomically moved the dead
regionserver logs.
2016-02-17 15:41:49,268 INFO  [ReplicationExecutor-0]
zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x1cfa9c9f connecting to ZooKeeper
ensemble=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
2016-02-17 15:41:49,268 INFO  [ReplicationExecutor-0]
zookeeper.ZooKeeper: Initiating client connection,
connectString=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
sessionTimeout=90000 watcher=hconnection-0x1cfa9c9f0x0,
quorum=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181,
baseZNode=/hbase-unsecure
2016-02-17 15:41:49,276 INFO
[ReplicationExecutor-0-SendThread(192.168.167.71:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
192.168.167.71/192.168.167.71:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-02-17 15:41:49,276 INFO
[ReplicationExecutor-0-SendThread(192.168.167.71:2181)]
zookeeper.ClientCnxn: Socket connection established to
192.168.167.71/192.168.167.71:2181, initiating session
2016-02-17 15:41:49,279 INFO
[ReplicationExecutor-0-SendThread(192.168.167.71:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
192.168.167.71/192.168.167.71:2181, sessionid = 0x25210f2711800fb,
negotiated timeout = 40000
2016-02-17 15:41:49,281 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902]
zookeeper.RecoverableZooKeeper: Process identifier=connection to
cluster: 1-host7516.mydomain,16020,1455618299902 connecting to
ZooKeeper 
ensemble=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
2016-02-17 15:41:49,281 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902]
zookeeper.ZooKeeper: Initiating client connection,
connectString=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
sessionTimeout=90000 watcher=connection to cluster:
1-host7516.mydomain,16020,14556182999020x0,
quorum=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181,
baseZNode=/hbase-unsecure
2016-02-17 15:41:49,283 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902-SendThread(192.168.189.222:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
192.168.189.222/192.168.189.222:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-02-17 15:41:49,283 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902-SendThread(192.168.189.222:2181)]
zookeeper.ClientCnxn: Socket connection established to
192.168.189.222/192.168.189.222:2181, initiating session
2016-02-17 15:41:49,286 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902-SendThread(192.168.189.222:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
192.168.189.222/192.168.189.222:2181, sessionid = 0x15210f2693c00d1,
negotiated timeout = 40000
2016-02-17 15:41:49,286 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902]
regionserver.ReplicationSource: Replicating
5d5e12bc-25ce-411f-8393-2b0f3ab0a779 ->
0cd74950-c586-4fcb-aca5-1db49ce76d14
2016-02-17 15:41:49,290 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902]
regionserver.ReplicationSource: NB dead servers : 1
2016-02-17 15:41:49,291 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902]
regionserver.ReplicationSource: Possible location
hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902/host7516.mydomain%252C16020%252C1455618299902.default.1455690304230
2016-02-17 15:41:49,291 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902]
regionserver.ReplicationSource: Possible location
hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%252C16020%252C1455618299902.default.1455690304230
2016-02-17 15:41:49,292 INFO
[ReplicationExecutor-0.replicationSource,1-host7516.mydomain,16020,1455618299902]
regionserver.ReplicationSource: Log
hdfs://hdpts/apps/hbase/data/oldWALs/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
still exists at
hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
----

Finally, at 15:56:55, master server start splitting
host7516.mydomain,16020,1455618299902. after that, cluster recovered.

---
2016-02-17 15:56:55,625 INFO
[MASTER_SERVER_OPERATIONS-HOST7512:16000-4]
handler.ServerShutdownHandler: Splitting logs for
host7516.mydomain,16020,1455618299902 before assignment; region
count=73
2016-02-17 15:56:55,627 INFO
[MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
dead splitlog workers [host7516.mydomain,16020,1455618299902]
2016-02-17 15:56:55,628 INFO
[MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
started splitting 1 logs in
[hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting]
for [host7516.mydomain,16020,1455618299902]
---

Here is the host7517 regionserver log in the correctly-recovered case.
(This time, I shutdown host7515 which has hbase:meta region)
I can see "Done with the recovered queue" log.

----
2016-02-23 16:52:24,632 INFO  [ReplicationExecutor-0]
replication.ReplicationQueuesZKImpl: Atomically moving
host7515.mydomain,16020,1455786190819's wals to my queue
2016-02-23 16:52:24,637 INFO  [ReplicationExecutor-0]
replication.ReplicationQueuesZKImpl: Atomically moved the dead
regionserver logs.
2016-02-23 16:52:24,705 INFO  [ReplicationExecutor-0]
zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x4610fe09 connecting to ZooKeeper
ensemble=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
2016-02-23 16:52:24,705 INFO  [ReplicationExecutor-0]
zookeeper.ZooKeeper: Initiating client connection,
connectString=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
sessionTimeout=90000 watcher=hconnection-0x4610fe090x0,
quorum=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181,
baseZNode=/hbase-unsecure
2016-02-23 16:52:24,714 INFO
[ReplicationExecutor-0-SendThread(192.168.189.222:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
192.168.189.222/192.168.189.222:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-02-23 16:52:24,715 INFO
[ReplicationExecutor-0-SendThread(192.168.189.222:2181)]
zookeeper.ClientCnxn: Socket connection established to
192.168.189.222/192.168.189.222:2181, initiating session
2016-02-23 16:52:24,718 INFO
[ReplicationExecutor-0-SendThread(192.168.189.222:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
192.168.189.222/192.168.189.222:2181, sessionid = 0x15210f2693c00e2,
negotiated timeout = 40000
2016-02-23 16:52:24,719 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819]
zookeeper.RecoverableZooKeeper: Process identifier=connection to
cluster: 1-host7515.mydomain,16020,1455786190819 connecting to
ZooKeeper 
ensemble=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
2016-02-23 16:52:24,720 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819]
zookeeper.ZooKeeper: Initiating client connection,
connectString=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181
sessionTimeout=90000 watcher=connection to cluster:
1-host7515.mydomain,16020,14557861908190x0,
quorum=repl7503.mydomain:2181,repl7504.mydomain:2181,repl7502.mydomain:2181,
baseZNode=/hbase-unsecure
2016-02-23 16:52:24,722 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819-SendThread(192.168.167.71:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
192.168.167.71/192.168.167.71:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-02-23 16:52:24,722 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819-SendThread(192.168.167.71:2181)]
zookeeper.ClientCnxn: Socket connection established to
192.168.167.71/192.168.167.71:2181, initiating session
2016-02-23 16:52:24,724 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819-SendThread(192.168.167.71:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
192.168.167.71/192.168.167.71:2181, sessionid = 0x25210f271180110,
negotiated timeout = 40000
2016-02-23 16:52:24,725 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819]
regionserver.ReplicationSource: Replicating
5d5e12bc-25ce-411f-8393-2b0f3ab0a779 ->
0cd74950-c586-4fcb-aca5-1db49ce76d14
2016-02-23 16:52:25,747 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819]
regionserver.ReplicationSourceManager: Done with the recovered queue
1-host7515.mydomain,16020,1455786190819
2016-02-23 16:52:25,750 INFO
[ReplicationExecutor-0.replicationSource,1-host7515.mydomain,16020,1455786190819]
regionserver.ReplicationSource: Finished recovering the queue with the
following stats Total replicated edits: 0, currently replicating from:
hdfs://hdpts/apps/hbase/data/oldWALs/host7515.mydomain%2C16020%2C1455786190819.default.1456212089608
at position: 81519135
----

Actually, there was a replication setting of 'ns2:usertbl' table,
which has just 1 region and its on host7516.
I have not find the root cause, but this is what I noticed.

Regards,
Hironori

2016-02-19 20:51 GMT+09:00 Samir Ahmic <ahmic.sa...@gmail.com>:
> I will search for related  jiras maybe you have hit some known bug. If you
> find something interesting in logs please post it here so we can open
> ticket if needed.
>
> Regards
> Samir
>
> On Fri, Feb 19, 2016 at 12:30 PM, おぎばやしひろのり <ogibaya...@gmail.com> wrote:
>
>> Thanks Samir,
>>
>> I checked that blog,helped me to understand what is going on when datanode
>> dies.
>> But still not sure why it took 15 minutes.
>> I will check the server logs again, carefully.
>>
>> Regards,
>> Hironori Ogibayashi
>>
>> 2016-02-18 22:56 GMT+09:00 Samir Ahmic <ahmic.sa...@gmail.com>:
>> > Thanks for explaining Hironori,
>> > By shuting down DataNode process on host7516 we also have to add hadoop
>> > recovery time in total recovery time.
>> > Here is great blog explaining how hadoop recovery process works.
>> >
>> > *
>> http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
>> > <
>> http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/
>> >*
>> >
>> > Regards
>> > Samir
>> >
>> > On Thu, Feb 18, 2016 at 11:04 AM, おぎばやしひろのり <ogibaya...@gmail.com>
>> wrote:
>> >
>> >> Samir,
>> >>
>> >> Thank you for your advice.
>> >> The actual operation I had to shutdown host7516 was, just click "stop"
>> >> button on our VM console. I don't know the internal, but I can see
>> >> some process terminating messages in /var/log/messages, so it looks
>> >> like a kind of graceful shutdown.
>> >> About reproducing, yes, I can, but not every time. maybe once in 3,4
>> >> times or so. I am not sure the difference.
>> >>
>> >> I also checked the regionserver logs. There ware warnings related to
>> >> DataNode access, this is because there was DataNode on the shutdown
>> >> host7516, too.
>> >> Regarding log split, I couldn't find any errors. Both host finished
>> >> the task within a few seconds.
>> >>
>> >> --- host7517
>> >> 2016-02-17 15:39:59,002 INFO  [LruBlockCacheStatsExecutor]
>> >> hfile.LruBlockCache: totalSize=420.92 KB, freeSize=399.19 MB,
>> >> max=399.60 MB, blockCount=0, accesses=24227, hits=0, hitRatio=0,
>> >> cachingAccesses=0, cachingHits=0, cachingHitsRatio=0
>> >> ,evictions=7289, evicted=0, evictedPerRun=0.0
>> >> 2016-02-17 15:39:59,869 INFO  [HOST7517:16020Replication Statistics
>> >> #0] regionserver.Replication: Normal source for cluster 1: Total
>> >> replicated edits: 0, currently replicating from:
>> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451
>> >> at position: 96609590
>> >>
>> >> 2016-02-17 15:40:50,953 WARN  [ResponseProcessor for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125]
>> >> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125
>> >> java.io.IOException: Bad response ERROR for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 from
>> >> datanode DatanodeInfoWithStorage[192.168.184.73:50010
>> >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
>> >>         at
>> >>
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:786)
>> >> 2016-02-17 15:40:50,954 WARN  [DataStreamer for file
>> >>
>> >>
>> /apps/hbase/data/WALs/host7517.mydomain,16020,1455618296942/host7517.mydomain%2C16020%2C1455618296942.default.1455690301451
>> >> block BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125]
>> >> hdfs.DFSClient: Error Recovery for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753847_13125 in
>> >> pipeline DatanodeInfoWithStorage[192.168.185.249:50010
>> >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK],
>> >> DatanodeInfoWithStorage[192.168.184.85:50010
>> >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK],
>> >> DatanodeInfoWithStorage[192.168.184.73:50010
>> >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]:
>> >> bad datanode DatanodeInfoWithStorage[192.168.184.73:50010
>> >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
>> >> ...
>> >> 2016-02-17 15:56:55,633 INFO  [SplitLogWorker-HOST7517:16020]
>> >> coordination.ZkSplitLogWorkerCoordination: worker
>> >> host7517.mydomain,16020,1455618296942 acquired task
>> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230
>> >> 2016-02-17 15:56:55,677 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> wal.WALSplitter: Splitting wal:
>> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230,
>> >> length=83
>> >> 2016-02-17 15:56:55,677 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> wal.WALSplitter: DistributedLogReplay = false
>> >> 2016-02-17 15:56:55,685 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> util.FSHDFSUtils: Recovering lease on dfs file
>> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
>> >> 2016-02-17 15:56:55,696 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> util.FSHDFSUtils: recoverLease=false, attempt=0 on
>> >>
>> >>
>> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
>> >> after 11ms
>> >> 2016-02-17 15:56:59,698 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> util.FSHDFSUtils: recoverLease=true, attempt=1 on
>> >>
>> >>
>> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020
>> >> 2016-02-17 15:56:59,771 INFO
>> >> [RS_LOG_REPLAY_OPS-HOST7517:16020-0-Writer-1] wal.WALSplitter:
>> >> Creating writer
>> >>
>> >>
>> path=hdfs://hdpts/apps/hbase/data/data/default/usertable/64199c31957c01b5bd9ee50b02e1f7fd/recovered.edits/0000000000000542072.temp
>> >> region=64199c31957c01b5bd9ee50b02e1f7fd
>> >> ...(similar lines)
>> >> 2016-02-17 15:57:01,001 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> wal.WALSplitter: Split writers finished
>> >> 2016-02-17 15:57:01,034 INFO  [split-log-closeStream-3]
>> >> wal.WALSplitter: Rename
>> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/data/default/usertable/06ed7277b7b9539a3ba597e0041acb12/recovered.edits/0000000000000054100.temp
>> >> to
>> >>
>> hdfs://hdpts/apps/hbase/data/data/default/usertable/06ed7277b7b9539a3ba597e0041acb12/recovered.edits/0000000000000055498
>> >> ...(similar lines)
>> >> 2016-02-17 15:57:01,332 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> wal.WALSplitter: Processed 73326 edits across 53 regions; edits
>> >> skipped=0; log
>> >>
>> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230,
>> >> length=83, corrupted=false, progress failed=false
>> >> 2016-02-17 15:57:01,336 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> coordination.ZkSplitLogWorkerCoordination: successfully transitioned
>> >> task
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230
>> >> to final state DONE host7517.mydomain,16020,1455618296942
>> >> 2016-02-17 15:57:01,336 INFO  [RS_LOG_REPLAY_OPS-HOST7517:16020-0]
>> >> handler.WALSplitterHandler: worker
>> >> host7517.mydomain,16020,1455618296942 done with task
>> >>
>> >>
>> org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@7b0d8dc7
>> >> in 5691ms
>> >>
>> >> --- host7518
>> >>
>> >> 2016-02-17 15:40:55,957 WARN  [ResponseProcessor for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177]
>> >> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177
>> >> java.io.IOException: Bad response ERROR for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177 from
>> >> datanode DatanodeInfoWithStorage[192.168.184.73:50010
>> >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
>> >>         at
>> >>
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:786)
>> >> 2016-02-17 15:40:55,959 WARN  [DataStreamer for file
>> >>
>> >>
>> /apps/hbase/data/WALs/host7518.mydomain,16020,1455618298475/host7518.mydomain%2C16020%2C1455618298475.default.1455691227152
>> >> block BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177]
>> >> hdfs.DFSClient: Error Recovery for block
>> >> BP-1843495860-192.168.189.219-1453778090403:blk_1073753896_13177 in
>> >> pipeline DatanodeInfoWithStorage[192.168.184.85:50010
>> >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK],
>> >> DatanodeInfoWithStorage[192.168.184.73:50010
>> >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK],
>> >> DatanodeInfoWithStorage[192.168.185.249:50010
>> >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK]:
>> >> bad datanode DatanodeInfoWithStorage[192.168.184.73:50010
>> >> ,DS-9787b201-fc64-450e-a20f-dcc79fb94b6f,DISK]
>> >> 2016-02-17 15:40:56,043 INFO  [sync.4] wal.FSHLog: Slow sync cost:
>> >> 3774 ms, current pipeline:
>> >> [DatanodeInfoWithStorage[192.168.184.85:50010
>> >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK],
>> >> DatanodeInfoWithStorage[192.168.185.249:50010
>> >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK],
>> >> DatanodeInfoWithStorage[192.168.189.208:50010
>> >> ,DS-882dca14-8651-4ffe-a102-8beebc774575,DISK]]
>> >> 2016-02-17 15:40:56,044 INFO  [sync.3] wal.FSHLog: Slow sync cost:
>> >> 20103 ms, current pipeline:
>> >> [DatanodeInfoWithStorage[192.168.184.85:50010
>> >> ,DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28,DISK],
>> >> DatanodeInfoWithStorage[192.168.185.249:50010
>> >> ,DS-52f00167-e863-48d8-8f98-3b613f774b0c,DISK],
>> >> DatanodeInfoWithStorage[192.168.189.208:50010
>> >> ,DS-882dca14-8651-4ffe-a102-8beebc774575,DISK]]
>> >> ...
>> >> 2016-02-17 15:41:16,016 INFO  [SplitLogWorker-HOST7518:16020]
>> >> coordination.ZkSplitLogWorkerCoordination: worker
>> >> host7518.mydomain,16020,1455618298475 acquired task
>> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta
>> >> 2016-02-17 15:41:16,194 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> wal.WALSplitter: Splitting wal:
>> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta,
>> >> length=83
>> >> 2016-02-17 15:41:16,194 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> wal.WALSplitter: DistributedLogReplay = false
>> >> 2016-02-17 15:41:16,252 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> util.FSHDFSUtils: Recovering lease on dfs file
>> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> 2016-02-17 15:41:16,396 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> util.FSHDFSUtils: recoverLease=false, attempt=0 on
>> >>
>> >>
>> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> after 144ms
>> >> 2016-02-17 15:41:20,398 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> util.FSHDFSUtils: recoverLease=true, attempt=1 on
>> >>
>> >>
>> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> after 4146ms
>> >> 2016-02-17 15:41:20,456 INFO
>> >> [RS_LOG_REPLAY_OPS-HOST7518:16020-0-Writer-2] wal.WALSplitter:
>> >> Creating writer
>> >>
>> >>
>> path=hdfs://hdpts/apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp
>> >> region=1588230740
>> >> 2016-02-17 15:41:20,457 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> wal.WALSplitter: Split writers finished
>> >> 2016-02-17 15:41:20,479 INFO  [split-log-closeStream-1]
>> >> wal.WALSplitter: Rename
>> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp
>> >> to
>> >>
>> hdfs://hdpts/apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009661
>> >> 2016-02-17 15:41:20,479 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> wal.WALSplitter: Processed 3 edits across 1 regions; edits
>> >> skipped=148; log
>> >>
>> >>
>> file=hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta,
>> >> length=83, corrupted=false, progress failed=false
>> >> 2016-02-17 15:41:20,482 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> coordination.ZkSplitLogWorkerCoordination: successfully transitioned
>> >> task
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta
>> >> to final state DONE host7518.mydomain,16020,1455618298475
>> >> 2016-02-17 15:41:20,482 INFO  [RS_LOG_REPLAY_OPS-HOST7518:16020-0]
>> >> handler.WALSplitterHandler: worker
>> >> host7518.mydomain,16020,1455618298475 done with task
>> >>
>> >>
>> org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@42ed43fe
>> >> in 4359ms
>> >>
>> >> Regards,
>> >> Hironori Ogibayashi
>> >>
>> >> 2016-02-18 5:12 GMT+09:00 Samir Ahmic <ahmic.sa...@gmail.com>:
>> >> > Hi Hironori,
>> >> >
>> >> > Based on your logs you have shutdown regionserver that was hosting
>> system
>> >> > "hbase:meta" table. This is special case since user regions can not be
>> >> > reassigned to other servers until "hbase:meta" table region is
>> >> reassigned.
>> >> > This can partially explain delay in recovery time you have
>> experienced.
>> >> > Also there is WAL splitting process which must be completed before
>> >> regions
>> >> > are reassigned. Still 15 minutes is too long period for recovery.
>> >> > How did you shutdown host7516? Also you should check logs for errors
>> >> > from host7517 and host7518 around this time 15:41 because they have
>> >> > acquired WAL splitting tasks.
>> >> > Also can you reproduce this every time in your tests ?
>> >> >
>> >> > Regards
>> >> > Samir
>> >> >
>> >> > On Wed, Feb 17, 2016 at 8:49 AM, おぎばやしひろのり <ogibaya...@gmail.com>
>> wrote:
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >> I am testing how HBase works in case of failure and experienced
>> >> >> unexpected behavior. If someone could tell me how to shorten recovery
>> >> >> time, that would be appreciated.
>> >> >>
>> >> >> What I tried was, first client start writing to HBase, and then, shut
>> >> >> down one of the slave node machines. I expected the writing operation
>> >> >> stops for a while (1 minute or so) and after HBase automatically
>> moves
>> >> >> failed regions to another slave node, client will be able to write
>> >> >> again.
>> >> >> But actually, it took about 15 minutes for the cluster to recover.
>> >> >> What was going on? I already have applied configurations presented on
>> >> >> https://hbase.apache.org/book.html#mttr
>> >> >>
>> >> >> My environment is:
>> >> >> - 3 master(NN, HMaster, ZK, etc.) nodes and 4 slave nodes(DN, RS),
>> all
>> >> >> of them are virtual machines.
>> >> >> - HDP 2.4.3 (HBase 1.1.1)
>> >> >> - CentOS 6.7
>> >> >>
>> >> >> Here are HMaster and NameNode log. HMaster detected failure of a
>> >> >> regionserver at 15:41:16 and trying to start recovery, but throwing
>> >> >> RetriesExhaustedException again and again, then, finally finished
>> >> >> recoverying at 15:57:11. I noticed that there are some logs about
>> >> >> lease recovery in NameNode log, but not sure this is related.
>> >> >>
>> >> >> ---- HMaster log ---
>> >> >> 2016-02-17 15:41:16,003 INFO  [main-EventThread]
>> >> >> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
>> >> >> processing expiration [host7516.mydomain,16020,1455618299902]
>> >> >> 2016-02-17 15:41:16,005 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> handler.MetaServerShutdownHandler: Splitting hbase:meta logs for
>> >> >> host7516.mydomain,16020,1455618299902
>> >> >> 2016-02-17 15:41:16,010 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> master.SplitLogManager: dead splitlog workers
>> >> >> [host7516.mydomain,16020,1455618299902]
>> >> >> 2016-02-17 15:41:16,011 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> master.SplitLogManager: started splitting 1 logs in
>> >> >>
>> >> >>
>> >>
>> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting]
>> >> >> for [host7516.
>> >> >> mydomain,16020,1455618299902]
>> >> >> 2016-02-17 15:41:16,016 INFO  [main-EventThread]
>> >> >> coordination.SplitLogManagerCoordination: task
>> >> >>
>> >> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..met
>> >> >> a.1455690873222.meta acquired by
>> host7518.mydomain,16020,1455618298475
>> >> >> 2016-02-17 15:41:16,266 INFO
>> >> >>
>> [host7512.mydomain,16000,1455618257882_splitLogManager__ChoreService_1]
>> >> >> master.SplitLogManager: total tasks = 1 unassigned = 0
>> >> >>
>> >>
>> tasks={/hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C14556182
>> >> >>
>> >> >>
>> >>
>> 99902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta=last_update
>> >> >> = 1455691276196 last_version = 2 cur_worker_name =
>> >> >> host7518.mydomain,16020,1455618298475 status = in_progress
>> incarnation
>> >> >> = 0 resub
>> >> >> mits = 0 batch = installed = 1 done = 0 error = 0}
>> >> >> 2016-02-17 15:41:20,481 INFO  [main-EventThread]
>> >> >> coordination.SplitLogManagerCoordination: task
>> >> >>
>> >> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..met
>> >> >> a.1455690873222.meta entered state: DONE
>> >> >> host7518.mydomain,16020,1455618298475
>> >> >> 2016-02-17 15:41:20,487 INFO  [main-EventThread] wal.WALSplitter:
>> >> >> Archived processed log
>> >> >>
>> >> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> >> to
>> >> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/oldWALs/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> >> 2016-02-17 15:41:20,487 INFO  [main-EventThread]
>> >> >> coordination.SplitLogManagerCoordination: Done splitting
>> >> >>
>> >> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta
>> >> >> 2016-02-17 15:41:20,492 WARN
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> master.SplitLogManager: returning success without actually splitting
>> >> >> and deleting all the log files in path
>> >> >>
>> >> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting
>> >> >> 2016-02-17 15:41:20,492 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> master.SplitLogManager: finished splitting (more than or equal to) 83
>> >> >> bytes in 1 log files in
>> >> >>
>> >> >>
>> >>
>> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting]
>> >> >> in 4481ms
>> >> >> 2016-02-17 15:41:20,492 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> handler.MetaServerShutdownHandler: Server
>> >> >> host7516.mydomain,16020,1455618299902 was carrying META. Trying to
>> >> >> assign.
>> >> >> 2016-02-17 15:41:20,492 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates:
>> >> >> Transition {1588230740 state=OPEN, ts=1455690873330,
>> >> >> server=host7516.mydomain,16020,1455618299902} to {1588230740
>> >> >> state=OFFLINE, ts=1455691280492,
>> >> >> server=host7516.mydomain,16020,1455618299902}
>> >> >> 2016-02-17 15:41:20,493 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates:
>> >> >> Offlined 1588230740 from host7516.mydomain,16020,1455618299902
>> >> >> 2016-02-17 15:41:27,495 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at
>> >> >> address=host7516.mydomain,16020,1455618299902, exception=Connection
>> >> >> timed out
>> >> >> 2016-02-17 15:41:27,495 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> zookeeper.MetaTableLocator: Deleting hbase:meta region location in
>> >> >> ZooKeeper
>> >> >> 2016-02-17 15:41:27,500 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> master.AssignmentManager: Setting node as OFFLINED in ZooKeeper for
>> >> >> region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY =>
>> >> >> '', ENDKEY => ''}
>> >> >> 2016-02-17 15:41:27,502 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
>> >> >> master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to
>> >> >> host7517.mydomain,16020,1455618296942
>> >> >> 2016-02-17 15:41:27,502 INFO
>> >> >> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates:
>> >> >> Transition {1588230740 state=OFFLINE, ts=1455691287500,
>> >> >> server=host7516.mydomain,16020,1455618299902} to {1588230740
>> >> >> state=PENDING_OPEN, ts=1455691287502,
>> >> >> server=host7517.mydomain,16020,1455618296942}
>> >> >> 2016-02-17 15:41:27,512 INFO  [AM.ZK.Worker-pool2-t62]
>> >> >> master.RegionStates: Transition {1588230740 state=PENDING_OPEN,
>> >> >> ts=1455691287502, server=host7517.mydomain,16020,1455618296942} to
>> >> >> {1588230740 state=OPENING, ts=1455691287512,
>> >> >> server=host7517.mydomain,16020,1455618296942}
>> >> >> 2016-02-17 15:41:27,598 INFO  [AM.ZK.Worker-pool2-t63]
>> >> >> master.RegionStates: Transition {1588230740 state=OPENING,
>> >> >> ts=1455691287512, server=host7517.mydomain,16020,1455618296942} to
>> >> >> {1588230740 state=OPEN, ts=1455691287598,
>> >> >> server=host7517.mydomain,16020,1455618296942}
>> >> >> 2016-02-17 15:41:27,598 INFO  [AM.ZK.Worker-pool2-t63]
>> >> >> coordination.ZkOpenRegionCoordination: Handling OPENED of 1588230740
>> >> >> from host7512.mydomain,16000,1455618257882; deleting unassigned node
>> >> >> 2016-02-17 15:42:57,601 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4]
>> >> >> handler.ServerShutdownHandler: Received exception accessing
>> hbase:meta
>> >> >> during server shutdown of host7516.mydomain,16020,1455618299902,
>> >> >> retrying hbase:meta read
>> >> >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
>> after
>> >> >> attempts=351, exceptions:
>> >> >> Wed Feb 17 15:42:57 KST 2016, null, java.net.SocketTimeoutException:
>> >> >> callTimeout=60000, callDuration=90304: row '' on table 'hbase:meta'
>> at
>> >> >> region=hbase:meta,,1.1588230740,
>> >> >> hostname=host7516.mydomain,16020,1455618299902, seqNum=0
>> >> >>
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:156)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.MetaTableAccessor.getServerUserRegions(MetaTableAccessor.java:555)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:177)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >> Caused by: java.net.SocketTimeoutException: callTimeout=60000,
>> >> >> callDuration=90304: row '' on table 'hbase:meta' at
>> >> >> region=hbase:meta,,1.1588230740,
>> >> >> hostname=host7516.mydomain,16020,1455618299902, seqNum=0
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
>> >> >>         ... 3 more
>> >> >> Caused by: java.io.IOException: Call to
>> >> >> HOST7516.mydomain/192.168.184.73:16020 failed on local exception:
>> >> >> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1388,
>> >> >> waitTime=90001, operationTimeout=90000 expired.
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:199)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
>> >> >>         ... 4 more
>> >> >> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
>> >> >> id=1388, waitTime=90001, operationTimeout=90000 expired.
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204)
>> >> >>         ... 14 more
>> >> >> ...(Same error happens again and again)
>> >> >> 2016-02-17 15:56:55,625 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4]
>> >> >> handler.ServerShutdownHandler: Splitting logs for
>> >> >> host7516.mydomain,16020,1455618299902 before assignment; region
>> >> >> count=73
>> >> >> 2016-02-17 15:56:55,627 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
>> >> >> dead splitlog workers [host7516.mydomain,16020,1455618299902]
>> >> >> 2016-02-17 15:56:55,628 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
>> >> >> started splitting 1 logs in
>> >> >>
>> >> >>
>> >>
>> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting]
>> >> >> for [host7516.line.
>> >> >> ism,16020,1455618299902]
>> >> >> 2016-02-17 15:56:55,634 INFO  [main-EventThread]
>> >> >> coordination.SplitLogManagerCoordination: task
>> >> >>
>> >> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.defa
>> >> >> ult.1455690304230 acquired by host7517.mydomain,16020,1455618296942
>> >> >> 2016-02-17 15:56:56,266 INFO
>> >> >>
>> [host7512.mydomain,16000,1455618257882_splitLogManager__ChoreService_1]
>> >> >> master.SplitLogManager: total tasks = 1 unassigned = 0
>> >> >>
>> >>
>> tasks={/hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C14556182
>> >> >>
>> >> >>
>> >>
>> 99902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230=last_update
>> >> >> = 1455692215679 last_version = 2 cur_worker_name =
>> >> >> host7517.mydomain,16020,1455618296942 status = in_progress
>> incarnation
>> >> >> = 0 resubmit
>> >> >> s = 0 batch = installed = 1 done = 0 error = 0}
>> >> >> 2016-02-17 15:57:01,336 INFO  [main-EventThread]
>> >> >> coordination.SplitLogManagerCoordination: task
>> >> >>
>> >> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.defa
>> >> >> ult.1455690304230 entered state: DONE
>> >> host7517.mydomain,16020,1455618296942
>> >> >> 2016-02-17 15:57:01,343 INFO  [main-EventThread] wal.WALSplitter:
>> >> >> Archived processed log
>> >> >>
>> >> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.14556903042
>> >> >> 30 to
>> >> >>
>> >>
>> hdfs://hdpts/apps/hbase/data/oldWALs/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
>> >> >> 2016-02-17 15:57:01,343 INFO  [main-EventThread]
>> >> >> coordination.SplitLogManagerCoordination: Done splitting
>> >> >>
>> >> >>
>> >>
>> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C14556182
>> >> >> 99902.default.1455690304230
>> >> >> 2016-02-17 15:57:01,346 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
>> >> >> finished splitting (more than or equal to) 83 bytes in 1 log files in
>> >> >> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,14556
>> >> >> 18299902-splitting] in 5718ms
>> >> >> ...
>> >> >> 2016-02-17 15:57:01,353 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.AssignmentManager:
>> >> >> Bulk assigning 73 region(s) across 3 server(s), round-robin=true
>> >> >> 2016-02-17 15:57:01,353 INFO
>> >> >> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-0]
>> >> >> master.AssignmentManager: Assigning 24 region(s) to
>> >> >> host7515.mydomain,16020,1455691124033
>> >> >> 2016-02-17 15:57:01,353 INFO
>> >> >> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-1]
>> >> >> master.AssignmentManager: Assigning 24 region(s) to
>> >> >> host7517.mydomain,16020,1455618296942
>> >> >> 2016-02-17 15:57:01,353 INFO
>> >> >> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-2]
>> >> >> master.AssignmentManager: Assigning 25 region(s) to
>> >> >> host7518.mydomain,16020,1455618298475
>> >> >> ...
>> >> >> 2016-02-17 15:57:11,005 INFO  [AM.ZK.Worker-pool2-t71]
>> >> >> master.RegionStates: Offlined 50cc594cd9e74287ddd5b960bfa2d8e9 from
>> >> >> host7516.mydomain,16020,1455618299902
>> >> >> 2016-02-17 15:57:11,005 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.AssignmentManager:
>> >> >> Bulk assigning done
>> >> >> 2016-02-17 15:57:11,005 INFO
>> >> >> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4]
>> >> >> handler.ServerShutdownHandler: Finished processing of shutdown of
>> >> >> host7516.mydomain,16020,1455618299902
>> >> >> ----
>> >> >>
>> >> >> ---- NameNode log ---
>> >> >> 2016-02-17 15:41:16,008 INFO  namenode.FSEditLog
>> >> >> (FSEditLog.java:printStatistics(699)) - Number of transactions: 45
>> >> >> Total time for transactions(ms): 4 Number of transactions batched in
>> >> >> Syncs: 0 Number of syncs: 42 SyncTimes(ms): 35 11
>> >> >> 2016-02-17 15:41:16,383 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:recoverLeaseInternal(2907)) - recoverLease:
>> [Lease.
>> >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 2],
>> >> >>
>> >> >>
>> >>
>> src=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> >> from client DFSClient_NONMAPREDUCE_1038296970_1
>> >> >> 2016-02-17 15:41:16,383 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:internalReleaseLease(4036)) - Recovering [Lease.
>> >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 2],
>> >> >>
>> >> >>
>> >>
>> src=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> >> 2016-02-17 15:41:16,383 WARN  hdfs.StateChange
>> >> >> (FSNamesystem.java:internalReleaseLease(4157)) - DIR*
>> >> >> NameSystem.internalReleaseLease: File
>> >> >>
>> >> >>
>> >>
>> /apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
>> >> >> has not been closed. Lease recovery is in progress. RecoveryId =
>> 13180
>> >> >> for block blk_1073753851_13131{UCState=UNDER_RECOVERY,
>> >> >> truncateBlock=null, primaryNodeIndex=2,
>> >> >>
>> >> >>
>> >>
>> replicas=[ReplicaUC[[DISK]DS-9787b201-fc64-450e-a20f-dcc79fb94b6f:NORMAL:
>> >> 192.168.184.73:50010
>> >> >> |RBW],
>> >> >>
>> >> >> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL:
>> >> 192.168.185.249:50010
>> >> >> |RBW],
>> >> >>
>> >> >> ReplicaUC[[DISK]DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28:NORMAL:
>> >> 192.168.184.85:50010
>> >> >> |RBW]]}
>> >> >> 2016-02-17 15:41:18,588 INFO  BlockStateChange
>> >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
>> >> >> chooseUnderReplicatedBlocks selected 8 blocks at priority level 2;
>> >> >> Total=8 Reset bookmarks? false
>> >> >> 2016-02-17 15:41:18,588 INFO  BlockStateChange
>> >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
>> >> >> neededReplications = 75, pendingReplications = 0.
>> >> >> 2016-02-17 15:41:18,588 INFO  blockmanagement.BlockManager
>> >> >> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks
>> >> >> chosen but could not be replicated = 8; of which 8 have no target, 0
>> >> >> have no source, 0 are UC, 0 are abandoned, 0 already have enough
>> >> >> replicas.
>> >> >> 2016-02-17 15:41:18,960 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,961 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,962 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,963 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,964 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,966 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,967 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,967 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:18,969 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:19,096 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:19,098 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 2016-02-17 15:41:19,099 WARN  security.UserGroupInformation
>> >> >> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
>> >> >> for user anonymous
>> >> >> 016-02-17 15:41:19,307 INFO  blockmanagement.DatanodeManager
>> >> >> (DatanodeManager.java:handleHeartbeat(1426)) - Skipped stale nodes
>> for
>> >> >> recovery : 1
>> >> >> 2016-02-17 15:41:19,694 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:commitBlockSynchronization(4269)) -
>> >> >>
>> >> >>
>> >>
>> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753851_13131,
>> >> >> newgenerationstamp=13180, newl
>> >> >> ength=60238, newtargets=[192.168.185.249:50010, 192.168.184.85:50010
>> ],
>> >> >> closeFile=true, deleteBlock=false)
>> >> >> 2016-02-17 15:41:19,696 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:commitBlockSynchronization(4434)) -
>> >> >>
>> >> >>
>> >>
>> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753851_13131,
>> >> >> file=/apps/hbase/data/WALs/cch
>> >> >>
>> >> >>
>> >>
>> dpts7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta,
>> >> >> newgenerationstamp=13180, newlength=60238,
>> >> >> newtargets=[192.168.185.249:50010, 192.168.184.85:50010]) successful
>> >> >> 2016-02-17 15:41:20,459 INFO  hdfs.StateChange
>> >> >> (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate
>> >> >> blk_1073753897_13181{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
>> >> >> primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-f8c4c3b2-
>> >> >> a7cc-40e0-9ebb-2b9c352c0d28:NORMAL:192.168.184.85:50010|RBW],
>> >> >>
>> >> >> ReplicaUC[[DISK]DS-882dca14-8651-4ffe-a102-8beebc774575:NORMAL:
>> >> 192.168.189.208:50010
>> >> >> |RBW],
>> >> >>
>> >> >> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL:
>> >> 192.168.185.249:50010
>> >> >> |RBW
>> >> >> ]]} for
>> >> >>
>> >>
>> /apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp
>> >> >> 2016-02-17 15:41:20,474 INFO  hdfs.StateChange
>> >> >> (FSNamesystem.java:completeFile(3503)) - DIR* completeFile:
>> >> >>
>> >> >>
>> >>
>> /apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp
>> >> >> is closed by DFSClient_NONMAPREDUCE_443617401_
>> >> >> 1
>> >> >> 2016-02-17 15:41:20,488 INFO  ipc.Server (Server.java:run(2172)) -
>> IPC
>> >> >> Server handler 8 on 8020, call
>> >> >> org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from
>> >> >> 192.168.189.219:51393 Call#8040 Retry#0
>> >> >> org.apache.hadoop.fs.PathIsNotEmptyDirectoryException:
>> >> >>
>> `/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting
>> >> >> is non empty': Directory is not empty
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:89)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3723)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1017)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:611)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>> >> >>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>> >> >>         at
>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>> >> >>         at
>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>> >> >>         at java.security.AccessController.doPrivileged(Native Method)
>> >> >>         at javax.security.auth.Subject.doAs(Subject.java:422)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> >> >>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>> >> >> 2016-02-17 15:41:21,588 INFO  BlockStateChange
>> >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
>> >> >> chooseUnderReplicatedBlocks selected 8 blocks at priority level 2;
>> >> >> Total=8 Reset bookmarks? false
>> >> >> 2016-02-17 15:41:21,588 INFO  BlockStateChange
>> >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
>> >> >> neededReplications = 76, pendingReplications = 0.
>> >> >> ...
>> >> >> 2016-02-17 15:56:15,384 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:rollEditLog(5937)) - Roll Edit Log from
>> >> >> 192.168.184.72
>> >> >> 2016-02-17 15:56:15,384 INFO  namenode.FSEditLog
>> >> >> (FSEditLog.java:rollEditLog(1202)) - Rolling edit logs
>> >> >> 2016-02-17 15:56:15,384 INFO  namenode.FSEditLog
>> >> >> (FSEditLog.java:endCurrentLogSegment(1258)) - Ending log segment
>> >> >> 369158
>> >> >> 2016-02-17 15:56:15,395 INFO  namenode.FSEditLog
>> >> >> (FSEditLog.java:printStatistics(699)) - Number of transactions: 14
>> >> >> Total time for transactions(ms): 5 Number of transactions batched in
>> >> >> Syncs: 0 Number of syncs: 14 SyncTimes(ms): 26 6
>> >> >> 2016-02-17 15:56:15,421 INFO  namenode.FileJournalManager
>> >> >> (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits
>> >> >> file
>> /hadoop/hdfs/namenode/current/edits_inprogress_0000000000000369158
>> >> >> -> /hadoop/hdfs/namenode/current/edits_
>> >> >> 0000000000000369158-0000000000000369171
>> >> >> 2016-02-17 15:56:15,421 INFO  namenode.FSEditLog
>> >> >> (FSEditLog.java:startLogSegment(1218)) - Starting log segment at
>> >> >> 369172
>> >> >> 2016-02-17 15:56:15,764 INFO  BlockStateChange
>> >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
>> >> >> chooseUnderReplicatedBlocks selected 3 blocks at priority level 1; 3
>> >> >> blocks at priority level 2;  Total=6 Reset bookmarks? fals
>> >> >> e
>> >> >> 2016-02-17 15:56:15,764 INFO  BlockStateChange
>> >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
>> >> >> neededReplications = 91, pendingReplications = 4.
>> >> >> 2016-02-17 15:56:15,764 INFO  blockmanagement.BlockManager
>> >> >> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks
>> >> >> chosen but could not be replicated = 3; of which 3 have no target, 0
>> >> >> have no source, 0 are UC, 0 are abandoned,
>> >> >> 0 already have enough replicas.
>> >> >> 2016-02-17 15:56:18,764 INFO  BlockStateChange
>> >> >> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
>> >> >> chooseUnderReplicatedBlocks selected 6 blocks at priority level 2;
>> >> >> Total=6 Reset bookmarks? false
>> >> >> 2016-02-17 15:56:18,765 INFO  BlockStateChange
>> >> >> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
>> >> >> neededReplications = 87, pendingReplications = 5.
>> >> >> 2016-02-17 15:56:18,765 INFO  blockmanagement.BlockManager
>> >> >> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks
>> >> >> chosen but could not be replicated = 2; of which 0 have no target, 2
>> >> >> have no source, 0 are UC, 0 are abandoned,
>> >> >> 0 already have enough replicas.
>> >> >> ...
>> >> >> 2016-02-17 15:56:55,688 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:recoverLeaseInternal(2907)) - recoverLease:
>> [Lease.
>> >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 1],
>> >> >> src=/apps/hbase/data/WALs/host7516.mydomain,16
>> >> >>
>> >> >>
>> >>
>> 020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
>> >> >> from client DFSClient_NONMAPREDUCE_1038296970_1
>> >> >> 2016-02-17 15:56:55,688 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:internalReleaseLease(4036)) - Recovering [Lease.
>> >> >> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 1],
>> >> >> src=/apps/hbase/data/WALs/host7516.mydomain,16020
>> >> >>
>> >> >>
>> >>
>> ,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
>> >> >> 2016-02-17 15:56:55,688 WARN  hdfs.StateChange
>> >> >> (FSNamesystem.java:internalReleaseLease(4157)) - DIR*
>> >> >> NameSystem.internalReleaseLease: File
>> >> >>
>> >> >>
>> >>
>> /apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C160
>> >> >> 20%2C1455618299902.default.1455690304230 has not been closed. Lease
>> >> >> recovery is in progress. RecoveryId = 13183 for block
>> >> >> blk_1073753849_13127{UCState=UNDER_RECOVERY, truncateBlock=null,
>> >> >> primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-97
>> >> >> 87b201-fc64-450e-a20f-dcc79fb94b6f:NORMAL:192.168.184.73:50010|RBW],
>> >> >>
>> >> >> ReplicaUC[[DISK]DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28:NORMAL:
>> >> 192.168.184.85:50010
>> >> >> |RBW],
>> >> >> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL:
>> >> >> 192.168.185.249:500
>> >> >> 10|RBW]]}
>> >> >> 2016-02-17 15:56:58,687 INFO  namenode.FSNamesystem
>> >> >> (FSNamesystem.java:commitBlockSynchronization(4434)) -
>> >> >>
>> >> >>
>> >>
>> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753849_13127,
>> >> >>
>> >> >>
>> >>
>> file=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230,
>> >> >> newgenerationstamp=13183, newlength=118884436,
>> >> >> newtargets=[192.168.184.85:50010, 192.168.185.249:50010]) successful
>> >> >>
>> >> >> ----
>> >> >>
>> >> >> Thanks,
>> >> >> Hironori Ogibayashi
>> >> >>
>> >>
>> >>
>>
>>

Re: HBase Cluster not responding after shutting down one of slave nodes

Reply via email to