> Do you see in the master log something similar to the following ?
> 
> master.HMaster: Not running balancer because 1 region(s) in transition
yes, I have several of them, but all of them were 3 days ago.

I check the ‘ritCount’ metric, and it is 0, also I checked the 
/hbase/region-in-transition znode, which is also empty. 
But I can’t start balancer manually.

I took snapshot of tables each our. 
I’ve checked the path /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot 
under in zookeeper, and there 
are ~4000 applications. It looks that all of them are create snapshot 
operations. Also I’ve observed that the CPU 
usage of the master is much higher that it was in the past. 
Is it possible that all of this applications are causing the problem?

Can I delete all of this applications?


> On 06 Jul 2015, at 18:45, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> Do you see in the master log something similar to the following ?
> 
> master.HMaster: Not running balancer because 1 region(s) in transition
> 
> You can search backwards for balancer / assignment related logs.
> 
> Cheers
> 
> On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <akmal.abba...@icloud.com>
> wrote:
> 
>>> What error(s) did you get when trying to restart the region server ? Have
>>> you checked its log files ?
>> it was a VM, and I was not able to access it any more, I can’t login to
>> it. Restarting several times didn’t helped.
>> 
>> 
>>> Can you check master log around this time ? If there was region in
>>> transition, balancer wouldn't balance.
>> I have a lot of this
>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>> util.FSVisitor: No logs under
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs
>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>> util.FSVisitor: No logs under
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs
>> 2015-07-06 15:15:39,921 INFO  [snapshot-log-cleaner-cache-refresher]
>> util.FSVisitor: No logs under
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs
>> 2015-07-06 15:15:39,925 INFO  [snapshot-log-cleaner-cache-refresher]
>> util.FSVisitor: No logs under
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs
>> 2015-07-06 15:15:39,926 INFO  [snapshot-log-cleaner-cache-refresher]
>> util.FSVisitor: No logs under
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs
>> 2015-07-06 15:15:39,927 INFO  [snapshot-log-cleaner-cache-refresher]
>> util.FSVisitor: No logs under
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs
>> 2015-07-06 15:15:39,928 INFO  [snapshot-log-cleaner-cache-refresher]
>> util.FSVisitor: No logs under
>> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs
>> 2015-07-06 15:15:47,324 INFO  [FifoRpcScheduler.handler1-thread-18]
>> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false
>> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>> hbase-rs1%2C60020%2C1436189457794.1436190023718
>> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>> hbase-rs1%2C60020%2C1436189457794.1436193624562
>> 2015-07-06 15:32:49,382 INFO  [FifoRpcScheduler.handler1-thread-14]
>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>> 2015-07-06 15:32:56,936 INFO  [FifoRpcScheduler.handler1-thread-1]
>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>> 
>> Thank you.
>> 
>>> On 06 Jul 2015, at 17:37, Ted Yu <yuzhih...@gmail.com> wrote:
>>> 
>>> bq. I had to delete and recreate it
>>> 
>>> What error(s) did you get when trying to restart the region server ? Have
>>> you checked its log files ?
>>> 
>>> bq. start balancer manually, but it returned false
>>> 
>>> Can you check master log around this time ? If there was region in
>>> transition, balancer wouldn't balance.
>>> 
>>> Cheers
>>> 
>>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov <akmal.abba...@icloud.com>
>>> wrote:
>>> 
>>>> Hi all,
>>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2
>> masters.
>>>> One of the rs stopped working, restart didn’t worked, and I had to
>> delete
>>>> and recreate it.
>>>> But when this rs have stopped, the cluster also stopped functioning.
>>>> There were a lot of inconsistencies. When I recreated the rs with disks
>> of
>>>> the previous one, cluster started working.
>>>> But now, only 3 rs host the regions, other 2 have 0 regions.
>>>> I’ve tried to start balancer manually, but it returned false?
>>>> Any idea?
>>>> 
>>>> I am using hbase hbase-0.98.7-hadoop2.
>>>> Thank you.
>>>> 
>>>> Kind regards,
>>>> Akmal Abbasov
>>>> 
>>>> 
>> 
>> 

Reply via email to