Also what version of HBase are you running ?

On Wed, May 22, 2013 at 1:38 PM, Varun Sharma <va...@pinterest.com> wrote:

> Basically,
>
> You had va-p-hbase-02 crash - that caused all the replication related data
> in zookeeper to be moved to va-p-hbase-01 and have it take over for
> replicating 02's logs. Now each region server also maintains an in-memory
> state of whats in ZK, it seems like when you start up 01, its trying to
> replicate the 02 logs underneath but its failing to because that data is
> not in ZK. This is somewhat weird...
>
> Can you open the zookeepeer shell and do
>
> ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379
>
> And give the output ?
>
>
> On Wed, May 22, 2013 at 1:27 PM, amit.mor.m...@gmail.com <
> amit.mor.m...@gmail.com> wrote:
>
>> Hi,
>>
>> This is bad ... and happened twice: I had my replication-slave cluster
>> offlined. I performed quite a massive Merge operation on it and after a
>> couple of hours it had finished and I returned it back online. At the same
>> time, the replication-master RS machines crashed (see first crash
>> http://pastebin.com/1msNZ2tH) with the first exception being:
>>
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>> NoNode for
>>
>> /hbase/replication/rs/va-p-hbase-01-c,60020,1369233253404/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
>>         at
>>
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:354)
>>         at
>> org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:846)
>>         at
>> org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:898)
>>         at
>> org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:892)
>>         at
>>
>> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
>>         at
>>
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
>>         at
>>
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:638)
>>         at
>>
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:387)
>>
>> Before restarting the crashed RS's, I have applied a 'stop_replication'
>> cmd. Then fired up the RS's again. They've started o.k. but once I've hit
>> 'start_replication' they have crashed once again. The second crash log
>> http://pastebin.com/8Nb5epJJ has the same initial exception
>> (org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode). I've started the crash region servers again
>> without replication and currently all is well, but I need to start
>> replication asap.
>>
>> Does anyone have an idea what's going on and how can I solve it ?
>>
>> Thanks,
>> Amit
>>
>
>

Reply via email to