Re: HBase Replication problems

Nathaniel Cook Mon, 13 Dec 2010 13:51:46 -0800

Thanks for looking into this with me.

Ok so on the master region servers I am getting the two statements
'Replicating x' and 'Replicated in total: y'


Nothing on the slave cluster.

On Mon, Dec 13, 2010 at 12:28 PM, Jean-Daniel Cryans
<[email protected]> wrote:
> Hi Nathaniel,
>
> Thanks for trying out replication, let's make it work for you.
>
> So on the master-side there's 2 lines that are important to make sure
> that replication works, first it has to say:
>
> Replicating x
>
> Where x is the number of edits it's going to ship, and then
>
> Replicated in total: y
>
> Where y is the total number it replicated. Seeing the second line
> means that replication was successful, at least from the master point
> of view.
>
> On the slave, one node should have:
>
> Total replicated: z
>
> And that z is the number of edits that that region server applied on
> it's cluster. It could be on any region server, since the sink for
> replication is chose at random.
>
> Do you see those? Any exceptions around those logs apart from EOFs?
>
> Thx,
>
> J-D
>
> On Mon, Dec 13, 2010 at 10:52 AM, Nathaniel Cook
> <[email protected]> wrote:
>> Hi,
>>
>> I am trying to setup replication for my HBase clusters. I have two
>> small clusters for testing each with 4 machines. The setup for the two
>> clusters is identical. Each machine runs a DataNode, and
>> HRegionServer. Three of the machines run a ZK peer and one machine
>> runs the HMaster and NameNode. The cluster master machines have
>> hostnames (ds1,ds2 ...) and the slave cluster is (bk1, bk2 ...). I set
>> the replication  scope to 1 for my test table column families and set
>> the hbase.replication property to true for both clusters. Next I ran
>> the add_peer.rb script with the following command on the ds1 machine:
>>
>> hbase org.jruby.Main /usr/lib/hbase/bin/replication/add_peer.rb
>> ds1:2181:/hbase bk1:2181:/hbase
>>
>> After the script finishes ZK for the master cluster has the
>> replication znode and children of peers, master, and state. The slave
>> ZK didn't have a replication znode. I fixed that problem by rerunning
>> the script on the bk1 machine and commenting out the code to write to
>> the master ZK. Now the slave ZK has the /hbase/replication/master
>> znode with data (ds1:2181:/hbase). Everthing looked to be configured
>> correctly. I restarted the clusters. The logs of the master
>> regionservers stated:
>>
>> This cluster (ds1:2181:/hbase) is a master for replication, compared
>> with (ds1:2181:/hbase)
>>
>> The logs on the slave cluster stated:
>>
>> This cluster (bk1:2181:/hbase) is a slave for replication, compared
>> with (ds1:2181:/hbase)
>>
>> Using the hbase shell I put a row into the test table.
>>
>> The regionserver for that table had a log statement like:
>>
>> Going to report log #192.168.1.166%3A60020.1291757445179 for position
>> 15828 in 
>> hdfs://ds1:9000/hbase/.logs/ds1.internal,60020,1291757445059/192.168.1.166
>> <http://192.168.1.166/>%3A60020.1291757445179
>>
>> (192.168.1.166 is ds1)
>>
>> I wait and even after several minutes the row still does not appear in
>> the slave cluster table.
>>
>> Any help with what the problem might be is greatly appreciated.
>>
>> Both clusters are using a CDH3b3. The HBase version is exactly
>> 0.89.20100924+28.
>>
>> -Nathaniel Cook
>>
>



-- 
-Nathaniel Cook

Re: HBase Replication problems

Reply via email to