Thanks for looking into this with me. Ok so on the master region servers I am getting the two statements 'Replicating x' and 'Replicated in total: y'
Nothing on the slave cluster. On Mon, Dec 13, 2010 at 12:28 PM, Jean-Daniel Cryans <[email protected]> wrote: > Hi Nathaniel, > > Thanks for trying out replication, let's make it work for you. > > So on the master-side there's 2 lines that are important to make sure > that replication works, first it has to say: > > Replicating x > > Where x is the number of edits it's going to ship, and then > > Replicated in total: y > > Where y is the total number it replicated. Seeing the second line > means that replication was successful, at least from the master point > of view. > > On the slave, one node should have: > > Total replicated: z > > And that z is the number of edits that that region server applied on > it's cluster. It could be on any region server, since the sink for > replication is chose at random. > > Do you see those? Any exceptions around those logs apart from EOFs? > > Thx, > > J-D > > On Mon, Dec 13, 2010 at 10:52 AM, Nathaniel Cook > <[email protected]> wrote: >> Hi, >> >> I am trying to setup replication for my HBase clusters. I have two >> small clusters for testing each with 4 machines. The setup for the two >> clusters is identical. Each machine runs a DataNode, and >> HRegionServer. Three of the machines run a ZK peer and one machine >> runs the HMaster and NameNode. The cluster master machines have >> hostnames (ds1,ds2 ...) and the slave cluster is (bk1, bk2 ...). I set >> the replication scope to 1 for my test table column families and set >> the hbase.replication property to true for both clusters. Next I ran >> the add_peer.rb script with the following command on the ds1 machine: >> >> hbase org.jruby.Main /usr/lib/hbase/bin/replication/add_peer.rb >> ds1:2181:/hbase bk1:2181:/hbase >> >> After the script finishes ZK for the master cluster has the >> replication znode and children of peers, master, and state. The slave >> ZK didn't have a replication znode. I fixed that problem by rerunning >> the script on the bk1 machine and commenting out the code to write to >> the master ZK. Now the slave ZK has the /hbase/replication/master >> znode with data (ds1:2181:/hbase). Everthing looked to be configured >> correctly. I restarted the clusters. The logs of the master >> regionservers stated: >> >> This cluster (ds1:2181:/hbase) is a master for replication, compared >> with (ds1:2181:/hbase) >> >> The logs on the slave cluster stated: >> >> This cluster (bk1:2181:/hbase) is a slave for replication, compared >> with (ds1:2181:/hbase) >> >> Using the hbase shell I put a row into the test table. >> >> The regionserver for that table had a log statement like: >> >> Going to report log #192.168.1.166%3A60020.1291757445179 for position >> 15828 in >> hdfs://ds1:9000/hbase/.logs/ds1.internal,60020,1291757445059/192.168.1.166 >> <http://192.168.1.166/>%3A60020.1291757445179 >> >> (192.168.1.166 is ds1) >> >> I wait and even after several minutes the row still does not appear in >> the slave cluster table. >> >> Any help with what the problem might be is greatly appreciated. >> >> Both clusters are using a CDH3b3. The HBase version is exactly >> 0.89.20100924+28. >> >> -Nathaniel Cook >> > -- -Nathaniel Cook
