Hi Lars: I'm using hbase 0.92.1 and Hadoop 1.0.1. Yes, you are right. I'm replicating from cluster A to cluster B only with cyclic replication configured. Eventually I will test replicating cluster A to cluster B and vice versa with high intensive write workload but if this replication doesn't work for one way, we need to think about other solutions.
No data loss in cluster A for sure. Best Regards, Jerry Sent from my iPad On 2012-04-20, at 15:34, lars hofhansl <lhofha...@yahoo.com> wrote: > Hi Jerry, > > which version of HBase are you using? > > You are not using cyclic backup, that needs >2 clusters. I assume you're just > replicating from one cluster to another, right? > > There is never data loss in Cluster A? > > -- Lars > > > ----- Original Message ----- > From: Jerry Lam <chiling...@gmail.com> > To: user@hbase.apache.org > Cc: > Sent: Friday, April 20, 2012 5:38 AM > Subject: HBase Cyclic Replication Issue: some data are missing in the > replication for intensive write > > Hi HBase community: > > We have been testing cyclic replication for 1 week. The basic functionality > seems to work as described in the document however when we started to > increase the write workload, the replication starts to miss data (i.e. some > data are not replicated to the other cluster). We have narrowed down to a > scenario that we can reproduce the problem quite consistently and here it is: > > ----------------------------- > Setup: > - We have setup 2 clusters (cluster A and cluster B)with identical size in > terms of number of nodes and configuration, 3 regionservers sit on top of 3 > datanodes. > - Cyclic replication is enabled. > > - We use YCSB to generate load to hbase the workload is very similar to > workloada: > > recordcount=200000 > operationcount=200000 > workload=com.yahoo.ycsb.workloads.CoreWorkload > fieldcount=1 > fieldlength=25000 > > readallfields=true > writeallfields=true > > readproportion=0 > updateproportion=1 > scanproportion=0 > insertproportion=0 > > requestdistribution=uniform > > - Records are inserted into Cluster A. After the benchmark is done and wait > until all data are replicated to Cluster B, we used verifyrep mapreduce job > for validation. > - Data are deleted from both table (truncate 'tablename') before a new > experiment is started. > > Scenario: > when we increase the number of threads until it max out the throughput of the > cluster, we saw some data are missing in Cluster B (total count != 200000) > although cluster A clearly has them all. This happens even though we disabled > region splitting in both clusters (it happens more often when region splits > occur). To further having more control of what is happening, we then decided > to disable the load balancer so the region (which is responsible for the > replicating data) will not relocate to other regionserver during the > benchmark. The situation improves a lot. We don't see any missing data in 5 > continuous runs. Finally, we decided to move the region around from a > regionserver to another regionserver during the benchmark to see if the > problem will reappear and it did. > > We believe that the issue could be related to region splitting and load > balancing during intensive write, the hbase replication strategy hasn't yet > cover those corner cases. > > Can someone take a look of it and suggest some ways to workaround this? > > Thanks~ > > Jerry