Replication from master cluster will do retry of the failed one -Anoop-
On Sat, Dec 12, 2015 at 6:05 AM, Abraham Tom <[email protected]> wrote: > I have 2 clusters ( 1 master and 1 slave) on CDH 5.4 hbase 1.0 > replication is working 95% of the time > but I do get the following WARN which I consider an error > > > Can't replicate because of an error on the remote cluster: > > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException): > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: > Failed 11 actions: NotServingRegionException: 11 times, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207) > at > org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1003) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1017) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:236) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:160) > at > org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:198) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1584) > at > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20880) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > > > > I consider this an error because my slave is missing data that I have in > the master. Is there a setting in hbase to keep trying to send ? > Cloudera management does try to restart and alerts me if the region for > some reason dies. As to why it dies, I am looking and that is a different > problem. but when the slave returns, I have an expectation that the > unconfirmed records would be resent. > > Best practices would be helpful as well > All zookeepers in the slave are listed as peers > > > -- > Abraham Tom > Email: [email protected] > Phone: 415-515-3621 >
