Below is the exception in the RS related to the replication Exception in one cluster where active writing takes place
2021-04-26 13:12:53,917 INFO [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session > 2021-04-26 13:12:53,941 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x1573dd4199c10011, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-26 13:12:54,608 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.212.XXX/10.XX.212.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-26 13:12:54,631 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session > 2021-04-26 13:12:54,655 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x1573dd4199c10012, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-26 13:12:54,848 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.232.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.232.XXX/10.XX.232.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-26 13:12:54,869 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.232.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.232.XXX/10.XX.232.XXX:2171, initiating session > 2021-04-26 13:12:55,018 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.23.XXX/10.XX.23.XXX:2171. Will not attempt to authenticate using SASL > (unknown error) > 2021-04-26 13:12:55,041 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.23.XXX/10.XX.23.XXX:2171, initiating session > 2021-04-26 13:12:55,064 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x1573dd4199c10011, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-26 13:12:55,112 WARN > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.232.XXX:2171)] > zookeeper.ClientCnxn: Session 0x1573dd4199c10012 for server > 10.XX.232.XXX/10.XX.232.XXX:2171, unexpected error, closing socket > connection and attempting reconnect > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) > 2021-04-26 13:12:55,918 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.23.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.23.XXX/10.XX.23.XXX:2171. Will not attempt to authenticate using SASL > (unknown error) > 2021-04-26 13:12:55,941 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.23.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.23.XXX/10.XX.23.XXX:2171, initiating session > 2021-04-26 13:12:55,964 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.23.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x1573dd4199c10012, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-26 13:12:56,047 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.232.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.232.XXX/10.XX.232.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-26 13:12:56,068 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.232.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.232.XXX/10.XX.232.XXX:2171, initiating session > 2021-04-26 13:12:56,090 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.232.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x1573dd4199c10011, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-26 13:12:57,250 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.212.XXX/10.XX.212.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-26 13:12:57,273 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session > 2021-04-26 13:12:57,297 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x1573dd4199c10011, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-26 13:12:57,853 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.212.XXX/10.XX.212.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-26 13:12:57,877 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session > 2021-04-26 13:12:57,900 INFO > [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x1573dd4199c10012, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-26 13:12:57,933 INFO > [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.23.XXX/10.XX.23.XXX:2171. Will not attempt to authenticate using SASL > (unknown error) Below is the exception in another cluster where currently no active writes in Hbase and but we enabled replication there too 2021-04-28 10:30:06,723 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.239.XXX/10.XX.239.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-28 10:30:06,724 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.239.XXX/10.XX.239.XXX:2171, initiating session > 2021-04-28 10:30:06,725 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Session establishment complete on server > 10.XX.239.XXX/10.XX.239.XXX:2171, sessionid = 0x15791182d1b10003, > negotiated timeout = 40000 > 2021-04-28 10:30:07,944 WARN [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Session 0x15791182d1b10003 for server > 10.XX.239.XXX/10.XX.239.XXX:2171, unexpected error, closing socket > connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) > 2021-04-28 10:30:08,185 INFO [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.29.XXX/10.XX.29.XXX:2171. Will not attempt to authenticate using SASL > (unknown error) > 2021-04-28 10:30:08,186 INFO [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.29.XXX/10.XX.29.XXX:2171, initiating session > 2021-04-28 10:30:08,187 INFO [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Session establishment complete on server > 10.XX.29.XXX/10.XX.29.XXX:2171, sessionid = 0x15791182d1b10003, negotiated > timeout = 40000 > 2021-04-28 10:30:12,074 WARN [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Session 0x15791182d1b10003 for server > 10.XX.29.XXX/10.XX.29.XXX:2171, unexpected error, closing socket connection > and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) > 2021-04-28 10:30:12,751 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.242.XXX/10.XX.242.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-28 10:30:12,752 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.242.XXX/10.XX.242.XXX:2171, initiating session > 2021-04-28 10:30:12,754 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Session establishment complete on server > 10.XX.242.XXX/10.XX.242.XXX:2171, sessionid = 0x15791182d1b10003, > negotiated timeout = 40000 > 2021-04-28 10:30:20,202 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x15791182d1b10003, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-28 10:30:20,303 ERROR [ReplicationExecutor-0] > zookeeper.RecoverableZooKeeper: ZooKeeper multi failed after 4 attempts > 2021-04-28 10:30:20,303 WARN [ReplicationExecutor-0] > replication.ReplicationQueuesZKImpl: Got exception in > copyQueuesFromRSUsingMulti: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:992) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:672) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1685) > at > org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.moveQueueUsingMulti(ReplicationQueuesZKImpl.java:437) > at > org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.claimQueue(ReplicationQueuesZKImpl.java:257) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:697) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2021-04-28 10:30:21,295 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.239.XXX/10.XX.239.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-28 10:30:21,295 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.239.XXX/10.XX.239.XXX:2171, initiating session > 2021-04-28 10:30:21,296 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Session establishment complete on server > 10.XX.239.XXX/10.XX.239.XXX:2171, sessionid = 0x15791182d1b10003, > negotiated timeout = 40000 > 2021-04-28 10:30:35,305 INFO [ReplicationExecutor-0] > replication.ReplicationQueuesZKImpl: Atomically moving > 10.XX.22.XXX,16020,1597146947282/1's WALs to my queue > 2021-04-28 10:30:37,260 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x15791182d1b10003, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-28 10:30:37,805 INFO [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.29.XXX/10.XX.29.XXX:2171. Will not attempt to authenticate using SASL > (unknown error) > 2021-04-28 10:30:37,806 INFO [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.29.XXX/10.XX.29.XXX:2171, initiating session > 2021-04-28 10:30:37,808 INFO [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Session establishment complete on server > 10.XX.29.XXX/10.XX.29.XXX:2171, sessionid = 0x15791182d1b10003, negotiated > timeout = 40000 > 2021-04-28 10:30:38,390 INFO [main-SendThread(10.XX.29.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x15791182d1b10003, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-28 10:30:38,731 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.242.XXX/10.XX.242.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-28 10:30:38,732 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.242.XXX/10.XX.242.XXX:2171, initiating session > 2021-04-28 10:30:38,734 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Session establishment complete on server > 10.XX.242.XXX/10.XX.242.XXX:2171, sessionid = 0x15791182d1b10003, > negotiated timeout = 40000 > 2021-04-28 10:30:40,522 INFO [main-SendThread(10.XX.242.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x15791182d1b10003, likely server has closed socket, closing socket > connection and attempting reconnect > 2021-04-28 10:30:40,765 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Opening socket connection to server > 10.XX.239.XXX/10.XX.239.XXX:2171. Will not attempt to authenticate using > SASL (unknown error) > 2021-04-28 10:30:40,766 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Socket connection established to > 10.XX.239.XXX/10.XX.239.XXX:2171, initiating session > 2021-04-28 10:30:40,767 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Session establishment complete on server > 10.XX.239.XXX/10.XX.239.XXX:2171, sessionid = 0x15791182d1b10003, > negotiated timeout = 40000 > 2021-04-28 10:30:44,651 INFO [main-SendThread(10.XX.239.XXX:2171)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x15791182d1b10003, likely server has closed socket, closing socket > connection and attempting reconnect Regards, Roshan On Tue, 27 Apr 2021 at 23:48, Rushabh Shah <rushabh.s...@salesforce.com.invalid> wrote: > Hi Roshan, > Are you seeing any replication related exception in your RS logs ? > > > > > On Tue, Apr 27, 2021 at 1:59 PM Roshan <jlks...@gmail.com> wrote: > > > Hi, > > > > In the hbase-1.4.10, I have enabled replication for all tables and > > configured the peer_id. the list_peers provide the below result: > > > > hbase(main):001:0> list_peers > > > PEER_ID CLUSTER_KEY ENDPOINT_CLASSNAME STATE TABLE_CFS BANDWIDTH > > > 1 10.XX.221.XX,10.XX.234.XX,10.XX.212.XX:2171:/hbase nil ENABLED nil 0 > > > 1 row(s) in 0.1430 seconds > > > > > > But the status_replication shows replication lag > > > > hbase(main):002:0> status 'replication' > > > version 1.4.10 > > > 3 live servers > > > 10.XX.232.XX: > > > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > > > TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication > Lag= > > > *1619545264329* > > > SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr > 27 > > > 23:09:23 IST 2021 > > > 10.XX.118.XX: > > > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > > > TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication > Lag= > > > *1619545264663* > > > SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr > 27 > > > 18:53:23 IST 2021 > > > 10.XX.138.XX: > > > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > > > TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication > Lag= > > > *1619545263509* > > > SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr > 27 > > > 10:31:05 IST 2021 > > > > > > > > But all the data are replicated properly to the defined cluster. I have > > checked the table in both clusters. > > > > I have verified using VerifyReplication Mapreduce to check unreplicated > > rows. But there are no rows in the unreplicated one. All are good Rows. > > > > ./hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication 1 > > > tablename > > > > > > > > > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters > > > GOODROWS=45 > > > File Input Format Counters > > > Bytes Read=0 > > > File Output Format Counters > > > Bytes Written=0 > > > > > > Due to this issue, I have Zknodes under replication is growing > > exponentially which causes issues in running ZK cluster which eventually > > affects the Hbase Connection too. Below exception occurs in ZK > > > > *ERROR java.io.IOException: Len error* > > > > Increasing jute.maxbuffer in ZK will not solve the problem as replication > > znode is increasing though the data are replicated properly to the given > > cluster Peer_id. > > > > I have enabled two-way replication between the cluster. It happens in > both > > the cluster. > > > > hbase version - 1.4.10 > > ZK Version - 3.4.10 > > Hadoop version - 2.7.3 > > > > Please help to fix this. > > > > Regards, > > Roshan > > >