Below is the exception in the RS related to the replication

Exception in one cluster where active writing takes place

2021-04-26 13:12:53,917 INFO
[regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session
> 2021-04-26 13:12:53,941 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x1573dd4199c10011, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-26 13:12:54,608 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.212.XXX/10.XX.212.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-26 13:12:54,631 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session
> 2021-04-26 13:12:54,655 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x1573dd4199c10012, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-26 13:12:54,848 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.232.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.232.XXX/10.XX.232.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-26 13:12:54,869 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.232.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.232.XXX/10.XX.232.XXX:2171, initiating session
> 2021-04-26 13:12:55,018 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.23.XXX/10.XX.23.XXX:2171. Will not attempt to authenticate using SASL
> (unknown error)
> 2021-04-26 13:12:55,041 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.23.XXX/10.XX.23.XXX:2171, initiating session
> 2021-04-26 13:12:55,064 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x1573dd4199c10011, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-26 13:12:55,112 WARN  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.232.XXX:2171)]
> zookeeper.ClientCnxn: Session 0x1573dd4199c10012 for server
> 10.XX.232.XXX/10.XX.232.XXX:2171, unexpected error, closing socket
> connection and attempting reconnect
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>         at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
>         at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> 2021-04-26 13:12:55,918 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.23.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.23.XXX/10.XX.23.XXX:2171. Will not attempt to authenticate using SASL
> (unknown error)
> 2021-04-26 13:12:55,941 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.23.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.23.XXX/10.XX.23.XXX:2171, initiating session
> 2021-04-26 13:12:55,964 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.23.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x1573dd4199c10012, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-26 13:12:56,047 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.232.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.232.XXX/10.XX.232.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-26 13:12:56,068 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.232.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.232.XXX/10.XX.232.XXX:2171, initiating session
> 2021-04-26 13:12:56,090 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.232.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x1573dd4199c10011, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-26 13:12:57,250 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.212.XXX/10.XX.212.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-26 13:12:57,273 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session
> 2021-04-26 13:12:57,297 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x1573dd4199c10011, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-26 13:12:57,853 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.212.XXX/10.XX.212.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-26 13:12:57,877 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.212.XXX/10.XX.212.XXX:2171, initiating session
> 2021-04-26 13:12:57,900 INFO  
> [regionserver//10.XX.235.XX:16020.replicationSource,1-SendThread(10.XX.212.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x1573dd4199c10012, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-26 13:12:57,933 INFO  
> [regionserver//10.XX.235.XX:16020-SendThread(10.XX.23.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.23.XXX/10.XX.23.XXX:2171. Will not attempt to authenticate using SASL
> (unknown error)



Below is the exception in another cluster where currently no active writes
in Hbase and but we enabled replication there too

2021-04-28 10:30:06,723 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.239.XXX/10.XX.239.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-28 10:30:06,724 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.239.XXX/10.XX.239.XXX:2171, initiating session
> 2021-04-28 10:30:06,725 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Session establishment complete on server
> 10.XX.239.XXX/10.XX.239.XXX:2171, sessionid = 0x15791182d1b10003,
> negotiated timeout = 40000
> 2021-04-28 10:30:07,944 WARN  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Session 0x15791182d1b10003 for server
> 10.XX.239.XXX/10.XX.239.XXX:2171, unexpected error, closing socket
> connection and attempting reconnect
> java.io.IOException: Broken pipe
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
>         at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
>         at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> 2021-04-28 10:30:08,185 INFO  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.29.XXX/10.XX.29.XXX:2171. Will not attempt to authenticate using SASL
> (unknown error)
> 2021-04-28 10:30:08,186 INFO  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.29.XXX/10.XX.29.XXX:2171, initiating session
> 2021-04-28 10:30:08,187 INFO  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Session establishment complete on server
> 10.XX.29.XXX/10.XX.29.XXX:2171, sessionid = 0x15791182d1b10003, negotiated
> timeout = 40000
> 2021-04-28 10:30:12,074 WARN  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Session 0x15791182d1b10003 for server
> 10.XX.29.XXX/10.XX.29.XXX:2171, unexpected error, closing socket connection
> and attempting reconnect
> java.io.IOException: Broken pipe
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
>         at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
>         at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> 2021-04-28 10:30:12,751 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.242.XXX/10.XX.242.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-28 10:30:12,752 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.242.XXX/10.XX.242.XXX:2171, initiating session
> 2021-04-28 10:30:12,754 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Session establishment complete on server
> 10.XX.242.XXX/10.XX.242.XXX:2171, sessionid = 0x15791182d1b10003,
> negotiated timeout = 40000
> 2021-04-28 10:30:20,202 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x15791182d1b10003, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-28 10:30:20,303 ERROR [ReplicationExecutor-0]
> zookeeper.RecoverableZooKeeper: ZooKeeper multi failed after 4 attempts
> 2021-04-28 10:30:20,303 WARN  [ReplicationExecutor-0]
> replication.ReplicationQueuesZKImpl: Got exception in
> copyQueuesFromRSUsingMulti:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at
> org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:992)
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>         at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:672)
>         at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1685)
>         at
> org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.moveQueueUsingMulti(ReplicationQueuesZKImpl.java:437)
>         at
> org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.claimQueue(ReplicationQueuesZKImpl.java:257)
>         at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:697)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2021-04-28 10:30:21,295 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.239.XXX/10.XX.239.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-28 10:30:21,295 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.239.XXX/10.XX.239.XXX:2171, initiating session
> 2021-04-28 10:30:21,296 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Session establishment complete on server
> 10.XX.239.XXX/10.XX.239.XXX:2171, sessionid = 0x15791182d1b10003,
> negotiated timeout = 40000
> 2021-04-28 10:30:35,305 INFO  [ReplicationExecutor-0]
> replication.ReplicationQueuesZKImpl: Atomically moving
> 10.XX.22.XXX,16020,1597146947282/1's WALs to my queue
> 2021-04-28 10:30:37,260 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x15791182d1b10003, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-28 10:30:37,805 INFO  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.29.XXX/10.XX.29.XXX:2171. Will not attempt to authenticate using SASL
> (unknown error)
> 2021-04-28 10:30:37,806 INFO  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.29.XXX/10.XX.29.XXX:2171, initiating session
> 2021-04-28 10:30:37,808 INFO  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Session establishment complete on server
> 10.XX.29.XXX/10.XX.29.XXX:2171, sessionid = 0x15791182d1b10003, negotiated
> timeout = 40000
> 2021-04-28 10:30:38,390 INFO  [main-SendThread(10.XX.29.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x15791182d1b10003, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-28 10:30:38,731 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.242.XXX/10.XX.242.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-28 10:30:38,732 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.242.XXX/10.XX.242.XXX:2171, initiating session
> 2021-04-28 10:30:38,734 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Session establishment complete on server
> 10.XX.242.XXX/10.XX.242.XXX:2171, sessionid = 0x15791182d1b10003,
> negotiated timeout = 40000
> 2021-04-28 10:30:40,522 INFO  [main-SendThread(10.XX.242.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x15791182d1b10003, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2021-04-28 10:30:40,765 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Opening socket connection to server
> 10.XX.239.XXX/10.XX.239.XXX:2171. Will not attempt to authenticate using
> SASL (unknown error)
> 2021-04-28 10:30:40,766 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Socket connection established to
> 10.XX.239.XXX/10.XX.239.XXX:2171, initiating session
> 2021-04-28 10:30:40,767 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Session establishment complete on server
> 10.XX.239.XXX/10.XX.239.XXX:2171, sessionid = 0x15791182d1b10003,
> negotiated timeout = 40000
> 2021-04-28 10:30:44,651 INFO  [main-SendThread(10.XX.239.XXX:2171)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x15791182d1b10003, likely server has closed socket, closing socket
> connection and attempting reconnect


Regards,
Roshan

On Tue, 27 Apr 2021 at 23:48, Rushabh Shah
<rushabh.s...@salesforce.com.invalid> wrote:

> Hi Roshan,
> Are you seeing any replication related exception in your RS logs ?
>
>
>
>
> On Tue, Apr 27, 2021 at 1:59 PM Roshan <jlks...@gmail.com> wrote:
>
> > Hi,
> >
> > In the hbase-1.4.10, I have enabled replication for all tables and
> > configured the peer_id. the list_peers provide the below result:
> >
> > hbase(main):001:0> list_peers
> > >  PEER_ID CLUSTER_KEY ENDPOINT_CLASSNAME STATE TABLE_CFS BANDWIDTH
> > >  1 10.XX.221.XX,10.XX.234.XX,10.XX.212.XX:2171:/hbase nil ENABLED nil 0
> > > 1 row(s) in 0.1430 seconds
> >
> >
> > But the status_replication shows replication lag
> >
> > hbase(main):002:0> status 'replication'
> > > version 1.4.10
> > > 3 live servers
> > >     10.XX.232.XX:
> > >        SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1,
> > > TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication
> Lag=
> > > *1619545264329*
> > >        SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr
> 27
> > > 23:09:23 IST 2021
> > >     10.XX.118.XX:
> > >        SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1,
> > > TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication
> Lag=
> > > *1619545264663*
> > >        SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr
> 27
> > > 18:53:23 IST 2021
> > >     10.XX.138.XX:
> > >        SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1,
> > > TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication
> Lag=
> > > *1619545263509*
> > >        SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr
> 27
> > > 10:31:05 IST 2021
> >
> >
> >
> > But all the data are replicated properly to the defined cluster. I have
> > checked the table in both clusters.
> >
> > I have verified using VerifyReplication Mapreduce to check unreplicated
> > rows. But there are no rows in the unreplicated one. All are good Rows.
> >
> > ./hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication 1
> > > tablename
> >
> >
> >
> >
> org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
> > > GOODROWS=45
> > > File Input Format Counters
> > > Bytes Read=0
> > > File Output Format Counters
> > > Bytes Written=0
> >
> >
> > Due to this issue, I have Zknodes under replication is growing
> > exponentially which causes issues in running ZK cluster which eventually
> > affects the Hbase Connection too. Below exception occurs in ZK
> >
> > *ERROR java.io.IOException: Len error*
> >
> > Increasing jute.maxbuffer in ZK will not solve the problem as replication
> > znode is increasing though the data are replicated properly to the given
> > cluster Peer_id.
> >
> > I have enabled two-way replication between the cluster. It happens in
> both
> > the cluster.
> >
> > hbase version - 1.4.10
> > ZK Version -  3.4.10
> > Hadoop version - 2.7.3
> >
> > Please help to fix this.
> >
> > Regards,
> > Roshan
> >
>

Reply via email to