Hi guys,
In our client-server OSGI application we are using ECF Zookeeper-based discovery provider for remote services discovery (based on Zookeeper v.3.3.6). In a standalone mode the plugin opens a dedicated Zookeeper connection from the client to each of the servers. When testing the application resiliency, we noticed that when we restart the server, the connection never gets re-established. In the server logs I found the following: 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from / 10.36.64.250:53022 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG org.apac.zook.serv.NIOServerCnxn - Session establishment request from client /10.36.64.250:53022 client's lastZxid is 0x8 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO org.apac.zook.serv.NIOServerCnxn - Refusing session request for client / 10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client must try another server 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client / 10.36.64.250:53022 (no session established for client) As far as I understood – this is an expected behaviour, since the server (due to restart) cleaned up its DB and reset the transaction id. The problem in this case is that the client session keeps trying re-connecting to this only server, which causes an infinite loop: 2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread( ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Opening socket connection to server ca-rd-mbernard.miranda.com/10.36.64.250:2001 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread( ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Socket connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001, initiating session 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread( ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn - Session establishment request sent on ca-rd-mbernard.miranda.com/10.36.64.250:2001 2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread( ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Unable to read additional data from server sessionid 0x14ce32e178c0002, likely server has closed socket, closing socket connection and attempting reconnect Again, I think this is a correct behaviour in case of several servers. But in our case – it’s always 1. So, I wanted to ask you for a suggestion: what you think we can do in this case to achieve automatic reconnect. I thought, maybe we can close the connection in case of such exception if there is only 1 server instead of retrying? Maybe this enhancement is already done in more recent versions and could be back-ported? Thanks, Yuriy