I think there needs to be some ZK-active methods to get the reconnect. I just 
tried again and didn't always get the reconnect. However, if I add this line 
after the second server start() it always works:

        zkClient.create().forPath("/hey");

-JZ

On Jun 14, 2013, at 12:47 PM, Evaristo José Camarero 
<[email protected]> wrote:

> 
> I have tested with Curator 2.0.1 , curator 2.0.2 branch, master. Same result. 
> Tested different timeouts and the result is the same. Let me know the branch 
> you are using and I can try again.
> 
> It looks that the issue is different in my opinion:
> I have checked Zookeeper server logs, and this is what I have found after 
> starting again the server (Notice that is a server fresh start).
> 
> Looks that zk client is trying to reconnect but ZK client last seen TX is 
> zxid = 2, and last server zxid =0 because server has been restarted (and TX 
> log is lost).
> 
> 2013-06-14 21:37:46 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:46 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50436
> 2013-06-14 21:37:46 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50436 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:46 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50436 (no session established for client)
> 2013-06-14 21:37:46 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:47 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:47 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:47 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50437
> 2013-06-14 21:37:47 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50437 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:47 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50437 (no session established for client)
> 2013-06-14 21:37:47 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:49 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:49 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:49 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50438
> 2013-06-14 21:37:49 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50438 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:49 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50438 (no session established for client)
> 2013-06-14 21:37:49 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:50 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:50 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:50 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50439
> 2013-06-14 21:37:50 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50439 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:50 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50439 (no session established for client)
> 2013-06-14 21:37:50 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:52 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:52 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50440
> 2013-06-14 21:37:52 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:52 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50440 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:52 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50440 (no session established for client)
> 2013-06-14 21:37:52 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:54 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:54 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:54 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50441
> 2013-06-14 21:37:54 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50441 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:54 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50441 (no session established for client)
> 2013-06-14 21:37:54 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:55 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:55 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50442
> 2013-06-14 21:37:55 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:55 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50442 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:55 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50442 (no session established for client)
> 2013-06-14 21:37:55 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:56 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:56 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50443
> 2013-06-14 21:37:56 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:56 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50443 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:56 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50443 (no session established for client)
> 2013-06-14 21:37:56 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:37:58 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:37:58 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50444
> 2013-06-14 21:37:58 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:37:58 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50444 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:37:58 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50444 (no session established for client)
> 2013-06-14 21:37:58 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:38:00 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:38:00 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50445
> 2013-06-14 21:38:00 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:38:00 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50445 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:38:00 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50445 (no session established for client)
> 2013-06-14 21:38:00 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:38:02 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:38:02 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50446
> 2013-06-14 21:38:02 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:38:02 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50446 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:38:02 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50446 (no session established for client)
> 2013-06-14 21:38:02 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:38:04 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:38:04 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50447
> 2013-06-14 21:38:04 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:38:04 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50447 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:38:04 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50447 (no session established for client)
> 2013-06-14 21:38:04 INFO  ClientCnxn:1085 - Unable to read additional data 
> from server sessionid 0x13f4430e2930000, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 2013-06-14 21:38:05 INFO  ClientCnxn:966 - Opening socket connection to 
> server localhost.localdomain/127.0.0.1:40000. Will not attempt to 
> authenticate using SASL (unknown error)
> 2013-06-14 21:38:05 INFO  ClientCnxn:849 - Socket connection established to 
> localhost.localdomain/127.0.0.1:40000, initiating session
> 2013-06-14 21:38:05 INFO  NIOServerCnxnFactory:197 - Accepted socket 
> connection from /127.0.0.1:50448
> 2013-06-14 21:38:05 INFO  ZooKeeperServer:812 - Refusing session request for 
> client /127.0.0.1:50448 as it has seen zxid 0x2 our last zxid is 0x0 client 
> must try another server
> 2013-06-14 21:38:05 INFO  NIOServerCnxn:1001 - Closed socket connection for 
> client /127.0.0.1:50448 (no session established for client)
> 2013-06
> 
> The same result
> 
> Timing are not relevant in the tests. The ZK server is providing this logs
> 
> 
> 
> De: Jordan Zimmerman <[email protected]>
> Para: Evaristo José Camarero <[email protected]> 
> CC: "[email protected]" <[email protected]> 
> Enviado: Viernes 14 de junio de 2013 21:13
> Asunto: Re: CuratorFramework is not recovering connecion with ZK
> 
> Play with the timings a bit. I consistently see RECONNECTED. Also, make sure 
> you're using the latest Curator. There's a bug fix in 2.0.1 that can affect 
> this.
> 
> -JZ
> 
> On Jun 14, 2013, at 12:09 PM, Evaristo José Camarero 
> <[email protected]> wrote:
> 
>> Hi there,
>> 
>> Thanks for the fast response.
>> 
>> This is the output of the program with your modifications:
>> 
>> 2013-06-14 21:05:01 INFO  CuratorFrameworkImpl:221 - Starting
>> 2013-06-14 21:05:01 INFO  ConnectionStateManager:151 - State change: 
>> CONNECTED
>> NEw state CONNECTED
>> 2013-06-14 21:05:13 INFO  ConnectionStateManager:151 - State change: 
>> SUSPENDED
>> NEw state SUSPENDED
>> 2013-06-14 21:05:16 ERROR CuratorFrameworkImpl:530 - Background operation 
>> retry gave up
>> org.apache.zookeeper.KeeperException$ConnectionLossException: 
>> KeeperErrorCode = ConnectionLoss
>>      at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>      at 
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:487)
>>      at 
>> org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50)
>>      at 
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:606)
>>      at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-06-14 21:05:16 INFO  ConnectionStateManager:151 - State change: LOST
>> NEw state LOST
>> 
>> 
>> You can notice that:
>> "NEW state RECONNECTED never appears"
>> 
>> what basically means that CuratorFramework listener is not called, and 
>> therefore I assume that Curator does not reconnect.
>> 
>> 
>> 
>> Something very interesting is the fact that if you remove 
>> "zkClient.create().forPath("/pedro", "juan".getBytes());" from the program 
>> (basically you are not doing any TX, Curator is able to recover.
>> 
>> Regards,
>> 
>> Evaristo
>> 
>> 
>> 
>> De: Jordan Zimmerman <[email protected]>
>> Para: [email protected]; Evaristo José Camarero 
>> <[email protected]> 
>> Enviado: Viernes 14 de junio de 2013 19:06
>> Asunto: Re: CuratorFramework is not recovering connecion with ZK
>> 
>> Curator uses the timeouts given when the CuratorFramework object is created 
>> to manage the connection. In this instance, after LOST has been received 
>> Curator may wait until session timeout before trying to re-create the 
>> ZooKeeper object. I'm enclosing a modified version of your test that sets 
>> the Curator timeouts to half the time that the test's sleeps wait and you 
>> can see that Curator recovers correctly.
>> 
>> -Jordan
>> 
>> 
>> 
>> On Jun 14, 2013, at 7:22 AM, Evaristo José Camarero 
>> <[email protected]> wrote:
>> 
>>> 
>>> Hi there,
>>> 
>>> I have found a case in which CuratorFramework is not recovering connection 
>>> with ZK servers.
>>> 
>>> The use case is the following:
>>> - Start ZK servers
>>> - Start application with CuratorFramework
>>> - ZK servers goes down.
>>> - ZK servers start again.
>>> - CuratorFramework app it is not notified that connection is reconnected , 
>>> but neither notifies that can not recover connection in any way, so 
>>> application can not recover. Only option is to restart the app.
>>> Notice that if CuratorFRamework client is not making any transaction (just 
>>> comment create() TX), the client is able to reconnect.
>>> 
>>> I attach a program able to reproduce the problem
>>> 
>>> Regards,
>>> 
>>> Evaristo
>>> 
>>> 
>>> 
>>> <CuratorFails.java>
>> 
>> 
>> 
> 
> 
> 

Reply via email to