I have tested with Curator 2.0.1 , curator 2.0.2 branch, master. Same result. 
Tested different timeouts and the result is the same. Let me know the branch 
you are using and I can try again.

It looks that the issue is different in my opinion:
I have checked Zookeeper server logs, and this is what I have found after 
starting again the server (Notice that is a server fresh start).


Looks that zk client is trying to reconnect but ZK client last seen TX is zxid 
= 2, and last server zxid =0 because server has been restarted (and TX log is 
lost).

2013-06-14 21:37:46 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:46 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50436
2013-06-14 21:37:46 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50436 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:46 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50436 (no session established for client)
2013-06-14 21:37:46 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:47 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:47 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:47 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50437
2013-06-14 21:37:47 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50437 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:47 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50437 (no session established for client)
2013-06-14 21:37:47 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:49 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:49 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:49 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50438
2013-06-14 21:37:49 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50438 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:49 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50438 (no session established for client)
2013-06-14 21:37:49 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:50 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:50 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:50 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50439
2013-06-14 21:37:50 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50439 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:50 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50439 (no session established for client)
2013-06-14 21:37:50 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:52 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:52 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50440
2013-06-14 21:37:52 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:52 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50440 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:52 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50440 (no session established for client)
2013-06-14 21:37:52 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:54 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:54 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:54 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50441
2013-06-14 21:37:54 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50441 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:54 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50441 (no session established for client)
2013-06-14 21:37:54 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:55 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:55 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50442
2013-06-14 21:37:55 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:55 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50442 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:55 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50442 (no session established for client)
2013-06-14 21:37:55 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:56 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:56 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50443
2013-06-14 21:37:56 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:56 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50443 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:56 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50443 (no session established for client)
2013-06-14 21:37:56 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:37:58 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:37:58 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50444
2013-06-14 21:37:58 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:37:58 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50444 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:37:58 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50444 (no session established for client)
2013-06-14 21:37:58 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:38:00 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:38:00 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50445
2013-06-14 21:38:00 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:38:00 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50445 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:38:00 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50445 (no session established for client)
2013-06-14 21:38:00 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:38:02 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:38:02 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50446
2013-06-14 21:38:02 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:38:02 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50446 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:38:02 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50446 (no session established for client)
2013-06-14 21:38:02 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:38:04 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:38:04 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50447
2013-06-14 21:38:04 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:38:04 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50447 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:38:04 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50447 (no session established for client)
2013-06-14 21:38:04 INFO  ClientCnxn:1085 - Unable to read additional data from 
server sessionid 0x13f4430e2930000, likely server has closed socket, closing 
socket connection and attempting reconnect
2013-06-14 21:38:05 INFO  ClientCnxn:966 - Opening socket connection to server 
localhost.localdomain/127.0.0.1:40000. Will not attempt to authenticate using 
SASL (unknown error)
2013-06-14 21:38:05 INFO  ClientCnxn:849 - Socket connection established to 
localhost.localdomain/127.0.0.1:40000, initiating session
2013-06-14 21:38:05 INFO  NIOServerCnxnFactory:197 - Accepted socket connection 
from /127.0.0.1:50448
2013-06-14 21:38:05 INFO  ZooKeeperServer:812 - Refusing session request for 
client /127.0.0.1:50448 as it has seen zxid 0x2 our last zxid is 0x0 client 
must try another server
2013-06-14 21:38:05 INFO  NIOServerCnxn:1001 - Closed socket connection for 
client /127.0.0.1:50448 (no session established for client)
2013-06

The same result

Timing are not relevant in the tests. The ZK server is providing this logs




________________________________
 De: Jordan Zimmerman <[email protected]>
Para: Evaristo José Camarero <[email protected]> 
CC: "[email protected]" <[email protected]> 
Enviado: Viernes 14 de junio de 2013 21:13
Asunto: Re: CuratorFramework is not recovering connecion with ZK
 


Play with the timings a bit. I consistently see RECONNECTED. Also, make sure 
you're using the latest Curator. There's a bug fix in 2.0.1 that can affect 
this.

-JZ


On Jun 14, 2013, at 12:09 PM, Evaristo José Camarero 
<[email protected]> wrote:

Hi there,
>
>
>Thanks for the fast response.
>
>
>This is the output of the program with your modifications:
>
>
>2013-06-14 21:05:01 INFO  CuratorFrameworkImpl:221 - Starting
>2013-06-14 21:05:01 INFO  ConnectionStateManager:151 - State change: CONNECTED
>NEw state CONNECTED
>2013-06-14 21:05:13 INFO  ConnectionStateManager:151 - State change: SUSPENDED
>NEw state SUSPENDED
>2013-06-14 21:05:16 ERROR CuratorFrameworkImpl:530 - Background operation 
>retry gave up
>org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
>= ConnectionLoss
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>at 
>org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:487)
>at 
>org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50)
>at 
>org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:606)
>at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>2013-06-14 21:05:16 INFO  ConnectionStateManager:151 - State change: LOST
>NEw state LOST
>
>
>
>
>You can notice that:
>"NEW state RECONNECTED never appears"
>
>
>
>what basically means that CuratorFramework listener is not called, and 
>therefore I assume that Curator does not reconnect.
>
>
>
>
>
>
>Something very interesting is the fact that if you remove 
>"zkClient.create().forPath("/pedro", "juan".getBytes());" from the program 
>(basically you are not doing any TX, Curator is able to recover.
>
>
>Regards,
>
>
>Evaristo
>
>
>
>
>
>
>
>________________________________
> De: Jordan Zimmerman <[email protected]>
>Para: [email protected]; Evaristo José Camarero 
><[email protected]> 
>Enviado: Viernes 14 de junio de 2013 19:06
>Asunto: Re: CuratorFramework is not recovering connecion with ZK
> 
>
>
>Curator uses the timeouts given when the CuratorFramework object is created to 
>manage the connection. In this instance, after LOST has been received Curator 
>may wait until session timeout before trying to re-create the ZooKeeper 
>object. I'm enclosing a modified version of your test that sets the Curator 
>timeouts to half the time that the test's sleeps wait and you can see that 
>Curator recovers correctly.
>
>
>-Jordan
>
>
>
>
>
>On Jun 14, 2013, at 7:22 AM, Evaristo José Camarero 
><[email protected]> wrote:
>
>
>>
>>Hi there,
>>
>>
>>I have found a case in which CuratorFramework is not recovering connection 
>>with ZK servers.
>>
>>
>>The use case is the following:
>>- Start ZK servers
>>- Start application with CuratorFramework
>>- ZK servers goes down.
>>- ZK servers start again.
>>- CuratorFramework app it is not notified that connection is reconnected , 
>>but neither notifies that can not recover connection in any way, so 
>>application can not recover. Only option is to restart the app.
>>Notice that if CuratorFRamework client is not making any transaction (just 
>>comment create() TX), the client is able to reconnect.
>>
>>
>>
>>I attach a program able to reproduce the problem
>>
>>
>>Regards,
>>
>>
>>Evaristo
>>
>>
>>
>>
>>
>><CuratorFails.java>
>
>
>

Reply via email to