Hi everyone! I am writing to this group because recently we are getting some strange errors with our production zookeeper setup.
>From time to time we are observing that our client application (C++ based) disconnects from zookeeper (session state is changed to 1) and reconnects (state changed to 3). This itself is not a problem - usually application continues to run without problems after reconnect. But from time to time after above happens all subsequent operations start to return ZSESSIONMOVED error. To make it work again we have to restart application (which creates new zookeeper session). I noticed that in 3.2.0 introduced a bug http://issues.apache.org/jira/browse/ZOOKEEPER-449 but we are using zookeeper v. 3.2.2. I just noticed that app at compile time used 3.2.0 library but patches fixing bug 449 did not touch C client lib so I believe that our problems are not related with that. In zookeeper logs at moment which initiated the problem with client application I have node1: 2010-03-16 14:21:43,510 - INFO [NIOServerCxn.Factory:2181:nioserverc...@607] - Connected to /10.1.112.61:37197 lastZxid 42992576502 2010-03-16 14:21:43,510 - INFO [NIOServerCxn.Factory:2181:nioserverc...@636] - Renewing session 0x324dcc1ba580085 2010-03-16 14:21:49,443 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:nioserverc...@992] - Finished init of 0x324dcc1ba580085 valid:true 2010-03-16 14:21:49,443 - WARN [NIOServerCxn.Factory:2181:nioserverc...@518] - Exception causing close of session 0x324dcc1ba580085 due to java.io.IOException: Read error 2010-03-16 14:21:49,444 - INFO [NIOServerCxn.Factory:2181:nioserverc...@857] - closing session:0x324dcc1ba580085 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.1.112.62:2181 remote=/10.1.112.61:37197] node2: 2010-03-16 14:21:40,580 - WARN [NIOServerCxn.Factory:2181:nioserverc...@494] - Exception causing close of session 0x324dcc1ba580085 due to java.io.IOException: Read error 2010-03-16 14:21:40,581 - INFO [NIOServerCxn.Factory:2181:nioserverc...@833] - closing session:0x324dcc1ba580085 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.1.112.63:2181 remote=/10.1.112.61:60693] 2010-03-16 14:21:46,839 - INFO [NIOServerCxn.Factory:2181:nioserverc...@583] - Connected to /10.1.112.61:48336 lastZxid 42992576502 2010-03-16 14:21:46,839 - INFO [NIOServerCxn.Factory:2181:nioserverc...@612] - Renewing session 0x324dcc1ba580085 2010-03-16 14:21:49,439 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:nioserverc...@964] - Finished init of 0x324dcc1ba580085 valid:true node3: 2010-03-16 02:14:48,961 - WARN [NIOServerCxn.Factory:2181:nioserverc...@494] - Exception causing close of session 0x324dcc1ba580085 due to java.io.IOException: Read error 2010-03-16 02:14:48,962 - INFO [NIOServerCxn.Factory:2181:nioserverc...@833] - closing session:0x324dcc1ba580085 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.1.112.64:2181 remote=/10.1.112.61:57309] and then lots of entries like this 2010-03-16 02:14:54,696 - WARN [ProcessThread:-1:preprequestproces...@402] - Got exception when processing sessionid:0x324dcc1ba580085 type:create cxid:0x4b9e9e49 zxid:0xfffffffffffffffe txntype:unknown /locks/9871253/lock-8589943989- org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode = Session moved at org.apache.zookeeper.server.SessionTrackerImpl.checkSession(SessionTrackerImpl.java:231) at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:211) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114) 2010-03-16 14:22:06,428 - WARN [ProcessThread:-1:preprequestproces...@402] - Got exception when processing sessionid:0x324dcc1ba580085 type:create cxid:0x4b9f6603 zxid:0xfffffffffffffffe txntype:unknown /locks/1665960/lock-8589961006- org.apache.zookeeper.KeeperException$SessionMovedException: KeeperErrorCode = Session moved at org.apache.zookeeper.server.SessionTrackerImpl.checkSession(SessionTrackerImpl.java:231) at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:211) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114) To workaround disconnections I am going to increase session timeout from 5 to 15 seconds but event if it helps at all it is just a workaround. Do you have an idea where is the source of my problem. Regards, Łukasz Osipiuk -- -- Łukasz Osipiuk mailto:luk...@osipiuk.net