fwiw: It appears that this happens when the leader happens to be the node shutdown (node 3 below). If the leader is chosen to be nodes 1 or 2 everything works fine. Also, this only happens if node 3 is brought down cleanly. If the node is killed (kill -9) everything works as expected.
- will On Tue, Jun 14, 2011 at 11:20 AM, Will Johnson <[email protected]>wrote: > I’m trying to embed zookeeper inside my application and having trouble > testing a simple failure scenario. My test goes as follows: > > > > Start up a 3 node cluster > > Create a client against all 3 nodes > > Add data A > > Stop node 3 > > Add data B > > Start the previously stopped node 3 > > Create a new client against node 3 only > > Make sure data A + B are present. > > > > Most of the time this works as expected but every once in a while it hangs > on multiple platforms (win64 and lin64) on the call to ‘add data B’. My > client automatically handles connection loss events and will try to > automatically replay the message until it succeeds however it is never able > to. From reading the doc in the ops guide it seems to say that 3 nodes will > work “Thus, a deployment that consists of three machines can handle one > failure.” Am I missing something? Do I need more than 3 nodes? Logs and > stack traces are the logic for create: > > > > 11/06/14 10:52:52 INFO ZooKeeper.DEBUGLOG: shutdown 2 > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 1, > my id = 2, error = java.nio.channels.AsynchronousCloseException > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 3, > my id = 2, error = java.nio.channels.AsynchronousCloseException > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2, > my id = 3, error = java.io.IOException: Channel eof > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2, > my id = 1, error = java.io.IOException: Channel eof > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting > for message on queue > > java.lang.InterruptedException > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) > > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting > for message on queue > > java.lang.InterruptedException > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) > > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread > > 11/06/14 10:52:52 INFO quorum.Learner: shutdown called > > java.lang.Exception: shutdown Follower > > at > org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165) > > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649) > > 11/06/14 10:52:52 INFO server.FinalRequestProcessor: shutdown of request > processor complete > > 11/06/14 10:52:52 WARN quorum.QuorumPeer: QuorumPeer main thread exited > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting > for message on queue > > java.lang.InterruptedException > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) > > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting > for message on queue > > java.lang.InterruptedException > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) > > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) > > 11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread > > 11/06/14 10:52:52 ERROR quorum.QuorumCnxManager: Exception while listening > > java.nio.channels.AsynchronousCloseException > > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) > > at > sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152) > > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:478) > > 11/06/14 10:52:52 INFO quorum.QuorumCnxManager: Leaving listener > > 11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerSender is down > > 11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerReceiver is down > > 11/06/14 10:52:52 INFO zookeeper.LoggingWatcher: Saw event: WatchedEvent > state:Disconnected type:None path:null > > 11/06/14 10:52:52 ERROR zookeeper.ZooUtils$AieZooKeeper: > ATTIVIO-PLATFORM-85 : Failed trying to talk to zookeeper, retrying... > > org.apache.zookeeper.KeeperException$ConnectionLossException - > KeeperErrorCode = ConnectionLoss for /mugatu > > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /mugatu > > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > > at > org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > > at > com.attivio.app.config.zookeeper.ZooUtils$AieZooKeeper.create(ZooUtils.java:277) > > at > com.attivio.app.config.zoo.ZooServerTest.check(ZooServerTest.java:116) > > at > com.attivio.app.config.zoo.ZooServerTest.helloQuroumZoo(ZooServerTest.java:78) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > > at > com.attivio.junit.internal.runners.statements.FailAndStackDumpOnTimeout$2.call(FailAndStackDumpOnTimeout.java:63) > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > 11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/127.0.0.1:16055 > > 11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Socket connection established > to localhost/127.0.0.1:16055, initiating session > > 11/06/14 10:52:53 INFO server.NIOServerCnxn: Accepted socket connection > from /127.0.0.1:44515 > > 11/06/14 10:52:53 INFO server.NIOServerCnxn: Client attempting to renew > session 0x2308ea3fab10000 at /127.0.0.1:44515 > > 11/06/14 10:52:53 ERROR server.NIOServerCnxn: Thread > Thread[NIOServerCxn.Factory:localhost/127.0.0.1:16055,5,main] died > > java.lang.AssertionError > > at > org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:66) > > at > org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.java:1552) > > at > org.apache.zookeeper.server.ZooKeeperServer.revalidateSession(ZooKeeperServer.java:529) > > at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.revalidateSession(LeaderZooKeeperServer.java:167) > > at > org.apache.zookeeper.server.ZooKeeperServer.reopenSession(ZooKeeperServer.java:537) > > at > org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:775) > > at > org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:485) > > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:521) > > at > org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) > > 11/06/14 10:52:56 ERROR quorum.LearnerHandler: Unexpected exception causing > shutdown while sock still open > > java.net.SocketTimeoutException: Read timed out > > at java.net.SocketInputStream.socketRead0(Native Method) > > at > java.net.SocketInputStream.read(SocketInputStream.java:129) > > at > java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > > at > java.io.BufferedInputStream.read(BufferedInputStream.java:237) > > at > java.io.DataInputStream.readInt(DataInputStream.java:370) > > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) > > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) > > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:375) > > 11/06/14 10:52:56 WARN quorum.LearnerHandler: ******* GOODBYE / > 127.0.0.1:34413 ******** > > 11/06/14 10:53:06 INFO zookeeper.ClientCnxn: Client session timed out, have > not heard from server in 13333ms for sessionid 0x2308ea3fab10000, closing > socket connection and attempting reconnect >
