I'm trying to embed zookeeper inside my application and having trouble
testing a simple failure scenario. My test goes as follows:
Start up a 3 node cluster
Create a client against all 3 nodes
Add data A
Stop node 3
Add data B
Start the previously stopped node 3
Create a new client against node 3 only
Make sure data A + B are present.
Most of the time this works as expected but every once in a while it hangs
on multiple platforms (win64 and lin64) on the call to 'add data B'. My
client automatically handles connection loss events and will try to
automatically replay the message until it succeeds however it is never able
to. From reading the doc in the ops guide it seems to say that 3 nodes will
work "Thus, a deployment that consists of three machines can handle one
failure." Am I missing something? Do I need more than 3 nodes? Logs and
stack traces are the logic for create:
11/06/14 10:52:52 INFO ZooKeeper.DEBUGLOG: shutdown 2
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 1,
my id = 2, error = java.nio.channels.AsynchronousCloseException
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 3,
my id = 2, error = java.nio.channels.AsynchronousCloseException
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2,
my id = 3, error = java.io.IOException: Channel eof
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Connection broken for id 2,
my id = 1, error = java.io.IOException: Channel eof
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupting SendWorker
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
11/06/14 10:52:52 INFO quorum.Learner: shutdown called
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
11/06/14 10:52:52 INFO server.FinalRequestProcessor: shutdown of request
processor complete
11/06/14 10:52:52 WARN quorum.QuorumPeer: QuorumPeer main thread exited
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Interrupted while waiting
for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.report
InterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitN
anos(AbstractQueuedSynchronizer.java:1976)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnx
Manager.java:622)
11/06/14 10:52:52 WARN quorum.QuorumCnxManager: Send worker leaving thread
11/06/14 10:52:52 ERROR quorum.QuorumCnxManager: Exception while listening
java.nio.channels.AsynchronousCloseException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptible
Channel.java:185)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxMa
nager.java:478)
11/06/14 10:52:52 INFO quorum.QuorumCnxManager: Leaving listener
11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerSender is down
11/06/14 10:52:52 INFO quorum.FastLeaderElection: WorkerReceiver is down
11/06/14 10:52:52 INFO zookeeper.LoggingWatcher: Saw event: WatchedEvent
state:Disconnected type:None path:null
11/06/14 10:52:52 ERROR zookeeper.ZooUtils$AieZooKeeper: ATTIVIO-PLATFORM-85
: Failed trying to talk to zookeeper, retrying...
org.apache.zookeeper.KeeperException$ConnectionLossException -
KeeperErrorCode = ConnectionLoss for /mugatu
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /mugatu
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at
com.attivio.app.config.zookeeper.ZooUtils$AieZooKeeper.create(ZooUtils.java:
277)
at
com.attivio.app.config.zoo.ZooServerTest.check(ZooServerTest.java:116)
at
com.attivio.app.config.zoo.ZooServerTest.helloQuroumZoo(ZooServerTest.java:7
8)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.
java:44)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.j
ava:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.ja
va:41)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.jav
a:20)
at
com.attivio.junit.internal.runners.statements.FailAndStackDumpOnTimeout$2.ca
ll(FailAndStackDumpOnTimeout.java:63)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
at java.lang.Thread.run(Thread.java:619)
11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:16055
11/06/14 10:52:53 INFO zookeeper.ClientCnxn: Socket connection established
to localhost/127.0.0.1:16055, initiating session
11/06/14 10:52:53 INFO server.NIOServerCnxn: Accepted socket connection from
/127.0.0.1:44515
11/06/14 10:52:53 INFO server.NIOServerCnxn: Client attempting to renew
session 0x2308ea3fab10000 at /127.0.0.1:44515
11/06/14 10:52:53 ERROR server.NIOServerCnxn: Thread
Thread[NIOServerCxn.Factory:localhost/127.0.0.1:16055,5,main] died
java.lang.AssertionError
at
org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:66)
at
org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.ja
va:1552)
at
org.apache.zookeeper.server.ZooKeeperServer.revalidateSession(ZooKeeperServe
r.java:529)
at
org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.revalidateSession(L
eaderZooKeeperServer.java:167)
at
org.apache.zookeeper.server.ZooKeeperServer.reopenSession(ZooKeeperServer.ja
va:537)
at
org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.j
ava:775)
at
org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:485
)
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:521)
at
org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262
)
11/06/14 10:52:56 ERROR quorum.LearnerHandler: Unexpected exception causing
shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at
java.net.SocketInputStream.read(SocketInputStream.java:129)
at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.jav
a:84)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:37
5)
11/06/14 10:52:56 WARN quorum.LearnerHandler: ******* GOODBYE
/127.0.0.1:34413 ********
11/06/14 10:53:06 INFO zookeeper.ClientCnxn: Client session timed out, have
not heard from server in 13333ms for sessionid 0x2308ea3fab10000, closing
socket connection and attempting reconnect