Are you actually remaking the connection?  It should happen
automatically for you.

On Wed, Jun 8, 2011 at 11:35 AM, Sampath Perera <[email protected]> wrote:
> Hi,
>
> First of all I must appreciate ZooKeeper, where I was able to get going with
> it pretty fast and implemented clustering (coordination of nodes in the
> cluster) for our product (UltraESB) just by going through the documentation
> and a few searches of the mailing list.
>
> Now, I was trying to run a sample setup with a ZooKeeper quorum of 3 nodes.
> I have setup the ZooKeeper quorum locally on the localhost with giving
> different election ports and client ports, and it seems to be like the
> quorum is working fine. Then I have started 3 UltraESB server nodes pointing
> to the quorum, I have noticed that a given UltraESB node connected to a
> particular ZooKeeper node. Then to test the reliability, I have tried to
> stop a ZooKeeper instance so that the 2 out of 3 ZK nodes are still alive,
> and the quorum has to work.
>
> What I have noticed when ever I stop the ZK node, the ESB server attached to
> that node, gets a Discconected keeper state watched event, (upon receiving
> this event I have registered a handler to stop the ESB cluster manager as
> this means the ZK connection was lost). Now I do not see ZK client trying to
> re-create the session with another node in the quorum...?
>
> Could it be due to some problem in the way I have implemented the watched
> event processing? or do we manually need to re-connect to the quorum once we
> receive a Disconnected event?
>
> Further I have been using ephemeral nodes, and I want to get the same
> session, so I have tried to re-create the ZK session with creating a new ZK
> instance from the ESB (client) side with passing the previous session id and
> the session paswd, this caused the other 2 ESB servers to receive
> Disconnected events too, but still I noticed that the ZK quorum was running
> fine with the 2 nodes that it had up and running and those 2 nodes got into
> a infinite loop due to the disconnect and then me trying to recreate ZK
> session and soon the system received "Too many open files error" probably
> due to running out of files with opened sockets (I am on unix)
>
> Any help in understanding this quorum re-connection would be really
> appreciated? Is there any documentation for this? If there is any please
> bare with me and point to the documentation.
>
> --
> Thanks,
> Sampath
> http://adroitlogic.org
>

Reply via email to