Ted,

Once I modified the code to not respond to disconnects like they were session 
expirations my issue is resolved.  But it did bring up a new question.  The 
original reason the code was there was to handle the case where a client is 
mainly used for listening to remote events.  So once it starts, it sets up a 
few watches and really doesn't interact with the server after that.  The 
thought was that if such a client was disconnected and did not handle that 
case, we'd never know about it and it would seem like no remote events 
occurred.  I have since changed this code to loop trying to check existence of 
some znode upon receipt of a disconnect.  If a session expiration occurs in 
this loop then I trigger the reconnect logic.  Otherwise once we reconnect, the 
check will succeed and the loop will exit.  Does this sound like a reasonable 
way to handle the issue?

Thanks,
Martin

> 
> Ted,
> 
> Sorry to trouble you on this one.  I do understand the difference, but at
> some point I did not.  :)
> 
> Your question inspired me to look deeper at our code (to see if we were
> confused) and I found one case that was triggering our reconnect response
> from Disconnected event.  Everywhere else we only do this in response to a
> SessionExpiredException.
> 
> Thanks for the quick response and your work on ZooKeeper in general!  I
> have also run into the "can't create ephemeral yet case" and our code
> generally loops until successful.
> 
> -Martin
 
> > -----Original Message-----
> > From: Ted Dunning [mailto:[email protected]]
> >
> > Martin,
> >
> > From your email, it sounds like there might be a bit of confusion
> > between disconnection and session expiration.  Are you sure you are
> > clear on the difference between these?
> >
> > Also, I have seen cases in my own code where I confused myself by
> > trying to re-create ephemeral files after a client program crashed.  I
> > knew that the client had crashed as soon as it happened, but the
> > Zookeeper servers could only determine this after a bit of time.  My
> > new program tried to recreate the ephemerals to indicate that it was
> > back but since the old ephemerals were still there, that failed.  Then
> > a short time later when the ZK cluster understood that the old client
> > was gone, the ephemerals disappeared even though the new client was
> > humming along nicely.  My solution was to delete the ephemerals when
> creating them.
> >
> > Is it possible you have a similar confusion?
> >
> > On Tue, Sep 13, 2011 at 11:25 AM, Martin Serrano <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > We have added code to our application to reconnect and re-establish
> > > watches when we receive a Disconnected event.  I am running tests on
> > > a heavily loaded system where the zookeeper server and clients are
> > > all impacted.  On this test system we regularly experience session
> > > timeouts and appropriately react to reconnect and set up our watches.
> > > There is an uncommon case that I am having trouble puzzling out.
> > > When running one of our tests in a loop about 1% of the time we hit
> > > a case where
> > on the client side we think the
> > > session has expired but on the server side it has been renewed.   We will
> > > then fail to be able to create an ephemeral node because it already
> > > exists and does not ever get cleaned up (since the previous session
> > > is still valid).  I'm trying to figure out if we are misusing the API or 
> > > if we
> have
> > > encountered a bug.   I'm happy to provide more details.  One thing I am
> > > wondering is if it is inappropriate to create a new session within
> > > the event thread of another session which has received the
> > > disconnected
> > event.
> > >
> > > Thanks,
> > > Martin Serrano
> > > ...

Reply via email to