> Just wondering how the LOST connection state is determined? I would have > thought that it would be safe to be in a SUSPENDED connection state until > somewhere close to the session timeout was reached. From my experimentation > though it seems that the LOST state isn't related to either the session > timeout or the connection timeout. When Curator gets a disconnection event, it sets the state to SUSPENDED and executes a sync() in the background (using retries, etc.). If that sync() fails, it sets the state to LOST. However, if Curator sees an expired session event, it goes straight to LOST.
> From my experimentation though it seems that the LOST state isn't related to > either the session timeout or the connection timeout. There is a relationship, but it isn't 1-to-1. > If I have a 5 second session timeout configured for the Curator connection, > it takes (in my case) 9 seconds between the SUSPENDED state and the LOST > state. Given that the session is expired on the server side well before the > LOST state is received, this seems incorrect. The timeouts are not necessarily related to connection state. I don't think this is implied in the docs. If it is, the docs should be updated. -JZ On Oct 7, 2013, at 6:37 PM, Cameron McKenzie <[email protected]> wrote: > Looking further into this, I think that it could be considered a bug. > > If I have a 5 second session timeout configured for the Curator connection, > it takes (in my case) 9 seconds between the SUSPENDED state and the LOST > state. Given that the session is expired on the server side well before the > LOST state is received, this seems incorrect. > > Any thoughts? > > > On Wed, Oct 2, 2013 at 4:17 PM, Cameron McKenzie <[email protected]> > wrote: > Hi, > Just wondering how the LOST connection state is determined? I would have > thought that it would be safe to be in a SUSPENDED connection state until > somewhere close to the session timeout was reached. From my experimentation > though it seems that the LOST state isn't related to either the session > timeout or the connection timeout. > > Is there some rationale behind this? > > My thinking was that for leader election, locks etc. that rely on ephemeral > nodes, we can be sure that these nodes are going to exist for as long as the > session timeout, and thus we can be disconnected from ZooKeeper for up to the > session timeout (with a bit of leeway for safety) and still assume that our > ephemeral nodes are present. For leader election or locks where it is not > possible for another client to come and 'steal' this function from the client > isn't this a safe assumption? > > Or am I missing something? > cheers > Cam >
