Hi Kevin, Thanks for the interest.
Helix can automatically detect flapping and disable the node if the the number of disconnects exceeds a threshold within a time period. Will this be sufficient ? Temporary disconnection vs permanent one is interesting, this can be due to GC/Network partition and detecting this reliably is challenging. We have a jira to address this https://issues.apache.org/jira/browse/HELIX-26. It will help if you add your requirements to this. Even though its possible to expose ZkStateChangeListener, we want to avoid exposing dependency on the library. We can definitely add a NodeStateChangeListener and provide the callback when the state of the node changes. We don't have this feature at this time, please file a jira and we should be able to add it easily. I am interested in what you plan to do when you detect the state changes. The reason i ask this is, it is really tricky to handle zookeeper callbacks in a reliable way and if possible Helix should deal with it properly and invoke the appropriate set of transitions. thanks, Kishore G On Thu, Aug 8, 2013 at 7:26 AM, Kevin Gao <[email protected]> wrote: > Hi everyone, > > First off, I'm new to the mailing list. My name is Kevin, and I'm working > over at Box. I'm excited to get to work with all of you! > > In the mean time, I do have one question regarding detection of ZooKeeper > connection state changes. For our purposes, we need to know the state of > the connection between Helix and ZooKeeper at all times to ensure proper > handling of some failure scenarios. Currently, the model we use for > detecting connectivity is a long poll to the HelixManager.isConnected() > method; however, we would like to ideally be notified of any connection > state changes. > > Essentially, I would like to be able to add a ZkStateChangeListener to the > underlying ZkClient, or at least be able to subscribe to such event changes > at the HelixManager level. A mechanism similar to > HelixManager.addCurrentStateChangeListener() would work for example. Am I > missing something obvious, or is this capability just not exposed? > > Reasons why we would like to do this include, for example: > > * Be able to externally know if a disconnect is due to a flapping issue > * Be able to differentiate between a temporary disconnection vs. a > permanent one, or one of longer duration > > Any suggestions? > > Thanks a lot, > Kevin > > -- > Kevin Gao > [email protected] >
