Thanks Camille. now I see that it's the Watcher.Event.KeeperState.Disconnected event being generated, by the ClientCnxn.SenderThread.run(). .... queueEvent , and then processed by EventThread.run() .... watcher.process() it seems that the same scenario I gave above could still happen, i.e. the ClientCnxn.SenderThread or the EventThread could be stopped and the main application thread keeps going happily along . though this is a very slight possibility, theoretically it is still possible. or am I missing something?
Thanks Yang On Mon, Jul 18, 2011 at 6:51 AM, Fournier, Camille F. <[email protected]> wrote: > If the zk cluster doesn't get pings from your existing master, the zk client > on that master should see a disconnected state event, not a node deletion > event. Upon seeing that event, it should stop acting as master until such > time as it can determine whether it has reconnected and is still master, or > it reconnects and sees that its original session has failed or the master > node is deleted. > > C > > > > ----- Original Message ----- > From: Yang <[email protected]> > To: [email protected] <[email protected]> > Sent: Mon Jul 18 04:00:04 2011 > Subject: Re: help on Zookeeper code walk through? > > Thanks Camille and Ben. > > I get the basic picture now. > > I have another question: in a leader election scenario (for example > HBase Master election), I want to make sure that at any time , there > is only at most one node running as master, and there is indeed one > running as master all the time except for very short failover time > period. > > then if only the connection between current master and ZK is down, > ZK senses the lack of pings, and kills the session and ephemeral child > node owned by the leader, and the next client node kicks in as leader. > at this time, if the current leader machine is still working fine, its > traffic going out to the its application servers as normal, would it > be blissfully still acting as a leader, and violate our "single > master" goal? for example if the Watcher.process() catches the > nodeDelete event, and tries to set some var to stop the application > server, but if this thread is stopped before the var is set, and is > never invoked again, then the application server could just keep > happily going along...? > > for example, the following dummy code > > class MyApplication { > volatile boolean should_stop = false; > class MyZKWatcher implements Zookeeper.Watcher { > public void process(Event e) { > if ( e is nodeDelete of my owner node ) { > should_stop = true ; //************* > } > } > > public void runApp() { > zk = new ZooKeeper(hostPort, 3000, this); > while ( ! should_shop ) { > send_out_some_messages to my application servers > assuming I'm leader > } > } > > public static void main(String args[]) { > new MyApplication().runApp(); > } > } > > > basically if the nodeDelete event is caught but the Watcher stops > right at "//*****" line , then the > application main loop could still be going on?? otherwise I have to > put a node exists() check before I send out every application message? > > > Thanks a lot > Yang > 7 PM, Benjamin Reed <[email protected]> wrote: >> if you are running with multiple servers, it is the leader that >> declares sessions dead, so the leader will call killSession(). the >> followers track the liveness of the clients with pings and will >> periodically send liveness summaries to the leader. >> >> see camille's email the specific classes to look at. >> >> ben >> >> On Sat, Jul 16, 2011 at 1:44 AM, Yang <[email protected]> wrote: >>> I'm wondering if a client loses session to its ephemeral znode, under >>> the hood, how >>> is the watcher triggered? >>> >>> went through the code , and found something that looks related: >>> ZKDataBase.killSession()-->DataTree.killSession()--->DataTree.deleteNode()--->WatchManager.triggerWatch()--->Watcher.process() >>> >>> but how is ZKDataBase.killSession() called? from the info given in >>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkSessions >>> I can see the ZooKeeper client code does periodically ping the server >>> to maintain liveness. but how the server checks for this liveness and >>> trigger killSession(), here I'm having difficulty connecting the dots. >>> >>> could you please give me some help walking through this piece of code? >>> >>> Thanks >>> Yang >>> >> >
