2011/4/14 Chang Song <[email protected]> > You need to understand that most app can tolerate delay in connect/close, > but we cannot tolerate ping delay since we are using ZK heartbeat TO > for sole failure detection. >
What about using multiple ZK clusters for this, then? But it really sounds like your ZK machines are misconfigured somehow. Session start/stop isn't any more expensive than znode updates and a small ZK cluster can handle tens of thousands of those per second if set up correctly. Have you tested a cluster where the machines are set up correctly with separate snapshot and log disks? Are your ZK machines doing any other tasks? > We use 15 seconds (5 sec for each ensemble) > for session timeout, important server will drop out of the clusters even > if the server is not malfunctioning, in some cases, it wreaks havoc on > certain > services. >
