Hello,
I am having some issues where the Zookeeper connection loss occurs. This affects various things in my application, namely watchers, which result in errors like the one below: 23:13:01,593 ERROR [org.apache.zookeeper.ClientCnxn] (pool-5-thread-1-EventThread) Error while calling watcher : org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /controller/resync at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) [zookeeper-3.3.4.jar:3.3.3-1203054] at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) [zookeeper-3.3.4.jar:3.3.3-1203054] at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) [zookeeper-3.3.4.jar:3.3.3-1203054] at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) [:1.7.0_51] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51] at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) [clojure-1.5.1.jar:] at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) [clojure-1.5.1.jar:] at zookeeper$children.doInvoke(zookeeper.clj:230) at clojure.lang.RestFn.invoke(RestFn.java:464) [clojure-1.5.1.jar:] at resync$resync_group_watcher.invoke(resync.clj:26) at zookeeper.internal$make_watcher$reify__10446.process(internal.clj:56) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) [zookeeper-3.3.4.jar:3.3.3-1203054] at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) [zookeeper-3.3.4.jar:3.3.3-1203054] I guess I have a few questions that might help me mitigate this issue. I could try to fix whatever is causing the session expiration. This issue occurs when we have a lot of activity on the machine, which leads me to believe that it might be caused by GC activity (based on the ZK guide). This might work, but it seems to me like we would just be masking the issue and eventually, it might happen again. The other issue is that our client never recovers. It's completely dead. Is there a way to make it auto reconnect after it dies? Does Zookeeper support such functionality? Are there any other things I should be aware of or any recommendations you have for setting up a Zookeeper environment? For the record, we are running version 3.4.5 in a single node setup. Thanks
