Sounds like a dead lock on client library. One idea is to instrument your client code and dump the thread stack when the wait timeouts. The stack will hopefully contain the states of various threads and provide some insights on what to look for next.
On Tue, Jun 20, 2017 at 3:14 PM, John Lindwall <[email protected]> wrote: > We are seeing some occasional incidents where a zookeeper java client will > hang in CountDownLatch.await() while waiting for a connection to be > established. Our connect() code is pretty standard I think and it similar > to this: > > private ZooKeeper connect(String hosts, int sessionTimeout) throws > IOException, InterruptedException { > final CountDownLatch connectedSignal = new CountDownLatch(1); > > ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() { > @Override > public void process(WatchedEvent event) { > if (event.getState() == Event.KeeperState.SyncConnected) { > connectedSignal.countDown(); > } > } > }); > > connectedSignal.await(); > return zk; > } > > Has anyone else had an issue with the await() blocking forever like this? > Any advice? > > As a "fix" I am considering adding a timeout to the CountDownLatch await() > call; if we fail to connect within that timeout then retry the connection > attempt. After, say, 3 retries, give up entirely. > > Thanks! > -- > John Lindwall > > -- Cheers Michael.
