We are seeing some occasional incidents where a zookeeper java client will hang in CountDownLatch.await() while waiting for a connection to be established. Our connect() code is pretty standard I think and it similar to this:

private ZooKeeper connect(String hosts, int sessionTimeout) throws IOException, InterruptedException {
        final CountDownLatch connectedSignal = new CountDownLatch(1);

        ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() {
            @Override
            public void process(WatchedEvent event) {
                if (event.getState() == Event.KeeperState.SyncConnected) {
                    connectedSignal.countDown();
                }
            }
        });

        connectedSignal.await();
        return zk;
    }

Has anyone else had an issue with the await() blocking forever like this? Any advice?

As a "fix" I am considering adding a timeout to the CountDownLatch await() call; if we fail to connect within that timeout then retry the connection attempt. After, say, 3 retries, give up entirely.

Thanks!
--
John Lindwall

Reply via email to