Hi ZK Folks, Some prior information including discussion on SOLR-13396 <https://issues.apache.org/jira/browse/SOLR-13396?focusedCommentId=16822748&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822748> had led me to believe that the zookeeper client established connections to all members of the cluster. This then seemed to be the logic for saying that having a load balancing in front of zookeeper was dangerous, allowing the possibility that the client might decide to talk to a zk that had not synced up. In the solr case this could lead to data loss. (see discussion in the above ticket).
However, I've now been reading code pursuing an issue for a client and unless the multiple connections are hidden deep inside the handling of channels in the ClientCnxnSocketNIO class (or it's close relatives) it looks a lot to me like only one actual connection is held at one time by an instance of ZooKeeper.java. If that's true, then while the ZooKeeper codebase certainly has logic to reconnect and to balance across the cluster etc, it's becoming murky to me how listing all zk servers directly vs through a load balancer would be protection against connecting to an as-yet unsynced zookeeper if it existed in the configured server list. Does such a protection exist? or is it the user's responsibility not to add the server to the list (or load balancer) until it's clear that it has successfully joined the cluster and synced its data? -Gus -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
