Thanks for the response. This is helpful, and jibes with what I am seeing in the code. WRT solr, currently there are a couple isolated places where curator is used, and there is a desire to move to using it extensively but that has not happened yet <https://issues.apache.org/jira/browse/SOLR-16116>. (PR <https://github.com/apache/solr/pull/760>)
To date, Solr manages its own clients directly. Exactly what it does varies a bit depending on who wrote the code and when, and I think part of the impetus for Curator is to achieve consistency. In the ticket I linked, the person who experienced a failure and loss of a collection evidently had a "not synced" zk server node that also didn't know it wasn't synced. Based on what you say above I guess this means that node wasn't configured properly and maybe didn't have a list of the other servers? I'm assuming nodes start up as not synced... Obviously if true, that's a serious user error in configuration, but conceivable if deployment infrastructure is meant to generate the config, and succeeds incorrectly (imagining something doing something conceptually like $OLDSERVERS + $NEW where $OLDSERVERS came out blank). IIUC you rely on the target server *knowing* that it is out of sync. Adding a server that has a false sense of confidence to the client's connection string doesn't (yet) seem any different than adding it to the load balancer here. In either case the client might select the "rogue" server that thinks it's synced but isn't with identical results. Based on your description A load balancer would still potentially cause interference if (for example) it didn't allow long running connections, but this would just be spurious disconnect/reconnect cycles and inconsistent load on individual zk servers, suboptimal but non-catastrophic unless near machine capacity already. -Gus On Fri, Jun 10, 2022 at 12:35 PM Szalay-Bekő Máté < [email protected]> wrote: > Hello Gus, > > I think you shouldn't use a load-balancer for ZooKeeper. Clients do the > load balancing and also they won't connect to any 'out-of-sync' servers. > The way it works normally: > > - You have ZK servers A, B and C. You list all these servers in all your > ZooKeeper client configs. And in all server configs. > - ZK servers form a quorum, when the majority of the servers are joined. > There is always one quorum member selected as leader (this is the only one > that can change the state stored in ZK and it will only commit a change if > the majority of the servers approved it). > - each client is connected to a single ZK server at a time. > - If any ZK server goes out of sync (e.g. losing the connection to the > leader, etc) then it will stop serving requests. So even if your client > would connect to such a server, the client will lose the connection > immediately if the server left the quorum and no client will be able to > connect to such an "out of sync" server. > - The ZK client first connects to a random server from the server list. If > the connection fails (server is unreachable or not serving client request), > the client will move to the next server in the list in a round-robin > fashion, until it finds a working / in-sync server. > - All ZK clients talk with the server in "sessions". Each client > session has an ID, unique in the cluster. If the client loses the > connection to a server, it will automatically try to connect to a different > server using ('renewing') the same session id. "A client will see the same > view of the service regardless of the server that it connects to. i.e., a > client will never see an older view of the system even if the client fails > over to a different server with the same session." This is (among other > things) guaranteed by ZooKeeper. > - Of course, sessions can terminate. There is a pre-configured and > negotiated session timeout. The session will be deleted from the ZK cluster > if the connection between any server and the given client breaks for more > than a pre-configured time (e.g. no heartbeat for 30 seconds). After this > time, the session can not be renewed. In this case the application needs to > decide what to do. It can start a new session, but then it is possible that > e.g. it will miss some watched events and also it will lose its ephemeral > ZNodes. (e.g. I know HBase servers aborts themselves when ZK > session timeout happens, as they can not guarantee consistency anymore) Now > as far as I remember, Solr is using Curator to handle ZooKeeper > connections. I'm not entirely sure how Solr is using ZooKeeper through > Curator. Maybe Solr reconnects automatically with a new session if the old > one terminates. Maybe Solr handles this case in a different way. > > see our overview docs: > https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html > > By default the list of the servers is a static config. If you want to add a > new server (or remove an old one), then you need to rolling-restart all the > ZK servers and also restart the clients with the new config. The dynamic > reconfig feature (if enabled) allows you to do this in a more clever way, > storing the list of the ZK servers inside a system znode, which can be > changed dynamically: > https://zookeeper.apache.org/doc/r3.8.0/zookeeperReconfig.html (this is > available since ZooKeeper 3.5) > > Best regards, > Mate > > > On Thu, Jun 9, 2022 at 10:37 PM Gus Heck <[email protected]> wrote: > > > Hi ZK Folks, > > > > Some prior information including discussion on SOLR-13396 > > < > > > https://issues.apache.org/jira/browse/SOLR-13396?focusedCommentId=16822748&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822748 > > > > > had > > led me to believe that the zookeeper client established connections to > all > > members of the cluster. This then seemed to be the logic for saying that > > having a load balancing in front of zookeeper was dangerous, allowing the > > possibility that the client might decide to talk to a zk that had not > > synced up. In the solr case this could lead to data loss. (see discussion > > in the above ticket). > > > > However, I've now been reading code pursuing an issue for a client and > > unless the multiple connections are hidden deep inside the handling of > > channels in the ClientCnxnSocketNIO class (or it's close relatives) it > > looks a lot to me like only one actual connection is held at one time by > an > > instance of ZooKeeper.java. > > > > If that's true, then while the ZooKeeper codebase certainly has logic to > > reconnect and to balance across the cluster etc, it's becoming murky to > me > > how listing all zk servers directly vs through a load balancer would be > > protection against connecting to an as-yet unsynced zookeeper if it > existed > > in the configured server list. > > > > Does such a protection exist? or is it the user's responsibility not to > add > > the server to the list (or load balancer) until it's clear that it has > > successfully joined the cluster and synced its data? > > > > -Gus > > > > -- > > http://www.needhamsoftware.com (work) > > http://www.the111shift.com (play) > > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
