Hello Gus,
I think you shouldn't use a load-balancer for ZooKeeper. Clients do the
load balancing and also they won't connect to any 'out-of-sync' servers.
The way it works normally:
- You have ZK servers A, B and C. You list all these servers in all your
ZooKeeper client configs. And in all server configs.
- ZK servers form a quorum, when the majority of the servers are joined.
There is always one quorum member selected as leader (this is the only one
that can change the state stored in ZK and it will only commit a change if
the majority of the servers approved it).
- each client is connected to a single ZK server at a time.
- If any ZK server goes out of sync (e.g. losing the connection to the
leader, etc) then it will stop serving requests. So even if your client
would connect to such a server, the client will lose the connection
immediately if the server left the quorum and no client will be able to
connect to such an "out of sync" server.
- The ZK client first connects to a random server from the server list. If
the connection fails (server is unreachable or not serving client request),
the client will move to the next server in the list in a round-robin
fashion, until it finds a working / in-sync server.
- All ZK clients talk with the server in "sessions". Each client
session has an ID, unique in the cluster. If the client loses the
connection to a server, it will automatically try to connect to a different
server using ('renewing') the same session id. "A client will see the same
view of the service regardless of the server that it connects to. i.e., a
client will never see an older view of the system even if the client fails
over to a different server with the same session." This is (among other
things) guaranteed by ZooKeeper.
- Of course, sessions can terminate. There is a pre-configured and
negotiated session timeout. The session will be deleted from the ZK cluster
if the connection between any server and the given client breaks for more
than a pre-configured time (e.g. no heartbeat for 30 seconds). After this
time, the session can not be renewed. In this case the application needs to
decide what to do. It can start a new session, but then it is possible that
e.g. it will miss some watched events and also it will lose its ephemeral
ZNodes. (e.g. I know HBase servers aborts themselves when ZK
session timeout happens, as they can not guarantee consistency anymore) Now
as far as I remember, Solr is using Curator to handle ZooKeeper
connections. I'm not entirely sure how Solr is using ZooKeeper through
Curator. Maybe Solr reconnects automatically with a new session if the old
one terminates. Maybe Solr handles this case in a different way.
see our overview docs:
https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html
By default the list of the servers is a static config. If you want to add a
new server (or remove an old one), then you need to rolling-restart all the
ZK servers and also restart the clients with the new config. The dynamic
reconfig feature (if enabled) allows you to do this in a more clever way,
storing the list of the ZK servers inside a system znode, which can be
changed dynamically:
https://zookeeper.apache.org/doc/r3.8.0/zookeeperReconfig.html (this is
available since ZooKeeper 3.5)
Best regards,
Mate
On Thu, Jun 9, 2022 at 10:37 PM Gus Heck <[email protected]> wrote:
> Hi ZK Folks,
>
> Some prior information including discussion on SOLR-13396
> <
> https://issues.apache.org/jira/browse/SOLR-13396?focusedCommentId=16822748&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822748
> >
> had
> led me to believe that the zookeeper client established connections to all
> members of the cluster. This then seemed to be the logic for saying that
> having a load balancing in front of zookeeper was dangerous, allowing the
> possibility that the client might decide to talk to a zk that had not
> synced up. In the solr case this could lead to data loss. (see discussion
> in the above ticket).
>
> However, I've now been reading code pursuing an issue for a client and
> unless the multiple connections are hidden deep inside the handling of
> channels in the ClientCnxnSocketNIO class (or it's close relatives) it
> looks a lot to me like only one actual connection is held at one time by an
> instance of ZooKeeper.java.
>
> If that's true, then while the ZooKeeper codebase certainly has logic to
> reconnect and to balance across the cluster etc, it's becoming murky to me
> how listing all zk servers directly vs through a load balancer would be
> protection against connecting to an as-yet unsynced zookeeper if it existed
> in the configured server list.
>
> Does such a protection exist? or is it the user's responsibility not to add
> the server to the list (or load balancer) until it's clear that it has
> successfully joined the cluster and synced its data?
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>