> the person who experienced a failure and loss of a collection evidently
had a "not synced" zk server node that also didn't know it wasn't synced

This indeed should be a configuration issue or a bug. ZK servers when
joining a quorum, sync-ing their states and later only the leader changes
the state. Although I know a few bugs in older ZK versions that can lead to
out-of-sync states in some rare cases.

In ZooKeeper, each committed state change made by the leader is marked by a
'zxid', an atomically / monotone increasing 'logical timestamp' value. It
shouldn't happen to have different changes marked by the same zxid. Also if
a client got a response that some change was successful, then that change
has already persisted and even in the case of a server failure, it should
be restored. But the theoretical configuration problem you described
(having multiple ZK quorums running and clients got provided servers from
multiple quorums) can lead to all sorts of problems. The states will
diverge and even the session ids can be duplicated.

However, you mentioned that ZooKeeper is used differently in multiple
places in Solr. If Solr uses multiple zk client sessions in a single jvm,
then it is possible that these zk sessions are handled by different zk
servers and server A is a bit behind server B (last known zxid is smaller
on zk server A than on zk server B). The state observed by the client in
session A will eventually catch up (within a given time bound) and the
changes in  session A will exactly follow the changes observed in session
B. But ZK doesn't guarantee that different client sessions (either running
on different hosts or in the same host / jvm) will always see the same
state. For the same session (if it moves to a different ZK server) this is
guaranteed though (the client  communicates to the server the last zxid it
has seen and the server knows if it is not up-to-date and can handle the
situation).
https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html#Guarantees

When one designs any distributed synchronization using ZooKeeper, then it
is important to calculate with these limitations. Relying on Curator can
help here a lot, as it provides a higher level of abstraction.

> Based on your description A load balancer would still potentially cause
interference if (for example) it didn't allow long running connections
Yes, definitely. ZooKeeper client connections will be frequently broken if
the load balancer doesn't support sticky tcp sessions. And (re-)connection
can be an expensive operation, especially if e.g. kerberos is used.

Best regards,
Mate


On Fri, Jun 10, 2022, 7:53 PM Gus Heck <[email protected]> wrote:

> Thanks for the response. This is helpful, and jibes with what I am seeing
> in the code. WRT solr, currently there are a couple isolated places where
> curator is used, and there is a desire to move to using it extensively but
> that has not happened yet <
> https://issues.apache.org/jira/browse/SOLR-16116>.
> (PR <https://github.com/apache/solr/pull/760>)
>
> To date, Solr manages its own clients directly. Exactly what it does varies
> a bit depending on who wrote the code and when, and I think part of the
> impetus for Curator is to achieve consistency.
>
> In the ticket I linked, the person who experienced a failure and loss of a
> collection evidently had a "not synced" zk server node that also didn't
> know it wasn't synced. Based on what you say above I guess this means that
> node wasn't configured properly and maybe didn't have a list of the other
> servers? I'm assuming nodes start up as not synced... Obviously if true,
> that's a serious user error in configuration, but conceivable if deployment
> infrastructure is meant to generate the config, and succeeds incorrectly
> (imagining something doing something conceptually like $OLDSERVERS + $NEW
> where $OLDSERVERS came out blank).
>
> IIUC you rely on the target server *knowing* that it is out of sync. Adding
> a server that has a false sense of confidence to the client's connection
> string doesn't (yet) seem any different than adding it to the load balancer
> here. In either case the client might select the "rogue" server that thinks
> it's synced but isn't with identical results. Based on your description A
> load balancer would still potentially cause interference if (for example)
> it didn't allow long running connections, but this would just be spurious
> disconnect/reconnect cycles and inconsistent load on individual zk servers,
> suboptimal but non-catastrophic unless near machine capacity already.
>
> -Gus
>
>
> On Fri, Jun 10, 2022 at 12:35 PM Szalay-Bekő Máté <
> [email protected]> wrote:
>
> > Hello Gus,
> >
> > I think you shouldn't use a load-balancer for ZooKeeper. Clients do the
> > load balancing and also they won't connect to any 'out-of-sync' servers.
> > The way it works normally:
> >
> > - You have ZK servers A, B and C. You list all these servers in all your
> > ZooKeeper client configs. And in all server configs.
> > - ZK servers form a quorum, when the majority of the servers are joined.
> > There is always one quorum member selected as leader (this is the only
> one
> > that can change the state stored in ZK and it will only commit a change
> if
> > the majority of the servers approved it).
> > - each client is connected to a single ZK server at a time.
> > - If any ZK server goes out of sync (e.g. losing the connection to the
> > leader, etc) then it will stop serving requests. So even if your client
> > would connect to such a server, the client will lose the connection
> > immediately if the server left the quorum and no client will be able to
> > connect to such an "out of sync" server.
> > - The ZK client first connects to a random server from the server list.
> If
> > the connection fails (server is unreachable or not serving client
> request),
> > the client will move to the next server in the list in a round-robin
> > fashion, until it finds a working / in-sync server.
> > - All ZK clients talk with the server in "sessions". Each client
> > session has an ID, unique in the cluster. If the client loses the
> > connection to a server, it will automatically try to connect to a
> different
> > server using ('renewing') the same session id. "A client will see the
> same
> > view of the service regardless of the server that it connects to. i.e., a
> > client will never see an older view of the system even if the client
> fails
> > over to a different server with the same session." This is (among other
> > things) guaranteed by ZooKeeper.
> > - Of course, sessions can terminate. There is a pre-configured and
> > negotiated session timeout. The session will be deleted from the ZK
> cluster
> > if the connection between any server and the given client breaks for more
> > than a pre-configured time (e.g. no heartbeat for 30 seconds). After this
> > time, the session can not be renewed. In this case the application needs
> to
> > decide what to do. It can start a new session, but then it is possible
> that
> > e.g. it will miss some watched events and also it will lose its ephemeral
> > ZNodes. (e.g. I know HBase servers aborts themselves when ZK
> > session timeout happens, as they can not guarantee consistency anymore)
> Now
> > as far as I remember, Solr is using Curator to handle ZooKeeper
> > connections. I'm not entirely sure how Solr is using ZooKeeper through
> > Curator. Maybe Solr reconnects automatically with a new session if the
> old
> > one terminates. Maybe Solr handles this case in a different way.
> >
> > see our overview docs:
> > https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html
> >
> > By default the list of the servers is a static config. If you want to
> add a
> > new server (or remove an old one), then you need to rolling-restart all
> the
> > ZK servers and also restart the clients with the new config. The dynamic
> > reconfig feature (if enabled) allows you to do this in a more clever way,
> > storing the list of the ZK servers inside a system znode, which can be
> > changed dynamically:
> > https://zookeeper.apache.org/doc/r3.8.0/zookeeperReconfig.html (this is
> > available since ZooKeeper 3.5)
> >
> > Best regards,
> > Mate
> >
> >
> > On Thu, Jun 9, 2022 at 10:37 PM Gus Heck <[email protected]> wrote:
> >
> > > Hi ZK Folks,
> > >
> > > Some prior information including discussion on SOLR-13396
> > > <
> > >
> >
> https://issues.apache.org/jira/browse/SOLR-13396?focusedCommentId=16822748&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822748
> > > >
> > > had
> > > led me to believe that the zookeeper client established connections to
> > all
> > > members of the cluster. This then seemed to be the logic for saying
> that
> > > having a load balancing in front of zookeeper was dangerous, allowing
> the
> > > possibility that the client might decide to talk to a zk that had not
> > > synced up. In the solr case this could lead to data loss. (see
> discussion
> > > in the above ticket).
> > >
> > > However, I've now been reading code pursuing an issue for a client and
> > > unless the multiple connections are hidden deep inside the handling of
> > > channels in the ClientCnxnSocketNIO class (or it's close relatives) it
> > > looks a lot to me like only one actual connection is held at one time
> by
> > an
> > > instance of ZooKeeper.java.
> > >
> > > If that's true, then while the ZooKeeper codebase certainly has logic
> to
> > > reconnect and to balance across the cluster etc, it's becoming murky to
> > me
> > > how listing all zk servers directly vs through a load balancer would be
> > > protection against connecting to an as-yet unsynced zookeeper if it
> > existed
> > > in the configured server list.
> > >
> > > Does such a protection exist? or is it the user's responsibility not to
> > add
> > > the server to the list (or load balancer) until it's clear that it has
> > > successfully joined the cluster and synced its data?
> > >
> > > -Gus
> > >
> > > --
> > > http://www.needhamsoftware.com (work)
> > > http://www.the111shift.com (play)
> > >
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Reply via email to