Hello Kaushal,

>  1. What is the algorithm used to elect the new leader between the
remaining 2 followers?

There is a very high-level description of our internal ZooKeeper leader
election algorithm here:
https://zookeeper.apache.org/doc/current/zookeeperInternals.html#sc_leaderElection
I don't know if we have more detailed documentation. If you are interested
in the code, best to start here:
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java
Also we have many unit tests around leader election that can help to
understand the behaviour.


> 2. During the leader elections process in place, does the client see
a 503 service unavailable for all read or write requests?

 "503 service unavailable" is an HTTP error code, and on the ZooKeeper
Client interface we don't use HTTP but we use a (jute based) binary
protocol. In ZooKeeper, we have client sessions which can be kept alive for
some time even if they can not communicate with the server. E.g. if you set
client session timeout to 30 sec and there is a leader election in
ZooKeeper server that takes e.g. 10 seconds, then (as far as I remember)
the ZooKeeper client library should keep the session open so this should
not be visible for the applications using ZooKeeper. Of course no change
can be submitted (or no new session can be created) while the quorum has no
active leader, so I assume these operations will be blocked until the
internal leader election finishes in ZooKeeper. So one can expect longer
response time temporarily in case of a leader election.


>    3. In an ensemble of 3 nodes with 1 leader and 2 followers. Is there a
way to see which node is serving read operations and which node is serving
write operations?

In ZooKeeper, the current leader is responsible to do all the modification
on the data, and all the changes made by the leader are synchronized to all
followers. The four-letter-word diagnostic interface (
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_4lw) or the
HTTP admin API (
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_adminserver)
can be used to find the current leader in the cluster. However, in
ZooKeeper the clients can be connected to any ZooKeeper Server in the
quorum (unless leaderServes config is explicitly disabled), and normally
all servers will accept both read and write operations. A client session is
handled by a server and if we send a write request, then this server will
make sure to play it through the current leader before sending back the
answer to the client. The client doesn't need to know who is the current
leader, it can communicate to any server. Usually we list all the ZooKeeper
servers when we initiate a new client session, so the client library can
fail-over and loadbalance.

In general, you might find useful to read our documentation:
https://zookeeper.apache.org/doc/current/zookeeperOver.html


Kind regards,
Máté


On Sat, Sep 17, 2022 at 6:27 PM Steph van Schalkwyk <svanschalk...@gmail.com>
wrote:

> Just google leader election site:zookeeper.apache.org
>
>
> On Fri, Sep 16, 2022 at 7:39 PM Kaushal Shriyan <kaushalshri...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am running Zookeeper version: 3.7.0 ( 3 nodes -> 1 Leader and 2
> > Followers) on CentOS Linux release 7.9.2009 (Core). In an ensemble of 3
> > nodes with 1 leader and 2 followers, if the leader goes down then two
> > servers can elect a leader among themselves. I have the below questions.
> >
> >    1. What is the algorithm used to elect the new leader between the
> >    remaining 2 followers?
> >    2. During the leader elections process in place, does the client see a
> >    503 service unavailable for all read or write requests?
> >    3. In an ensemble of 3 nodes with 1 leader and 2 followers. Is there a
> >    way to see which node is serving read operations and which node is
> > serving
> >    write operations?
> >
> > Please guide me. Any help will be highly appreciable. Thanks in advance.
> >
> > Best Regards,
> >
> > Kaushal
> >
>

Reply via email to