There is a very high-level description of our internal ZooKeeper leader
election algorithm here:
I don't know if we have more detailed documentation. If you are interested
in the code, best to start here:
Also we have many unit tests around leader election that can help to
understand the behaviour.

 "503 service unavailable" is an HTTP error code, and on the ZooKeeper
Client interface we don't use HTTP but we use a (jute based) binary
protocol. In ZooKeeper, we have client sessions which can be kept alive for
some time even if they can not communicate with the server. E.g. if you set
client session timeout to 30 sec and there is a leader election in
ZooKeeper server that takes e.g. 10 seconds, then (as far as I remember)
the ZooKeeper client library should keep the session open so this should
not be visible for the applications using ZooKeeper. Of course no change
can be submitted (or no new session can be created) while the quorum has no
active leader, so I assume these operations will be blocked until the
internal leader election finishes in ZooKeeper. So one can expect longer
response time temporarily in case of a leader election.

In ZooKeeper, the current leader is responsible to do all the modification
on the data, and all the changes made by the leader are synchronized to all
followers. The four-letter-word diagnostic interface (
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_4lw) or the
HTTP admin API (
can be used to find the current leader in the cluster. However, in
ZooKeeper the clients can be connected to any ZooKeeper Server in the
quorum (unless leaderServes config is explicitly disabled), and normally
all servers will accept both read and write operations. A client session is
handled by a server and if we send a write request, then this server will
make sure to play it through the current leader before sending back the
answer to the client. The client doesn't need to know who is the current
leader, it can communicate to any server. Usually we list all the ZooKeeper
servers when we initiate a new client session, so the client library can
fail-over and loadbalance.

In general, you might find useful to read our documentation:

