Hello Ram,

sorry, I don't really understand the question. The zxid is a 64 bit long
number. The upper 32 bits are coding an election epoch number (a logical
time / counter for leader elections), while the bottom 32 bits are counting
/ providing an auto incremented id for all the changes made (committed) in
ZooKeeper. As far as I understood, the followers are sending proposals to
the leader, and each accepted (committed) proposal will result in an
increase in the zxid. The "current" / "latest" zxid is the same in the
whole cluster (of course followers can lag behind a little, but not much in
theory. if they are in-sync and part of the quorum).

My understanding is that what you want to catch, is the event when the
lower 32 bits of the zxid is approaching 0xffffffff . As when the last 32
bits of the zxid is reaching 0xffffffff, then a new leader election will be
triggered automatically and ZooKeeper won't be able to serve for a short
period of time. And I guess you want to control this event and maybe
restart the leader manually in a time what is suiting you better?

But maybe I misunderstood your question.

Máté

On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu <
[email protected]> wrote:

> Máté,
>
> Thanks for quick reply, yes i did see that srvr command can give the
> current zxid, I also see a metric in mntr "proposal_count" which gives
> total proposals and when we hit the zxid limit that is matching with the
> proposal_count  2^32=*4,294,967,296)*metric. So i am trying to understand
> how this zxid will get incitement ? I don't see zxid in logs for normal
> events other than leader elections time.
>
> Ram
>
>
>
> On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté <
> [email protected]> wrote:
>
> > Hello!
> >
> > I think the "srvr" 4-letter-word diagnostic command should print you the
> > current zxid. Also the similar command works on the Admin Rest API (if it
> > is enabled).
> >
> > See:
> >
> https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands
> >
> > An example:
> >
> >
> > echo srvr | nc localhost 2181
> >
> > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
> > built on 08/08/2022 11:04 GMT
> > Latency min/avg/max: 0/0/1808
> > Received: 9599434
> > Sent: 9673689
> > Connections: 41
> > Outstanding: 0
> > Zxid: 0x2000afcbf                             <------------- this line
> > Mode: leader
> > Node count: 1384
> > Proposal sizes last/min/max: 32/32/4226
> >
> >
> >
> >
> > Also the zxid is added to the name of the snapshots / transaction log
> > files, which are flushed to the file system. Like:  log.<zxid>  or
> > snapshot.<zxid>
> >
> > e.g.:
> >
> > ls -la -R /var/lib/zookeeper/version-2/
> >
> > /var/lib/zookeeper/version-2/:
> > total 57328
> > drwxr-xr-x 2 zookeeper zookeeper     4096 Aug 23 10:42 .
> > drwxr-x--- 3 zookeeper zookeeper     4096 Aug  9 10:41 ..
> > -rw-r--r-- 1 zookeeper zookeeper        1 Aug 10 17:55 acceptedEpoch
> > -rw-r--r-- 1 zookeeper zookeeper        1 Aug 10 17:55 currentEpoch
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
> > -rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09 snapshot.20005a540
> > -rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37 snapshot.20006fc18
> > -rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43 snapshot.20008754f
> > -rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40 snapshot.200096ed4
> > -rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30 snapshot.2000a9c56
> >
> >
> >
> > Best regards,
> > Máté
> >
> > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
> > [email protected]> wrote:
> >
> > > Hi,
> > >
> > > We recently had a leader election due to "*zxid lower 32 bits have
> rolled
> > > over, forcing re-election*". This is the first time we are seeing this
> > and
> > > trying to understand how to find if the ensemble is reaching that
> limit.
> > > Are there any metrics available in zk to track this? How can I estimate
> > > when my zk cluster will reach this limit?
> > >
> > > Thanks,
> > > Ram
> > >
> >
>

Reply via email to