For Kafka with Zookeeper the recovery time is proportional to the number of
partitions in the cluster. So theoretically speaking the behaviour is
consistent - it will take time. Kraft based Kafka clusters (since Kafka
v3.3) are much much better with clusters with a large number of partitions
such as yours. This is one thing you should consider - upgrade to a newer
Kraft based cluster.
On the current setup you can try to optimize zookeeper for better
throughput/latency.

On Fri, Aug 23, 2024 at 12:40 PM Akash Jain <akashjain0...@gmail.com> wrote:

> HI Atul you use the word 'leader'. You mean the 'controller'? Or you
> referring to the leader for each of the partitions?
>
> On Fri, Aug 23, 2024 at 7:44 AM Atul Sharma
> <atul.sharma.ma...@itbhu.ac.in.invalid> wrote:
>
>> Hi,
>> We are currently facing a prolonged leader election time, approx 2 mins,
>> in
>> a Kafka cluster (version 2.8.2) that is configured with Zookeeper. This
>> cluster has large number of topic partitions.
>>
>> The issue arises during the rolling restarts of the servers in the Kafka
>> cluster.
>>
>> This extended leader election time is causing communication issues and
>> unavailability for producers and consumers as they are unable to connect
>> to
>> Kafka within this timeframe. Any recommendations on reducing the leader
>> election time?
>>
>> Issue is occurring on Kafka 2.8.2 with Zookeeper
>>
>

Reply via email to