I've tried to read up more on this issue and look at my logs. Here is what I think should is happening when we restart the controlling broker that also happens to be the leader of the partition in question:
1. Broker 0, the controlling broker that owns the partition we're looking at 2. The session times out & controller election starts (~4 seconds in trace below) 3. After a controller is elected, it moves partition leadership (~3 seconds in trace below) What I would expect to minimize the time from (1) and (2) would be to update: - *kafka broker: zookeeper.session.timeout.ms <http://zookeeper.session.timeout.ms>* on the Kafka broker - *zookeeper:* *tickTime *on Zookeeper (session timeout must be >= 2 x the tickTime, which defaults to 2 seconds) But updating those didn't seem to have an effect on our controller election timing. In addition, I would hope we could move partitions a bit faster than 3 seconds, but can't see what those might be from looking at the broker configurations. I have a sample run where between (1) & (2) takes ~4 seconds and step (3) completes in around 3 seconds: >From a trace I'm looking at: 1. 2020-03-25 18:44:*58,267* INFO [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error sending fetch request (sessionId=1734000127, epoch=14558) to node 0: java.io.IOException: Connection to 0 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler) [ReplicaFetcherThread-0-0] 2. 2020-03-25 18:45:*02,103* INFO [Controller id=1] 1 successfully elected as the controller. Epoch incremented to 9 and epoch zk version is now 9 (kafka.controller.KafkaController) [controller-event-thread] 3. 2020-03-25 18:45:*02,762* TRACE [Controller id=1 epoch=9] Sending become-leader LeaderAndIsr request PartitionState(controllerEpoch=9, leader=1, leaderEpoch=13, isr=1, zkVersion=19, replicas=1,0, isNew=false) to broker 1 for partition nsm2app-0 (state.change.logger) [controller-event-thread] ... 2020-03-25 18:45:*05,629* TRACE [Broker id=1] Completed LeaderAndIsr request correlationId 5 from controller 1 epoch 9 for the become-leader transition for partition nsm2app-0 (state.change.logger) [data-plane-kafka-request-handler-2] So this run was ~7+ seconds, but some runs are faster - probably in cases the controlling broker doesn't move, and a few are a bit slower. On Wed, Mar 25, 2020 at 2:22 PM Larry Hemenway <larry.hemen...@gmail.com> wrote: > All, > > We're experimenting with what happens in the event of a Kafka broker > failure and we're seeing it take up to ~10 seconds for leadership to switch > over. We've been unable to figure out if there are some parameters to > tighten this timing. > > Are there broker config parameters that affect this timing? > > Alternatively, is there some documentation that would help me understand > the broker failure and partition election? > > Thanks in advance for any help. > > Larry > >