On Tue, Jun 29, 2021 at 5:45 PM Péter Sinóros-Szabó <peter.sinoros-sz...@wise.com.invalid> wrote:
> Hey, > > we had the same issue as you. > > I checked the code and it chooses the first live replica from the > assignment list. So if you describe a topic with kafka-topics, you will see > the brokers list that has the replica of each partition. For example: > [1001, 1002, 1003]. If that is the list, Kafka will choose the first > replica that is available (is online) in that list. > That was our understanding of the relevant code as well (unless the assignment sequence is ordered in a way that most-in-sync replica goes first, which is doubtful): def offlinePartitionLeaderElection(assignment: Seq[Int], isr: Seq[Int], liveReplicas: Set[Int], uncleanLeaderElectionEnabled: Boolean, controllerContext: ControllerContext): Option[Int] = { assignment.find(id => liveReplicas.contains(id) && isr.contains(id)).orElse { if (uncleanLeaderElectionEnabled) { val leaderOpt = assignment.find(liveReplicas.contains) if (leaderOpt.isDefined) controllerContext.stats.uncleanLeaderElectionRate.mark() leaderOpt } else { None } } } https://github.com/apache/kafka/blob/99b9b3e84f4e98c3f07714e1de6a139a004cbc5b/core/src/main/scala/kafka/controller/PartitionStateMachine.scala#L516-L527 We use "acks=all" and "min.insync.replicas=2", so that should mean that > even if the leader is down and the rest of the replicas fall out of the > ISR, one of the follower replicas should have up to date data. I'm thinking that the problem here is that Kafka allows the ISR list to shrink to 1 with the above settings in the first place: this way the information about the most-in-sync replica is effectively lost. I'm wondering now if there is a chance to adjust this behavior without the need to change the client-server protocol. The decision to stop publishing when min.insync.replicas requirement isn't met, is it made on the client or on the server side? Regards, -- Alex