Hi Gang,

I am testing some of the durability guarantees given by Kafka 8.2.1 which
involve min in-sync replicas and disabling unclean leader election.

My question is: *When will the failed replica after successfully coming up
will be included back in ISR? Is this governed by replica.lag.max.messages
property or will it have to completely catch up with the leader to be back
in ISR?*

Alternately, In more detail, Will we loose a committed write in the
following theoretical setup:

   - Single topic
   - 3 Kafka Brokers K1, K2, K3
   - Replication : 3
   - Minimum In-Sync Replica : 2
   - Acks : -1
   - Compression : Gzip
   - Producer type : Async
   - Batch size : 16000
   - replica.lag.max.messages : 4000

There are 3 batches of data to be sent. Producer will retry if the batch of
data fails on error callback.

Batch 1 : Leader : K1 ; ISR : K1, K2, K3   Result: Data committed
Batch 2 : Leader : K1 ; ISR : K1, K2 ( K3 crashed)  Result: Data committed
Batch 3 : Leader : K1 ; ISR : K1 (K2 crashed)  Result: Data uncommitted due
to min in-sync replica violation.

K3 wakes up, Starts catching up with current leader. It doesn't have batch
2 data. At this point, broker K1 crashes and K3 has about 2K messages less
than K1.

Will K3 be elected the leader at this point as it's within 4K messages to
be in ISR? If true, this probably will lead to committed data loss despite
disabling the unclean leader election, if I am not wrong here?


Thanks,
Puneet Mehta

Reply via email to