Hi I asked this question on SO here: https://stackoverflow.com/questions/45778455/kafka-rack-id-and-min-in-sync-replicas. Basically, I am trying to understand how rack-id helps in DR situations.
Kafka has introduced rack-id to provide redundancy capabilities if a whole rack fails. There is a min in-sync replica setting to specify the minimum number of replicas that need to be in-sync before a producer receives an ack (-1 / all config). There is an unclean leader election setting to specify whether a leader can be elected when it is not in-sync. So, given the following scenario: - Two racks. Rack 1, 2. - Replication count is 4. - Min in-sync replicas = 2 - Producer ack=-1 (all). - Unclean leader election = false Is it possible that there is a moment where all 4 replicas are available, but the two in-sync replicas both come from rack 1, so the producer receives an ack and at that point rack 1 crashes (before any replicas from rack 2 are in-sync)? This means that rack 2 will only contain unclean replicas and no producers would be able to add messages to the partition essentially grinding to a halt. The replicas would be unclean in any case, so no new leader could be elected in any case. Is my analysis correct, or is there something under the hood to ensure that the replicas forming min in-sync replicas have to be from different racks? Since replicas on the same rack would have lower latency it seems that the above scenario is reasonably likely. <the SO post has an image showing the scenario> Thanks, Carl