Hi

I asked this question on SO here:
https://stackoverflow.com/questions/45778455/kafka-rack-id-and-min-in-sync-replicas.
Basically, I am trying to understand how rack-id helps in DR situations.

Kafka has introduced rack-id to provide redundancy capabilities if a whole
rack fails. There is a min in-sync replica setting to specify the minimum
number of replicas that need to be in-sync before a producer receives an
ack (-1 / all config). There is an unclean leader election setting to
specify whether a leader can be elected when it is not in-sync.

So, given the following scenario:

   - Two racks. Rack 1, 2.
   - Replication count is 4.
   - Min in-sync replicas = 2
   - Producer ack=-1 (all).
   - Unclean leader election = false

Is it possible that there is a moment where all 4 replicas are available,
but the two in-sync replicas both come from rack 1, so the producer
receives an ack and at that point rack 1 crashes (before any replicas from
rack 2 are in-sync)? This means that rack 2 will only contain unclean
replicas and no producers would be able to add messages to the partition
essentially grinding to a halt. The replicas would be unclean in any case,
so no new leader could be elected in any case.

Is my analysis correct, or is there something under the hood to ensure that
the replicas forming min in-sync replicas have to be from different racks?
Since replicas on the same rack would have lower latency it seems that the
above scenario is reasonably likely.

<the SO post has an image showing the scenario>

Thanks,

Carl

Reply via email to