Hi Sanjay >From Kafka 0.10.0 you can use the optional broker.rack property to get replications distributed across racks.
See: https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment and https://issues.apache.org/jira/browse/KAFKA-1215 https://docs.confluent.io/current/installation/configuration/broker-configs.html#broker-rack Best Regards Dan On Sun, Aug 5, 2018 at 7:21 PM, Sanjay Awatramani < sanjay.awatram...@guavus.com> wrote: > Hi, > > I have done some experiments and gone through kafka documentation, which > makes me conclude that there is a small chance of data loss or availability > in a rack scenario. Can someone please validate my understanding ? > > The minimum configuration for a single rack system against single machine > failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This > will ensure that leader + at least one replica receives the data written by > a producer and there will be no data loss as well as the system continues > to be available for further writes by the producer when a broker goes down. > > With rack awareness enabled, Kafka will distribute replicas of a partition > across racks, giving reliability in case of rack failure. However rack > awareness is only concerned with distribution of replicas, not prioritising > the order of replication when followers catch up with the leader. > > Moving to a rack aware setup which has 2 racks, the above configuration > would create a problem because one of the racks might get 2 replicas and if > that rack goes down, data will be lost. > > Extending the minimum configuration for a 2 rack setup, Replication Factor > = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack goes > down, one of the replicas will be available as it would be on a different > rack than the leader. This was my understanding and I cannot find any > documentation to back this. I studied the mechanism by which producer > writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and if > the leader confirms that at least min.insync.replicas have got the newest > data, it sends an ack back to the producer. In a rack aware system, I think > Kafka will send an ack even if the 2 replicas which are in sync are on the > same rack. And at this instant if that rack goes down, data is lost. > > If we make min.insync.replicas=3, we can guarantee that one of the > replicas will be on a different rack and data will not be lost. However if > any rack goes down, producer’s writes will start failing as it won’t have > the requisite replicas available. > > Is my understanding correct ? Is there a way to configure Kafka in a rack > scenario to make it tolerant to data loss as well as make it available for > further writes even when a single node or an entire rack goes down ? > > Regards, > Sanjay > >