3 racks, Replication Factor = 3, min.insync.replicas=2, ack=all 2018-08-05 20:21 GMT+02:00 Sanjay Awatramani <sanjay.awatram...@guavus.com>:
> Hi, > > I have done some experiments and gone through kafka documentation, which > makes me conclude that there is a small chance of data loss or availability > in a rack scenario. Can someone please validate my understanding ? > > The minimum configuration for a single rack system against single machine > failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This > will ensure that leader + at least one replica receives the data written by > a producer and there will be no data loss as well as the system continues > to be available for further writes by the producer when a broker goes down. > > With rack awareness enabled, Kafka will distribute replicas of a partition > across racks, giving reliability in case of rack failure. However rack > awareness is only concerned with distribution of replicas, not prioritising > the order of replication when followers catch up with the leader. > > Moving to a rack aware setup which has 2 racks, the above configuration > would create a problem because one of the racks might get 2 replicas and if > that rack goes down, data will be lost. > > Extending the minimum configuration for a 2 rack setup, Replication Factor > = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack goes > down, one of the replicas will be available as it would be on a different > rack than the leader. This was my understanding and I cannot find any > documentation to back this. I studied the mechanism by which producer > writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and if > the leader confirms that at least min.insync.replicas have got the newest > data, it sends an ack back to the producer. In a rack aware system, I think > Kafka will send an ack even if the 2 replicas which are in sync are on the > same rack. And at this instant if that rack goes down, data is lost. > > If we make min.insync.replicas=3, we can guarantee that one of the > replicas will be on a different rack and data will not be lost. However if > any rack goes down, producer’s writes will start failing as it won’t have > the requisite replicas available. > > Is my understanding correct ? Is there a way to configure Kafka in a rack > scenario to make it tolerant to data loss as well as make it available for > further writes even when a single node or an entire rack goes down ? > > Regards, > Sanjay > >