Hi Svante,

I just forgot about ZK. Thanks for setting me on the right track :)
 
Regards,
Sanjay

On 06/08/18, 12:14 AM, "Svante Karlsson" <svante.karls...@csi.se> wrote:

    You need 3 racks for your zookeepers anyway. It needs 2 out of three. How
    have you solved that?
    
    Den sön 5 aug. 2018 20:31Sanjay Awatramani <sanjay.awatram...@guavus.com>
    skrev:
    
    > Thanks for the quick response Svante.
    > I forgot to mention that the deployment I am looking at has 2 racks. We
    > came up with this solution, but for this specific deployment adding a rack
    > is out of question.
    > Is there a way to resolve this with 2 racks ?
    >
    > Regards,
    > Sanjay
    >
    > On 05/08/18, 11:57 PM, "Svante Karlsson" <svante.karls...@csi.se> wrote:
    >
    > >3 racks,  Replication Factor = 3, min.insync.replicas=2, ack=all
    > >
    > >2018-08-05 20:21 GMT+02:00 Sanjay Awatramani
    > ><sanjay.awatram...@guavus.com>:
    > >
    > >> Hi,
    > >>
    > >> I have done some experiments and gone through kafka documentation, 
which
    > >> makes me conclude that there is a small chance of data loss or
    > >>availability
    > >> in a rack scenario. Can someone please validate my understanding ?
    > >>
    > >> The minimum configuration for a single rack system against single
    > >>machine
    > >> failure is Replication Factor = 3, min.insync.replicas=2, ack=all. This
    > >> will ensure that leader + at least one replica receives the data
    > >>written by
    > >> a producer and there will be no data loss as well as the system
    > >>continues
    > >> to be available for further writes by the producer when a broker goes
    > >>down.
    > >>
    > >> With rack awareness enabled, Kafka will distribute replicas of a
    > >>partition
    > >> across racks, giving reliability in case of rack failure. However rack
    > >> awareness is only concerned with distribution of replicas, not
    > >>prioritising
    > >> the order of replication when followers catch up with the leader.
    > >>
    > >> Moving to a rack aware setup which has 2 racks, the above configuration
    > >> would create a problem because one of the racks might get 2 replicas
    > >>and if
    > >> that rack goes down, data will be lost.
    > >>
    > >> Extending the minimum configuration for a 2 rack setup, Replication
    > >>Factor
    > >> = 4, min.insync.replicas=2, ack=all. This will ensure that when a rack
    > >>goes
    > >> down, one of the replicas will be available as it would be on a
    > >>different
    > >> rack than the leader. This was my understanding and I cannot find any
    > >> documentation to back this. I studied the mechanism by which producer
    > >> writes to leader - all IN SYNC REPLICAS (ISR) pull the newest data, and
    > >>if
    > >> the leader confirms that at least min.insync.replicas have got the
    > >>newest
    > >> data, it sends an ack back to the producer. In a rack aware system, I
    > >>think
    > >> Kafka will send an ack even if the 2 replicas which are in sync are on
    > >>the
    > >> same rack. And at this instant if that rack goes down, data is lost.
    > >>
    > >> If we make min.insync.replicas=3, we can guarantee that one of the
    > >> replicas will be on a different rack and data will not be lost. However
    > >>if
    > >> any rack goes down, producer¹s writes will start failing as it won¹t
    > >>have
    > >> the requisite replicas available.
    > >>
    > >> Is my understanding correct ? Is there a way to configure Kafka in a
    > >>rack
    > >> scenario to make it tolerant to data loss as well as make it available
    > >>for
    > >> further writes even when a single node or an entire rack goes down ?
    > >>
    > >> Regards,
    > >> Sanjay
    > >>
    > >>
    >
    >
    

Reply via email to