It's quite difficult to infer from the docs the exact techniques required
to ensure consistency and durability in Kafka. I propose that we add a doc
section detailing these techniques. I would be happy to help with this.

The basic question is this: assuming that I can afford to temporarily halt
production to Kafka, how do I ensure that no message written to Kafka is
ever lost under typical failure scenarios (i.e., the loss of a single
broker)?

Here's my understanding of this for Kafka v0.8.1.1:

1. Create a topic with a replication factor of 3.
2. Use a sync producer and set acks to 2. (Setting acks to -1 may
successfully write even in a case where the data is written only to a
single node).

Even with these two precautions, there's always the possibility of an
"unclean leader election." Can data loss still occur in this scenario? Is
it possible to achieve this level of durability on v0.8.1.1?

In Kafka v0.8.2, in addition to the above:

3. Ensure that the triple-replicated topic also disallows unclean leader
election (https://issues.apache.org/jira/browse/KAFKA-1028).

4. Set the min.isr value of the producer to 2 and acks to -1 (
https://issues.apache.org/jira/browse/KAFKA-1555). The producer will then
throw an exception if data can't be written to 2 out of 3 nodes.

In addition to producer configuration and usage, there are also monitoring
and operations considerations for achieving durability and consistency. As
those are rather nuanced, it'd probably be easiest to just start iterating
on a document to flesh those out.

If anyone has any advice on how to better specify this, or how to get
started on improving the docs, I'm happy to help out.

Reply via email to