Hi, the first question to ask would be, if you get duplicate writes at the producer or duplicate reads at the consumer...
For exactly-once: it's work in progress and we aim for 0.11 release (what might still be a beta version). In short, there will be an idempotent producer that will avoid duplicate writes. Furthermore, the will be "transactions" that allow for exactly-once "read-process-write" scenarios -- Kafka Streams will leverage this feature. For reads, exactly-once will allow to only consumer committed messages. But it does not help with duplicate reads. For duplicate reads, you cannot assume that "Kafka just does the right thing" -- however, you can influence the potential number of duplicates heavily. For example, you can reduce commit interval or even commit manually (in the extreme case after each message). But even if you commit after each message, your application needs to "track" the progress of the currently processed message -- if you are in the middle of processing and fail, Kafka cannot know what progress your application made for the current message -- thus, it's up to you to decide on restart, if you want to receive the message again or not... Kafka cannot know this. If you want to get the full details about exactly-once, you can have a look into the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging Hope this helps. -Matthias On 4/13/17 9:35 AM, Dmitry Goldenberg wrote: > Thanks, Jayesh and Vincent. > > It seems rather extreme that one has to implement a cache of already seen > messages using Redis, memcached or some such. I would expect Kafka to "do > the right thing". The data loss is a worse problem, especially for mission > critical applications. So what is the current "stance" on the exactly-once > delivery semantic? > > - Dmitry > > On Thu, Apr 13, 2017 at 12:07 PM, Thakrar, Jayesh < > jthak...@conversantmedia.com> wrote: > >> Hi Dmitri, >> >> This presentation might help you understand and take appropriate actions >> to deal with data duplication (and data loss) >> >> https://www.slideshare.net/JayeshThakrar/kafka-68540012 >> >> Regards, >> Jayesh >> >> On 4/13/17, 10:05 AM, "Vincent Dautremont" >> <vincent.dautrem...@olamobile.com.INVALID> >> wrote: >> >> One of the case where you would get a message more than once is if you >> get >> disconnected / kicked off the consumer group / etc if you fail to >> commit >> offset for messages you have already read. >> >> What I do is that I insert the message in a in-memory cache redis >> database. >> If it fails to insert because of primary key duplication, well that >> means >> I've already received that message in the past. >> >> You could even do an insert of the topic+partition+offset of the >> message >> payload as the insert (instead of the full message) if you know for >> sure >> that your message payload would not be duplicated in the the kafka >> topic. >> >> Vincent. >> >> On Thu, Apr 13, 2017 at 4:52 PM, Dmitry Goldenberg < >> dgoldenb...@hexastax.com >> > wrote: >> >> > Hi all, >> > >> > I was wondering if someone could list some of the causes which may >> lead to >> > Kafka delivering the same messages more than once. >> > >> > We've looked around and we see no errors to notice, yet >> intermittently, we >> > see messages being delivered more than once. >> > >> > Kafka documentation talks about the below delivery modes: >> > >> > - *At most once*—Messages may be lost but are never redelivered. >> > - *At least once*—Messages are never lost but may be redelivered. >> > - *Exactly once*—this is what people actually want, each message >> is >> > delivered once and only once. >> > >> > So the default is 'at least once' and that is what we're running >> with (we >> > don't want to do "at most once" as that appears to yield some >> potential for >> > message loss). >> > >> > We had not seen duplicated deliveries for a while previously but just >> > started seeing them quite frequently in our test cluster. >> > >> > What are some of the possible causes for this? What are some of the >> > available tools for troubleshooting this issue? What are some of the >> > possible fixes folks have developed or instrumented for this issue? >> > >> > Also, is there an effort underway on Kafka side to provide support >> for the >> > "exactly once" semantic? That is exactly the semantic we want and >> we're >> > wondering how that may be achieved. >> > >> > Thanks, >> > - Dmitry >> > >> >> -- >> The information transmitted is intended only for the person or entity >> to >> which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipient is prohibited. If you >> received >> this in error, please contact the sender and delete the material from >> any >> computer. >> >> >> >
signature.asc
Description: OpenPGP digital signature