Hi Dmitri, This presentation might help you understand and take appropriate actions to deal with data duplication (and data loss)
https://www.slideshare.net/JayeshThakrar/kafka-68540012 Regards, Jayesh On 4/13/17, 10:05 AM, "Vincent Dautremont" <vincent.dautrem...@olamobile.com.INVALID> wrote: One of the case where you would get a message more than once is if you get disconnected / kicked off the consumer group / etc if you fail to commit offset for messages you have already read. What I do is that I insert the message in a in-memory cache redis database. If it fails to insert because of primary key duplication, well that means I've already received that message in the past. You could even do an insert of the topic+partition+offset of the message payload as the insert (instead of the full message) if you know for sure that your message payload would not be duplicated in the the kafka topic. Vincent. On Thu, Apr 13, 2017 at 4:52 PM, Dmitry Goldenberg <dgoldenb...@hexastax.com > wrote: > Hi all, > > I was wondering if someone could list some of the causes which may lead to > Kafka delivering the same messages more than once. > > We've looked around and we see no errors to notice, yet intermittently, we > see messages being delivered more than once. > > Kafka documentation talks about the below delivery modes: > > - *At most once*—Messages may be lost but are never redelivered. > - *At least once*—Messages are never lost but may be redelivered. > - *Exactly once*—this is what people actually want, each message is > delivered once and only once. > > So the default is 'at least once' and that is what we're running with (we > don't want to do "at most once" as that appears to yield some potential for > message loss). > > We had not seen duplicated deliveries for a while previously but just > started seeing them quite frequently in our test cluster. > > What are some of the possible causes for this? What are some of the > available tools for troubleshooting this issue? What are some of the > possible fixes folks have developed or instrumented for this issue? > > Also, is there an effort underway on Kafka side to provide support for the > "exactly once" semantic? That is exactly the semantic we want and we're > wondering how that may be achieved. > > Thanks, > - Dmitry > -- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.