Hi Dmitri,

This presentation might help you understand and take appropriate actions to 
deal with data duplication (and data loss)

https://www.slideshare.net/JayeshThakrar/kafka-68540012

Regards,
Jayesh

On 4/13/17, 10:05 AM, "Vincent Dautremont" 
<vincent.dautrem...@olamobile.com.INVALID> wrote:

    One of the case where you would get a message more than once is if you get
    disconnected / kicked off the consumer group / etc if you fail to commit
    offset for messages you have already read.
    
    What I do is that I insert the message in a in-memory cache redis database.
    If it fails to insert because of primary key duplication, well that means
    I've already received that message in the past.
    
    You could even do an insert of the topic+partition+offset of the message
    payload as the insert (instead of the full message) if you know for sure
    that your message payload would not be duplicated in the the kafka topic.
    
    Vincent.
    
    On Thu, Apr 13, 2017 at 4:52 PM, Dmitry Goldenberg <dgoldenb...@hexastax.com
    > wrote:
    
    > Hi all,
    >
    > I was wondering if someone could list some of the causes which may lead to
    > Kafka delivering the same messages more than once.
    >
    > We've looked around and we see no errors to notice, yet intermittently, we
    > see messages being delivered more than once.
    >
    > Kafka documentation talks about the below delivery modes:
    >
    >    - *At most once*—Messages may be lost but are never redelivered.
    >    - *At least once*—Messages are never lost but may be redelivered.
    >    - *Exactly once*—this is what people actually want, each message is
    >    delivered once and only once.
    >
    > So the default is 'at least once' and that is what we're running with (we
    > don't want to do "at most once" as that appears to yield some potential 
for
    > message loss).
    >
    > We had not seen duplicated deliveries for a while previously but just
    > started seeing them quite frequently in our test cluster.
    >
    > What are some of the possible causes for this?  What are some of the
    > available tools for troubleshooting this issue? What are some of the
    > possible fixes folks have developed or instrumented for this issue?
    >
    > Also, is there an effort underway on Kafka side to provide support for the
    > "exactly once" semantic?  That is exactly the semantic we want and we're
    > wondering how that may be achieved.
    >
    > Thanks,
    > - Dmitry
    >
    
    -- 
    The information transmitted is intended only for the person or entity to 
    which it is addressed and may contain confidential and/or privileged 
    material. Any review, retransmission, dissemination or other use of, or 
    taking of any action in reliance upon, this information by persons or 
    entities other than the intended recipient is prohibited. If you received 
    this in error, please contact the sender and delete the material from any 
    computer.
    

Reply via email to