It is only available in 0.8.1 (current trunk) which has not been released yet. We plan to release it right after 0.8-final is out. Here are some wikis that describe the deduplication feature -
https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction Thanks, Neha On Tue, Oct 1, 2013 at 11:01 AM, Sybrandy, Casey < casey.sybra...@six3systems.com> wrote: > Interesting. I didn't know that Kafka had deduplication capabilities. > How do you leverage it? Also, is it supported in Kafka 0.7.x? > > -----Original Message----- > From: Guozhang Wang [mailto:wangg...@gmail.com] > Sent: Tuesday, October 01, 2013 11:33 AM > To: users@kafka.apache.org > Subject: Re: use case with high rate of duplicate messages > > Batch processing will increase the throughput but also increase latency, > how large latency your real-time processing can tolerate? > > One thing you could try is to use the keyed messages, with key as the md5 > hash of your message. Kafka has a deduplication mechanism on the brokers > that dedup messages with the same key. All you need to do is setting the > dedup frequency appropriately for your use case. > > Guozhang > > > On Tue, Oct 1, 2013 at 8:19 AM, S Ahmed <sahmed1...@gmail.com> wrote: > > > I have a use case where thousands of servers send status type > > messages, which I am currently handling real-time w/o any kind of > queueing system. > > > > So currently when I receive a message, and perform a md5 hash of the > > message, perform a lookup in my database to see if this is a > > duplicate, if not, I store the message. > > > > Now the message format can be either xml or json, and the actual > > parsing of the message takes time so I would am thinking of storing > > all the messages in kafka first and then batch processing these > > messages in hopes that this will be faster to do. > > > > Do you think there would be a faster way of recognizing duplicate > > messages this way or its just the same problem but doing it on a batch > level? > > > > > > -- > -- Guozhang >