Actually, most of the duplicates I was seeing was due to a bug in an old Hive 
version I'm using 0.9. 
But I am still seeing some, although fewer duplicates. Instead of 3-13% I'm now 
only seeing less than 1%. This appears to be the case for each of the batch 
messages for my consumer which is set to be 1,000,000 messages right now. Does 
that seem more reasonable?

-----Original Message-----
From: Joel Koshy [mailto:jjkosh...@gmail.com] 
Sent: Thursday, January 09, 2014 7:07 AM
To: users@kafka.apache.org
Subject: Re: Duplicate records in Kafka 0.7

You mean duplicate records on the consumer side? Duplicates are possible if 
there are consumer failures and a another consumer instance resumes from an 
earlier offset. It is also possible if there are producer retries due to 
exceptions while producing. Do you see any of these errors in your logs? 
Besides these scenarios though, you shouldn't be seeing duplicates.

Thanks,

Joel


On Wed, Jan 8, 2014 at 5:21 PM, Xuyen On <x...@ancestry.com> wrote:
> Hi,
>
> I would like to check to see if other people are seeing duplicate records 
> with Kafka 0.7. I read the Jira's and I believe that duplicates are still 
> possible when using message compression on Kafka 0.7. I'm seeing duplicate 
> records from the range of 6-13%. Is this normal?
>
> If you're using Kafka 0.7 with message compression enabled, can you please 
> let me know any duplicate records and if so, what %?
>
> Also, please let me know what sort of deduplication strategy you're using.
>
> Thanks!
>
>


Reply via email to