In 0.7.x, if the messages are compressed, there could be duplicated messages during consumer rebalance. This is because we can only checkpoint consumer offset at the compressed unit boundary. You may want to see if you have unnecessary rebalances (see https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog%3F). In 0.8, there won't be duplicated messages even when compression is enabled.
Thanks, Jun On Fri, Jul 19, 2013 at 1:16 PM, Sybrandy, Casey < casey.sybra...@six3systems.com> wrote: > Hello, > > No, we couldn't check the broker logs because the data is obfuscated, so > we can't just look at the files and tell. It looks like our dev system may > be experiencing the same issue, so I did turn of the obfuscation and we'll > monitor it. However, on our production system where we were seeing the > errors more often, appears to have had zookeeper misconfigured, so we're > thinking that may be the issue. > > Casey > > -----Original Message----- > From: Philip O'Toole [mailto:phi...@loggly.com] > Sent: Thursday, July 18, 2013 3:29 PM > To: users@kafka.apache.org > Cc: kafka-us...@incubator.apache.org > Subject: Re: Duplicate Messages on the Consumer > > Have you actually examined the Kafka files on disk, to make sure those > dupes are really there? Or is this a case of reading the same message more > than once? > > Philip > > On Thu, Jul 18, 2013 at 8:55 AM, Sybrandy, Casey < > casey.sybra...@six3systems.com> wrote: > > Hello, > > > > We recently started seeing duplicate messages appearing at our > consumers. Thankfully, the database is set up so that we don't store the > dupes, but it is annoying. It's not every message, only about 1% of them. > We are running 0.7.0 for the broker with Zookeeper 3.3.4 from Cloudera and > 0.7.0 for the producer and consumer. We tried upgrading the consumer to > 0.7.2 to see if that worked, but we're still seeing the dupes. Do we have > to upgrade the broker as well to resolve this? Is there something we can > check to see what's going on because we're not seeing anything unusual in > the logs. I suspected that there may be significant rebalancing, but that > does not appear to be the case at all. > > > > Casey Sybrandy > > >