Thanks for reply,

On producer side, I have ACK as all, with 3 retries, rest all are mostly
default properties.

With replication factor of 2, I believe the messages from partition of
downed broker will be read by other one but I doubt if that would lead to
duplicate reading to such a high extent which I observed (~200-800). More
over this is not that consistent, sometime the count goes up and some time
down.

I think when re-balance happens, and consumers start reading from committed
offset is when the duplicates get in.

One thing I observed is when I have following properties:

enable.auto.commit=true
auto.commit.interval.ms=10000
session.timeout.ms=30000

I get most optimal performance (with much less number of duplicates). As I
lower value of auto.commit.interval.ms, the performance deteriorates
drastically.

What may be I need to try, please correct me if I have got it wrong
completely, is to try async commit mode and see how it performs.

Also, as I mentioned there was a bug reported of same kind with
kafka-python, can it be same here with kafka-java?

Thanks,

On Tue, Aug 2, 2016 at 3:46 AM, R Krishna <[email protected]> wrote:

> What about failed async commits in this case due to downed broker? Can it
> not cause consumer to read it again as offsets may not be successfully
> updated?
>
> On Mon, Aug 1, 2016 at 11:35 AM, Tauzell, Dave <
> [email protected]
> > wrote:
>
> > If you kill a broker, then any uncommitted messages will be replayed.
> >
> > -Dave
> > ________________________________________
> > From: R Krishna <[email protected]>
> > Sent: Monday, August 1, 2016 1:32 PM
> > To: [email protected]
> > Subject: Re: Kafka java consumer processes duplicate messages
> >
> > Remember reading about these options for higher consumer guarantees:
> > Unclean.leader.election = false
> > Auto.offset.commit = false    consumer side
> > Commit after processing    syncCommit() regularly
> >
> > What about your producer, does it wait until it reaches all replicas in
> > ISR, i.e., ack=all or none? Not sure, if this can cause consumer to read
> > duplicates, I know there can definitely be data loss because of data not
> > being replicated.
> >
> > On Mon, Aug 1, 2016 at 10:11 AM, Amit K <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I am kind of new to Kafka. I have set up a 3 node kafka (1 broker per
> > > machine) cluster with 3 node zookeer cluster. I am using Kafka 0.9.0.0
> > > version.
> > >
> > > The set up works fine wherein from my single producer I am pushing a
> JSON
> > > string to Kafka to a topic with 3 partitions and replication factor of
> 2.
> > > At consumer end I have application with 3 consumer threads (I suppose
> > each
> > > consumer thread will read from corresponding dedicated partition). The
> > > consumer reads the JSON and persist the same in DB in a separate
> thread.
> > > Following are consumer properties:
> > >
> > > topic=TestTopic2807
> > > bootstrap.servers=XXX.221:9092,XXX.222:9092,XXX.221:9092
> > > topic.consumer.threads=3
> > > group.id=EOTG
> > > client.id=EOTG
> > > enable.auto.commit=true
> > > auto.commit.interval.ms=10000
> > > session.timeout.ms=30000
> > >
> key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
> > >
> >
> value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
> > >
> > >
> > > The consumer thread routine is as follows: Each consumer runs following
> > in
> > > it's own thread and spawns a new thread for DB operation (I know DB
> > > operation failure can be issue but will fix that sooner)
> > >
> > >  ConsumerRecords<String, String> records = consumer.poll(20);
> > >                if(!records.isEmpty()) {
> > >                    for (ConsumerRecord<String, String> record :
> records)
> > {
> > >
> > >                     String eOCJSONString = record.value();
> > >
> > >                     logger.info("Received the records at consumer
> id:" +
> > > consumerId +
> > >                     ". Record topic:" + record.topic() +
> > >                     ". Record partition:" + record.partition() +
> > >                     ". Record offset id:" + record.offset());
> > >                     logger.info("\n Record:" + eOCJSONString);
> > >
> > >                     if (emailOCJSONString.startsWith("{")) {
> > >                     OCBean ocBean = gson.fromJson(record.value(),
> > > EOCBean.class);
> > >                     executorServiceWorker.submit(new OCWorker(ocBean,
> > > consumerId));
> > >                     :
> > > }
> > >
> > > The problem occurs when I load test the application sending 30k of
> > messages
> > > (JSONS) from single producer and when I tried bringing down one of the
> > > broker while consumer is consuming the messages. I could observe that
> > many
> > > of the messages are processed duplicate (~200-800). I repeated this
> > > experiment a few times and always noticed that there are many messages
> > > which are read duplicate by consumer thread. I tried by bringing one,
> two
> > > brokers down.
> > >
> > > Is it normal to happen?
> > > Should I switch to manual offset commit than enabling auto commit?
> > > Or should I manually assign the partition in program rather than let
> > > brokers manage it?
> > >
> > > Am I missing something very important here?
> > >
> > > Also,
> > > I observed that Kafka-Python had similar bug and has been fixed it in
> > 0.9.2
> > > (https://github.com/dpkp/kafka-python/issues/189), but I believe no
> such
> > > issue reported for Java.
> > >
> > > Thanks,
> > >
> >
> >
> >
> > --
> > Radha Krishna, Proddaturi
> > 253-234-5657
> > This e-mail and any files transmitted with it are confidential, may
> > contain sensitive information, and are intended solely for the use of the
> > individual or entity to whom they are addressed. If you have received
> this
> > e-mail in error, please notify the sender by reply e-mail immediately and
> > destroy all copies of the e-mail and any attachments.
> >
>
>
>
> --
> Radha Krishna, Proddaturi
> 253-234-5657
>

Reply via email to