Duplicate records on consumer side.

sunil chaudhari Fri, 19 Jun 2020 04:14:16 -0700

Hi,
I am using kafka as a broker in my event data pipeline.
Filebeat as producer
Logstash as consumer.



Filebeat simply pushes to Kafka.
Logstash has 3 instances.
Each instance has a consumer group say consumer_mytopic which reads from
mytopic.

mytopic has 3 partitions and 2 replica.

As per my understanding, each consumer group can have threads equal to
number of partitions so i kept 3 threads for each consumer.

Here I am considering one logstash instance as a one consumer which is part
of consumer_mytopic.
Similar consumer running on some other server which has group_id same as
above. Note that 3 servers has client Id different so that they wont read
duplicate data.
So 3 instances of logstash running with group_id as consumer_mytopic with 3
threads each, and diff client id. Means 9 threads total.

My understanding is each consumer(instance) can read with 3 threads from 3
partitions. And another consumer with 3 threads.

Is this good design?
Can it create duplicate?
This thread and partitions trade-off is related to client_id or Consumer
group Id?
I hope because of diff client_id 3 instances wont read duplicate data even
if group_id is same.
I am getting duplicate data in my consumer side.
Please help in this.

Regards,
Sunil.

Duplicate records on consumer side.

Reply via email to