Sunil,

Kafka ensures that each partition is read by one given thread only from a consumer group. Since your topic has three partitions, the rationale is that at least three threads from the consumer group will be properly served.

However, though your calculation is correct (3 instances, each one of 3 threads will total 9 threads) the design and usage is incorrect. As stated above only three threads will be served and the remaining six other threads will be kept waiting -- likely to starve if all of them belong to the consumer group that the other three threads belong.

Please note that the `client-id` property has nothing to do with this thread group management. This property is used internally by Kafka to correlate events sent from the same machine in order to better adjust quota management. So the only property taking place where is the `group-id` in the matter of partition assignment.

Regarding duplicated data, this is another problem that would require a better investigation of your topology, how Logstash connect to Kafka, and how the code is implemented.

Thanks,

-- Ricardo

On 6/19/20 7:13 AM, sunil chaudhari wrote:
Hi,
I am using kafka as a broker in my event data pipeline.
Filebeat as producer
Logstash as consumer.


Filebeat simply pushes to Kafka.
Logstash has 3 instances.
Each instance has a consumer group say consumer_mytopic which reads from
mytopic.

mytopic has 3 partitions and 2 replica.

As per my understanding, each consumer group can have threads equal to
number of partitions so i kept 3 threads for each consumer.

Here I am considering one logstash instance as a one consumer which is part
of consumer_mytopic.
Similar consumer running on some other server which has group_id same as
above. Note that 3 servers has client Id different so that they wont read
duplicate data.
So 3 instances of logstash running with group_id as consumer_mytopic with 3
threads each, and diff client id. Means 9 threads total.

My understanding is each consumer(instance) can read with 3 threads from 3
partitions. And another consumer with 3 threads.

Is this good design?
Can it create duplicate?
This thread and partitions trade-off is related to client_id or Consumer
group Id?
I hope because of diff client_id 3 instances wont read duplicate data even
if group_id is same.
I am getting duplicate data in my consumer side.
Please help in this.

Regards,
Sunil.

Reply via email to