Hi, I am using kafka as a broker in my event data pipeline. Filebeat as producer Logstash as consumer.
Filebeat simply pushes to Kafka. Logstash has 3 instances. Each instance has a consumer group say consumer_mytopic which reads from mytopic. mytopic has 3 partitions and 2 replica. As per my understanding, each consumer group can have threads equal to number of partitions so i kept 3 threads for each consumer. Here I am considering one logstash instance as a one consumer which is part of consumer_mytopic. Similar consumer running on some other server which has group_id same as above. Note that 3 servers has client Id different so that they wont read duplicate data. So 3 instances of logstash running with group_id as consumer_mytopic with 3 threads each, and diff client id. Means 9 threads total. My understanding is each consumer(instance) can read with 3 threads from 3 partitions. And another consumer with 3 threads. Is this good design? Can it create duplicate? This thread and partitions trade-off is related to client_id or Consumer group Id? I hope because of diff client_id 3 instances wont read duplicate data even if group_id is same. I am getting duplicate data in my consumer side. Please help in this. Regards, Sunil.