Hello kafka-users,

I have 50 topics, each with 32 partitions where data is being ingested
continuously.

Data is being published in these 50 partitions externally (no control)
which causes data skew amount the partitions of each topic.

For example: For topic-1, partition-1 contains 100 events, while
partition-2 can have 10K events and so on for all 50 topics.

*Consuming data from all 50 topics using kafka-stream mechanism,*

   - Running 4 consumer instances, all within the same consumer-group.
   - Num of threads per consumer process: 8


As data among partitions are not evenly distributed (Data-skewed partitions
across topics), I see 1 or 2 consumer instances (JVM) are
processing/consuming very less records compared to other 2 instances, My
guess is these instances process partitions with less data.

*Can someone help, how can I balance the consumers here (distribute
consumer workload evenly across 4 consumer instances)? Expectation here is
that all 4 consumer instances should process approx. same amount of
events. *

Looking forward to hearing your inputs.

Thanks in advance.

*Ankit.*

Reply via email to