Hello Kafka users, I’m facing an issue with a Kafka cluster, specifically with the __consumer_offsets topic. There seems to be an imbalance in the number of commit messages across its partitions. Most of the commit messages are concentrated in a single partition, which is causing high CPU usage on the broker handling that partition. I have already verified that the topic partitions’ leaders are well-balanced across the six brokers. However, a specific consumer group (The largest one, with many members consuming from multiple topics, based on Spring Kafka) generates a large number of commit messages, and they all end up in the same partition #37. My understanding is that, by default, all commit messages sent by a particular consumer group for a specific topic partition are directed to a single partition of the __consumer_offsets topic, determined by hashing the consumer group id and the topic partition. In our case, this default partitioning strategy seems to be causing the imbalance, even though I don’t understand why exactly. Could you please help me understand why there’s such an imbalance in the number of messages across the __consumer_offsets partitions and why the large number of commit messages from the large consumer group are not spread well across the partitions of the __consumer_offsets topic? Are there any recommendations or best practices to address this issue?
Any guidance would be greatly appreciated. Best Regards, Fares