[ https://issues.apache.org/jira/browse/KAFKA-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Lin resolved KAFKA-7126. ----------------------------- Resolution: Fixed > Reduce number of rebalance for large consumer groups after a topic is created > ----------------------------------------------------------------------------- > > Key: KAFKA-7126 > URL: https://issues.apache.org/jira/browse/KAFKA-7126 > Project: Kafka > Issue Type: Improvement > Reporter: Dong Lin > Assignee: Jon Lee > Priority: Major > Fix For: 2.0.0, 2.1.0 > > Attachments: 1.diff > > > For a group of 200 MirrorMaker consumers with patten-based topic > subscription, a single topic creation caused 50 rebalances for each of these > consumer over 5 minutes period. This causes the MM to significantly lag > behind during this 5 minutes period and the clusters may be considerably > out-of-sync during this period. > Ideally we would like to trigger only 1 rebalance in the MM group after a > topic is created. And conceptually it should be doable. > > Here is the explanation of this repeated consumer rebalance based on the > consumer rebalance logic in the latest Kafka code: > 1) A topic of 10 partitions are created in the cluster and it matches the > subscription pattern of the MM consumers. > 2) The leader of the MM consumer group detects the new topic after metadata > refresh. It triggers rebalance. > 3) At time T0, the first rebalance finishes. 10 consumers are assigned 1 > partition of this topic. The other 190 consumers are not assigned any > partition of this topic. At this moment, the newly created topic will appear > in `ConsumerCoordinator.subscriptions.subscription` for those consumers who > is assigned partition of this consumer or who has refreshed metadata before > time T0. > 4) In the common case, half of the consumers has refreshed metadata before > the leader of the consumer group refreshed metadata. Thus around 100 + 10 = > 110 consumers has the newly created topic in > `ConsumerCoordinator.subscriptions.subscription`. The other 90 consumers do > not have this topic in `ConsumerCoordinator.subscriptions.subscription`. > 5) For those 90 consumers, if any consumer refreshes metadata, it will add > this topic to `ConsumerCoordinator.subscriptions.subscription`, which causes > `ConsumerCoordinator.rejoinNeededOrPending()` to return true and triggers > another rebalance. If a few consumers refresh metadata almost at the same > time, they will jointly trigger one rebalance. Otherwise, they each trigger a > separate rebalance. > 6) The default metadata.max.age.ms is 5 minutes. Thus in the worse case, > which is probably also the average case if number of consumers in the group > is large, the latest consumer will refresh its metadata 5 minutes after T0. > And the rebalance will be repeated during this 5 minutes interval. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)