Shawn Wang created KAFKA-13891:
----------------------------------

             Summary: sync group failed with rebalanceInProgress error cause 
rebalance many rounds in coopeartive
                 Key: KAFKA-13891
                 URL: https://issues.apache.org/jira/browse/KAFKA-13891
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 3.0.0
            Reporter: Shawn Wang


This issue was first found in 
[KAFKA-13419|https://issues.apache.org/jira/browse/KAFKA-13419]

But the previous PR forgot to reset generation when sync group failed with 
rebalanceInProgress error. So the previous bug still exists and it may cause 
consumer to rebalance many rounds before final stable.

Here's the example ({*}bold is added{*}):
 # consumer A joined and synced group successfully with generation 1 *( with 
ownedPartition P1/P2 )*
 # New rebalance started with generation 2, consumer A joined successfully, but 
somehow, consumer A doesn't send out sync group immediately
 # other consumer completed sync group successfully in generation 2, except 
consumer A.
 # After consumer A send out sync group, the new rebalance start, with 
generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group 
response
 # When receiving REBALANCE_IN_PROGRESS, we re-join the group, with generation 
3, with the assignment (ownedPartition) in generation 1.
 # So, now, we have out-of-date ownedPartition sent, with unexpected results 
happened
 # *After the generation-3 rebalance, consumer A got P3/P4 partition. the 
ownedPartition is ignored because of old generation.*
 # *consumer A revoke P1/P2 and re-join to start a new round of rebalance*
 # *if some other consumer C failed to syncGroup before consumer A's joinGroup. 
the same issue will happens again and result in many rounds of rebalance before 
stable*

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to