David Jacot created KAFKA-12890:
-----------------------------------

             Summary: Consumer group stuck in `CompletingRebalance`
                 Key: KAFKA-12890
                 URL: https://issues.apache.org/jira/browse/KAFKA-12890
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.6.2, 2.7.1, 2.8.0, 2.6.1, 2.7.0
            Reporter: David Jacot
            Assignee: David Jacot


We have seen recently multiple consumer groups stuck in `CompletingRebalance`. 
It appears that those group never receives the assignment from the leader of 
the group and remains stuck in this state forever.

When a group transitions to the `CompletingRebalance` state, the group 
coordinator sets up `DelayedHeartbeat` for each member of the group. It does so 
to ensure that the member sends a sync request within the session timeout. If 
it does not, the group coordinator rebalances the group. Note that here, 
`DelayedHeartbeat` is used here for this purpose. `DelayedHeartbeat` are also 
completed when member heartbeats.

The issue is that https://github.com/apache/kafka/pull/8834 has changed the 
heartbeat logic to allow members to heartbeat while the group is in the 
`CompletingRebalance` state. This was not allowed before. Now, if a member 
starts to heartbeat while the group is in the `CompletingRebalance`, the 
heartbeat request will basically complete the pending `DelayedHeartbeat` that 
was setup previously for catching not receiving the sync request. Therefore, if 
the sync request never comes, the group coordinator does not notice anymore.

We need to bring that behavior back somehow.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to