Jeff Kim created KAFKA-16106:
--------------------------------

             Summary: group size counters do not reflect the actual sizes when 
operations fail
                 Key: KAFKA-16106
                 URL: https://issues.apache.org/jira/browse/KAFKA-16106
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Jeff Kim
            Assignee: Jeff Kim


An expire-group-metadata operation generates tombstone records, updates the 
`groups` state and decrements group size counters, then performs a write to the 
log. If there is a __consumer_offsets partition reassignment, this operation 
fails. The `groups` state is reverted to an earlier snapshot but classic group 
size counters are not. This begins an inconsistency between the metrics and the 
actual groups size. This applies to all unsuccessful write operations that 
alter the `groups` state.

 

The issue is exacerbated because the expire group metadata operation is retried 
possibly indefinitely.

 

The solution to this is to make the counters also a timeline data structure 
(TimelineLong) so that in the event of a failed write operation we revert the 
counters as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to