Jeff Kim created KAFKA-16106: -------------------------------- Summary: group size counters do not reflect the actual sizes when operations fail Key: KAFKA-16106 URL: https://issues.apache.org/jira/browse/KAFKA-16106 Project: Kafka Issue Type: Sub-task Reporter: Jeff Kim Assignee: Jeff Kim
An expire-group-metadata operation generates tombstone records, updates the `groups` state and decrements group size counters, then performs a write to the log. If there is a __consumer_offsets partition reassignment, this operation fails. The `groups` state is reverted to an earlier snapshot but classic group size counters are not. This begins an inconsistency between the metrics and the actual groups size. This applies to all unsuccessful write operations that alter the `groups` state. The issue is exacerbated because the expire group metadata operation is retried possibly indefinitely. The solution to this is to make the counters also a timeline data structure (TimelineLong) so that in the event of a failed write operation we revert the counters as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)