Raman Gupta created KAFKA-10007:
-----------------------------------

             Summary: Kafka consumer offset reset despite recent group activity
                 Key: KAFKA-10007
                 URL: https://issues.apache.org/jira/browse/KAFKA-10007
             Project: Kafka
          Issue Type: Bug
            Reporter: Raman Gupta


I was running a Kafka 2.3.0 broker with the default values for 
`offset.retention.minutes` (which should be 7 days as of 2.0.0). I deployed a 
2.4.1 broker, along with a change in setting `offset.retention.minutes` to 14 
days, as I have several low-traffic topics in which exactly-once processing is 
desired.

As I understand it, with https://issues.apache.org/jira/browse/KAFKA-4682 and 
KIP-211, offsets should no longer be expired based on the last commit 
timestamp, but instead on the last time the group transitioned into an Empty 
state.

However, the behavior I saw from Kafka upon broker shutdown was that the 
offsets were expired for a group when as far as I can tell, they should not 
have been. See these logs from during the cluster recycle -- during this time 
the consumer, configured with the static group membership protocol, is always 
running:


{code}
<<Running Kafka 2.3.0, 4 brokers, all on 2.3, protocol version 2.3, 
offsets.retention.minutes using default value>>

[2020-05-10 05:37:01,070] <<Shutting down kafka-0>>

<< Starting broker-0 on 2.4.1 with protocol version 2.3, 
offsets.retention.minutes = 10080 >>

kafka-0 [2020-05-10 05:37:39,682] INFO starting (kafka.server.KafkaServer)
kafka-0 [2020-05-10 05:39:42,680] INFO [GroupCoordinator 0]: Loading group 
metadata for produs-cis-CisFileEventConsumer with generation 27 
(kafka.coordinator.group.GroupCoordinator)

<< Recycling broker-1 on 2.4.1, protocol version 2.3, offsets.retention.minutes 
= 10080, looks like the consumer fails because of the broker going down, and 
kafka-0 reports: >>

kafka-0 [2020-05-10 05:45:14,121] INFO [GroupCoordinator 0]: Member 
cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 in group 
produs-cis-CisFileEventConsumer has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:45:14,124] INFO [GroupCoordinator 0]: Preparing to 
rebalance group produs-cis-CisFileEventConsumer in state PreparingRebalance 
with old generation 27 (__consumer_offsets-17) (reason: removing member 
cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 on heartbeat 
expiration) (kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:45:19,479] INFO [GroupCoordinator 0]: Member 
cis-9c5d994c5-sknlk-2b9ed8bf-348c-4a10-97d3-5f2caccce7df in group 
produs-cis-CisFileEventConsumer has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:45:19,482] INFO [GroupCoordinator 0]: Group 
produs-cis-CisFileEventConsumer with generation 28 is now empty 
(__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)

<< and now kafka-1 starts up again, the offsets are expired >>

kafka-1 [2020-05-10 05:46:11,229] INFO starting (kafka.server.KafkaServer)
...
kafka-0 [2020-05-10 05:47:42,303] INFO [GroupCoordinator 0]: Preparing to 
rebalance group produs-cis-CisFileEventConsumer in state PreparingRebalance 
with old generation 28 (__consumer_offsets-17) (reason: Adding new member 
cis-9c5d994c5-sknlk-1194b4b6-81ae-4a78-89a7-c610cf8c65be with group instanceid 
Some(cis-9c5d994c5-sknlk)) (kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:47:47,611] INFO [GroupMetadataManager brokerId=0] 
Removed 43 expired offsets in 13 milliseconds. 
(kafka.coordinator.group.GroupMetadataManager)
kafka-0 [2020-05-10 05:48:12,308] INFO [GroupCoordinator 0]: Stabilized group 
produs-cis-CisFileEventConsumer generation 29 (__consumer_offsets-17) 
(kafka.coordinator.group.GroupCoordinator)
kafka-0 [2020-05-10 05:48:12,311] INFO [GroupCoordinator 0]: Assignment 
received from leader for group produs-cis-CisFileEventConsumer for generation 
29 (kafka.coordinator.group.GroupCoordinator)
{code}


The group becomes empty at 2020-05-10 05:45:19,482, and then the offsets are 
expired about two minutes later at 05:47:47,611. I can't see any reason based 
on my understanding of how things work for this to have happened, other than it 
being a bug of some type?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to