Rajini Sivaram created KAFKA-9815:
-------------------------------------

             Summary: Consumer may never re-join if inconsistent metadata is 
received once
                 Key: KAFKA-9815
                 URL: https://issues.apache.org/jira/browse/KAFKA-9815
             Project: Kafka
          Issue Type: Bug
          Components: consumer
            Reporter: Rajini Sivaram
            Assignee: Rajini Sivaram


KAFKA-9797 is the result of an incorrect rolling upgrade test where a new 
listener is added to brokers and set as the inter-broker listener within the 
same rolling upgrade. As a result, metadata is inconsistent across brokers 
until the rolling upgrade completes since interbroker communication is broken 
until all brokers have the new listener. The test fails due to consumer 
timeouts and sometimes this is because the upgrade takes longer than consumer 
timeout. But several logs show an issue with the consumer when one metadata 
response received during upgrade is different from the consumer's cached 
`assignmentSnapshot`, triggering rebalance.

In 
[https://github.com/apache/kafka/blob/7f640f13b4d486477035c3edb28466734f053beb/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L750,]
 we return true for `rejoinNeededOrPending()` if `assignmentSnapshot` is not 
the same as the current `metadataSnapshot`. We don't set `rejoinNeeded` in the 
instance, but we revoke partitions and send JoinGroup request. If the JoinGroup 
request fails and a subsequent metadata response contains the same snapshot 
value as the previously cached `assignmentSnapshot`, we never send `JoinGroup` 
again since snapshots match and `rejoinNeeded=false`. Partitions are not 
assigned to the consumer after this and the test fails because messages are not 
received.

Even though this particular system test isn't a valid upgrade scenario, we 
should fix the consumer, since temporary metadata differences between brokers 
can result in this scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to