Luke Chen created KAFKA-15702:
---------------------------------

             Summary: apiVersion request doesn't send when DNS is not ready at 
first
                 Key: KAFKA-15702
                 URL: https://issues.apache.org/jira/browse/KAFKA-15702
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.6.0
            Reporter: Luke Chen


When ZK migrating to KRaft, we [rely on 
apiVersions|https://github.com/apache/kafka/blob/3055cd7c180cac15016169c52383ddc204ca5f16/metadata/src/main/java/org/apache/kafka/controller/QuorumFeatures.java#L140]
 to check if all controllers enabled ZkMigration flag. But if the DNS is not 
updated with the latest info (frequently happen in k8s env), then the active 
controller won't send out the apiVersion request. 

The impact is the ZK migrating to KRaft will be blocked since the active 
controller will consider the follower is not ready.

*The flow of the issue is like this:*
1. start up 3 controllers, ex: c1, c2, c3
2. The DNS doesn't update the host name entry of c3.
3. c1 becomes the leader, and send apiVersion request to c2
4. c1 is trying to connect to c3, and got unknownHost error
5. DNS is updated with c3 entry
6. c1 successfully connect to c3, but no apiVersion request sent.
7. The KRaftMigrationDriver keeps waiting for c3 ready for ZK migration


Had a look, and it looks like the channel cannot successfully finishConnect 
[here|https://github.com/apache/kafka/blob/3.6/clients/src/main/java/org/apache/kafka/common/network/Selector.java#L527],
 so the channel won't be considered as connected, and initiate a apiVersion 
request.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to