Luke Chen created KAFKA-15702: --------------------------------- Summary: apiVersion request doesn't send when DNS is not ready at first Key: KAFKA-15702 URL: https://issues.apache.org/jira/browse/KAFKA-15702 Project: Kafka Issue Type: Bug Affects Versions: 3.6.0 Reporter: Luke Chen
When ZK migrating to KRaft, we [rely on apiVersions|https://github.com/apache/kafka/blob/3055cd7c180cac15016169c52383ddc204ca5f16/metadata/src/main/java/org/apache/kafka/controller/QuorumFeatures.java#L140] to check if all controllers enabled ZkMigration flag. But if the DNS is not updated with the latest info (frequently happen in k8s env), then the active controller won't send out the apiVersion request. The impact is the ZK migrating to KRaft will be blocked since the active controller will consider the follower is not ready. *The flow of the issue is like this:* 1. start up 3 controllers, ex: c1, c2, c3 2. The DNS doesn't update the host name entry of c3. 3. c1 becomes the leader, and send apiVersion request to c2 4. c1 is trying to connect to c3, and got unknownHost error 5. DNS is updated with c3 entry 6. c1 successfully connect to c3, but no apiVersion request sent. 7. The KRaftMigrationDriver keeps waiting for c3 ready for ZK migration Had a look, and it looks like the channel cannot successfully finishConnect [here|https://github.com/apache/kafka/blob/3.6/clients/src/main/java/org/apache/kafka/common/network/Selector.java#L527], so the channel won't be considered as connected, and initiate a apiVersion request. -- This message was sent by Atlassian Jira (v8.20.10#820010)