Gantigmaa Selenge created KAFKA-16620: -----------------------------------------
Summary: Kraft quorum cannot be formed if all controllers are restarted at the same time Key: KAFKA-16620 URL: https://issues.apache.org/jira/browse/KAFKA-16620 Project: Kafka Issue Type: Bug Reporter: Gantigmaa Selenge Controller quorum cannot seem to form at all after accidentally restarting all controller nodes at the same time in a test environment. This is reproducible, happens almost everytime when restarting all controller nodes of the cluster. Started a cluster with 3 controller nodes and 3 broker nodes. After restarting the controller nodes, one of them becomes the active controller but resigns due to fetch timeout. The quorum leadership bounces off like this between the nodes indefinitely. The controller.quorum.fetch.timeout.ms was set to the default of 2 seconds. Logs from an active controller: ``` 2024-04-17 14:00:48,250 INFO [QuorumController id=0] Becoming the active controller at epoch 34, next write offset 1116. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,250 WARN [QuorumController id=0] Performing controller activation. Loaded ZK migration state of NONE. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:00:48,701 INFO [RaftManager id=0] Node 1 disconnected. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,701 WARN [RaftManager id=0] Connection to node 1 (my-cluster-controller-1.my-cluster-kafka-brokers.roller.svc.cluster.local/10.244.0.68:9090) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient) [kafka-0-raft-outbound-request-thread] 2024-04-17 14:00:48,776 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448239, current time: 1713362448776,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:00:49,277 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1118 (exclusive)with recovery point 1118, last flushed: 1713362448777, current time: ... 2024-04-17 14:01:35,934 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log0] Flushing log up to offset 1200 (exclusive)with recovery point 1200, last flushed: 1713362489371, current time: 1713362495934,unflushed: 1 (kafka.log.UnifiedLog) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,121 INFO [RaftManager id=0] Did not receive fetch request from the majority of the voters within 3000ms. Current fetched voters are []. (org.apache.kafka.raft.LeaderState) [kafka-0-raft-io-thread] 2024-04-17 14:01:36,223 WARN [QuorumController id=0] Renouncing the leadership due to a metadata log event. We were the leader at epoch 34, but in the new epoch 35, the leader is (none). Reverting to last stable offset 1198. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] failAll(NotControllerException): failing writeNoOpRecord(152156824). (org.apache.kafka.deferred.DeferredEventQueue) [quorum-controller-0-event-handler] 2024-04-17 14:01:36,223 INFO [QuorumController id=0] writeNoOpRecord: event failed with NotControllerException in 6291037 microseconds. (org.apache.kafka.controller.QuorumController) [quorum-controller-0-event-handler] ``` Logs from the follower: ``` 024-04-17 14:00:48,242 INFO [RaftManager id=2] Completed transition to FollowerState(fetchTimeoutMs=2000, epoch=34, leaderId=0, voters=[0, 1, 2], highWatermark=Optional[LogOffsetMetadata(offset=1113, metadata=Optional.empty)], fetchingSnapshot=Optional.empty) from Voted(epoch=34, votedId=0, voters=[0, 1, 2], electionTimeoutMs=1794) (org.apache.kafka.raft.QuorumState) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,242 INFO [QuorumController id=2] In the new epoch 34, the leader is 0. (org.apache.kafka.controller.QuorumController) [quorum-controller-2-event-handler] 2024-04-17 14:00:48,247 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1116 (exclusive)with recovery point 1116, last flushed: 1713362442238, current time: 1713362448247,unflushed: 2 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:48,777 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1117 (exclusive)with recovery point 1117, last flushed: 1713362448249, current time: 1713362448777,unflushed: 1 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:00:49,278 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1118 (exclusive)with recovery point 1118, last flushed: 1713362448811, current time: 1713362449278,unflushed ... 2024-04-17 14:01:29,371 DEBUG [UnifiedLog partition=__cluster_metadata-0, dir=/var/lib/kafka/data/kafka-log2] Flushing log up to offset 1199 (exclusive)with recovery point 1199, last flushed: 1713362488919, current time: 1713362489371,unflushed: 1 (kafka.log.UnifiedLog) [kafka-2-raft-io-thread] 2024-04-17 14:01:35,718 INFO [RaftManager id=2] Become candidate due to fetch timeout (org.apache.kafka.raft.KafkaRaftClient) [kafka-2-raft-io-thread] ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)