[ https://issues.apache.org/jira/browse/KAFKA-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845658#comment-17845658 ]
sanghyeok An commented on KAFKA-15648: -------------------------------------- Hey, [~davidarthur] ! How did you test it locally? I execute test code repeatly, bit that error not occurs. this is command lists which i executed. # gradle --parallel test # gradle --parallel metadata:test # gradle metadata:test > QuorumControllerTest#testBootstrapZkMigrationRecord is flaky > ------------------------------------------------------------ > > Key: KAFKA-15648 > URL: https://issues.apache.org/jira/browse/KAFKA-15648 > Project: Kafka > Issue Type: Bug > Components: controller, unit tests > Reporter: David Arthur > Priority: Minor > Labels: flaky-test, good-first-issue > > Noticed that this test failed on Jenkins with > {code} > org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: > exception while completing controller activation: Should not have ZK > migrations enabled on a cluster running metadata.version 3.0-IV1 > at > app//org.apache.kafka.controller.ActivationRecordsGenerator.recordsForNonEmptyLog(ActivationRecordsGenerator.java:154) > at > app//org.apache.kafka.controller.ActivationRecordsGenerator.generate(ActivationRecordsGenerator.java:229) > at > app//org.apache.kafka.controller.QuorumController$CompleteActivationEvent.generateRecordsAndResult(QuorumController.java:1237) > at > app//org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:784) > at > app//org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127) > at > app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > app//org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.RuntimeException: Should not have ZK migrations enabled > on a cluster running metadata.version 3.0-IV1 > ... 8 more > {code} > I think this exception is a red herring (it's expected from one of the > negative test cases). When trying to reproduce this issue locally, I do > occasionally see the following type of failure > {code} > [2023-10-19 13:42:10,091] INFO Elected new leader: > LeaderAndEpoch(leaderId=OptionalInt[0], epoch=1). > (org.apache.kafka.metalog.LocalLogManager$SharedLogData:300) > [2023-10-19 13:42:10,091] DEBUG > append(batch=LeaderChangeBatch(newLeader=LeaderAndEpoch(leaderId=OptionalInt[0], > epoch=1)), nextEndOffset=0) > (org.apache.kafka.metalog.LocalLogManager$SharedLogData:276) > [2023-10-19 13:42:10,091] DEBUG [LocalLogManager 0] Node 0: running log > check. (org.apache.kafka.metalog.LocalLogManager:536) > [2023-10-19 13:42:10,091] DEBUG [LocalLogManager 0] initialized local log > manager for node 0 (org.apache.kafka.metalog.LocalLogManager:685) > [2023-10-19 13:42:10,091] DEBUG [QuorumController id=0] Creating in-memory > snapshot -1 (org.apache.kafka.timeline.SnapshotRegistry:203) > [2023-10-19 13:42:10,091] INFO [QuorumController id=0] Creating new > QuorumController with clusterId 6xRUXZ_kQ1GfuaHK42AS9Q. ZK migration mode is > enabled. (org.apache.kafka.controller.QuorumController:1912) > [2023-10-19 13:42:10,092] INFO [LocalLogManager 0] Node 0: registered > MetaLogListener 1082153924 (org.apache.kafka.metalog.LocalLogManager:703) > [2023-10-19 13:42:10,092] DEBUG [LocalLogManager 0] Node 0: running log > check. (org.apache.kafka.metalog.LocalLogManager:536) > [2023-10-19 13:42:10,092] DEBUG [LocalLogManager 0] Node 0: Executing > handleLeaderChange LeaderAndEpoch(leaderId=OptionalInt[0], epoch=1) > (org.apache.kafka.metalog.LocalLogManager:578) > [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Executing > handleLeaderChange[1]. (org.apache.kafka.controller.QuorumController:577) > [2023-10-19 13:42:10,092] INFO [QuorumController id=0] In the new epoch 1, > the leader is (none). (org.apache.kafka.controller.QuorumController:1179) > [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Processed > handleLeaderChange[1] in 50 us > (org.apache.kafka.controller.QuorumController:510) > [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Executing > handleLeaderChange[1]. (org.apache.kafka.controller.QuorumController:577) > [2023-10-19 13:42:10,092] INFO [QuorumController id=0] Becoming the active > controller at epoch 1, next write offset 1. > (org.apache.kafka.controller.QuorumController:1175) > [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Processed > handleLeaderChange[1] in 77 us > (org.apache.kafka.controller.QuorumController:510) > [2023-10-19 13:42:10,092] WARN [QuorumController id=0] Performing controller > activation. The metadata log appears to be empty. Appending 1 bootstrap > record(s) at metadata.version 3.4-IV0 from bootstrap source 'test'. Putting > the controller into pre-migration mode. No metadata updates will be allowed > until the ZK metadata has been migrated. > (org.apache.kafka.controller.QuorumController:108) > [2023-10-19 13:42:10,092] INFO [QuorumController id=0] Replayed a > FeatureLevelRecord setting metadata version to 3.4-IV0 > (org.apache.kafka.controller.FeatureControlManager:400) > [2023-10-19 13:42:10,092] INFO [QuorumController id=0] Replayed a > ZkMigrationStateRecord changing the migration state from NONE to > PRE_MIGRATION. (org.apache.kafka.controller.FeatureControlManager:421) > [2023-10-19 13:42:10,092] DEBUG [QuorumController id=0] Got exception while > running completeActivation[1](342832465). Invoking handleException. > (org.apache.kafka.queue.KafkaEventQueue:132) > org.apache.kafka.raft.errors.NotLeaderException: Append failed because the > given epoch 0 is stale. Current leader epoch = 1 > at > org.apache.kafka.metalog.LocalLogManager$SharedLogData.tryAppend(LocalLogManager.java:253) > at > org.apache.kafka.metalog.LocalLogManager$SharedLogData.tryAppend(LocalLogManager.java:233) > at > org.apache.kafka.metalog.LocalLogManager.scheduleAtomicAppend(LocalLogManager.java:795) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent$1.apply(QuorumController.java:839) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent$1.apply(QuorumController.java:816) > at > org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:918) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:815) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base/java.lang.Thread.run(Thread.java:833) > [2023-10-19 13:42:10,092] INFO [QuorumController id=0] completeActivation[1]: > event failed with NotLeaderException (treated as NotControllerException) at > epoch 1 in 387 microseconds. Renouncing leadership and reverting to the last > committed offset -1. (org.apache.kafka.controller.QuorumController:542) > [2023-10-19 13:42:10,092] ERROR Encountered fatalFaultHandler fault: > exception while renouncing leadership > (org.apache.kafka.server.fault.MockFaultHandler:44) > java.lang.IllegalArgumentException: Attempt to resign from epoch 1 which is > larger than the current epoch 0 > at > org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808) > at > org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1268) > at > org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:545) > at > org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:177) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:879) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:869) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base/java.lang.Thread.run(Thread.java:833) > [2023-10-19 13:42:10,093] DEBUG [QuorumController id=0] Processed read > migration state(176059074) in 4 us > (org.apache.kafka.controller.QuorumController:510) > [2023-10-19 13:42:10,093] INFO [QuorumController id=0] > QuorumController#beginShutdown: shutting down event queue. > (org.apache.kafka.queue.KafkaEventQueue:492) > [2023-10-19 13:42:10,093] DEBUG [QuorumController id=0] > KafkaEventQueue#close: Event queue is already shutting down. > (org.apache.kafka.queue.KafkaEventQueue:489) > [2023-10-19 13:42:10,093] INFO [QuorumController id=0] closed event queue. > (org.apache.kafka.queue.KafkaEventQueue:514) > [2023-10-19 13:42:10,093] INFO [LocalLogManager 0] beginShutdown: shutting > down event queue. (org.apache.kafka.queue.KafkaEventQueue:492) > [2023-10-19 13:42:10,093] DEBUG [LocalLogManager 0] Node 0: closing. > (org.apache.kafka.metalog.LocalLogManager:653) > [2023-10-19 13:42:10,094] DEBUG [LocalLogManager 0] beginShutdown: Event > queue is already shutting down. (org.apache.kafka.queue.KafkaEventQueue:489) > [2023-10-19 13:42:10,094] DEBUG [LocalLogManager 0] KafkaEventQueue#close: > Event queue is already shutting down. > (org.apache.kafka.queue.KafkaEventQueue:489) > [2023-10-19 13:42:10,094] DEBUG [LocalLogManager 0] Node 0: beginning > shutdown. (org.apache.kafka.metalog.LocalLogManager:637) > [2023-10-19 13:42:10,094] DEBUG > append(batch=LeaderChangeBatch(newLeader=LeaderAndEpoch(leaderId=OptionalInt.empty, > epoch=2)), nextEndOffset=1) > (org.apache.kafka.metalog.LocalLogManager$SharedLogData:276) > [2023-10-19 13:42:10,094] INFO Elected new leader: > LeaderAndEpoch(leaderId=OptionalInt[0], epoch=3). > (org.apache.kafka.metalog.LocalLogManager$SharedLogData:300) > [2023-10-19 13:42:10,094] DEBUG > append(batch=LeaderChangeBatch(newLeader=LeaderAndEpoch(leaderId=OptionalInt[0], > epoch=3)), nextEndOffset=2) > (org.apache.kafka.metalog.LocalLogManager$SharedLogData:276) > [2023-10-19 13:42:10,094] DEBUG [QuorumController id=0] > MetaLogManager.Listener: Event queue is already shutting down. > (org.apache.kafka.queue.KafkaEventQueue:489) > [2023-10-19 13:42:10,094] INFO [LocalLogManager 0] closed event queue. > (org.apache.kafka.queue.KafkaEventQueue:514) > org.apache.kafka.server.fault.FaultHandlerException: fatalFaultHandler: > exception while renouncing leadership: Attempt to resign from epoch 1 which > is larger than the current epoch 0 > at > org.apache.kafka.metalog.LocalLogManager.resign(LocalLogManager.java:808) > at > org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1268) > at > org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:545) > at > org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:177) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:879) > at > org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:869) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148) > at > org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) > at > org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: java.lang.IllegalArgumentException: Attempt to resign from epoch 1 > which is larger than the current epoch 0 > ... 11 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)