David Arthur created KAFKA-13050:
------------------------------------

             Summary: Race between controller creating snapshot and snapshot 
cleaning
                 Key: KAFKA-13050
                 URL: https://issues.apache.org/jira/browse/KAFKA-13050
             Project: Kafka
          Issue Type: Bug
          Components: controller, kraft
    Affects Versions: 3.0.0
            Reporter: David Arthur


If the controller attempts to take a snapshot with its cached OffsetAndEpoch 
while snapshot cleaning is happening, it is possible for the OffsetAndEpoch to 
be invalidated due to truncation.

{code}
[2021-07-08 12:12:41,938] WARN [Controller 1] 
org.apache.kafka.controller.QuorumController@67e0d836: failed with unknown 
server exception IllegalArgumentException at epoch -1 in 3207460 us.  Reverting 
to last committed offset 98. (org.apache.kafka.controller.QuorumController)
java.lang.IllegalArgumentException: Snapshot id (OffsetAndEpoch(offset=99, 
epoch=5)) is not valid according to the log: ValidOffsetAndEpoch(kind=SNAPSHOT, 
offsetAndEpoch=OffsetAndEpoch(offset=180, epoch=8))
        at 
kafka.raft.KafkaMetadataLog.createNewSnapshot(KafkaMetadataLog.scala:252)
        at 
org.apache.kafka.raft.KafkaRaftClient.lambda$createSnapshot$30(KafkaRaftClient.java:2334)
        at 
org.apache.kafka.snapshot.SnapshotWriter.createWithHeader(SnapshotWriter.java:134)
        at 
org.apache.kafka.raft.KafkaRaftClient.createSnapshot(KafkaRaftClient.java:2333)
        at 
org.apache.kafka.controller.QuorumController$SnapshotGeneratorManager.createSnapshotGenerator(QuorumController.java:351)
        at 
org.apache.kafka.controller.QuorumController.checkSnapshotGeneration(QuorumController.java:904)
        at 
org.apache.kafka.controller.QuorumController.access$3000(QuorumController.java:121)
        at 
org.apache.kafka.controller.QuorumController$QuorumMetaLogListener.lambda$handleCommit$0(QuorumController.java:681)
        at 
org.apache.kafka.controller.QuorumController$ControlEvent.run(QuorumController.java:311)
        at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:121)
        at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:200)
        at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:173)
        at java.lang.Thread.run(Thread.java:748)
[2021-07-08 12:12:41,941] INFO [BrokerMetadataListener id=1] Loading snapshot 
180-8. (kafka.server.metadata.BrokerMetadataListener)
{code}

This was observed while running a broker in combined mode with artificially low 
values for snapshot generation and cleaning.

{code}
metadata.log.max.record.bytes.between.snapshots=100
metadata.log.segment.bytes=1024
metadata.max.retention.bytes=4096
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to