[ 
https://issues.apache.org/jira/browse/KAFKA-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash reassigned KAFKA-16890:
--------------------------------------------

    Assignee: Kamal Chandraprakash

> Failing to build aux state on broker failover
> ---------------------------------------------
>
>                 Key: KAFKA-16890
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16890
>             Project: Kafka
>          Issue Type: Bug
>          Components: Tiered-Storage
>    Affects Versions: 3.7.0, 3.7.1
>            Reporter: Francois Visconte
>            Assignee: Kamal Chandraprakash
>            Priority: Major
>
> We have clusters where we replace machines often falling into a state where 
> we keep having "Error building remote log auxiliary state for 
> loadtest_topic-22" and the partition being under-replicated until the leader 
> is manually restarted. 
> Looking into a specific case, here is what we observed in 
> __remote_log_metadata topic:
> {code:java}
>  
> partition: 29, offset: 183593, value: 
> RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=GZeRTXLMSNe2BQjRXkg6hQ}, startOffset=10823, endOffset=11536, 
> brokerId=10013, maxTimestampMs=1715774588597, eventTimestampMs=1715781657604, 
> segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, 
> 133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535}, 
> segmentSizeInBytes=704895, customMetadata=Optional.empty, 
> state=COPY_SEGMENT_STARTED}
> partition: 29, offset: 183594, value: 
> RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
> state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715781658183, brokerId=10013}
> partition: 29, offset: 183669, value: 
> RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=L1TYzx0lQkagRIF86Kp0QQ}, startOffset=10823, endOffset=11544, 
> brokerId=10008, maxTimestampMs=1715781445270, eventTimestampMs=1715782717593, 
> segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, 
> 133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535, 140=11537, 
> 142=11543}, segmentSizeInBytes=713088, customMetadata=Optional.empty, 
> state=COPY_SEGMENT_STARTED}
> partition: 29, offset: 183670, value: 
> RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
> state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715782718370, brokerId=10008}
> partition: 29, offset: 186215, value: 
> RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
> state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874617, brokerId=10008}
> partition: 29, offset: 186216, value: 
> RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
> state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874725, brokerId=10008}
> partition: 29, offset: 186217, value: 
> RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
> state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874729, brokerId=10008}
> partition: 29, offset: 186218, value: 
> RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
>  id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
> state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874817, brokerId=10008}
> {code}
>  
> It seems that at the time the leader is restarted (10013), a second copy of 
> the same segment is tiered by the new leader (10008). Interestingly the 
> segment doesn't have the same end offset, which is concerning. 
> Then the follower sees the following error repeatedly until the leader is 
> restarted: 
>  
> {code:java}
> [2024-05-17 20:46:42,133] DEBUG [ReplicaFetcher replicaId=10013, 
> leaderId=10008, fetcherId=0] Handling errors in processFetchRequest for 
> partitions HashSet(loadtest_topic-22) (kafka.server.ReplicaFetcherThread)
> [2024-05-17 20:46:43,174] DEBUG [ReplicaFetcher replicaId=10013, 
> leaderId=10008, fetcherId=0] Received error OFFSET_MOVED_TO_TIERED_STORAGE, 
> at fetch offset: 11537, topic-partition: loadtest_topic-22 
> (kafka.server.ReplicaFetcherThread)
> [2024-05-17 20:46:43,175] ERROR [ReplicaFetcher replicaId=10013, 
> leaderId=10008, fetcherId=0] Error building remote log auxiliary state for 
> loadtest_topic-22 (kafka.server.ReplicaFetcherThread)
> org.apache.kafka.server.log.remote.storage.RemoteStorageException: Couldn't 
> build the state from remote store for partition: loadtest_topic-22, 
> currentLeaderEpoch: 153, leaderLocalLogStartOffset: 11545, 
> leaderLogStartOffset: 11537, epoch: 142as the previous remote log segment 
> metadata was not found
> {code}
> The follower is trying to fetch from 11537 and gets 
> OFFSET_MOVED_TO_TIERED_STORAGE . Then when the follower retries, it still 
> thinks it needs to fetch from 11537 . There is no data in S3, so the correct 
> leaderLogStartOffset should be 11545 .  I'm not sure yet if its intentional 
> that there can be two copies of the same segment that are different uploaded 
> to S3 or if the segments were just deleted in the wrong order, but that is 
> what ultimately caused the leaderLogStartOffset to be set incorrectly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to