[ 
https://issues.apache.org/jira/browse/KAFKA-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14704.
---------------------------------
    Fix Version/s: 3.5.0
                   3.4.1
                   3.3.3
         Reviewer: Jason Gustafson
       Resolution: Fixed

> Follower should truncate before incrementing high watermark
> -----------------------------------------------------------
>
>                 Key: KAFKA-14704
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14704
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: David Jacot
>            Assignee: David Jacot
>            Priority: Major
>             Fix For: 3.5.0, 3.4.1, 3.3.3
>
>
> When a leader becomes a follower, it is likely that it has uncommitted 
> records in its log. When it reaches out to the leader, the leader will detect 
> that they have diverged and it will return the diverging epoch and offset. 
> The follower truncates it log based on this.
> There is a small caveat in this process. When the leader return the diverging 
> epoch and offset, it also includes its high watermark, low watermark, start 
> offset and end offset. The current code in the `AbstractFetcherThread` works 
> as follow. First it process the partition data and then it checks whether 
> there is a diverging epoch/offset. The former may accidentally expose 
> uncommitted records as this step updates the local watermark to whatever is 
> received from the leader. As the follower, or the former leader, may have 
> uncommitted records, it will be able to updated the high watermark to a 
> larger offset if the leader has a higher watermark than the current local 
> one. This result in exposing uncommitted records until the log is finally 
> truncated. The time window is short but a fetch requests coming at the right 
> time to the follower could read those records. This is especially true for 
> clients out there which uses recent versions of the fetch request but without 
> implementing KIP-320.
> When this happens, the follower logs the following message: `Non-monotonic 
> update of high watermark from (offset=21437 segment=[20998:98390]) to 
> (offset=21434 segment=[20998:97843])`.
> This patch proposes to mitigate the issue by starting by checking on whether 
> a diverging epoch/offset is provided by the leader and skip processing the 
> partition data if it is. This basically means that the first fetch request 
> will result in truncating the log and a subsequent fetch request will update 
> the log/high watermarks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to