[ https://issues.apache.org/jira/browse/KAFKA-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Jacot resolved KAFKA-14704. --------------------------------- Fix Version/s: 3.5.0 3.4.1 3.3.3 Reviewer: Jason Gustafson Resolution: Fixed > Follower should truncate before incrementing high watermark > ----------------------------------------------------------- > > Key: KAFKA-14704 > URL: https://issues.apache.org/jira/browse/KAFKA-14704 > Project: Kafka > Issue Type: Bug > Reporter: David Jacot > Assignee: David Jacot > Priority: Major > Fix For: 3.5.0, 3.4.1, 3.3.3 > > > When a leader becomes a follower, it is likely that it has uncommitted > records in its log. When it reaches out to the leader, the leader will detect > that they have diverged and it will return the diverging epoch and offset. > The follower truncates it log based on this. > There is a small caveat in this process. When the leader return the diverging > epoch and offset, it also includes its high watermark, low watermark, start > offset and end offset. The current code in the `AbstractFetcherThread` works > as follow. First it process the partition data and then it checks whether > there is a diverging epoch/offset. The former may accidentally expose > uncommitted records as this step updates the local watermark to whatever is > received from the leader. As the follower, or the former leader, may have > uncommitted records, it will be able to updated the high watermark to a > larger offset if the leader has a higher watermark than the current local > one. This result in exposing uncommitted records until the log is finally > truncated. The time window is short but a fetch requests coming at the right > time to the follower could read those records. This is especially true for > clients out there which uses recent versions of the fetch request but without > implementing KIP-320. > When this happens, the follower logs the following message: `Non-monotonic > update of high watermark from (offset=21437 segment=[20998:98390]) to > (offset=21434 segment=[20998:97843])`. > This patch proposes to mitigate the issue by starting by checking on whether > a diverging epoch/offset is provided by the leader and skip processing the > partition data if it is. This basically means that the first fetch request > will result in truncating the log and a subsequent fetch request will update > the log/high watermarks. -- This message was sent by Atlassian Jira (v8.20.10#820010)