hachikuji opened a new pull request, #12506:
URL: https://github.com/apache/kafka/pull/12506

   It is possible for the leader to send an `AlterPartition` request to a 
zombie controller which includes either a partition or leader epoch which is 
larger than what is found in the controller context. Prior to 
https://github.com/apache/kafka/pull/12032, the controller handled this in the 
following way:
   
   1. If the `LeaderAndIsr` state exactly matches the current state on the 
controller excluding the partition epoch, then the `AlterPartition` request is 
considered successful and no error is returned. The risk with this handling is 
that this may cause the leader to incorrectly assume that the state had been 
successfully updated. Since the controller's state is stale, there is no way to 
know what the latest ISR state is.
   2. Otherwise, the controller will attempt to update the state in zookeeper 
with the leader/partition epochs from the `AlterPartition` request. This 
operation would fail if the controller's epoch was not still current in 
Zookeeper and the result would be a `NOT_CONTROLLER` error.
   
   Following https://github.com/apache/kafka/pull/12032, the controller's 
validation is stricter. If the partition epoch is larger than expected, then 
the controller will return `INVALID_UPDATE_VERSION` without attempting the 
operation. Similarly, if the leader epoch is larger than expected, the 
controller will return `FENCED_LEADER_EPOCH`. The problem with this new 
handling is that the leader treats the errors from the controller as 
authoritative. For example, if it sees the `FENCED_LEADER_EPOCH` error, then it 
will not retry the request and will simply wait until the next leader epoch 
arrives. The ISR state gets suck in a pending state, which can lead to 
persistent URPs until the leader epoch gets bumped.
   
   In this patch, we want to fix the issues with this handling, but we don't 
want to restore the buggy idempotent check. The approach is straightforward. If 
the controller sees a partition/leader epoch which is larger than what it has 
in the controller context, then it assumes that has become a zombie and returns 
`NOT_CONTROLLER` to the leader. This will cause the leader to attempt to reset 
the controller from its local metadata cache and retry the `AlterPartition` 
request.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to