[ 
https://issues.apache.org/jira/browse/KAFKA-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5813:
------------------------------
    Component/s: zkclient
                 replication

> Unexpected unclean leader election due to leader/controller's unusual event 
> handling order 
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-5813
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5813
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication, zkclient
>    Affects Versions: 0.10.2.1
>            Reporter: Allen Wang
>            Priority: Minor
>
> We experienced an unexpected unclean leader election after network glitch 
> happened on the leader of partition. We have replication factor 2.
> Here is the sequence of event gathered from various logs:
> 1. ZK session timeout happens for leader of partition 
> 2. New ZK session is established for leader 
> 3. Leader removes the follower from ISR (which might be caused by replication 
> delay due to the network problem) and updates the ISR in ZK 
> 4. Controller processes the BrokerChangeListener event happened at step 1 
> where the leader seems to be offline 
> 5. Because the ISR in ZK is already updated by leader to remove the follower, 
> controller makes an unclean leader election 
> 6. Controller processes the second BrokerChangeListener event happened at 
> step 2 to mark the broker online again
> It seems to me that step 4 happens too late. If it happens right after step 
> 1, it will be a clean leader election and hopefully the producer will 
> immediately switch to the new leader, thus avoiding consumer offset reset. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to