Igor Soarez created KAFKA-15650:
-----------------------------------

             Summary: Data-loss on leader shutdown right after partition 
creation?
                 Key: KAFKA-15650
                 URL: https://issues.apache.org/jira/browse/KAFKA-15650
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Igor Soarez


As per KIP-858, when a replica is created, the broker selects a log directory 
to host the replica and queues the propagation of the directory assignment to 
the controller. The replica becomes immediately active, it isn't blocked until 
the controller confirms the metadata change. If the replica is the leader 
replica it can immediately start accepting writes. 

Consider the following scenario:
 # A partition is created in some selected log directory, and some produce 
traffic is accepted
 # Before the broker is able to notify the controller of the directory 
assignment, the broker shuts down
 # Upon coming back online, the broker has an offline directory, the same 
directory which was chosen to host the replica
 # The broker assumes leadership for the replica, but cannot find it in any 
available directory and has no way of knowing it was already created because 
the directory assignment is still missing
 # The replica is created and the previously produced records are lost

Step 4. may seem unlikely due to ISR membership gating leadership, but even 
assuming acks=all and replicas>1, if all other replicas are also offline the 
broker may still gain leadership. Perhaps KIP-966 is relevant here.

We may need to delay new replica activation until the assignment is propagated 
successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to