Gaurav Narula created KAFKA-16157:
-------------------------------------

             Summary: Topic recreation with offline disk doesn't update 
leadership/shrink ISR correctly
                 Key: KAFKA-16157
                 URL: https://issues.apache.org/jira/browse/KAFKA-16157
             Project: Kafka
          Issue Type: Bug
          Components: jbod, kraft
    Affects Versions: 3.7.1
            Reporter: Gaurav Narula


In a cluster with 4 brokers, `broker-1..broker-4` with 2 disks `d1` and `d2` in 
each broker, we perform the following operations:

 
 # Create a topic `foo.test` with 10 replicas and RF 4. Let's assume the topic 
was created with id `rAujIqcjRbu_-E4UxgQT8Q`.
 # Start a producer in the background to produce to `foo.test`.
 # Break disk `d1` in `broker-1`. We simulate this by marking the log dir 
read-only.
 # Delete topic `foo.test`
 # Recreate topic `foo.test`. Let's assume the topic was created with id 
`bgdrsv-1QjCLFEqLOzVCHg`.
 # Wait for 5 minutes
 # Describe the recreated topic `foo.test`.

 

We observe that `broker-1` is the leader and in-sync for few partitions

 

 
{code:java}
 
Topic: foo.test TopicId: bgdrsv-1QjCLFEqLOzVCHg PartitionCount: 10      
ReplicationFactor: 4    Configs: 
min.insync.replicas=1,unclean.leader.election.enable=false
        Topic: foo.test Partition: 0    Leader: 101     Replicas: 
101,102,103,104       Isr: 101,102,103,104
        Topic: foo.test Partition: 1    Leader: 102     Replicas: 
102,103,104,101       Isr: 102,103,104
        Topic: foo.test Partition: 2    Leader: 103     Replicas: 
103,104,101,102       Isr: 103,104,102
        Topic: foo.test Partition: 3    Leader: 104     Replicas: 
104,101,102,103       Isr: 104,102,103
        Topic: foo.test Partition: 4    Leader: 104     Replicas: 
104,102,101,103       Isr: 104,102,103
        Topic: foo.test Partition: 5    Leader: 102     Replicas: 
102,101,103,104       Isr: 102,103,104
        Topic: foo.test Partition: 6    Leader: 101     Replicas: 
101,103,104,102       Isr: 101,103,104,102
        Topic: foo.test Partition: 7    Leader: 103     Replicas: 
103,104,102,101       Isr: 103,104,102
        Topic: foo.test Partition: 8    Leader: 101     Replicas: 
101,102,104,103       Isr: 101,102,104,103
        Topic: foo.test Partition: 9    Leader: 102     Replicas: 
102,104,103,101       Isr: 102,104,103
{code}
 

 

In this example, it is the leader of partitions `0, 6 and 8`.

 

Consider `foo.test-8`. It is present in the following brokers/disks:

 

 
{code:java}
$ fd foo.test-8
broker-1/d1/foo.test-8/
broker-2/d2/foo.test-8/
broker-3/d2/foo.test-8/
broker-4/d1/foo.test-8/{code}
 

 

`broker-1/d1` still refers to the topic id which is pending deletion because 
the log dir is marked offline.

 

 
{code:java}
$ cat broker-1/d1/foo.test-8/partition.metadata
version: 0
topic_id: rAujIqcjRbu_-E4UxgQT8Q{code}
 

 

However, other brokers have the correct topic-id

 

 
{code:java}
$ cat broker-2/d2/foo.test-8/partition.metadata
version: 0
topic_id: bgdrsv-1QjCLFEqLOzVCHg%{code}
 

 

Now, let's consider `foo.test-0`. We observe that the replica isn't present in 
`broker-1`:




{code:java}
$ fd foo.test-0
broker-2/d1/foo.test-0/
broker-3/d1/foo.test-0/
broker-4/d2/foo.test-0/{code}



In both cases, `broker-1` shouldn't be the leader or in-sync replica for the 
partitions.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to