Regarding (1), I am assuming that it is expected that brokers going down
will be brought back up soon. At which point, they will pick up from the
current leader and get back into the ISR. Am I right?

The broker will be added back to the ISR once it is restarted, but it never
goes out of the replica list until the admin explicitly moves it using the
reassign partitions tool.

Regarding (2), I finally kicked off a reassign_partitions admin task adding
broker 7 to the replicas list for partition 0 which finally fixed the under
replicated issue:
Is this therefore expected that the user will fix up the under replication
situation?

Yes. Currently, partition reassignment is purely an admin initiated task.

Another thing I'd like to clarify is that for another topic Y, broker 5 was
never removed from the ISR array. Note that Y is an unused topic so I am
guessing that technically broker 5 is not out of sync... though it is still
dead. Is this the expected behavior?

Not really. After replica.lag.time.max.ms (which defaults to 10 seconds),
the leader should remove the dead broker out of the ISR.

Thanks,
Neha

On Tue, Oct 14, 2014 at 9:27 AM, Jean-Pascal Billaud <j...@tellapart.com>
wrote:

> hey folks,
>
> I have been testing a kafka cluster of 10 nodes on AWS using version
> 2.8.0-0.8.0
> and see some behavior on failover that I want to make sure I understand.
>
> Initially, I have a topic X with 30 partitions and a replication factor of
> 3. Looking at the partition 0:
> partition: 0 - leader: 5 preferred leader: 5 brokers: [5, 3, 4] in-sync:
> [5, 3, 4]
>
> While killing broker 5, the controller immediately grab the next replica in
> the ISR and assign it as a leader:
> partition: 0 - leader: 3 preferred leader: 5 brokers: [5, 3, 4] in-sync:
> [3, 4]
>
> There are couple of things at this point I would like to clarify:
>
> (1) Why is broker 5 still in the brokers array for partition 0? Note this
> broker array comes from a get of the zookeeper path /brokers/topics/[topic]
> as documented.
> (2) Partition 0 is now under replicated and the controller does not seem to
> do anything about. Is this expected?
>
> Regarding (1), I am assuming that it is expected that brokers going down
> will be brought back up soon. At which point, they will pick up from the
> current leader and get back into the ISR. Am I right?
>
> Regarding (2), I finally kicked off a reassign_partitions admin task adding
> broker 7 to the replicas list for partition 0 which finally fixed the under
> replicated issue:
>
> partition: 0 - leader: 3  expected_leader: 3  brokers: [3, 4, 7]  in-sync:
> [3, 4, 7]
>
> Is this therefore expected that the user will fix up the under replication
> situation? Or maybe it is expected again that broker 5 will come back soon
> and this whole thing is a non-issue once that's true given that
> decommissioning brokers is not something supported as of the kafka version
> I am using.
>
> Another thing I'd like to clarify is that for another topic Y, broker 5 was
> never removed from the ISR array. Note that Y is an unused topic so I am
> guessing that technically broker 5 is not out of sync... though it is still
> dead. Is this the expected behavior?
>
> I'd really appreciate somebody to confirm my understanding,
>
> Thanks,
>

Reply via email to