Hi all,

An update about this.

Having a dead broker is the ISR doesn't seem by-design. I added an issue
<https://issues.apache.org/jira/browse/KAFKA-9672> about this, because it
is actually causing exceptions:

  Uncaught exception in scheduled task 'isr-expiration'
(kafka.utils.KafkaScheduler)
  org.apache.kafka.common.errors.ReplicaNotAvailableException: Replica with
id 32 is not available on broker 37

So, effectively a dead broker in ISR list makes at least the scheduled
isr-expiration operation non-working.

Unfortunately, I haven't been able to reproduce this in isolation yet.

Ivan


On Wed, 19 Feb 2020 at 19:43, Brian Sang <bais...@yelp.com.invalid> wrote:

> You can either use the built-in kafka-reassign-partitions.sh script (
>
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-4.ReassignPartitionsTool
> )
>
> Or in industry others use tooling such as
> https://github.com/Yelp/kafka-utils (more easy to use scripts) or
> automated
> system like https://github.com/linkedin/cruise-control
>
> On Wed, Feb 19, 2020 at 5:53 AM Dinesh Kumar <devdinu...@gmail.com> wrote:
>
> > How did you reassign partitions? While reassigning reassignment json had
> > the broker 0 mentioned, it could be in this state. Could you share what's
> > the output of describing the topic from console.
> >
> > On Wed, Feb 19, 2020 at 5:11 AM Bhat, Avinash
> <asb...@amazon.co.uk.invalid
> > >
> > wrote:
> >
> > > Hi Ivan,
> > >
> > > This is probably by design, since just because the broker 0 has gone
> down
> > > does not mean that it won't come back up in the future. That is why the
> > > data structures keep track of dead brokers too.
> > >
> > > As far as ISR is concerned, broker 0 is still probably in ISR for "foo"
> > > topic's 0th partition. If you produce a few messages on this topic
> > > partition, I would expect the leader (in this case broker 1) to remove
> > > broker 0 from the ISR for this topic partition.
> > >
> > > Thanks
> > > Avinash Bhat
> > >
> > > P.S: I am fairly new to Kafka; it will be great to get a confirmation
> on
> > > the above from other folks.
> > >
> > > -----Original Message-----
> > > From: Ivan Yurchenko <ivan0yurche...@gmail.com>
> > > Sent: Thursday, February 13, 2020 5:27 PM
> > > To: users@kafka.apache.org
> > > Subject: Dead broker in ISR
> > >
> > > Hi Kafka community,
> > >
> > > We're running Kafka 2.4 and facing a pretty strange situation.
> > > Let's say there were three brokers in the cluster 0, 1, and 2. Then:
> > >   1. Broker 3 was added.
> > >   2. Partitions were reassigned from broker 0 to broker 3.
> > >   3. Broker 0 was shut down (not gracefully) and removed from the
> > cluster.
> > >   4. We see the following state in ZooKeeper:
> > >
> > > ls /brokers/ids
> > > [1, 2, 3]
> > >
> > > get /brokers/topics/foo
> > >
> > >
> >
> {"version":2,"partitions":{"0":[2,1,3]},"adding_replicas":{},"removing_replicas":{}}
> > >
> > > get /brokers/topics/foo/partitions/0/state
> > >
> > >
> >
> {"controller_epoch":123,"leader":1,"version":1,"leader_epoch":42,"isr":[0,2,3,1]}
> > >
> > > It means, the dead broker 0 remains in the partitions's ISR. A big
> share
> > > of the partitions in the cluster have this issue.
> > >
> > > What would you recommend to get the cluster out of this situation?
> > > Controller re-election doesn't help.
> > > Should we directly edit partitions' state to remove the dead broker? Or
> > > there's some better/safer way?
> > >
> > > Thank you.
> > >
> > > Ivan
> > >
> >
>

Reply via email to