Re: ClientUtils.fetchTopicMetadata reports smaller ISR than ZkUtils.getLeaderIsrAndEpochForPartition

Ryan Berdeen Wed, 18 Dec 2013 16:23:52 -0800

Hi Joe,

I'm trying to reproduce it with the Vagrant setup you provided. Thanks for
setting that up! I should also need to run the sbt commands from the README
to build Kafka, right?


You included the output from "bin/kafka-list-topic.sh". Based on the
problem I've described, this wouldn't show the issue, would it? If I'm
reading the source right, this command only queries ZooKeeper, while the
problem that I'm seeing is in the metadata reported by the brokers.

I am using the Oracle JDK, version 1.6.0_45.

I'm not sure what you mean by having one topic for the 15 partitions. The
single topic I used as an example has 15 partitions. I have two other
topics with the same number of partitions and replicas, and they exhibit
the same problem.

I'll keep trying to reproduce it with the Vagrant setup.

Thanks!

Ryan


On Tue, Dec 17, 2013 at 9:39 PM, Joe Stein <joe.st...@stealth.ly> wrote:

> Hi Ryan, can you help re-reproduce the issue on virtual machines?  If so, I
> added two more brokers (so five in total now) in a vagrant file
> https://github.com/stealthly/kafka/tree/0.8_hubspot_testing_1
>
> git clone https://github.com/stealthly/kafka/tree/0.8_hubspot_testing_1
> cd 0.8_hubspot_testing_1
> vagrant up
>
> you need vagrant http://www.vagrantup.com/downloads.html and virtual box
> installed https://www.virtualbox.org/
>
> I tried to reproduce and not sure what steps to take or is there issue when
> it launches?
>
> Joes-MacBook-Air:kafka joestein$ bin/kafka-create-topic.sh --zookeeper
> 192.168.50.5:2181 --replica 2 --partition 15 --topic hubspot_testing
> creation succeeded!
>
> Joes-MacBook-Air:kafka joestein$ bin/kafka-list-topic.sh --zookeeper
> 192.168.50.5:2181
> topic: hubspot_testing partition: 0 leader: 3 replicas: 3,1 isr: 3,1
> topic: hubspot_testing partition: 1 leader: 4 replicas: 4,2 isr: 4,2
> topic: hubspot_testing partition: 2 leader: 1 replicas: 1,3 isr: 1,3
> topic: hubspot_testing partition: 3 leader: 2 replicas: 2,4 isr: 2,4
> topic: hubspot_testing partition: 4 leader: 3 replicas: 3,2 isr: 3,2
> topic: hubspot_testing partition: 5 leader: 4 replicas: 4,3 isr: 4,3
> topic: hubspot_testing partition: 6 leader: 1 replicas: 1,4 isr: 1,4
> topic: hubspot_testing partition: 7 leader: 2 replicas: 2,1 isr: 2,1
> topic: hubspot_testing partition: 8 leader: 3 replicas: 3,4 isr: 3,4
> topic: hubspot_testing partition: 9 leader: 4 replicas: 4,1 isr: 4,1
> topic: hubspot_testing partition: 10 leader: 1 replicas: 1,2 isr: 1,2
> topic: hubspot_testing partition: 11 leader: 2 replicas: 2,3 isr: 2,3
> topic: hubspot_testing partition: 12 leader: 3 replicas: 3,1 isr: 3,1
> topic: hubspot_testing partition: 13 leader: 4 replicas: 4,2 isr: 4,2
> topic: hubspot_testing partition: 14 leader: 1 replicas: 1,3 isr: 1,3
>
> Are you using the Oracle JDK?
>
> Do you have one topic for the 15 partitions?
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Tue, Dec 17, 2013 at 7:09 PM, Ryan Berdeen <rberd...@hubspot.com>
> wrote:
>
> > Sorry it's taken so long to reply, the issue went away after I reassigned
> > partitions. Now it's back.
> >
> > I haven't checked JMX, because the brokers and zookeeper have been
> > reporting the same ISR for several hours.
> >
> > Some more details:
> >
> > The cluster/topic has
> >   5 brokers (1, 4, 5, 7, 8)
> >   15 partitions (0...14)
> >   2 replicas
> >
> > A single broker, 4, is the one missing from the ISR in every case. For
> > partitions where 4 is the leader (1, 6, 11), it is present in the ISR.
> For
> > partitions where 4 is not the leader (4, 8, 12), it is not present in the
> > ISR. Here's the output of my tool, showing assignment and ISR:
> > https://gist.github.com/also/8012383#file-from-brokers-txt
> >
> > I haven't seen anything interesting in the logs, but I'm not entirely
> sure
> > what to look for. The cluster is currently in this state, and if it goes
> > like last time, this will persist until I reassign partitions.
> >
> > What can I do in the meantime to track down the issue?
> >
> > Thanks,
> >
> > Ryan
> >
> > On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Do you see any ISR churns on the brokers? You can check the ISR
> > > expand/shrink rate jmx.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Dec 4, 2013 at 3:53 PM, Ryan Berdeen <rberd...@hubspot.com>
> > wrote:
> > >
> > > > I'm working on some monitoring tools for Kafka, and I've seen a
> couple
> > of
> > > > clusters get into a state where ClientUtils.fetchTopicMetadata will
> > show
> > > > that not all replicas are in the ISR.
> > > >
> > > > At the same time, ZkUtils.getLeaderIsrAndEpochForPartition will show
> > that
> > > > all all partitions are in the ISR, and
> > > > the
> > "kafka.server":name="UnderReplicatedPartitions",type="ReplicaManager"
> > > > MBean will report 0.
> > > >
> > > > What's going on? Is there something wrong with my controller, or
> > should I
> > > > not be paying attention to ClientUtils.fetchTopicMetadata?
> > > >
> > > > Thanks,
> > > >
> > > > Ryan
> > > >
> > >
> >
>

Re: ClientUtils.fetchTopicMetadata reports smaller ISR than ZkUtils.getLeaderIsrAndEpochForPartition

Reply via email to