Hi Json.

John might have a point. It is not reasonable to have more than 6-8GB of
heap provided for the JVM that's running Kafka. One of the reason is GC
time and the other is that Kafka relies heavily on the OS' disk read/write
in-memory caching.
Also there were a few synchronization bugs in 0.9 which caused similar
problems. I would recommend you to upgrade to 1.0.0 if that is feasible.

Viktor


On Thu, Nov 9, 2017 at 2:59 PM, John Yost <hokiege...@gmail.com> wrote:

> I've seen this before and it was due to long GC pauses due in large part to
> a memory heap > 8 GB.
>
> --John
>
> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <kafka...@126.com> wrote:
>
> > Hi,
> >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> > 16G memory on each broker’s machine, and we have about 1600 topics in the
> > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> each
> > broker.
> >     when we restart a normal broke,  we find that there are 500+
> > partitions shrink and expand frequently when restart the broker,
> > there are many logs as below.
> >
> >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > …
> >
> >
> >     and repeat shrink and expand after 30 minutes which is the default
> > value of leader.imbalance.check.interval.seconds, and at that time
> > we can find the log of controller’s auto rebalance,which can leads some
> > partition’s leader change to this restarted broker.
> >     we have no shrink and expand when our cluster is running except when
> > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> >
> >     we can reproduce it at each restart,can someone give some
> suggestions.
> > thanks before.
> >
> >
> >
> >
> >
> >
> >
> >
>

Reply via email to