I'm happy that it's solved :)

On Thu, Nov 9, 2017 at 3:32 PM, John Yost <hokiege...@gmail.com> wrote:

> Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> heap was due to OOM errors that were being thrown when I upgraded from
> 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> 0.9.0.1 because we needed to support the older clients and the
> corresponding format. once I set the message format to 0.9.0.1, the memory
> requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
> cluster has been awesome since.
>
> --John
>
> On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <viktorsomo...@gmail.com>
> wrote:
>
> > Hi Json.
> >
> > John might have a point. It is not reasonable to have more than 6-8GB of
> > heap provided for the JVM that's running Kafka. One of the reason is GC
> > time and the other is that Kafka relies heavily on the OS' disk
> read/write
> > in-memory caching.
> > Also there were a few synchronization bugs in 0.9 which caused similar
> > problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
> >
> > Viktor
> >
> >
> > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <hokiege...@gmail.com> wrote:
> >
> > > I've seen this before and it was due to long GC pauses due in large
> part
> > to
> > > a memory heap > 8 GB.
> > >
> > > --John
> > >
> > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <kafka...@126.com> wrote:
> > >
> > > > Hi,
> > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> and
> > > > 16G memory on each broker’s machine, and we have about 1600 topics in
> > the
> > > > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> > > each
> > > > broker.
> > > >     when we restart a normal broke,  we find that there are 500+
> > > > partitions shrink and expand frequently when restart the broker,
> > > > there are many logs as below.
> > > >
> > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > …
> > > >
> > > >
> > > >     and repeat shrink and expand after 30 minutes which is the
> default
> > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > we can find the log of controller’s auto rebalance,which can leads
> some
> > > > partition’s leader change to this restarted broker.
> > > >     we have no shrink and expand when our cluster is running except
> > when
> > > > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> > > >
> > > >     we can reproduce it at each restart,can someone give some
> > > suggestions.
> > > > thanks before.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to