I ran into the same issue today. In a production cluster, I noticed the
"Shrinking ISR for partition" log messages for a topic deleted 2 months
ago.
Our staging cluster shows the same messages for all the topics deleted in
that cluster.
Both 0.8.2

Yifan, Guozhang, did you find a way to get rid of them?

thanks in advance,
alexis


On Tue, Apr 5, 2016 at 4:16 PM Guozhang Wang <wangg...@gmail.com> wrote:

> It is possible, there are some discussions about a similar issue in KIP:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-53+-+Add+custom+policies+for+reconnect+attempts+to+NetworkdClient
>
> mailing thread:
>
> https://www.mail-archive.com/dev@kafka.apache.org/msg46868.html
>
>
>
> Guozhang
>
> On Tue, Apr 5, 2016 at 2:34 PM, Yifan Ying <nafan...@gmail.com> wrote:
>
> > Some updates:
> >
> > Yesterday, right after release (producers and consumers reconnected to
> > Kafka/Zookeeper, but no code change in our producers and consumers), all
> > under replication issues were resolved automatically and no more high
> > latency in both Kafka and Zookeeper. But right after today's
> > release(producers and consumers re-connected again), the under
> replication
> > and high latency issue happened again. So the all-at-once reconnecting
> from
> > producers and consumers would cause the problem? And all these only
> > happened since I deleted a deprecated topic in production.
> >
> > Yifan
> >
> > On Tue, Apr 5, 2016 at 9:04 AM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> >> These configs are mainly dependent on your publish throughput, since the
> >> replication throughput is higher bounded by the publish throughput. If
> the
> >> publish throughput is not high, then setting a lower threshold values in
> >> these two configs will cause churns in shrinking / expanding ISRs.
> >>
> >> Guozhang
> >>
> >> On Mon, Apr 4, 2016 at 11:55 PM, Yifan Ying <nafan...@gmail.com> wrote:
> >>
> >>> Thanks for replying, Guozhang. We did increase both settings:
> >>>
> >>> replica.lag.max.messages=20000
> >>>
> >>> replica.lag.time.max.ms=20000
> >>>
> >>>
> >>> But no sure if these are good enough. And yes, that's a good suggestion
> >>> to monitor ZK performance.
> >>>
> >>>
> >>> Thanks.
> >>>
> >>> On Mon, Apr 4, 2016 at 8:58 PM, Guozhang Wang <wangg...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hmm, it seems like your broker config "replica.lag.max.messages" and "
> >>>> replica.lag.time.max.ms" is mis-configed regarding your replication
> >>>> traffic, and the deletion of the topic actually makes it below the
> >>>> threshold. What are the config values for these two? And could you
> try to
> >>>> increase these configs and see if that helps?
> >>>>
> >>>> In 0.8.2.1 Kafka-consumer-offset-checker.sh access ZK to query the
> >>>> consumer offsets one-by-one, and hence if your ZK read latency is
> high it
> >>>> could take long time. You may want to monitor your ZK cluster
> performance
> >>>> to check its read / write latencies.
> >>>>
> >>>>
> >>>> Guozhang
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 4, 2016 at 10:59 AM, Yifan Ying <nafan...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Guozhang,
> >>>>>
> >>>>> It's 0.8.2.1. So it should be fixed? We also tried to start from
> >>>>> scratch by wiping out the data directory on both Kafka and
> Zookeeper. And
> >>>>> it's odd that the constant shrinking and expanding happened after
> fresh
> >>>>> restart, and high request latency as well. The brokers are using the
> same
> >>>>> config before topic deletion.
> >>>>>
> >>>>> Another observation is that, using the
> >>>>> Kafka-consumer-offset-checker.sh is extremely slow. Any suggestion
> would be
> >>>>> appreciated! Thanks.
> >>>>>
> >>>>> On Sun, Apr 3, 2016 at 2:29 PM, Guozhang Wang <wangg...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Yifan,
> >>>>>>
> >>>>>> Are you on 0.8.0 or 0.8.1/2? There are some issues with zkVersion
> >>>>>> checking
> >>>>>> in 0.8.0 that are fixed in later minor releases of 0.8.
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Fri, Apr 1, 2016 at 7:46 PM, Yifan Ying <nafan...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> > Hi All,
> >>>>>> >
> >>>>>> > We deleted a deprecated topic on Kafka cluster(0.8) and started
> >>>>>> observing
> >>>>>> > constant 'Expanding ISR for partition' and 'Shrinking ISR for
> >>>>>> partition'
> >>>>>> > for other topics. As a result we saw a huge number of under
> >>>>>> replicated
> >>>>>> > partitions and very high request latency from Kafka. And it
> doesn't
> >>>>>> seem
> >>>>>> > able to recover itself.
> >>>>>> >
> >>>>>> > Anyone knows what caused this issue and how to resolve it?
> >>>>>> >
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Yifan
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> -- Guozhang
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Yifan
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
> >
> > --
> > Yifan
> >
> >
> >
>
>
> --
> -- Guozhang
>

Reply via email to