These configs are mainly dependent on your publish throughput, since the replication throughput is higher bounded by the publish throughput. If the publish throughput is not high, then setting a lower threshold values in these two configs will cause churns in shrinking / expanding ISRs.
Guozhang On Mon, Apr 4, 2016 at 11:55 PM, Yifan Ying <nafan...@gmail.com> wrote: > Thanks for replying, Guozhang. We did increase both settings: > > replica.lag.max.messages=20000 > > replica.lag.time.max.ms=20000 > > > But no sure if these are good enough. And yes, that's a good suggestion to > monitor ZK performance. > > > Thanks. > > On Mon, Apr 4, 2016 at 8:58 PM, Guozhang Wang <wangg...@gmail.com> wrote: > >> Hmm, it seems like your broker config "replica.lag.max.messages" and " >> replica.lag.time.max.ms" is mis-configed regarding your replication >> traffic, and the deletion of the topic actually makes it below the >> threshold. What are the config values for these two? And could you try to >> increase these configs and see if that helps? >> >> In 0.8.2.1 Kafka-consumer-offset-checker.sh access ZK to query the >> consumer offsets one-by-one, and hence if your ZK read latency is high it >> could take long time. You may want to monitor your ZK cluster performance >> to check its read / write latencies. >> >> >> Guozhang >> >> >> >> >> >> On Mon, Apr 4, 2016 at 10:59 AM, Yifan Ying <nafan...@gmail.com> wrote: >> >>> Hi Guozhang, >>> >>> It's 0.8.2.1. So it should be fixed? We also tried to start from scratch >>> by wiping out the data directory on both Kafka and Zookeeper. And it's odd >>> that the constant shrinking and expanding happened after fresh restart, and >>> high request latency as well. The brokers are using the same config before >>> topic deletion. >>> >>> Another observation is that, using the Kafka-consumer-offset-checker.sh >>> is extremely slow. Any suggestion would be appreciated! Thanks. >>> >>> On Sun, Apr 3, 2016 at 2:29 PM, Guozhang Wang <wangg...@gmail.com> >>> wrote: >>> >>>> Yifan, >>>> >>>> Are you on 0.8.0 or 0.8.1/2? There are some issues with zkVersion >>>> checking >>>> in 0.8.0 that are fixed in later minor releases of 0.8. >>>> >>>> Guozhang >>>> >>>> On Fri, Apr 1, 2016 at 7:46 PM, Yifan Ying <nafan...@gmail.com> wrote: >>>> >>>> > Hi All, >>>> > >>>> > We deleted a deprecated topic on Kafka cluster(0.8) and started >>>> observing >>>> > constant 'Expanding ISR for partition' and 'Shrinking ISR for >>>> partition' >>>> > for other topics. As a result we saw a huge number of under replicated >>>> > partitions and very high request latency from Kafka. And it doesn't >>>> seem >>>> > able to recover itself. >>>> > >>>> > Anyone knows what caused this issue and how to resolve it? >>>> > >>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>>> >>> >>> >>> >>> -- >>> Yifan >>> >>> >>> >> >> >> -- >> -- Guozhang >> > > > > -- > Yifan > > > -- -- Guozhang