On Thu, Nov 21, 2019 at 4:25 PM Peter Bukowinski <pmb...@gmail.com> wrote:
> How many partitions are on each of your brokers? That’s a key factor > affecting shutdown and startup time. > The test hosts run about 384 partitions each (7 topics * 128 partitions each * 3x replication / 7 brokers). The largest prod cluster has about 1344 partitions/broker; the smallest and slowest has 2560. > I’m currently doing a rolling restart of a 150-broker cluster running > kafka 2.3.1. The cluster is very busy (~500k msg/sec, ~1GB/sec). Each > broker has about 65 partitions. Each broker restart cycle (stop/start, > rejoin ISR) takes about 90 seconds. > In our largest prod cluster (16 d2.8xlarge broker cluster, 200k msg/s, 300 MB/s), our restart cycles take about 3 minutes on 1.1.0 (counting ISR-rejoin time) and about 30 minutes on 2.3.1. The only other change we made between versions was increasing heap size from 8G to 16G. Thanks for the response! > > > On Nov 21, 2019, at 3:52 PM, Nicholas Feinberg <nicho...@liftoff.io> > wrote: > > > > I've been looking at upgrading my cluster from 1.1.0 to 2.3.1. While > > testing, I've noticed that shutting brokers down seems to take > consistently > > longer on 2.3.1. Specifically, the process of 'creating snapshots' seems > to > > take several times longer than it did on 1.1.0. On a small testing setup, > > the time needed to create snapshots and shut down goes from ~20s to > ~120s; > > with production-scale data, it goes from ~2min to ~30min. > > > > To allow myself to roll back, I'm still using the 1.1 versions of the > > inter-broker protocol and the message format - is it possible that those > > could slow things down in 2.3.1? If not, any ideas what else could be at > > fault, or what I could do to narrow down the issue further? > > > > Thanks! > > -Nicholas > >