Re: Broker shutdown slowdown between 1.1.0 and 2.3.1

Nicholas Feinberg Thu, 21 Nov 2019 16:50:36 -0800

On Thu, Nov 21, 2019 at 4:25 PM Peter Bukowinski <pmb...@gmail.com> wrote:


> How many partitions are on each of your brokers? That’s a key factor
> affecting shutdown and startup time.
>

The test hosts run about 384 partitions each (7 topics * 128 partitions
each * 3x replication / 7 brokers). The largest prod cluster has about 1344
partitions/broker; the smallest and slowest has 2560.


> I’m currently doing a rolling restart of a 150-broker cluster running
> kafka 2.3.1. The cluster is very busy (~500k msg/sec, ~1GB/sec). Each
> broker has about 65 partitions. Each broker restart cycle (stop/start,
> rejoin ISR) takes about 90 seconds.
>

In our largest prod cluster (16 d2.8xlarge broker cluster, 200k msg/s, 300
MB/s), our restart cycles take about 3 minutes on 1.1.0 (counting
ISR-rejoin time) and about 30 minutes on 2.3.1. The only other change we
made between versions was increasing heap size from 8G to 16G.

Thanks for the response!


>
> > On Nov 21, 2019, at 3:52 PM, Nicholas Feinberg <nicho...@liftoff.io>
> wrote:
> >
> > I've been looking at upgrading my cluster from 1.1.0 to 2.3.1. While
> > testing, I've noticed that shutting brokers down seems to take
> consistently
> > longer on 2.3.1. Specifically, the process of 'creating snapshots' seems
> to
> > take several times longer than it did on 1.1.0. On a small testing setup,
> > the time needed to create snapshots and shut down goes from ~20s to
> ~120s;
> > with production-scale data, it goes from ~2min to ~30min.
> >
> > To allow myself to roll back, I'm still using the 1.1 versions of the
> > inter-broker protocol and the message format - is it possible that those
> > could slow things down in 2.3.1? If not, any ideas what else could be at
> > fault, or what I could do to narrow down the issue further?
> >
> > Thanks!
> > -Nicholas
>
>

Re: Broker shutdown slowdown between 1.1.0 and 2.3.1

Reply via email to