Re: Insanely long recovery time with Kafka 0.11.0.2

Vincent Rischmann Sat, 06 Jan 2018 07:19:14 -0800

Here's an excerpt just after the broker started: https://pastebin.com/tZqze4Ya


After more than 8 hours of recovery the broker finally started. I haven't read 
through all 8 hours of log but the parts I looked at are like the pastebin.

I'm not seeing much in the log cleaner logs either, they look normal. We have a 
couple of compacted topics but seems only the consumer offsets is ever 
compacted (the other topics don't have much traffic).

On Sat, Jan 6, 2018, at 12:02 AM, Brett Rann wrote:
> What do the broker logs say its doing during all that time?
> 
> There are some consumer offset / log cleaner bugs which caused us similarly
> log delays. that was easily visible by watching the log cleaner activity in
> the logs, and in our monitoring of partition sizes watching them go down,
> along with IO activity on the host for those files.
> 
> On Sat, Jan 6, 2018 at 7:48 AM, Vincent Rischmann <vinc...@rischmann.fr>
> wrote:
> 
> > Hello,
> >
> > so I'm upgrading my brokers from 0.10.1.1 to 0.11.0.2 to fix this bug
> > https://issues.apache.org/jira/browse/KAFKA-4523
> > <https://issues.apache.org/jira/browse/KAFKA-4523>
> > Unfortunately while stopping one broker, it crashed exactly because of
> > this bug. No big deal usually, except after restarting Kafka in 0.11.0.2
> > the recovery is taking a really long time.
> > I have around 6TB of data on that broker, and before when it crashed it
> > usually took around 30 to 45 minutes to recover, but now I'm at almost
> > 5h since Kafka started and it's still not recovered.
> > I'm wondering what could have changed to have such a dramatic effect on
> > recovery time ? Is there maybe something I can tweak to try to reduce
> > the time ?
> > Thanks.
> >

Re: Insanely long recovery time with Kafka 0.11.0.2

Reply via email to