Re: Recovery of Kafka cluster takes very long time

Todd Palino Mon, 10 Aug 2015 09:14:22 -0700

It looks like you did an unclean shutdown of the cluster, in which case
each open log segment in each partition needs to be checked upon startup.
It doesn't really have anything to do with RF=3 specifically, but it does
mean that each of your brokers has 6000 partitions to check.

What is the setting of recovery.threads.per.data.dir in your broker
configuration? The default is 1, which means that upon startup and
shutdown, the broker only uses 1 thread for checking/closing log segments.
If you increase this, it will parallelize both the startup and shutdown
process. This is particularly helpful for recovering from unclean shutdown.
We generally set it to the number of CPUs in the system, because we want a
fast recovery.

-Todd

On Mon, Aug 10, 2015 at 8:57 AM, Alexey Sverdelov <
[email protected]> wrote:

> Hi all,
>
> I have a 3 node Kafka cluster. There are ten topics, every topic has 600
> partitions with RF3.
>
> So, after cluster restart I can see the following log message like "INFO
> Recovering unflushed segment 0 in log..." and the complete recovery of 3
> nodes takes about 2+ hours.
>
> I don't know why it takes so long? Is it because of RF=3?
>
> Have a nice day,
> Alexey
>

Re: Recovery of Kafka cluster takes very long time

Reply via email to