Vadim,

Controlled shutdown takes 2 parameters - number of retries and shutdown
timeout. In every retry, controlled shutdown attempts to move leaders off
of the broker that needs to be shutdown. If the controlled shutdown runs
out of retries, it proceeds to shutting down the broker even if it still
hosts a few leaders. At LinkedIn, the script to bounce Kafka brokers waits
for the under replicated partition count to drop to 0 before invoking
controlled shutdown on the next broker. The aim is to avoid data loss that
occurs if you shut down a broker that still has some leaders. If the under
replicated count never drops to 0, it indicates a bug in Kafka code and the
script does not proceed to bouncing any more brokers in a cluster. We
measure the time it takes to move "n" leaders off of some broker, and
configure the shutdown timeout accordingly. We also configure the retries
to a small number (2 or 3). If the controlled shutdown fails the retries,
the broker shuts itself down anyways. In general, you want to avoid hard
killing (kill -9) a broker since that means the broker will run a long
running log recovery process on startup. That significantly delays the time
the broker takes to rejoin the cluster.

Thanks,
Neha


On Sun, Aug 18, 2013 at 3:33 PM, Vadim Keylis <vkeylis2...@gmail.com> wrote:

> Good afternoon. We are running kafka on centos linux. I enabled controlled
> shutdown in the property file. We are starting/stopping kafka using init
> script. The init script will issue term signal first followed 3 seconds
> later by kill signal. Is that right process to shutdown kafka? Which
> startup/shutdown/restart script you guys use? What shutdown process
> linkedin uses? What side effects could be after kafka service is killed
> uncleanly using kill -9 signal?
>
> Thanks,
> Vadim
>

Reply via email to