I'm trying to figure out the best way to handle a disk failure in a live
environment.

The obvious (and naive) solution is to decommission the broker and let
other brokers taker over and create new followers. Then replace the disk
and clean the remaining log directories and add the broker again.

The disadvantage with this approach is of course the network overhead and
the time it takes to reassign partitions.

Is there a better way?

As a sub question, is it possible to continue running a broker with a
failed drive and still serve the remaining partitions?

thanks,
svante

Reply via email to