Re: how to avoid Half-Dead Kafka Broker Scenarios in Self-Hosted OSS Kafka

Mohamed Saliha A Wed, 25 Jun 2025 17:44:17 -0700

Hi all,

Just following up on this thread.


I would appreciate any insights or guidance the community can share on this
topic whenever you have a chance.

Thank you very much for your time and support.

Best regards,

Saliha


On Mon, 23 Jun 2025 at 12:19 PM, Mohamed Saliha A <
[email protected]> wrote:

> Hello Kafka Community,
>
> I’d like to consult the community on best practices for handling and
> preventing what’s sometimes called a "half-dead" Kafka broker scenario in a
> self-hosted OSS Kafka environment.
>
> Specifically, I’m referring to situations where a broker appears healthy
> from a cluster perspective (i.e., still part of the ISR) but is no longer
> able to properly serve traffic, causing disruption to producers or
> consumers.
>
> I understand that some managed services like AWS MSK implement additional
> mechanisms (e.g., their "healing" state) to detect and handle such brokers,
> but I’d like to know how self-hosted OSS Kafka operators typically manage
> this risk.
>
> Some key questions:
>
>    -
>
>    Are there recommended monitoring patterns to detect a "half-dead"
>    broker more proactively?
>    -
>
>    Are there any community-recommended configurations, scripts, or tools
>    to automatically remove or restart such brokers?
>    -
>
>    Any lessons learned or operational best practices from other
>    self-hosted users?
>
> I would greatly appreciate any guidance ,Thank you in advance!
>
> Best regards,
>
> Saliha
>

Re: how to avoid Half-Dead Kafka Broker Scenarios in Self-Hosted OSS Kafka

Reply via email to