Re: Kafka Connect on Kubernetes: Statefulset vs Deployment

Federico Valeri Sat, 14 Jun 2025 07:04:09 -0700

Hi Prateek.

In a Kafka Connect cluster, the advertised address represents the
identity of the worker node. Connectors and tasks are scheduled to the
individual worker nodes based on their identity.

If you use a Kubernetes Deployment, when you roll the cluster, new
Pods with new IPs will be be created for each worker node. Kafka
Connect sees this as new worker node joining the cluster followed by
an existing node leaving the cluster (RollingUpdate strategy), or the
other way around (Recreate strategy).

In this case, it takes some time (5 minutes by default) to detect that
the old worker node has left and start rescheduling connectors and
tasks on the other nodes. During this time, some tasks would not run,
impacting service availability.

For this reason, I think it's better to go with StatefulSets and an
associated Headless Service to also have stable DNS records for each
Kafka Connect Pod. Alternatively, you can use Strimzi, where this
problem is already solved.

Hope it helps.

On Sat, Jun 14, 2025 at 2:28 PM Prateek Kohli
<prateek.ko...@ericsson.com.invalid> wrote:
>
> Hi All,
>
> I'm building a custom Docker image for kafka Connect and planning to run it 
> on Kubernetes. I'm a bit stuck on whether I should use a Deployment or a 
> StatefulSet.
>
> From what I understand, the main difference that could affect Kafka Connect 
> is the hostname/IP behaviour. With a Deployment, pod IPs and hostnames can 
> change after restarts. With a StatefulSet, each pod gets a stable hostname 
> (like connect-0, connect-1, etc.)
>
> My question is: Does it really matter for Kafka Connect(task reassignment) if 
> the pod IPs/hostnames(this will be the worker_Id as well) change on restarts, 
> considering its a stateless application?
>
> Thanks

Re: Kafka Connect on Kubernetes: Statefulset vs Deployment

Reply via email to