RE: Kafka Connect on Kubernetes: Statefulset vs Deployment

Prateek Kohli Sun, 15 Jun 2025 21:04:30 -0700

Thanks a lot @Vignesh & @Raphael Mazelier  for your detailed replies.

Even I thought the same, but I read this and now I’m a bit confused.

"In a Kafka Connect cluster, each worker node is identified by its advertised 
address. This identity is crucial because connectors and tasks are assigned to 
specific workers based on it.

When you use a Kubernetes Deployment, rolling updates result in Pods being 
recreated with new IPs and hostnames. Kafka Connect interprets these as 
entirely new worker nodes joining the cluster, while the old ones are seen as 
having left.

As a result, Kafka Connect takes some time (typically around 5 minutes) to 
recognize that the old nodes have departed and to reassign their tasks to the 
remaining active workers. During this delay, some tasks may remain inactive, 
leading to reduced service availability."

Strimzi also switched to using StrimziPodSet some time ago because of this 
issue.

https://github.com/strimzi/strimzi-kafka-operator/pull/8090

https://github.com/strimzi/strimzi-kafka-operator/issues/4676

Thanks

-----Original Message-----
From: Vignesh <davidviki...@gmail.com>
Sent: 16 June 2025 01:34
To: users@kafka.apache.org
Subject: Re: Kafka Connect on Kubernetes: Statefulset vs Deployment

[You don't often get email from davidviki...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Kafka Connect is a stateless component by design. It relies on external Kafka 
topics to persist its state, including connector configurations, offsets, and 
status updates. In a distributed Kafka Connect cluster, this state is managed 
through the following configurable topics:

   -

   config.storage.topic – stores connector configurations
   -

   offset.storage.topic – stores source connector offsets
   -

   status.storage.topic – stores the status of connectors and tasks

Because Kafka Connect does not maintain any state locally, it is not dependent 
on a specific IP address or hostname. As a result, it is best to deploy Kafka 
Connect using a *Kubernetes Deployment* rather than a *StatefulSet*, since 
Deployments are better suited for stateless applications and provide more 
flexibility with scaling and rolling updates.

Additionally, it is common practice to expose the Kafka Connect REST API via an 
*Ingress*, allowing external systems to submit and manage connectors.
We have deployed several instances of this as deployment for our use case from 
below repo - FYR
https://github.com/ibm-messaging/kafka-connect-mq-source

Thanks,
Vignesh

On Sun, Jun 15, 2025 at 12:12 AM Prateek Kohli <prateekkohli2...@gmail.com>
wrote:

> Hi All,
>
> I'm building a custom Docker image for kafka Connect and planning to
> run it on Kubernetes. I'm a bit stuck on whether I should use a
> Deployment or a StatefulSet.
>
> From what I understand, the main difference that could affect Kafka
> Connect is the hostname/IP behaviour. With a Deployment, pod IPs and
> hostnames can change after restarts. With a StatefulSet, each pod gets
> a stable hostname (like connect-0, connect-1, etc.)
>
> My question is: Does it really matter for Kafka Connect if the pod
> IPs/hostname change, considering its a stateless application?
>
> Thanks
>

RE: Kafka Connect on Kubernetes: Statefulset vs Deployment

Reply via email to