[kafka-connect] multiple or single clusters?

noah Fri, 24 Jun 2016 11:17:57 -0700

I'm having some trouble figuring out the right way to run Kafka Connect in
production. We will have multiple sink connectors that we need to remain
running indefinitely and have at least once semantics (with as little
duplication as possible) so it seems clear that we need to run in
distributed mode so that our offsets are durable and we can scale up by
adding new distributed mode instances of Connect.


What isn't clear is what the best way to run multiple, heterogenous
connectors in distributed mode is. It looks like every instance of Connect
will read the config/status topics and take on some number of tasks (and
that tasks can't be assigned to specific running instances of Connect.) It
also looks like it is only possible to configure 1 key and value converter
per Connect instance. So if I need two different conversion strategies, I'd
need to either write a custom converter that can figure it out, or run
multiple Connect clusters, each with their own set of config+offset+status
topics.

Is that right? Worst case, I need another set of N distributed Connect
instances per sink/source, which ends up being a lot of topics to manage.
What does a real-world Connect topology look like?

[kafka-connect] multiple or single clusters?

Reply via email to