Kafka Connect - how to deal with multiple formats in Kafka?

Michael Sklyar Mon, 22 Aug 2016 10:39:44 -0700

I am looking into Kafka Connect and Confluent HDFSSinkConnector.

The goal is to save data from various topics to HDFS.
We have at least two different formats of the data in Kafka - raw data
(JSON) - that we want to save as SequenceFile and normalized data
(Protobuf) that we want to save as Parquet.


(I understand that Confluent expects to use Avro but I succeeded with
writing my custom converters and RecordWriters that work fine without Avro
and ShemaRegistry).

Question: Is there a specific reason that key.converter value,converter are
defined per Kafka Connect cluster and not per a specific connector?

It means that all the data in Kafka(in all the topics) should be stored in
the same format - or I will need two different clusters: one with
value.converter = MyCustomJsonConverter and another with
MyCustomProtobufConverter.

It becomes even worse in case of Protobuf - every topic has a different
Protobuf schema - therefore needs a different converter and having a dozen
of Kafka clusters sounds like a very bad option.

Wouldn't it make more sense to have the key.converter and value.converter
defined on the specific Connector level?

Any other suggestions?

Kafka Connect - how to deal with multiple formats in Kafka?

Reply via email to