I’m creating a custom Kafka Connect source connector, and I’m running into a situation for which Kafka Connect doesn’t seem to provide a solution out of the box. I thought I’d first post to the users list in case I’m just missing a feature that’s already there.
My connector’s SourceTask implementation is reading a relational database transaction log. That log contains schema changes and row changes, and the row changes include a reference to the table and the row values. Thus, as the task processes the log, it has to use any schema changes in the log to adjust how it converts subsequent row changes into Kafka source records. Should the task stop and be restarted elsewhere, it can continue reading the transaction log where it left off only if that new task instance can recover the schema state accumulated by an earlier task. While I certainly can use a custom solution to store this state somewhere, it seems like other connectors might benefit from having Kafka Connect include something out of the box. And, this accumulated state (and its history with respect to the source offset at which the state changes) seems like a perfect fit for storing in a Kafka topic. Does Kafka Connect already have a mechanism for tasks to store and recover arbitrary state? If not, then is there interest in adding this capability to Kafka Connect? (If there is interest, then perhaps the dev list is a better venue.) Best regards, Randall Hauch