Mark, There are definitely limitations to using JDBC for change data capture. Using a database-specific implementation, especially if you can read directly off the database's log, will be able to handle more situations like this. Cases like the one you describe are difficult to address efficiently working only with simple queries.
The JDBC connector offers a few different modes for handling incremental queries. One of them uses both a timestamp and a unique ID, which will be more robust to issues like these. However, even with both, you can still come up with variants that can cause issues like the one you describe. You also have the option of using a custom query which might help if you can do something smarter by making assumptions about your table, but for now that's pretty limited for constructing incremental queries since the connector doesn't provide a way to track offset columns with custom queries. I'd like to improve the support for this in the future, but at some point it starts making sense to look at database-specific connectors. (By the way, this gets even messier once you start thinking about the variety of different isolation levels people may be using...) -Ewen P.S. Where to ask these questions is a bit confusing since Connect is part of Kafka. In general, for specific connectors I'd suggest asking on the corresponding mailing list for the project, which in the case of the JDBC connector would be the Confluent Platform mailing list here: https://groups.google.com/forum/#!forum/confluent-platform On Wed, Dec 16, 2015 at 5:27 AM, Mark Drago <markdr...@gmail.com> wrote: > I had asked this in a github issue but I'm reposting here to try and get an > answer from a wider audience. > > Has any thought gone into how kafka-connect-jdbc will be impacted by SQL > transactions committing IDs and timestamps out-of-order? Let me give an > example with two connections. > > 1: begin transaction > 1: insert (get id 1) > 2: begin transaction > 2: insert (get id 2) > 2: commit (recording id 2) > kafka-connect-jdbc runs and thinks it has handled everything through id 2 > 1: commit (recording id 1) > > This would result in kafka-connect-jdbc missing id 1. The same thing could > happen with timestamps. I've read through some of the kafka-connect-jdbc > code and I think it may be susceptible to this problem, but I haven't run > it or verified that it would be an issue. Has this come up before? Are > there plans to deal with this situation? > > Obviously something like bottled-water for postgresql would handle this > nicely as it would get the changes once they're committed. > > > Thanks for any insight, > > Mark. > > > Original github issue: > https://github.com/confluentinc/kafka-connect-jdbc/issues/27 > -- Thanks, Ewen