Hi ,
We are trying to write a Kafka-connect connector for Mongodb. The issue is,
MongoDB does not provide an entire changed document for update operations,
It just provides the modified fields.
if Kafka allows custom log compaction then It is possible to eventually
merge an entire document and su
Sunny,
As I said on Twitter, I'm stoked to hear you're working on a Mongo
connector! It struck me as a pretty natural source to tackle since it does
such a nice job of cleanly exposing the op log.
Regarding the problem of only getting deltas, unfortunately there is not a
trivial solution here --
Hey Ewen, how come you need to get it all in memory for approach (1)? I
guess the obvious thing to do would just be to query for the record
after-image when you get the diff--e.g. just read a batch of changes and
multi-get the final values. I don't know how bad the overhead of this would
be...batch
Jay,
You can query after the fact, but you're not necessarily going to get the
same value back. There could easily be dozens of changes to the document in
the oplog so the delta you see may not even make sense given the current
state of the document. Even if you can apply it the delta, you'd still
Ah, agreed. This approach is actually quite common in change capture,
though. For many use cases getting the final value is actually preferable
to getting intermediates. The exception is usually if you want to do
analytics on something like number of changes.
On Fri, Jan 29, 2016 at 9:35 AM, Ewen
Also, most database provide a "full logging" option that let's you capture
the whole row in the log (I know Oracle and MySQL have this) but it sounds
like Mongo doesn't yet. That would be the ideal solution.
-Jay
On Fri, Jan 29, 2016 at 9:38 AM, Jay Kreps wrote:
> Ah, agreed. This approach is a
Not sure if this will help anything, but just throwing it out there.
The Maxwell and mypipe projects both do CDC from MySQL and support
bootstrapping. The way they do it is kind of "eventually consistent".
1) At time T1, record coordinates of the end of the binlog as of T1.
2) At time T2, do a f
Hello Everyone,
Thanks a lot for your valuable responses.
We will use external database to store Key and Kafka-offset, We won't set
any preference on which database to use, We will leave it to the
driver-user by using a flexible data-access-model like Apache Metamodel.
@James, Even for MongoDB c