MongoDB Kafka Connect driver

2016-01-26 Thread Sunny Shah
Hi , We are trying to write a Kafka-connect connector for Mongodb. The issue is, MongoDB does not provide an entire changed document for update operations, It just provides the modified fields. if Kafka allows custom log compaction then It is possible to eventually merge an entire document and su

Re: MongoDB Kafka Connect driver

2016-01-29 Thread Ewen Cheslack-Postava
Sunny, As I said on Twitter, I'm stoked to hear you're working on a Mongo connector! It struck me as a pretty natural source to tackle since it does such a nice job of cleanly exposing the op log. Regarding the problem of only getting deltas, unfortunately there is not a trivial solution here --

Re: MongoDB Kafka Connect driver

2016-01-29 Thread Jay Kreps
Hey Ewen, how come you need to get it all in memory for approach (1)? I guess the obvious thing to do would just be to query for the record after-image when you get the diff--e.g. just read a batch of changes and multi-get the final values. I don't know how bad the overhead of this would be...batch

Re: MongoDB Kafka Connect driver

2016-01-29 Thread Ewen Cheslack-Postava
Jay, You can query after the fact, but you're not necessarily going to get the same value back. There could easily be dozens of changes to the document in the oplog so the delta you see may not even make sense given the current state of the document. Even if you can apply it the delta, you'd still

Re: MongoDB Kafka Connect driver

2016-01-29 Thread Jay Kreps
Ah, agreed. This approach is actually quite common in change capture, though. For many use cases getting the final value is actually preferable to getting intermediates. The exception is usually if you want to do analytics on something like number of changes. On Fri, Jan 29, 2016 at 9:35 AM, Ewen

Re: MongoDB Kafka Connect driver

2016-01-29 Thread Jay Kreps
Also, most database provide a "full logging" option that let's you capture the whole row in the log (I know Oracle and MySQL have this) but it sounds like Mongo doesn't yet. That would be the ideal solution. -Jay On Fri, Jan 29, 2016 at 9:38 AM, Jay Kreps wrote: > Ah, agreed. This approach is a

Re: MongoDB Kafka Connect driver

2016-01-29 Thread James Cheng
Not sure if this will help anything, but just throwing it out there. The Maxwell and mypipe projects both do CDC from MySQL and support bootstrapping. The way they do it is kind of "eventually consistent". 1) At time T1, record coordinates of the end of the binlog as of T1. 2) At time T2, do a f

Re: MongoDB Kafka Connect driver

2016-02-06 Thread Sunny Shah
Hello Everyone, Thanks a lot for your valuable responses. We will use external database to store Key and Kafka-offset, We won't set any preference on which database to use, We will leave it to the driver-user by using a flexible data-access-model like Apache Metamodel. @James, Even for MongoDB c