Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

Xiao Tue, 17 Mar 2015 22:02:08 -0700

Hi, all, 

Do you know whether Linkedin plans to open source Lumos in the near future?


I found the answer from Qiao Lin’s post about replication from Oracle/mySQL to 
Hadoop. 

        - https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease

At the source side, it can be DataBus-based or file based. 

At the target side, it is Lumos to rebuild the snapshots due to inability to do 
an update/delete in Hadoop. 

The slides about Lumos:
        http://www.slideshare.net/Hadoop_Summit/th-220p230-cramachandranv1
The talk about Lumos: 
        https://www.youtube.com/watch?v=AGlRjlrNDYk

Event publishing is different from database replication. Kafka is used for 
change publishing or maybe also used for sending changes (recorded in files). 

Thanks, 

Xiao Li

On Mar 17, 2015, at 7:26 PM, Arya Ketan <ketan.a...@gmail.com> wrote:

> AFAIK , linkedin uses databus to do the same. Aesop is built on top of
> databus , extending its beautiful capabilities to mysql n hbase
> On Mar 18, 2015 7:37 AM, "Xiao" <lixiao1...@gmail.com> wrote:
> 
>> Hi, all,
>> 
>> Do you know how Linkedin team publishes changed rows in Oracle to Kafka? I
>> believe they already knew the whole problem very well.
>> 
>> Using triggers? or directly parsing the log? or using any Oracle
>> GoldenGate interfaces?
>> 
>> Any lesson or any standard message format? Could the Linkedin people share
>> it with us? I believe it can help us a lot.
>> 
>> Thanks,
>> 
>> Xiao Li
>> 
>> 
>> On Mar 17, 2015, at 12:26 PM, James Cheng <jch...@tivo.com> wrote:
>> 
>>> This is a great set of projects!
>>> 
>>> We should put this list of projects on a site somewhere so people can
>> more easily see and refer to it. These aren't Kafka-specific, but most seem
>> to be "MySQL CDC." Does anyone have a place where they can host a page?
>> Preferably a wiki, so we can keep it up to date easily.
>>> 
>>> -James
>>> 
>>> On Mar 17, 2015, at 8:21 AM, Hisham Mardam-Bey <
>> hisham.mardam...@gmail.com> wrote:
>>> 
>>>> Pretty much a hijack / plug as well (=
>>>> 
>>>> https://github.com/mardambey/mypipe
>>>> 
>>>> "MySQL binary log consumer with the ability to act on changed rows and
>>>> publish changes to different systems with emphasis on Apache Kafka."
>>>> 
>>>> Mypipe currently encodes events using Avro before pushing them into
>> Kafka
>>>> and is Avro schema repository aware. The project is young; and patches
>> for
>>>> improvements are appreciated (=
>>>> 
>>>> On Mon, Mar 16, 2015 at 10:35 PM, Arya Ketan <ketan.a...@gmail.com>
>> wrote:
>>>> 
>>>>> Great work.
>>>>> Sorry for kinda hijacking this thread, but I though that we had built
>>>>> some-thing on mysql bin log event propagator and wanted to share it .
>>>>> You guys can also look into Aesop ( https://github.com/Flipkart/aesop
>> ).
>>>>> Its
>>>>> a change propagation frame-work. It has relays which listens to bin
>> logs of
>>>>> Mysql, keeps track of SCNs  and has consumers which can then
>> (transform/map
>>>>> or interpret as is) the bin log-event to a destination. Consumers also
>> keep
>>>>> track of SCNs and a slow consumer can go back to a previous SCN if it
>> wants
>>>>> to re-listen to events  ( similar to kafka's consumer view ).
>>>>> 
>>>>> All the producers/consumers are extensible and you can write your own
>>>>> custom consumer and feed off the data to it.
>>>>> 
>>>>> Common use-cases:
>>>>> a) Archive mysql based data into say hbase
>>>>> b) Move mysql based data to say a search store for serving reads.
>>>>> 
>>>>> It has a decent ( not an awesome :) ) console too which gives a nice
>> human
>>>>> readable view of where the producers and consumers are.
>>>>> 
>>>>> Current supported producers are mysql bin logs, hbase wall-edits.
>>>>> 
>>>>> 
>>>>> Further insights/reviews/feature reqs/pull reqs/advices are all
>> welcome.
>>>>> 
>>>>> --
>>>>> Arya
>>>>> 
>>>>> Arya
>>>>> 
>>>>> On Tue, Mar 17, 2015 at 1:48 AM, Gwen Shapira <gshap...@cloudera.com>
>>>>> wrote:
>>>>> 
>>>>>> Really really nice!
>>>>>> 
>>>>>> Thank you.
>>>>>> 
>>>>>> On Mon, Mar 16, 2015 at 7:18 AM, Pierre-Yves Ritschard <
>> p...@spootnik.org
>>>>>> 
>>>>>> wrote:
>>>>>>> Hi kafka,
>>>>>>> 
>>>>>>> I just wanted to mention I published a very simple project which can
>>>>>>> connect as MySQL replication client and stream replication events to
>>>>>>> kafka: https://github.com/pyr/sqlstream
>>>>>>> 
>>>>>>> When you don't have control over an application, it can provide a
>>>>> simple
>>>>>>> way of consolidating SQL data in kafka.
>>>>>>> 
>>>>>>> This is an early release and there are a few caveats (mentionned in
>> the
>>>>>>> README), mostly the poor partitioning which I'm going to evolve
>> quickly
>>>>>>> and the reconnection strategy which doesn't try to keep track of
>> binlog
>>>>>>> position, other than that, it should work as advertised.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> - pyr
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Hisham Mardam-Bey
>>>> http://hisham.cc/
>>> 
>> 
>>

Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

Reply via email to