Hi Tamas, Thanks a lot for your suggestion! I will also investigate this one later.
Best regards, Yang 2017-01-03 12:38 GMT+01:00 Tamas Szuromi <tamas.szur...@odigeo.com>: > > You can also try https://github.com/zendesk/maxwell > > Tamas > > On 3 January 2017 at 12:25, Amrit Jangid <amrit.jan...@goibibo.com> wrote: > >> You can try out *debezium* : https://github.com/debezium. it reads data >> from bin-logs, provides structure and stream into Kafka. >> >> Now Kafka can be your new source for streaming. >> >> On Tue, Jan 3, 2017 at 4:36 PM, Yuanzhe Yang <yyz1...@gmail.com> wrote: >> >>> Hi Hongdi, >>> >>> Thanks a lot for your suggestion. The data is truely immutable and the >>> table is append-only. But actually there are different databases involved, >>> so the only feature they share in common and I can depend on is jdbc... >>> >>> Best regards, >>> Yang >>> >>> >>> 2016-12-30 6:45 GMT+01:00 任弘迪 <ryan.hd....@gmail.com>: >>> >>>> why not sync binlog of mysql(hopefully the data is immutable and the >>>> table is append-only), send the log through kafka and then consume it by >>>> spark streaming? >>>> >>>> On Fri, Dec 30, 2016 at 9:01 AM, Michael Armbrust < >>>> mich...@databricks.com> wrote: >>>> >>>>> We don't support this yet, but I've opened this JIRA as it sounds >>>>> generally useful: https://issues.apache.org/jira/browse/SPARK-19031 >>>>> >>>>> In the mean time you could try implementing your own Source, but that >>>>> is pretty low level and is not yet a stable API. >>>>> >>>>> On Thu, Dec 29, 2016 at 4:05 AM, "Yuanzhe Yang (杨远哲)" < >>>>> yyz1...@gmail.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Thanks a lot for your contributions to bring us new technologies. >>>>>> >>>>>> I don't want to waste your time, so before I write to you, I googled, >>>>>> checked stackoverflow and mailing list archive with keywords "streaming" >>>>>> and "jdbc". But I was not able to get any solution to my use case. I >>>>>> hope I >>>>>> can get some clarification from you. >>>>>> >>>>>> The use case is quite straightforward, I need to harvest a relational >>>>>> database via jdbc, do something with data, and store result into Kafka. I >>>>>> am stuck at the first step, and the difficulty is as follows: >>>>>> >>>>>> 1. The database is too large to ingest with one thread. >>>>>> 2. The database is dynamic and time series data comes in constantly. >>>>>> >>>>>> Then an ideal workflow is that multiple workers process partitions of >>>>>> data incrementally according to a time window. For example, the >>>>>> processing >>>>>> starts from the earliest data with each batch containing data for one >>>>>> hour. >>>>>> If data ingestion speed is faster than data production speed, then >>>>>> eventually the entire database will be harvested and those workers will >>>>>> start to "tail" the database for new data streams and the processing >>>>>> becomes real time. >>>>>> >>>>>> With Spark SQL I can ingest data from a JDBC source with partitions >>>>>> divided by time windows, but how can I dynamically increment the time >>>>>> windows during execution? Assume that there are two workers ingesting >>>>>> data >>>>>> of 2017-01-01 and 2017-01-02, the one which finishes quicker gets next >>>>>> task >>>>>> for 2017-01-03. But I am not able to find out how to increment those >>>>>> values >>>>>> during execution. >>>>>> >>>>>> Then I looked into Structured Streaming. It looks much more promising >>>>>> because window operations based on event time are considered during >>>>>> streaming, which could be the solution to my use case. However, from >>>>>> documentation and code example I did not find anything related to >>>>>> streaming >>>>>> data from a growing database. Is there anything I can read to achieve my >>>>>> goal? >>>>>> >>>>>> Any suggestion is highly appreciated. Thank you very much and have a >>>>>> nice day. >>>>>> >>>>>> Best regards, >>>>>> Yang >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> >> Regards, >> Amrit >> Data Team >> > >