Re: Best practice to process DB stored log (is Flink the right choice?)

Piotr Nowojski Mon, 17 Jun 2019 06:04:22 -0700

Hi,

Those are good questions.


> A datastream to connect to a table is available? I

What table, what database system do you mean? You can check the list of 
existing connectors provided by Flink in the documentation. About reading from 
relational DB (example by using JDBC) you can read a little bit here: 
https://stackoverflow.com/questions/48162464/how-to-read-data-from-relational-database-in-apache-flink-streaming
 
<https://stackoverflow.com/questions/48162464/how-to-read-data-from-relational-database-in-apache-flink-streaming>

> I Flick an optimal option for that rather simple processing?

It depends on many things. What you have described could be easily done by some 
trivial python script, however you would have to answer yourself couple of 
questions:

- what is the scale at which you would like to operate? Would your computation 
need to be distributed across multiple machines in a foreseeable future?
- do you care about reliability? What should happen in case of failures? Do you 
need High Availability?
- could you have more use cases/requirements in the future?
- do you care about at-least-once or exactly-once processing guarantees?
- do you care if you lost your computation state in case of failure?
- how do you want to deploy your job (flink provides out of the box integration 
with many systems like Mesos, Yarn etc…)
- will you need to integrate with some other external systems, for which Flink 
has built in support (like S3 file system, Kafka, Kinesis, …)
- do you care about monitoring your job? (Flink has built-in metrics)
- …

If you do not care about those things and you only need to process small number 
of records per second, then Flink might be an overkill. If not, or if you are 
not sure, then I would encourage you to read/do the research about the above 
mention things to make up your mind.

Piotrek

> On 15 Jun 2019, at 18:27, Stefano Lissa <sato...@gmail.com> wrote:
> 
> Hi,
> surely really new-bye question but Flink sounds like to be the best choice. I 
> have a log continuously added to a database table where a machine status is 
> stored with a timestamp (the status is 0, 1, 2).
> 
> What I need is to process this status and produce another stream where a 
> sequence of status X is aggregated producing a new record with the first 
> status X timestamp found in the input stream and the time delta until a new 
> status different from X is seen.
> 
> A datastream to connect to a table is available? I've tried to find something 
> in the documentation, but not sure if I searched in the right place.
> 
> I Flick an optimal option for that rather simple processing?
> 
> Thank you, Stefano.

Re: Best practice to process DB stored log (is Flink the right choice?)

Reply via email to