Hi,

it is true that there is no dedicated machine learning library for Flink. Flink is a general data processing framework. It allows to embedded any available algorithm library within user-defined functions.

Flink's focus is on stream processing. There are not many dedicated stream processing algorithms out there. Usually, people run batch jobs to train models and just evaluate the model in independent parallel operator instances.

There are some ML efforts going on and I'm sure there will be more in the future. But for now the community focuses on developing a very good streaming runtime core.

https://github.com/alibaba/Alink

https://www.ververica.com/blog/flink-for-online-machine-learning-and-real-time-processing-at-weibo

I hope this helps a bit.

Regards,
Timo


On 31.01.21 06:01, Bilinmek Istemiyor wrote:
Hello

I am a complete newbie and I need help. I am evaluating the usage of flink for my academic study and reading the documentation. I have a bit of experience in Apache Spark. I am asking this question, based on my experience in Apache Spark.

In spark, there is a machine learning library embedded in the framework.  To the best of my knowledge,  the library is aware of RDD data structure and the machine learning algorithms do get benefits of cluster processing. I have read about flink cluster capability but I have not seen a machine learning library for flink. I have seen some references of machine learning library for flink  in google searches but they are linked to older versions of flink. It seems machine learning library has been dropped from the flink in latest releases.

My questions are;

1. Is it true that there is no customized machine learning library for flink  or I am missing something? 2. If there is no customized machine learning library for flink, what are my options?  Can I use any library which uses scala or java api? 3. If I use an external machine learning library, how this will impact cluster processing of flink. Does the processing of algorithms become bound to one flink instance? How can the algorithm be scaled, multiple machines?

I appreciate any response, please respond me gently, like a  talking to a kid....I am really newbie...

Thank in advance...



Reply via email to