Re: Flink and Spark

Márton Balassi Wed, 24 Dec 2014 15:02:25 -0800

Dear Samarth,

Besides the discussions you have mentioned [1] I can recommend one of our
recent presentations [2], especially the distinguishing Flink section (from
slide 16).

It is generally a difficult question as both the systems are rapidly
evolving, so the answer can become outdated quite fast. However there are
fundamental design features that are highly unlikely to change, for example
Spark uses "true" batch processing, meaning that intermediate results are
materialized (mostly in memory) as RDDs. Flink's engine is internally more
like streaming, forwarding the results to the next operator asap. The
latter can yield performance benefits for more complex jobs. Flink also
gives you a query optimizer, spills gracefully to disk when the system runs
out of memory and has some cool features around serialization. For
performance numbers and some more insight please check out the presentation
[2] and do not hesitate to post a follow-up mail here if you come across
something unclear or extraordinary.

[1]
http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark
[2] http://www.slideshare.net/GyulaFra/flink-apachecon

Best,

Marton

On Tue, Dec 23, 2014 at 6:19 PM, Samarth Mailinglist <
[email protected]> wrote:

> Hey folks, I have a noob question.
>
> I already looked up the archives and saw a couple of discussions
> <http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark>
> about Spark and Flink.
>
> I am familiar with spark (the python API, esp MLLib), and I see many
> similarities between Flink and Spark.
>
> How does Flink distinguish itself from Spark?
>

Re: Flink and Spark

Reply via email to