Dear Samarth, Besides the discussions you have mentioned [1] I can recommend one of our recent presentations [2], especially the distinguishing Flink section (from slide 16).
It is generally a difficult question as both the systems are rapidly evolving, so the answer can become outdated quite fast. However there are fundamental design features that are highly unlikely to change, for example Spark uses "true" batch processing, meaning that intermediate results are materialized (mostly in memory) as RDDs. Flink's engine is internally more like streaming, forwarding the results to the next operator asap. The latter can yield performance benefits for more complex jobs. Flink also gives you a query optimizer, spills gracefully to disk when the system runs out of memory and has some cool features around serialization. For performance numbers and some more insight please check out the presentation [2] and do not hesitate to post a follow-up mail here if you come across something unclear or extraordinary. [1] http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark [2] http://www.slideshare.net/GyulaFra/flink-apachecon Best, Marton On Tue, Dec 23, 2014 at 6:19 PM, Samarth Mailinglist < [email protected]> wrote: > Hey folks, I have a noob question. > > I already looked up the archives and saw a couple of discussions > <http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1&query=spark> > about Spark and Flink. > > I am familiar with spark (the python API, esp MLLib), and I see many > similarities between Flink and Spark. > > How does Flink distinguish itself from Spark? >
