While Flink may not be younger than Spark, Spark came to Apache first, which always helps. Plus, there was already a lot of buzz around Spark before it came to Apache. Coming from Berkeley also helps.
That said, Flink seems decently healthy to me: - http://search-hadoop.com/?fc_project=Flink&fc_type=mail+_hash_+user&q= - http://search-hadoop.com/?fc_project=Flink&fc_type=mail+_hash_+dev&q= - http://search-hadoop.com/?fc_project=Flink&fc_type=issue&q=&startDate=1445472000000&endDate=1461024000000 Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ On Sun, Apr 17, 2016 at 5:55 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Assuming that both Spark and Flink are contemporaries what are the reasons > that Flink has not been adopted widely? (this may sound obvious and or > prejudged). I mean Spark has surged in popularity in the past year if I am > correct > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 17 April 2016 at 22:49, Michael Malak <michaelma...@yahoo.com> wrote: > >> In terms of publication date, a paper on Nephele was published in 2009, >> prior to the 2010 USENIX paper on Spark. Nephele is the execution engine of >> Stratosphere, which became Flink. >> >> >> ------------------------------ >> *From:* Mark Hamstra <m...@clearstorydata.com> >> *To:* Mich Talebzadeh <mich.talebza...@gmail.com> >> *Cc:* Corey Nolet <cjno...@gmail.com>; "user @spark" < >> user@spark.apache.org> >> *Sent:* Sunday, April 17, 2016 3:30 PM >> *Subject:* Re: Apache Flink >> >> To be fair, the Stratosphere project from which Flink springs was started >> as a collaborative university research project in Germany about the same >> time that Spark was first released as Open Source, so they are near >> contemporaries rather than Flink having been started only well after Spark >> was an established and widely-used Apache project. >> >> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> Also it always amazes me why they are so many tangential projects in Big >> Data space? Would not it be easier if efforts were spent on adding to Spark >> functionality rather than creating a new product like Flink? >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> >> On 17 April 2016 at 21:08, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >> Thanks Corey for the useful info. >> >> I have used Sybase Aleri and StreamBase as commercial CEPs engines. >> However, there does not seem to be anything close to these products in >> Hadoop Ecosystem. So I guess there is nothing there? >> >> Regards. >> >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> >> On 17 April 2016 at 20:43, Corey Nolet <cjno...@gmail.com> wrote: >> >> i have not been intrigued at all by the microbatching concept in Spark. I >> am used to CEP in real streams processing environments like Infosphere >> Streams & Storm where the granularity of processing is at the level of each >> individual tuple and processing units (workers) can react immediately to >> events being received and processed. The closest Spark streaming comes to >> this concept is the notion of "state" that that can be updated via the >> "updateStateBykey()" functions which are only able to be run in a >> microbatch. Looking at the expected design changes to Spark Streaming in >> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on >> the radar for Spark, though I have seen articles stating that more effort >> is going to go into the Spark SQL layer in Spark streaming which may make >> it more reminiscent of Esper. >> >> For these reasons, I have not even tried to implement CEP in Spark. I >> feel it's a waste of time without immediate tuple-at-a-time processing. >> Without this, they avoid the whole problem of "back pressure" (though keep >> in mind, it is still very possible to overload the Spark streaming layer >> with stages that will continue to pile up and never get worked off) but >> they lose the granular control that you get in CEP environments by allowing >> the rules & processors to react with the receipt of each tuple, right away. >> >> Awhile back, I did attempt to implement an InfoSphere Streams-like API >> [1] on top of Apache Storm as an example of what such a design may look >> like. It looks like Storm is going to be replaced in the not so distant >> future by Twitter's new design called Heron. IIRC, Heron does not have an >> open source implementation as of yet. >> >> [1] https://github.com/calrissian/flowmix >> >> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> Hi Corey, >> >> Can you please point me to docs on using Spark for CEP? Do we have a set >> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries >> for Spark something like below >> >> >> >> >> Thanks >> >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> >> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote: >> >> One thing I've noticed about Flink in my following of the project has >> been that it has established, in a few cases, some novel ideas and >> improvements over Spark. The problem with it, however, is that both the >> development team and the community around it are very small and many of >> those novel improvements have been rolled directly into Spark in subsequent >> versions. I was considering changing over my architecture to Flink at one >> point to get better, more real-time CEP streaming support, but in the end I >> decided to stick with Spark and just watch Flink continue to pressure it >> into improvement. >> >> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com> >> wrote: >> >> i never found much info that flink was actually designed to be fault >> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that >> doesn't bode well for large scale data processing. spark was designed with >> fault tolerance in mind from the beginning. >> >> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> Hi, >> >> I read the benchmark published by Yahoo. Obviously they already use Storm >> and inevitably very familiar with that tool. To start with although these >> benchmarks were somehow interesting IMO, it lend itself to an assurance >> that the tool chosen for their platform is still the best choice. So >> inevitably the benchmarks and the tests were done to support primary their >> approach. >> >> In general anything which is not done through TCP Council or similar body >> is questionable.. >> Their argument is that because Spark handles data streaming in micro >> batches then inevitably it introduces this in-built latency as per design. >> In contrast, both Storm and Flink do not (at the face value) have this >> issue. >> >> In addition as we already know Spark has far more capabilities compared >> to Flink (know nothing about Storm). So really it boils down to the >> business SLA to choose which tool one wants to deploy for your use case. >> IMO Spark micro batching approach is probably OK for 99% of use cases. If >> we had in built libraries for CEP for Spark (I am searching for it), I >> would not bother with Flink. >> >> HTH >> >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> >> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU < >> ovidiu-cristian.ma...@inria.fr> wrote: >> >> You probably read this benchmark at Yahoo, any comments from Spark? >> >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at >> <https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at?soc_src=mail&soc_trk=ma> >> >> >> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com> wrote: >> >> Just adding one thing to the mix: `that the latency for streaming data is >> eliminated` is insane :-D >> >> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> It seems that Flink argues that the latency for streaming data is >> eliminated whereas with Spark RDD there is this latency. >> >> I noticed that Flink does not support interactive shell much like Spark >> shell where you can add jars to it to do kafka testing. The advice was to >> add the streaming Kafka jar file to CLASSPATH but that does not work. >> >> Most Flink documentation also rather sparce with the usual example of >> word count which is not exactly what you want. >> >> Anyway I will have a look at it further. I have a Spark Scala streaming >> Kafka program that works fine in Spark and I want to recode it using Scala >> for Flink with Kafka but have difficulty importing and testing libraries. >> >> Cheers >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> >> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> wrote: >> >> I compared both last month, seems to me that Flink's MLLib is not yet >> ready. >> >> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> Thanks Ted. I was wondering if someone is using both :) >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> >> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote: >> >> Looks like this question is more relevant on flink mailing list :-) >> >> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >> Hi, >> >> Has anyone used Apache Flink instead of Spark by any chance >> >> I am interested in its set of libraries for Complex Event Processing. >> >> Frankly I don't know if it offers far more than Spark offers. >> >> Thanks >> >> Dr Mich Talebzadeh >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> http://talebzadehmich.wordpress.com >> >> >> >> >> >> >> -- >> andy >> >> >> >> >> >> >> >> >> >> >> >> >> >