The problem is that the strength and wider acceptance of a typical Open source project is its sizeable user and development community. When the community is small like Flink, then it is not a viable solution to adopt
I am rather disappointed that no big data project can be used for Complex Event Processing as it has wider use in Algorithmic trading among others. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 17 April 2016 at 22:30, Mark Hamstra <m...@clearstorydata.com> wrote: > To be fair, the Stratosphere project from which Flink springs was started > as a collaborative university research project in Germany about the same > time that Spark was first released as Open Source, so they are near > contemporaries rather than Flink having been started only well after Spark > was an established and widely-used Apache project. > > On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Also it always amazes me why they are so many tangential projects in Big >> Data space? Would not it be easier if efforts were spent on adding to Spark >> functionality rather than creating a new product like Flink? >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 17 April 2016 at 21:08, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Thanks Corey for the useful info. >>> >>> I have used Sybase Aleri and StreamBase as commercial CEPs engines. >>> However, there does not seem to be anything close to these products in >>> Hadoop Ecosystem. So I guess there is nothing there? >>> >>> Regards. >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 17 April 2016 at 20:43, Corey Nolet <cjno...@gmail.com> wrote: >>> >>>> i have not been intrigued at all by the microbatching concept in Spark. >>>> I am used to CEP in real streams processing environments like Infosphere >>>> Streams & Storm where the granularity of processing is at the level of each >>>> individual tuple and processing units (workers) can react immediately to >>>> events being received and processed. The closest Spark streaming comes to >>>> this concept is the notion of "state" that that can be updated via the >>>> "updateStateBykey()" functions which are only able to be run in a >>>> microbatch. Looking at the expected design changes to Spark Streaming in >>>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on >>>> the radar for Spark, though I have seen articles stating that more effort >>>> is going to go into the Spark SQL layer in Spark streaming which may make >>>> it more reminiscent of Esper. >>>> >>>> For these reasons, I have not even tried to implement CEP in Spark. I >>>> feel it's a waste of time without immediate tuple-at-a-time processing. >>>> Without this, they avoid the whole problem of "back pressure" (though keep >>>> in mind, it is still very possible to overload the Spark streaming layer >>>> with stages that will continue to pile up and never get worked off) but >>>> they lose the granular control that you get in CEP environments by allowing >>>> the rules & processors to react with the receipt of each tuple, right away. >>>> >>>> Awhile back, I did attempt to implement an InfoSphere Streams-like API >>>> [1] on top of Apache Storm as an example of what such a design may look >>>> like. It looks like Storm is going to be replaced in the not so distant >>>> future by Twitter's new design called Heron. IIRC, Heron does not have an >>>> open source implementation as of yet. >>>> >>>> [1] https://github.com/calrissian/flowmix >>>> >>>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi Corey, >>>>> >>>>> Can you please point me to docs on using Spark for CEP? Do we have a >>>>> set of CEP libraries somewhere. I am keen on getting hold of adaptor >>>>> libraries for Spark something like below >>>>> >>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote: >>>>> >>>>>> One thing I've noticed about Flink in my following of the project has >>>>>> been that it has established, in a few cases, some novel ideas and >>>>>> improvements over Spark. The problem with it, however, is that both the >>>>>> development team and the community around it are very small and many of >>>>>> those novel improvements have been rolled directly into Spark in >>>>>> subsequent >>>>>> versions. I was considering changing over my architecture to Flink at one >>>>>> point to get better, more real-time CEP streaming support, but in the >>>>>> end I >>>>>> decided to stick with Spark and just watch Flink continue to pressure it >>>>>> into improvement. >>>>>> >>>>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com> >>>>>> wrote: >>>>>> >>>>>>> i never found much info that flink was actually designed to be fault >>>>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then >>>>>>> that >>>>>>> doesn't bode well for large scale data processing. spark was designed >>>>>>> with >>>>>>> fault tolerance in mind from the beginning. >>>>>>> >>>>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I read the benchmark published by Yahoo. Obviously they already use >>>>>>>> Storm and inevitably very familiar with that tool. To start with >>>>>>>> although >>>>>>>> these benchmarks were somehow interesting IMO, it lend itself to an >>>>>>>> assurance that the tool chosen for their platform is still the best >>>>>>>> choice. >>>>>>>> So inevitably the benchmarks and the tests were done to support >>>>>>>> primary their approach. >>>>>>>> >>>>>>>> In general anything which is not done through TCP Council or >>>>>>>> similar body is questionable.. >>>>>>>> Their argument is that because Spark handles data streaming in >>>>>>>> micro batches then inevitably it introduces this in-built latency as >>>>>>>> per >>>>>>>> design. In contrast, both Storm and Flink do not (at the face value) >>>>>>>> have >>>>>>>> this issue. >>>>>>>> >>>>>>>> In addition as we already know Spark has far more capabilities >>>>>>>> compared to Flink (know nothing about Storm). So really it boils down >>>>>>>> to >>>>>>>> the business SLA to choose which tool one wants to deploy for your use >>>>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use >>>>>>>> cases. If we had in built libraries for CEP for Spark (I am searching >>>>>>>> for >>>>>>>> it), I would not bother with Flink. >>>>>>>> >>>>>>>> HTH >>>>>>>> >>>>>>>> >>>>>>>> Dr Mich Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> LinkedIn * >>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU < >>>>>>>> ovidiu-cristian.ma...@inria.fr> wrote: >>>>>>>> >>>>>>>>> You probably read this benchmark at Yahoo, any comments from Spark? >>>>>>>>> >>>>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Just adding one thing to the mix: `that the latency for streaming >>>>>>>>> data is eliminated` is insane :-D >>>>>>>>> >>>>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh < >>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> It seems that Flink argues that the latency for streaming data >>>>>>>>>> is eliminated whereas with Spark RDD there is this latency. >>>>>>>>>> >>>>>>>>>> I noticed that Flink does not support interactive shell much like >>>>>>>>>> Spark shell where you can add jars to it to do kafka testing. The >>>>>>>>>> advice >>>>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does >>>>>>>>>> not work. >>>>>>>>>> >>>>>>>>>> Most Flink documentation also rather sparce with the usual >>>>>>>>>> example of word count which is not exactly what you want. >>>>>>>>>> >>>>>>>>>> Anyway I will have a look at it further. I have a Spark Scala >>>>>>>>>> streaming Kafka program that works fine in Spark and I want to >>>>>>>>>> recode it >>>>>>>>>> using Scala for Flink with Kafka but have difficulty importing and >>>>>>>>>> testing >>>>>>>>>> libraries. >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> >>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LinkedIn * >>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I compared both last month, seems to me that Flink's MLLib is >>>>>>>>>>> not yet ready. >>>>>>>>>>> >>>>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh < >>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Ted. I was wondering if someone is using both :) >>>>>>>>>>>> >>>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> LinkedIn * >>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Looks like this question is more relevant on flink mailing >>>>>>>>>>>>> list :-) >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh < >>>>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am interested in its set of libraries for Complex Event >>>>>>>>>>>>>> Processing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> LinkedIn * >>>>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> andy >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >