Thanks Corey for the useful info. I have used Sybase Aleri and StreamBase as commercial CEPs engines. However, there does not seem to be anything close to these products in Hadoop Ecosystem. So I guess there is nothing there?
Regards. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 17 April 2016 at 20:43, Corey Nolet <cjno...@gmail.com> wrote: > i have not been intrigued at all by the microbatching concept in Spark. I > am used to CEP in real streams processing environments like Infosphere > Streams & Storm where the granularity of processing is at the level of each > individual tuple and processing units (workers) can react immediately to > events being received and processed. The closest Spark streaming comes to > this concept is the notion of "state" that that can be updated via the > "updateStateBykey()" functions which are only able to be run in a > microbatch. Looking at the expected design changes to Spark Streaming in > Spark 2.0.0, it also does not look like tuple-at-a-time processing is on > the radar for Spark, though I have seen articles stating that more effort > is going to go into the Spark SQL layer in Spark streaming which may make > it more reminiscent of Esper. > > For these reasons, I have not even tried to implement CEP in Spark. I feel > it's a waste of time without immediate tuple-at-a-time processing. Without > this, they avoid the whole problem of "back pressure" (though keep in mind, > it is still very possible to overload the Spark streaming layer with stages > that will continue to pile up and never get worked off) but they lose the > granular control that you get in CEP environments by allowing the rules & > processors to react with the receipt of each tuple, right away. > > Awhile back, I did attempt to implement an InfoSphere Streams-like API [1] > on top of Apache Storm as an example of what such a design may look like. > It looks like Storm is going to be replaced in the not so distant future by > Twitter's new design called Heron. IIRC, Heron does not have an open source > implementation as of yet. > > [1] https://github.com/calrissian/flowmix > > On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi Corey, >> >> Can you please point me to docs on using Spark for CEP? Do we have a set >> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries >> for Spark something like below >> >> >> >> ​ >> Thanks >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote: >> >>> One thing I've noticed about Flink in my following of the project has >>> been that it has established, in a few cases, some novel ideas and >>> improvements over Spark. The problem with it, however, is that both the >>> development team and the community around it are very small and many of >>> those novel improvements have been rolled directly into Spark in subsequent >>> versions. I was considering changing over my architecture to Flink at one >>> point to get better, more real-time CEP streaming support, but in the end I >>> decided to stick with Spark and just watch Flink continue to pressure it >>> into improvement. >>> >>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> >>>> i never found much info that flink was actually designed to be fault >>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that >>>> doesn't bode well for large scale data processing. spark was designed with >>>> fault tolerance in mind from the beginning. >>>> >>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I read the benchmark published by Yahoo. Obviously they already use >>>>> Storm and inevitably very familiar with that tool. To start with although >>>>> these benchmarks were somehow interesting IMO, it lend itself to an >>>>> assurance that the tool chosen for their platform is still the best >>>>> choice. >>>>> So inevitably the benchmarks and the tests were done to support >>>>> primary their approach. >>>>> >>>>> In general anything which is not done through TCP Council or similar >>>>> body is questionable.. >>>>> Their argument is that because Spark handles data streaming in micro >>>>> batches then inevitably it introduces this in-built latency as per design. >>>>> In contrast, both Storm and Flink do not (at the face value) have this >>>>> issue. >>>>> >>>>> In addition as we already know Spark has far more capabilities >>>>> compared to Flink (know nothing about Storm). So really it boils down to >>>>> the business SLA to choose which tool one wants to deploy for your use >>>>> case. IMO Spark micro batching approach is probably OK for 99% of use >>>>> cases. If we had in built libraries for CEP for Spark (I am searching for >>>>> it), I would not bother with Flink. >>>>> >>>>> HTH >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU < >>>>> ovidiu-cristian.ma...@inria.fr> wrote: >>>>> >>>>>> You probably read this benchmark at Yahoo, any comments from Spark? >>>>>> >>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at >>>>>> >>>>>> >>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Just adding one thing to the mix: `that the latency for streaming >>>>>> data is eliminated` is insane :-D >>>>>> >>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> It seems that Flink argues that the latency for streaming data is >>>>>>> eliminated whereas with Spark RDD there is this latency. >>>>>>> >>>>>>> I noticed that Flink does not support interactive shell much like >>>>>>> Spark shell where you can add jars to it to do kafka testing. The advice >>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does not >>>>>>> work. >>>>>>> >>>>>>> Most Flink documentation also rather sparce with the usual example >>>>>>> of word count which is not exactly what you want. >>>>>>> >>>>>>> Anyway I will have a look at it further. I have a Spark Scala >>>>>>> streaming Kafka program that works fine in Spark and I want to recode it >>>>>>> using Scala for Flink with Kafka but have difficulty importing and >>>>>>> testing >>>>>>> libraries. >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> wrote: >>>>>>> >>>>>>>> I compared both last month, seems to me that Flink's MLLib is not >>>>>>>> yet ready. >>>>>>>> >>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Thanks Ted. I was wondering if someone is using both :) >>>>>>>>> >>>>>>>>> Dr Mich Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> LinkedIn * >>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>> >>>>>>>>> >>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Looks like this question is more relevant on flink mailing list >>>>>>>>>> :-) >>>>>>>>>> >>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh < >>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance >>>>>>>>>>> >>>>>>>>>>> I am interested in its set of libraries for Complex Event >>>>>>>>>>> Processing. >>>>>>>>>>> >>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> LinkedIn * >>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>> andy >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >