So there is an offering from Stratio, https://github.com/Stratio/Decision
Decision CEP engine is a Complex Event Processing platform built on Spark > Streaming. > > It is the result of combining the power of Spark Streaming as a continuous > computing framework and Siddhi CEP engine as complex event processing > engine. https://stratio.atlassian.net/wiki/display/DECISION0x9/Home I have not used it, only read about it but it may be of some interest to you. -Todd On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian <mohaj...@gmail.com> wrote: > Microbatching is certainly not a waste of time, you are making way too > strong of an statement. In fact in certain cases one tuple at the time > makes no sense, it all depends on the use cases. In fact if you understand > the history of the project Storm you would know that microbatching was > added later in Storm, Trident, and it is specifically for > microbatching/windowing. > In certain cases you are doing aggregation/windowing and throughput is the > dominant design consideration and you don't care what each individual > event/tuple does, e.g. of you push different event types to separate kafka > topics and all you care is to do a count, what is the need for single event > processing. > > On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet <cjno...@gmail.com> wrote: > >> i have not been intrigued at all by the microbatching concept in Spark. I >> am used to CEP in real streams processing environments like Infosphere >> Streams & Storm where the granularity of processing is at the level of each >> individual tuple and processing units (workers) can react immediately to >> events being received and processed. The closest Spark streaming comes to >> this concept is the notion of "state" that that can be updated via the >> "updateStateBykey()" functions which are only able to be run in a >> microbatch. Looking at the expected design changes to Spark Streaming in >> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on >> the radar for Spark, though I have seen articles stating that more effort >> is going to go into the Spark SQL layer in Spark streaming which may make >> it more reminiscent of Esper. >> >> For these reasons, I have not even tried to implement CEP in Spark. I >> feel it's a waste of time without immediate tuple-at-a-time processing. >> Without this, they avoid the whole problem of "back pressure" (though keep >> in mind, it is still very possible to overload the Spark streaming layer >> with stages that will continue to pile up and never get worked off) but >> they lose the granular control that you get in CEP environments by allowing >> the rules & processors to react with the receipt of each tuple, right away. >> >> Awhile back, I did attempt to implement an InfoSphere Streams-like API >> [1] on top of Apache Storm as an example of what such a design may look >> like. It looks like Storm is going to be replaced in the not so distant >> future by Twitter's new design called Heron. IIRC, Heron does not have an >> open source implementation as of yet. >> >> [1] https://github.com/calrissian/flowmix >> >> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hi Corey, >>> >>> Can you please point me to docs on using Spark for CEP? Do we have a set >>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries >>> for Spark something like below >>> >>> >>> >>> ​ >>> Thanks >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote: >>> >>>> One thing I've noticed about Flink in my following of the project has >>>> been that it has established, in a few cases, some novel ideas and >>>> improvements over Spark. The problem with it, however, is that both the >>>> development team and the community around it are very small and many of >>>> those novel improvements have been rolled directly into Spark in subsequent >>>> versions. I was considering changing over my architecture to Flink at one >>>> point to get better, more real-time CEP streaming support, but in the end I >>>> decided to stick with Spark and just watch Flink continue to pressure it >>>> into improvement. >>>> >>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com> >>>> wrote: >>>> >>>>> i never found much info that flink was actually designed to be fault >>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that >>>>> doesn't bode well for large scale data processing. spark was designed with >>>>> fault tolerance in mind from the beginning. >>>>> >>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I read the benchmark published by Yahoo. Obviously they already use >>>>>> Storm and inevitably very familiar with that tool. To start with although >>>>>> these benchmarks were somehow interesting IMO, it lend itself to an >>>>>> assurance that the tool chosen for their platform is still the best >>>>>> choice. >>>>>> So inevitably the benchmarks and the tests were done to support >>>>>> primary their approach. >>>>>> >>>>>> In general anything which is not done through TCP Council or similar >>>>>> body is questionable.. >>>>>> Their argument is that because Spark handles data streaming in micro >>>>>> batches then inevitably it introduces this in-built latency as per >>>>>> design. >>>>>> In contrast, both Storm and Flink do not (at the face value) have this >>>>>> issue. >>>>>> >>>>>> In addition as we already know Spark has far more capabilities >>>>>> compared to Flink (know nothing about Storm). So really it boils down to >>>>>> the business SLA to choose which tool one wants to deploy for your use >>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use >>>>>> cases. If we had in built libraries for CEP for Spark (I am searching for >>>>>> it), I would not bother with Flink. >>>>>> >>>>>> HTH >>>>>> >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU < >>>>>> ovidiu-cristian.ma...@inria.fr> wrote: >>>>>> >>>>>>> You probably read this benchmark at Yahoo, any comments from Spark? >>>>>>> >>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at >>>>>>> >>>>>>> >>>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Just adding one thing to the mix: `that the latency for streaming >>>>>>> data is eliminated` is insane :-D >>>>>>> >>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> It seems that Flink argues that the latency for streaming data is >>>>>>>> eliminated whereas with Spark RDD there is this latency. >>>>>>>> >>>>>>>> I noticed that Flink does not support interactive shell much like >>>>>>>> Spark shell where you can add jars to it to do kafka testing. The >>>>>>>> advice >>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does not >>>>>>>> work. >>>>>>>> >>>>>>>> Most Flink documentation also rather sparce with the usual example >>>>>>>> of word count which is not exactly what you want. >>>>>>>> >>>>>>>> Anyway I will have a look at it further. I have a Spark Scala >>>>>>>> streaming Kafka program that works fine in Spark and I want to recode >>>>>>>> it >>>>>>>> using Scala for Flink with Kafka but have difficulty importing and >>>>>>>> testing >>>>>>>> libraries. >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> Dr Mich Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> LinkedIn * >>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>> >>>>>>>> >>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I compared both last month, seems to me that Flink's MLLib is not >>>>>>>>> yet ready. >>>>>>>>> >>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh < >>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thanks Ted. I was wondering if someone is using both :) >>>>>>>>>> >>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LinkedIn * >>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Looks like this question is more relevant on flink mailing list >>>>>>>>>>> :-) >>>>>>>>>>> >>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh < >>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance >>>>>>>>>>>> >>>>>>>>>>>> I am interested in its set of libraries for Complex Event >>>>>>>>>>>> Processing. >>>>>>>>>>>> >>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> LinkedIn * >>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>> andy >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >