Re: Apache Flink

Todd Nist Sun, 17 Apr 2016 17:59:18 -0700

So there is an offering from Stratio, https://github.com/Stratio/Decision


Decision CEP engine is a Complex Event Processing platform built on Spark
> Streaming.
>


> It is the result of combining the power of Spark Streaming as a continuous
> computing framework and Siddhi CEP engine as complex event processing
> engine.


https://stratio.atlassian.net/wiki/display/DECISION0x9/Home

I have not used it, only read about it but it may be of some interest to
you.

-Todd

On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian <mohaj...@gmail.com>
wrote:

> Microbatching is certainly not a waste of time, you are making way too
> strong of an statement. In fact in certain cases one tuple at the time
> makes no sense, it all depends on the use cases. In fact if you understand
> the history of the project Storm you would know that microbatching was
> added later in Storm, Trident, and it is specifically for
> microbatching/windowing.
> In certain cases you are doing aggregation/windowing and throughput is the
> dominant design consideration and you don't care what each individual
> event/tuple does, e.g. of you push different event types to separate kafka
> topics and all you care is to do a count, what is the need for single event
> processing.
>
> On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet <cjno...@gmail.com> wrote:
>
>> i have not been intrigued at all by the microbatching concept in Spark. I
>> am used to CEP in real streams processing environments like Infosphere
>> Streams & Storm where the granularity of processing is at the level of each
>> individual tuple and processing units (workers) can react immediately to
>> events being received and processed. The closest Spark streaming comes to
>> this concept is the notion of "state" that that can be updated via the
>> "updateStateBykey()" functions which are only able to be run in a
>> microbatch. Looking at the expected design changes to Spark Streaming in
>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>> the radar for Spark, though I have seen articles stating that more effort
>> is going to go into the Spark SQL layer in Spark streaming which may make
>> it more reminiscent of Esper.
>>
>> For these reasons, I have not even tried to implement CEP in Spark. I
>> feel it's a waste of time without immediate tuple-at-a-time processing.
>> Without this, they avoid the whole problem of "back pressure" (though keep
>> in mind, it is still very possible to overload the Spark streaming layer
>> with stages that will continue to pile up and never get worked off) but
>> they lose the granular control that you get in CEP environments by allowing
>> the rules & processors to react with the receipt of each tuple, right away.
>>
>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>> [1] on top of Apache Storm as an example of what such a design may look
>> like. It looks like Storm is going to be replaced in the not so distant
>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>> open source implementation as of yet.
>>
>> [1] https://github.com/calrissian/flowmix
>>
>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Corey,
>>>
>>> Can you please point me to docs on using Spark for CEP? Do we have a set
>>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
>>> for Spark something like below
>>>
>>>
>>>
>>> 
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote:
>>>
>>>> One thing I've noticed about Flink in my following of the project has
>>>> been that it has established, in a few cases, some novel ideas and
>>>> improvements over Spark. The problem with it, however, is that both the
>>>> development team and the community around it are very small and many of
>>>> those novel improvements have been rolled directly into Spark in subsequent
>>>> versions. I was considering changing over my architecture to Flink at one
>>>> point to get better, more real-time CEP streaming support, but in the end I
>>>> decided to stick with Spark and just watch Flink continue to pressure it
>>>> into improvement.
>>>>
>>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> i never found much info that flink was actually designed to be fault
>>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
>>>>> doesn't bode well for large scale data processing. spark was designed with
>>>>> fault tolerance in mind from the beginning.
>>>>>
>>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I read the benchmark published by Yahoo. Obviously they already use
>>>>>> Storm and inevitably very familiar with that tool. To start with although
>>>>>> these benchmarks were somehow interesting IMO, it lend itself to an
>>>>>> assurance that the tool chosen for their platform is still the best 
>>>>>> choice.
>>>>>> So inevitably the benchmarks and the tests were done to support
>>>>>> primary their approach.
>>>>>>
>>>>>> In general anything which is not done through TCP Council or similar
>>>>>> body is questionable..
>>>>>> Their argument is that because Spark handles data streaming in micro
>>>>>> batches then inevitably it introduces this in-built latency as per 
>>>>>> design.
>>>>>> In contrast, both Storm and Flink do not (at the face value) have this
>>>>>> issue.
>>>>>>
>>>>>> In addition as we already know Spark has far more capabilities
>>>>>> compared to Flink (know nothing about Storm). So really it boils down to
>>>>>> the business SLA to choose which tool one wants to deploy for your use
>>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use
>>>>>> cases. If we had in built libraries for CEP for Spark (I am searching for
>>>>>> it), I would not bother with Flink.
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>>>>>> ovidiu-cristian.ma...@inria.fr> wrote:
>>>>>>
>>>>>>> You probably read this benchmark at Yahoo, any comments from Spark?
>>>>>>>
>>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>>>>>>>
>>>>>>>
>>>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Just adding one thing to the mix: `that the latency for streaming
>>>>>>> data is eliminated` is insane :-D
>>>>>>>
>>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>>  It seems that Flink argues that the latency for streaming data is
>>>>>>>> eliminated whereas with Spark RDD there is this latency.
>>>>>>>>
>>>>>>>> I noticed that Flink does not support interactive shell much like
>>>>>>>> Spark shell where you can add jars to it to do kafka testing. The 
>>>>>>>> advice
>>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does not 
>>>>>>>> work.
>>>>>>>>
>>>>>>>> Most Flink documentation also rather sparce with the usual example
>>>>>>>> of word count which is not exactly what you want.
>>>>>>>>
>>>>>>>> Anyway I will have a look at it further. I have a Spark Scala
>>>>>>>> streaming Kafka program that works fine in Spark and I want to recode 
>>>>>>>> it
>>>>>>>> using Scala for Flink with Kafka but have difficulty importing and 
>>>>>>>> testing
>>>>>>>> libraries.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * 
>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I compared both last month, seems to me that Flink's MLLib is not
>>>>>>>>> yet ready.
>>>>>>>>>
>>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Ted. I was wondering if someone is using both :)
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * 
>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Looks like this question is more relevant on flink mailing list
>>>>>>>>>>> :-)
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance
>>>>>>>>>>>>
>>>>>>>>>>>> I am interested in its set of libraries for Complex Event
>>>>>>>>>>>> Processing.
>>>>>>>>>>>>
>>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>> andy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Apache Flink

Reply via email to