Re: Apache Flink

Mich Talebzadeh Sun, 17 Apr 2016 13:08:56 -0700

Thanks Corey for the useful info.

I have used Sybase Aleri and StreamBase as commercial CEPs engines.
However, there does not seem to be anything close to these products in
Hadoop Ecosystem. So I guess there is nothing there?


Regards.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 20:43, Corey Nolet <cjno...@gmail.com> wrote:

> i have not been intrigued at all by the microbatching concept in Spark. I
> am used to CEP in real streams processing environments like Infosphere
> Streams & Storm where the granularity of processing is at the level of each
> individual tuple and processing units (workers) can react immediately to
> events being received and processed. The closest Spark streaming comes to
> this concept is the notion of "state" that that can be updated via the
> "updateStateBykey()" functions which are only able to be run in a
> microbatch. Looking at the expected design changes to Spark Streaming in
> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
> the radar for Spark, though I have seen articles stating that more effort
> is going to go into the Spark SQL layer in Spark streaming which may make
> it more reminiscent of Esper.
>
> For these reasons, I have not even tried to implement CEP in Spark. I feel
> it's a waste of time without immediate tuple-at-a-time processing. Without
> this, they avoid the whole problem of "back pressure" (though keep in mind,
> it is still very possible to overload the Spark streaming layer with stages
> that will continue to pile up and never get worked off) but they lose the
> granular control that you get in CEP environments by allowing the rules &
> processors to react with the receipt of each tuple, right away.
>
> Awhile back, I did attempt to implement an InfoSphere Streams-like API [1]
> on top of Apache Storm as an example of what such a design may look like.
> It looks like Storm is going to be replaced in the not so distant future by
> Twitter's new design called Heron. IIRC, Heron does not have an open source
> implementation as of yet.
>
> [1] https://github.com/calrissian/flowmix
>
> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi Corey,
>>
>> Can you please point me to docs on using Spark for CEP? Do we have a set
>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
>> for Spark something like below
>>
>>
>>
>> 
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote:
>>
>>> One thing I've noticed about Flink in my following of the project has
>>> been that it has established, in a few cases, some novel ideas and
>>> improvements over Spark. The problem with it, however, is that both the
>>> development team and the community around it are very small and many of
>>> those novel improvements have been rolled directly into Spark in subsequent
>>> versions. I was considering changing over my architecture to Flink at one
>>> point to get better, more real-time CEP streaming support, but in the end I
>>> decided to stick with Spark and just watch Flink continue to pressure it
>>> into improvement.
>>>
>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> i never found much info that flink was actually designed to be fault
>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
>>>> doesn't bode well for large scale data processing. spark was designed with
>>>> fault tolerance in mind from the beginning.
>>>>
>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I read the benchmark published by Yahoo. Obviously they already use
>>>>> Storm and inevitably very familiar with that tool. To start with although
>>>>> these benchmarks were somehow interesting IMO, it lend itself to an
>>>>> assurance that the tool chosen for their platform is still the best 
>>>>> choice.
>>>>> So inevitably the benchmarks and the tests were done to support
>>>>> primary their approach.
>>>>>
>>>>> In general anything which is not done through TCP Council or similar
>>>>> body is questionable..
>>>>> Their argument is that because Spark handles data streaming in micro
>>>>> batches then inevitably it introduces this in-built latency as per design.
>>>>> In contrast, both Storm and Flink do not (at the face value) have this
>>>>> issue.
>>>>>
>>>>> In addition as we already know Spark has far more capabilities
>>>>> compared to Flink (know nothing about Storm). So really it boils down to
>>>>> the business SLA to choose which tool one wants to deploy for your use
>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use
>>>>> cases. If we had in built libraries for CEP for Spark (I am searching for
>>>>> it), I would not bother with Flink.
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>>>>> ovidiu-cristian.ma...@inria.fr> wrote:
>>>>>
>>>>>> You probably read this benchmark at Yahoo, any comments from Spark?
>>>>>>
>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>>>>>>
>>>>>>
>>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Just adding one thing to the mix: `that the latency for streaming
>>>>>> data is eliminated` is insane :-D
>>>>>>
>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>>  It seems that Flink argues that the latency for streaming data is
>>>>>>> eliminated whereas with Spark RDD there is this latency.
>>>>>>>
>>>>>>> I noticed that Flink does not support interactive shell much like
>>>>>>> Spark shell where you can add jars to it to do kafka testing. The advice
>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does not 
>>>>>>> work.
>>>>>>>
>>>>>>> Most Flink documentation also rather sparce with the usual example
>>>>>>> of word count which is not exactly what you want.
>>>>>>>
>>>>>>> Anyway I will have a look at it further. I have a Spark Scala
>>>>>>> streaming Kafka program that works fine in Spark and I want to recode it
>>>>>>> using Scala for Flink with Kafka but have difficulty importing and 
>>>>>>> testing
>>>>>>> libraries.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * 
>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I compared both last month, seems to me that Flink's MLLib is not
>>>>>>>> yet ready.
>>>>>>>>
>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Ted. I was wondering if someone is using both :)
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * 
>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Looks like this question is more relevant on flink mailing list
>>>>>>>>>> :-)
>>>>>>>>>>
>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance
>>>>>>>>>>>
>>>>>>>>>>> I am interested in its set of libraries for Complex Event
>>>>>>>>>>> Processing.
>>>>>>>>>>>
>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>> andy
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Apache Flink

Reply via email to