Re: Apache Flink

Mich Talebzadeh Sun, 17 Apr 2016 14:48:53 -0700

The problem is that the strength and wider acceptance of a typical Open
source project is its sizeable user and development community. When the
community is small like Flink, then it is not a viable solution to adopt


I am rather disappointed that no big data project can be used for Complex
Event Processing as it has wider use in Algorithmic trading among others.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 22:30, Mark Hamstra <m...@clearstorydata.com> wrote:

> To be fair, the Stratosphere project from which Flink springs was started
> as a collaborative university research project in Germany about the same
> time that Spark was first released as Open Source, so they are near
> contemporaries rather than Flink having been started only well after Spark
> was an established and widely-used Apache project.
>
> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Also it always amazes me why they are so many tangential projects in Big
>> Data space? Would not it be easier if efforts were spent on adding to Spark
>> functionality rather than creating a new product like Flink?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 21:08, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Thanks Corey for the useful info.
>>>
>>> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
>>> However, there does not seem to be anything close to these products in
>>> Hadoop Ecosystem. So I guess there is nothing there?
>>>
>>> Regards.
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 20:43, Corey Nolet <cjno...@gmail.com> wrote:
>>>
>>>> i have not been intrigued at all by the microbatching concept in Spark.
>>>> I am used to CEP in real streams processing environments like Infosphere
>>>> Streams & Storm where the granularity of processing is at the level of each
>>>> individual tuple and processing units (workers) can react immediately to
>>>> events being received and processed. The closest Spark streaming comes to
>>>> this concept is the notion of "state" that that can be updated via the
>>>> "updateStateBykey()" functions which are only able to be run in a
>>>> microbatch. Looking at the expected design changes to Spark Streaming in
>>>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>>>> the radar for Spark, though I have seen articles stating that more effort
>>>> is going to go into the Spark SQL layer in Spark streaming which may make
>>>> it more reminiscent of Esper.
>>>>
>>>> For these reasons, I have not even tried to implement CEP in Spark. I
>>>> feel it's a waste of time without immediate tuple-at-a-time processing.
>>>> Without this, they avoid the whole problem of "back pressure" (though keep
>>>> in mind, it is still very possible to overload the Spark streaming layer
>>>> with stages that will continue to pile up and never get worked off) but
>>>> they lose the granular control that you get in CEP environments by allowing
>>>> the rules & processors to react with the receipt of each tuple, right away.
>>>>
>>>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>>>> [1] on top of Apache Storm as an example of what such a design may look
>>>> like. It looks like Storm is going to be replaced in the not so distant
>>>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>>>> open source implementation as of yet.
>>>>
>>>> [1] https://github.com/calrissian/flowmix
>>>>
>>>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi Corey,
>>>>>
>>>>> Can you please point me to docs on using Spark for CEP? Do we have a
>>>>> set of CEP libraries somewhere. I am keen on getting hold of adaptor
>>>>> libraries for Spark something like below
>>>>>
>>>>>
>>>>>
>>>>> 
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote:
>>>>>
>>>>>> One thing I've noticed about Flink in my following of the project has
>>>>>> been that it has established, in a few cases, some novel ideas and
>>>>>> improvements over Spark. The problem with it, however, is that both the
>>>>>> development team and the community around it are very small and many of
>>>>>> those novel improvements have been rolled directly into Spark in 
>>>>>> subsequent
>>>>>> versions. I was considering changing over my architecture to Flink at one
>>>>>> point to get better, more real-time CEP streaming support, but in the 
>>>>>> end I
>>>>>> decided to stick with Spark and just watch Flink continue to pressure it
>>>>>> into improvement.
>>>>>>
>>>>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> i never found much info that flink was actually designed to be fault
>>>>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then 
>>>>>>> that
>>>>>>> doesn't bode well for large scale data processing. spark was designed 
>>>>>>> with
>>>>>>> fault tolerance in mind from the beginning.
>>>>>>>
>>>>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I read the benchmark published by Yahoo. Obviously they already use
>>>>>>>> Storm and inevitably very familiar with that tool. To start with 
>>>>>>>> although
>>>>>>>> these benchmarks were somehow interesting IMO, it lend itself to an
>>>>>>>> assurance that the tool chosen for their platform is still the best 
>>>>>>>> choice.
>>>>>>>> So inevitably the benchmarks and the tests were done to support
>>>>>>>> primary their approach.
>>>>>>>>
>>>>>>>> In general anything which is not done through TCP Council or
>>>>>>>> similar body is questionable..
>>>>>>>> Their argument is that because Spark handles data streaming in
>>>>>>>> micro batches then inevitably it introduces this in-built latency as 
>>>>>>>> per
>>>>>>>> design. In contrast, both Storm and Flink do not (at the face value) 
>>>>>>>> have
>>>>>>>> this issue.
>>>>>>>>
>>>>>>>> In addition as we already know Spark has far more capabilities
>>>>>>>> compared to Flink (know nothing about Storm). So really it boils down 
>>>>>>>> to
>>>>>>>> the business SLA to choose which tool one wants to deploy for your use
>>>>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use
>>>>>>>> cases. If we had in built libraries for CEP for Spark (I am searching 
>>>>>>>> for
>>>>>>>> it), I would not bother with Flink.
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * 
>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>>>>>>>> ovidiu-cristian.ma...@inria.fr> wrote:
>>>>>>>>
>>>>>>>>> You probably read this benchmark at Yahoo, any comments from Spark?
>>>>>>>>>
>>>>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Just adding one thing to the mix: `that the latency for streaming
>>>>>>>>> data is eliminated` is insane :-D
>>>>>>>>>
>>>>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>  It seems that Flink argues that the latency for streaming data
>>>>>>>>>> is eliminated whereas with Spark RDD there is this latency.
>>>>>>>>>>
>>>>>>>>>> I noticed that Flink does not support interactive shell much like
>>>>>>>>>> Spark shell where you can add jars to it to do kafka testing. The 
>>>>>>>>>> advice
>>>>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does 
>>>>>>>>>> not work.
>>>>>>>>>>
>>>>>>>>>> Most Flink documentation also rather sparce with the usual
>>>>>>>>>> example of word count which is not exactly what you want.
>>>>>>>>>>
>>>>>>>>>> Anyway I will have a look at it further. I have a Spark Scala
>>>>>>>>>> streaming Kafka program that works fine in Spark and I want to 
>>>>>>>>>> recode it
>>>>>>>>>> using Scala for Flink with Kafka but have difficulty importing and 
>>>>>>>>>> testing
>>>>>>>>>> libraries.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * 
>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I compared both last month, seems to me that Flink's MLLib is
>>>>>>>>>>> not yet ready.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Ted. I was wondering if someone is using both :)
>>>>>>>>>>>>
>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like this question is more relevant on flink mailing
>>>>>>>>>>>>> list :-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am interested in its set of libraries for Complex Event
>>>>>>>>>>>>>> Processing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> andy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Apache Flink

Reply via email to