The problem is that the strength and wider acceptance of a typical Open
source project is its sizeable user and development community. When the
community is small like Flink, then it is not a viable solution to adopt

I am rather disappointed that no big data project can be used for Complex
Event Processing as it has wider use in Algorithmic trading among others.

Dr Mich Talebzadeh

LinkedIn *

On 17 April 2016 at 22:30, Mark Hamstra <> wrote:

> To be fair, the Stratosphere project from which Flink springs was started
> as a collaborative university research project in Germany about the same
> time that Spark was first released as Open Source, so they are near
> contemporaries rather than Flink having been started only well after Spark
> was an established and widely-used Apache project.
> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
>> wrote:
>> Also it always amazes me why they are so many tangential projects in Big
>> Data space? Would not it be easier if efforts were spent on adding to Spark
>> functionality rather than creating a new product like Flink?
>> Dr Mich Talebzadeh
>> LinkedIn * 
>> <>*
>> On 17 April 2016 at 21:08, Mich Talebzadeh <>
>> wrote:
>>> Thanks Corey for the useful info.
>>> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
>>> However, there does not seem to be anything close to these products in
>>> Hadoop Ecosystem. So I guess there is nothing there?
>>> Regards.
>>> Dr Mich Talebzadeh
>>> LinkedIn * 
>>> <>*
>>> On 17 April 2016 at 20:43, Corey Nolet <> wrote:
>>>> i have not been intrigued at all by the microbatching concept in Spark.
>>>> I am used to CEP in real streams processing environments like Infosphere
>>>> Streams & Storm where the granularity of processing is at the level of each
>>>> individual tuple and processing units (workers) can react immediately to
>>>> events being received and processed. The closest Spark streaming comes to
>>>> this concept is the notion of "state" that that can be updated via the
>>>> "updateStateBykey()" functions which are only able to be run in a
>>>> microbatch. Looking at the expected design changes to Spark Streaming in
>>>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>>>> the radar for Spark, though I have seen articles stating that more effort
>>>> is going to go into the Spark SQL layer in Spark streaming which may make
>>>> it more reminiscent of Esper.
>>>> For these reasons, I have not even tried to implement CEP in Spark. I
>>>> feel it's a waste of time without immediate tuple-at-a-time processing.
>>>> Without this, they avoid the whole problem of "back pressure" (though keep
>>>> in mind, it is still very possible to overload the Spark streaming layer
>>>> with stages that will continue to pile up and never get worked off) but
>>>> they lose the granular control that you get in CEP environments by allowing
>>>> the rules & processors to react with the receipt of each tuple, right away.
>>>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>>>> [1] on top of Apache Storm as an example of what such a design may look
>>>> like. It looks like Storm is going to be replaced in the not so distant
>>>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>>>> open source implementation as of yet.
>>>> [1]
>>>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>>>>> wrote:
>>>>> Hi Corey,
>>>>> Can you please point me to docs on using Spark for CEP? Do we have a
>>>>> set of CEP libraries somewhere. I am keen on getting hold of adaptor
>>>>> libraries for Spark something like below
>>>>> ​
>>>>> Thanks
>>>>> Dr Mich Talebzadeh
>>>>> LinkedIn * 
>>>>> <>*
>>>>> On 17 April 2016 at 16:07, Corey Nolet <> wrote:
>>>>>> One thing I've noticed about Flink in my following of the project has
>>>>>> been that it has established, in a few cases, some novel ideas and
>>>>>> improvements over Spark. The problem with it, however, is that both the
>>>>>> development team and the community around it are very small and many of
>>>>>> those novel improvements have been rolled directly into Spark in 
>>>>>> subsequent
>>>>>> versions. I was considering changing over my architecture to Flink at one
>>>>>> point to get better, more real-time CEP streaming support, but in the 
>>>>>> end I
>>>>>> decided to stick with Spark and just watch Flink continue to pressure it
>>>>>> into improvement.
>>>>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <>
>>>>>> wrote:
>>>>>>> i never found much info that flink was actually designed to be fault
>>>>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then 
>>>>>>> that
>>>>>>> doesn't bode well for large scale data processing. spark was designed 
>>>>>>> with
>>>>>>> fault tolerance in mind from the beginning.
>>>>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
>>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> I read the benchmark published by Yahoo. Obviously they already use
>>>>>>>> Storm and inevitably very familiar with that tool. To start with 
>>>>>>>> although
>>>>>>>> these benchmarks were somehow interesting IMO, it lend itself to an
>>>>>>>> assurance that the tool chosen for their platform is still the best 
>>>>>>>> choice.
>>>>>>>> So inevitably the benchmarks and the tests were done to support
>>>>>>>> primary their approach.
>>>>>>>> In general anything which is not done through TCP Council or
>>>>>>>> similar body is questionable..
>>>>>>>> Their argument is that because Spark handles data streaming in
>>>>>>>> micro batches then inevitably it introduces this in-built latency as 
>>>>>>>> per
>>>>>>>> design. In contrast, both Storm and Flink do not (at the face value) 
>>>>>>>> have
>>>>>>>> this issue.
>>>>>>>> In addition as we already know Spark has far more capabilities
>>>>>>>> compared to Flink (know nothing about Storm). So really it boils down 
>>>>>>>> to
>>>>>>>> the business SLA to choose which tool one wants to deploy for your use
>>>>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use
>>>>>>>> cases. If we had in built libraries for CEP for Spark (I am searching 
>>>>>>>> for
>>>>>>>> it), I would not bother with Flink.
>>>>>>>> HTH
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>> LinkedIn * 
>>>>>>>> <>*
>>>>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>>>>>>>>> wrote:
>>>>>>>>> You probably read this benchmark at Yahoo, any comments from Spark?
>>>>>>>>> On 17 Apr 2016, at 12:41, andy petrella <>
>>>>>>>>> wrote:
>>>>>>>>> Just adding one thing to the mix: `that the latency for streaming
>>>>>>>>> data is eliminated` is insane :-D
>>>>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>>>>>>>>>> wrote:
>>>>>>>>>>  It seems that Flink argues that the latency for streaming data
>>>>>>>>>> is eliminated whereas with Spark RDD there is this latency.
>>>>>>>>>> I noticed that Flink does not support interactive shell much like
>>>>>>>>>> Spark shell where you can add jars to it to do kafka testing. The 
>>>>>>>>>> advice
>>>>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does 
>>>>>>>>>> not work.
>>>>>>>>>> Most Flink documentation also rather sparce with the usual
>>>>>>>>>> example of word count which is not exactly what you want.
>>>>>>>>>> Anyway I will have a look at it further. I have a Spark Scala
>>>>>>>>>> streaming Kafka program that works fine in Spark and I want to 
>>>>>>>>>> recode it
>>>>>>>>>> using Scala for Flink with Kafka but have difficulty importing and 
>>>>>>>>>> testing
>>>>>>>>>> libraries.
>>>>>>>>>> Cheers
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>> LinkedIn * 
>>>>>>>>>> <>*
>>>>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <>
>>>>>>>>>> wrote:
>>>>>>>>>>> I compared both last month, seems to me that Flink's MLLib is
>>>>>>>>>>> not yet ready.
>>>>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> Thanks Ted. I was wondering if someone is using both :)
>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>>> <>*
>>>>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <> wrote:
>>>>>>>>>>>>> Looks like this question is more relevant on flink mailing
>>>>>>>>>>>>> list :-)
>>>>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance
>>>>>>>>>>>>>> I am interested in its set of libraries for Complex Event
>>>>>>>>>>>>>> Processing.
>>>>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers.
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>>>>> <>*
>>>>>>>>>> --
>>>>>>>>> andy

Reply via email to