Re: Apache Flink

Mich Talebzadeh Sun, 17 Apr 2016 14:56:16 -0700

Assuming that both Spark and Flink are contemporaries what are the reasons
that Flink has not been adopted widely? (this may sound obvious and or
prejudged). I mean Spark has surged in popularity in the past year if I am
correct


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 22:49, Michael Malak <[email protected]> wrote:

> In terms of publication date, a paper on Nephele was published in 2009,
> prior to the 2010 USENIX paper on Spark. Nephele is the execution engine of
> Stratosphere, which became Flink.
>
>
> ------------------------------
> *From:* Mark Hamstra <[email protected]>
> *To:* Mich Talebzadeh <[email protected]>
> *Cc:* Corey Nolet <[email protected]>; "user @spark" <
> [email protected]>
> *Sent:* Sunday, April 17, 2016 3:30 PM
> *Subject:* Re: Apache Flink
>
> To be fair, the Stratosphere project from which Flink springs was started
> as a collaborative university research project in Germany about the same
> time that Spark was first released as Open Source, so they are near
> contemporaries rather than Flink having been started only well after Spark
> was an established and widely-used Apache project.
>
> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
> [email protected]> wrote:
>
> Also it always amazes me why they are so many tangential projects in Big
> Data space? Would not it be easier if efforts were spent on adding to Spark
> functionality rather than creating a new product like Flink?
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 21:08, Mich Talebzadeh <[email protected]>
> wrote:
>
> Thanks Corey for the useful info.
>
> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
> However, there does not seem to be anything close to these products in
> Hadoop Ecosystem. So I guess there is nothing there?
>
> Regards.
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 20:43, Corey Nolet <[email protected]> wrote:
>
> i have not been intrigued at all by the microbatching concept in Spark. I
> am used to CEP in real streams processing environments like Infosphere
> Streams & Storm where the granularity of processing is at the level of each
> individual tuple and processing units (workers) can react immediately to
> events being received and processed. The closest Spark streaming comes to
> this concept is the notion of "state" that that can be updated via the
> "updateStateBykey()" functions which are only able to be run in a
> microbatch. Looking at the expected design changes to Spark Streaming in
> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
> the radar for Spark, though I have seen articles stating that more effort
> is going to go into the Spark SQL layer in Spark streaming which may make
> it more reminiscent of Esper.
>
> For these reasons, I have not even tried to implement CEP in Spark. I feel
> it's a waste of time without immediate tuple-at-a-time processing. Without
> this, they avoid the whole problem of "back pressure" (though keep in mind,
> it is still very possible to overload the Spark streaming layer with stages
> that will continue to pile up and never get worked off) but they lose the
> granular control that you get in CEP environments by allowing the rules &
> processors to react with the receipt of each tuple, right away.
>
> Awhile back, I did attempt to implement an InfoSphere Streams-like API [1]
> on top of Apache Storm as an example of what such a design may look like.
> It looks like Storm is going to be replaced in the not so distant future by
> Twitter's new design called Heron. IIRC, Heron does not have an open source
> implementation as of yet.
>
> [1] https://github.com/calrissian/flowmix
>
> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
> [email protected]> wrote:
>
> Hi Corey,
>
> Can you please point me to docs on using Spark for CEP? Do we have a set
> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
> for Spark something like below
>
>
>
> 
> Thanks
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 16:07, Corey Nolet <[email protected]> wrote:
>
> One thing I've noticed about Flink in my following of the project has been
> that it has established, in a few cases, some novel ideas and improvements
> over Spark. The problem with it, however, is that both the development team
> and the community around it are very small and many of those novel
> improvements have been rolled directly into Spark in subsequent versions. I
> was considering changing over my architecture to Flink at one point to get
> better, more real-time CEP streaming support, but in the end I decided to
> stick with Spark and just watch Flink continue to pressure it into
> improvement.
>
> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <[email protected]> wrote:
>
> i never found much info that flink was actually designed to be fault
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
> doesn't bode well for large scale data processing. spark was designed with
> fault tolerance in mind from the beginning.
>
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
> [email protected]> wrote:
>
> Hi,
>
> I read the benchmark published by Yahoo. Obviously they already use Storm
> and inevitably very familiar with that tool. To start with although these
> benchmarks were somehow interesting IMO, it lend itself to an assurance
> that the tool chosen for their platform is still the best choice. So
> inevitably the benchmarks and the tests were done to support primary their
> approach.
>
> In general anything which is not done through TCP Council or similar body
> is questionable..
> Their argument is that because Spark handles data streaming in micro
> batches then inevitably it introduces this in-built latency as per design.
> In contrast, both Storm and Flink do not (at the face value) have this
> issue.
>
> In addition as we already know Spark has far more capabilities compared to
> Flink (know nothing about Storm). So really it boils down to the business
> SLA to choose which tool one wants to deploy for your use case. IMO Spark
> micro batching approach is probably OK for 99% of use cases. If we had in
> built libraries for CEP for Spark (I am searching for it), I would not
> bother with Flink.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
> [email protected]> wrote:
>
> You probably read this benchmark at Yahoo, any comments from Spark?
>
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
> <https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at?soc_src=mail&soc_trk=ma>
>
>
> On 17 Apr 2016, at 12:41, andy petrella <[email protected]> wrote:
>
> Just adding one thing to the mix: `that the latency for streaming data is
> eliminated` is insane :-D
>
> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
> [email protected]> wrote:
>
>  It seems that Flink argues that the latency for streaming data is
> eliminated whereas with Spark RDD there is this latency.
>
> I noticed that Flink does not support interactive shell much like Spark
> shell where you can add jars to it to do kafka testing. The advice was to
> add the streaming Kafka jar file to CLASSPATH but that does not work.
>
> Most Flink documentation also rather sparce with the usual example of word
> count which is not exactly what you want.
>
> Anyway I will have a look at it further. I have a Spark Scala streaming
> Kafka program that works fine in Spark and I want to recode it using Scala
> for Flink with Kafka but have difficulty importing and testing libraries.
>
> Cheers
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 02:41, Ascot Moss <[email protected]> wrote:
>
> I compared both last month, seems to me that Flink's MLLib is not yet
> ready.
>
> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
> [email protected]> wrote:
>
> Thanks Ted. I was wondering if someone is using both :)
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 16 April 2016 at 17:08, Ted Yu <[email protected]> wrote:
>
> Looks like this question is more relevant on flink mailing list :-)
>
> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
> [email protected]> wrote:
>
> Hi,
>
> Has anyone used Apache Flink instead of Spark by any chance
>
> I am interested in its set of libraries for Complex Event Processing.
>
> Frankly I don't know if it offers far more than Spark offers.
>
> Thanks
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
>
>
>
>
> --
> andy
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Apache Flink

Reply via email to