Yes, mostly regarding spark partitioning and use of groupByKey instead of reduceByKey. However, Flink extended the benchmark here http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ <http://data-artisans.com/extending-the-yahoo-streaming-benchmark/> So I was curious about an answer from Spark team, do they plan to do something similar.
> On 17 Apr 2016, at 15:33, Silvio Fiorito <[email protected]> > wrote: > > Actually there were multiple responses to it on the GitHub project, including > a PR to improve the Spark code, but they weren’t acknowledged. > > > From: Ovidiu-Cristian MARCU <mailto:[email protected]> > Sent: Sunday, April 17, 2016 7:48 AM > To: andy petrella <mailto:[email protected]> > Cc: Mich Talebzadeh <mailto:[email protected]>; Ascot Moss > <mailto:[email protected]>; Ted Yu <mailto:[email protected]>; user > @spark <mailto:[email protected]> > Subject: Re: Apache Flink > > You probably read this benchmark at Yahoo, any comments from Spark? > https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at > > <https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at> > > >> On 17 Apr 2016, at 12:41, andy petrella <[email protected] >> <mailto:[email protected]>> wrote: >> >> Just adding one thing to the mix: `that the latency for streaming data is >> eliminated` is insane :-D >> >> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <[email protected] >> <mailto:[email protected]>> wrote: >> It seems that Flink argues that the latency for streaming data is >> eliminated whereas with Spark RDD there is this latency. >> >> I noticed that Flink does not support interactive shell much like Spark >> shell where you can add jars to it to do kafka testing. The advice was to >> add the streaming Kafka jar file to CLASSPATH but that does not work. >> >> Most Flink documentation also rather sparce with the usual example of word >> count which is not exactly what you want. >> >> Anyway I will have a look at it further. I have a Spark Scala streaming >> Kafka program that works fine in Spark and I want to recode it using Scala >> for Flink with Kafka but have difficulty importing and testing libraries. >> >> Cheers >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> >> On 17 April 2016 at 02:41, Ascot Moss <[email protected] >> <mailto:[email protected]>> wrote: >> I compared both last month, seems to me that Flink's MLLib is not yet ready. >> >> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <[email protected] >> <mailto:[email protected]>> wrote: >> Thanks Ted. I was wondering if someone is using both :) >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> >> On 16 April 2016 at 17:08, Ted Yu <[email protected] >> <mailto:[email protected]>> wrote: >> Looks like this question is more relevant on flink mailing list :-) >> >> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <[email protected] >> <mailto:[email protected]>> wrote: >> Hi, >> >> Has anyone used Apache Flink instead of Spark by any chance >> >> I am interested in its set of libraries for Complex Event Processing. >> >> Frankly I don't know if it offers far more than Spark offers. >> >> Thanks >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> >> >> >> >> -- >> andy
