Re: Apache Flink

Ovidiu-Cristian MARCU Sun, 17 Apr 2016 08:42:06 -0700

For the streaming case Flink is fault tolerant (DataStream API), for the batch 
case (DataSet API) not yet, as from my research regarding their platform.


> On 17 Apr 2016, at 17:03, Koert Kuipers <ko...@tresata.com> wrote:
> 
> i never found much info that flink was actually designed to be fault 
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that 
> doesn't bode well for large scale data processing. spark was designed with 
> fault tolerance in mind from the beginning.
> 
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> Hi,
> 
> I read the benchmark published by Yahoo. Obviously they already use Storm and 
> inevitably very familiar with that tool. To start with although these 
> benchmarks were somehow interesting IMO, it lend itself to an assurance that 
> the tool chosen for their platform is still the best choice. So inevitably 
> the benchmarks and the tests were done to support primary their approach.
> 
> In general anything which is not done through TCP Council or similar body is 
> questionable..
> Their argument is that because Spark handles data streaming in micro batches 
> then inevitably it introduces this in-built latency as per design. In 
> contrast, both Storm and Flink do not (at the face value) have this issue.
> 
> In addition as we already know Spark has far more capabilities compared to 
> Flink (know nothing about Storm). So really it boils down to the business SLA 
> to choose which tool one wants to deploy for your use case. IMO Spark micro 
> batching approach is probably OK for 99% of use cases. If we had in built 
> libraries for CEP for Spark (I am searching for it), I would not bother with 
> Flink.
> 
> HTH
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU 
> <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> 
> wrote:
> You probably read this benchmark at Yahoo, any comments from Spark?
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>  
> <https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at>
> 
> 
>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com 
>> <mailto:andy.petre...@gmail.com>> wrote:
>> 
>> Just adding one thing to the mix: `that the latency for streaming data is 
>> eliminated` is insane :-D
>> 
>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>>  It seems that Flink argues that the latency for streaming data is 
>> eliminated whereas with Spark RDD there is this latency.
>> 
>> I noticed that Flink does not support interactive shell much like Spark 
>> shell where you can add jars to it to do kafka testing. The advice was to 
>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>> 
>> Most Flink documentation also rather sparce with the usual example of word 
>> count which is not exactly what you want.
>> 
>> Anyway I will have a look at it further. I have a Spark Scala streaming 
>> Kafka program that works fine in Spark and I want to recode it using Scala 
>> for Flink with Kafka but have difficulty importing and testing libraries.
>> 
>> Cheers
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com 
>> <mailto:ascot.m...@gmail.com>> wrote:
>> I compared both last month, seems to me that Flink's MLLib is not yet ready.
>> 
>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> Thanks Ted. I was wondering if someone is using both :)
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com 
>> <mailto:yuzhih...@gmail.com>> wrote:
>> Looks like this question is more relevant on flink mailing list :-)
>> 
>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> Hi,
>> 
>> Has anyone used Apache Flink instead of Spark by any chance
>> 
>> I am interested in its set of libraries for Complex Event Processing.
>> 
>> Frankly I don't know if it offers far more than Spark offers.
>> 
>> Thanks
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> 
>> 
>> 
>> -- 
>> andy
> 
> 
>

Re: Apache Flink

Reply via email to