Re: Benchmark results between Flink and Spark

Jan-Paul Bultmann Mon, 06 Jul 2015 00:15:13 -0700

Sorry, that should be shortest path, and diameter of the graph.
I shouldn't write emails before I get my morning coffee...


> On 06 Jul 2015, at 09:09, Jan-Paul Bultmann <janpaulbultm...@me.com> wrote:
> 
> I would guess the opposite is true for highly iterative benchmarks (common in 
> graph processing and data-science).
> 
> Spark has a pretty large overhead per iteration, more optimisations and 
> planning only makes this worse.
> 
> Sure people implemented things like dijkstra's algorithm in spark
> (a problem where the number of iterations is bounded by the circumference of 
> the input graph),
> but all the datasets I've seen it running on had a very small circumference 
> (which is common for e.g. social networks).
> 
> Take sparkSQL for example. Catalyst is a really good query optimiser, but it 
> introduces significant overhead.
> Since spark has no iterative semantics on its own (unlike flink),
> one has to materialise the intermediary dataframe at each iteration boundary 
> to determine if a termination criterion is reached.
> This causes a huge amount of planning, especially since it looks like 
> catalyst will try to optimise the dependency graph
> regardless of caching. A dependency graph that grows in the number of 
> iterations and thus the size of the input dataset.
> 
> In flink on the other hand, you can describe you entire iterative program 
> through transformations without ever calling an action.
> This means that the optimiser will only have to do planing once.
> 
> Just my 2 cents :)
> Cheers, Jan
> 
>> On 06 Jul 2015, at 06:10, n...@reactor8.com <mailto:n...@reactor8.com> wrote:
>> 
>> Maybe some flink benefits from some pts they outline here:
>>  
>> http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html 
>> <http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html>
>>  
>> Probably if re-ran the benchmarks with 1.5/tungsten line would close the gap 
>> a bit(or a lot) with spark moving towards similar style off-heap memory 
>> mgmt, more planning optimizations
>>  
>>  
>> From: Jerry Lam [mailto:chiling...@gmail.com <mailto:chiling...@gmail.com>] 
>> Sent: Sunday, July 5, 2015 6:28 PM
>> To: Ted Yu
>> Cc: Slim Baltagi; user
>> Subject: Re: Benchmark results between Flink and Spark
>>  
>> Hi guys,
>>  
>> I just read the paper too. There is no much information regarding why Flink 
>> is faster than Spark for data science type of workloads in the benchmark. It 
>> is very difficult to generalize the conclusion of a benchmark from my point 
>> of view. How much experience the author has with Spark is in comparisons to 
>> Flink is one of the immediate questions I have. It would be great if they 
>> have the benchmark software available somewhere for other people to 
>> experiment.
>>  
>> just my 2 cents,
>>  
>> Jerry
>>  
>> On Sun, Jul 5, 2015 at 4:35 PM, Ted Yu <yuzhih...@gmail.com 
>> <mailto:yuzhih...@gmail.com>> wrote:
>>> There was no mentioning of the versions of Flink and Spark used in 
>>> benchmarking.
>>>  
>>> The size of cluster is quite small.
>>>  
>>> Cheers
>>>  
>>> On Sun, Jul 5, 2015 at 10:24 AM, Slim Baltagi <sbalt...@gmail.com 
>>> <mailto:sbalt...@gmail.com>> wrote:
>>>> Hi
>>>> 
>>>> Apache Flink outperforms Apache Spark in processing machine learning & 
>>>> graph
>>>> algorithms and relational queries but not in batch processing!
>>>> 
>>>> The results were published in the proceedings of the 18th International
>>>> Conference, Business Information Systems 2015, Poznań, Poland, June 24-26,
>>>> 2015.
>>>> 
>>>> Thanks to our friend Google, Chapter 3: 'Evaluating New Approaches of Big
>>>> Data Analytics Frameworks' by Norman Spangenberg, Martin Roth and Bogdan
>>>> Franczyk is available for preview at http://goo.gl/WocQci 
>>>> <http://goo.gl/WocQci> on pages 28-37.
>>>> 
>>>> Enjoy!
>>>> 
>>>> Slim Baltagi
>>>> http://www.SparkBigData.com <http://www.sparkbigdata.com/>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context: 
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Benchmark-results-between-Flink-and-Spark-tp23626.html
>>>>  
>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Benchmark-results-between-Flink-and-Spark-tp23626.html>
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com 
>>>> <http://nabble.com/>.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>>>> <mailto:user-unsubscr...@spark.apache.org>
>>>> For additional commands, e-mail: user-h...@spark.apache.org 
>>>> <mailto:user-h...@spark.apache.org>

Re: Benchmark results between Flink and Spark

Reply via email to