Sorry, that should be shortest path, and diameter of the graph. I shouldn't write emails before I get my morning coffee...
> On 06 Jul 2015, at 09:09, Jan-Paul Bultmann <janpaulbultm...@me.com> wrote: > > I would guess the opposite is true for highly iterative benchmarks (common in > graph processing and data-science). > > Spark has a pretty large overhead per iteration, more optimisations and > planning only makes this worse. > > Sure people implemented things like dijkstra's algorithm in spark > (a problem where the number of iterations is bounded by the circumference of > the input graph), > but all the datasets I've seen it running on had a very small circumference > (which is common for e.g. social networks). > > Take sparkSQL for example. Catalyst is a really good query optimiser, but it > introduces significant overhead. > Since spark has no iterative semantics on its own (unlike flink), > one has to materialise the intermediary dataframe at each iteration boundary > to determine if a termination criterion is reached. > This causes a huge amount of planning, especially since it looks like > catalyst will try to optimise the dependency graph > regardless of caching. A dependency graph that grows in the number of > iterations and thus the size of the input dataset. > > In flink on the other hand, you can describe you entire iterative program > through transformations without ever calling an action. > This means that the optimiser will only have to do planing once. > > Just my 2 cents :) > Cheers, Jan > >> On 06 Jul 2015, at 06:10, n...@reactor8.com <mailto:n...@reactor8.com> wrote: >> >> Maybe some flink benefits from some pts they outline here: >> >> http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html >> <http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html> >> >> Probably if re-ran the benchmarks with 1.5/tungsten line would close the gap >> a bit(or a lot) with spark moving towards similar style off-heap memory >> mgmt, more planning optimizations >> >> >> From: Jerry Lam [mailto:chiling...@gmail.com <mailto:chiling...@gmail.com>] >> Sent: Sunday, July 5, 2015 6:28 PM >> To: Ted Yu >> Cc: Slim Baltagi; user >> Subject: Re: Benchmark results between Flink and Spark >> >> Hi guys, >> >> I just read the paper too. There is no much information regarding why Flink >> is faster than Spark for data science type of workloads in the benchmark. It >> is very difficult to generalize the conclusion of a benchmark from my point >> of view. How much experience the author has with Spark is in comparisons to >> Flink is one of the immediate questions I have. It would be great if they >> have the benchmark software available somewhere for other people to >> experiment. >> >> just my 2 cents, >> >> Jerry >> >> On Sun, Jul 5, 2015 at 4:35 PM, Ted Yu <yuzhih...@gmail.com >> <mailto:yuzhih...@gmail.com>> wrote: >>> There was no mentioning of the versions of Flink and Spark used in >>> benchmarking. >>> >>> The size of cluster is quite small. >>> >>> Cheers >>> >>> On Sun, Jul 5, 2015 at 10:24 AM, Slim Baltagi <sbalt...@gmail.com >>> <mailto:sbalt...@gmail.com>> wrote: >>>> Hi >>>> >>>> Apache Flink outperforms Apache Spark in processing machine learning & >>>> graph >>>> algorithms and relational queries but not in batch processing! >>>> >>>> The results were published in the proceedings of the 18th International >>>> Conference, Business Information Systems 2015, PoznaĆ, Poland, June 24-26, >>>> 2015. >>>> >>>> Thanks to our friend Google, Chapter 3: 'Evaluating New Approaches of Big >>>> Data Analytics Frameworks' by Norman Spangenberg, Martin Roth and Bogdan >>>> Franczyk is available for preview at http://goo.gl/WocQci >>>> <http://goo.gl/WocQci> on pages 28-37. >>>> >>>> Enjoy! >>>> >>>> Slim Baltagi >>>> http://www.SparkBigData.com <http://www.sparkbigdata.com/> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Benchmark-results-between-Flink-and-Spark-tp23626.html >>>> >>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Benchmark-results-between-Flink-and-Spark-tp23626.html> >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com >>>> <http://nabble.com/>. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> <mailto:user-unsubscr...@spark.apache.org> >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> <mailto:user-h...@spark.apache.org>