Happy new year everyone :)

I’m currently working on a paper about Flink. I already got some 
recommendations on general papers with details about Flink, which helped me a 
lot already. But now that I read them, I’m further interested is the speedup 
capabilities, provided by the Flink Framework: How „far“ can it scale 
efficiently?

Amdahls law states that a parallelization is only efficient as long as the 
non-parallelizable part of the processing (time for the communication between 
the nodes etc.) doesn’t „eat up“ the speed gains of parallelization (= parallel 
slowdown). 
Of course, the communication overhead is mostly caused by the implementation, 
but the frameworks specific solution for the communication between the nodes 
has a reasonable effect as well.

After studying these papers, it looks like, although Flinks performance is 
better in many cases, the possible speedup is equal to the possible speedup of 
Spark.
1. Spark versus Flink - Understanding Performance in Big Data Analytics 
Frameworks | https://hal.inria.fr/hal-01347638/document
 <https://hal.inria.fr/hal-01347638/document>2. Big Data Analytics on Cray XC 
Series DataWarp using Hadoop, Spark and Flink | 
https://cug.org/proceedings/cug2016_proceedings/includes/files/pap141.pdf 
<https://cug.org/proceedings/cug2016_proceedings/includes/files/pap141.pdf>
3. Thrill - High-Performance Algorithmic Distributed Batch Data Processing with 
C++ | 
https://panthema.net/2016/0816-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP/1608.05634v1.pdf
 
<https://panthema.net/2016/0816-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP/1608.05634v1.pdf>

Does someone have …
… more information (or data) on speedup of Flink applications? 
… experience (or data) with Flink in an extremely paralellized environment?
… detailed information on how the nodes communicate, especially when they are 
waiting for task results of one another?

Thank you very much for your time & answers!
Hanna

Reply via email to