Hi,

My name is Thalita and I am a PhD student researching big data processing using 
a multi-cloud architecture. I am currently evaluating the advantages of using 
an intermediate framework such as Beam in order to achieve framework 
agnosticism for processing code (write the code once, use any framework to run 
it).

While running some experiments to assess the "cost" of using an additional 
framework such as Beam, as opposed to using the runner's SDK directly, I have 
come across some bizarre findings. I produced equivalent processing pipelines 
using Beam and the Flink SDK. I then ran them on my Flink cluster (same 
infrastructure, same settings), and found that the Beam pipeline performed 
significantly better than the Flink pipeline.

There are more details of this experiment here:
https://stackoverflow.com/questions/52182558/how-flink-and-beam-sdks-handle-windowing-which-is-more-efficient

I should be grateful for any insight from you guys on:

1.       Whether I am comparing like for like (are the code samples 
equivalent?).

2.       Any possible explanations for what I am observing.

3.       Your own experiences using Beam vs. using the runner's SDK directly. 
Are they different from mine?

Thank you very much for your help. Any advice you can give me is much much 
appreciated :)

Best wishes,

Thalita
To view the terms under which this email is distributed, please go to:-
http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html

Reply via email to