Re: Comparison between Beam (with Flink runner) and using the Flink SDK directly

Maximilian Michels Mon, 10 Sep 2018 08:48:02 -0700

Hi Thalita,

Your code examples for Beam/Flink are not equivalent.

For the Beam version, you load data from Kafka, then you window, producea KV<Window, KafkaRecord>, group by the window, and finally process thegroupings.

For Flink, you load from Kafka, parse the Kafka Json, then window andkeyby the id from Kafka, followed by the processing.

The jobs are semantically very different because in Beam you group by aninterval window, but in Flink you group by a key extracted from Kafka.


Best,
Max

On 07.09.18 10:41, Vergilio, Thalita wrote:

Hi,
My name is Thalita and I am a PhD student researching big dataprocessing using a multi-cloud architecture. I am currently evaluatingthe advantages of using an intermediate framework such as Beam in orderto achieve framework agnosticism for processing code (write the codeonce, use any framework to run it).
While running some experiments to assess the “cost” of using anadditional framework such as Beam, as opposed to using the runner’s SDKdirectly, I have come across some bizarre findings. I producedequivalent processing pipelines using Beam and the Flink SDK. I then ranthem on my Flink cluster (same infrastructure, same settings), and foundthat the Beam pipeline performed significantly better than the Flinkpipeline.
There are more details of this experiment here:

https://stackoverflow.com/questions/52182558/how-flink-and-beam-sdks-handle-windowing-which-is-more-efficient

I should be grateful for any insight from you guys on:

1.Whether I am comparing like for like (are the code samples equivalent?).

2.Any possible explanations for what I am observing.
3.Your own experiences using Beam vs. using the runner’s SDK directly.Are they different from mine?
Thank you very much for your help. Any advice you can give me is muchmuch appreciated J
Best wishes,

Thalita

To view the terms under which this email is distributed, please go to:-
http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html

Re: Comparison between Beam (with Flink runner) and using the Flink SDK directly

Reply via email to