Hi Beamers,

I was wondering if anyone had any further thoughts on this. I have updated the 
Stack Overflow question with more information:
https://stackoverflow.com/questions/52182558/how-flink-and-beam-sdks-handle-windowing-which-is-more-efficient

Basically, it looks like differences in serialisation could be causing 
performance issues in the Flink scenario. The Beam example works with a 
collection of KafkaRecord<byte[], byte[]> up until it has to calculate the PUE 
for the window. In the Flink example, deserialisation into ObjectNode objects 
occurs much earlier in the process, before the stream is even windowed. What do 
you reckon?

Another difference is that the Flink SDK uses Jackson under the hood to 
deserialise the Kafka stream into ObjectNode objects. In the Beam example, I 
wrote my own deserialiser using Gson. I don't think this alone could have 
caused the big difference in performance though.

Finally, what Max highlighted earlier: the Flink example requires a keyed 
stream before a windowing function can be applied. This could have a 
performance impact. Is there a way around it using the Flink SDK?

Many thanks,

Thalita

-----Original Message-----
From: Vergilio, Thalita
Sent: 10 September 2018 17:07
To: 'Maximilian Michels' <[email protected]>; [email protected]
Cc: Mullier, Duncan <[email protected]>; Ramachandran, Muthu 
<[email protected]>
Subject: RE: Comparison between Beam (with Flink runner) and using the Flink 
SDK directly

Hi Max,

Thank you very much for that. Is there a way that I could group by window in 
Flink? It did occur to me that I should be doing that but, as far as I 
understand, Flink requires a keyed stream before a windowing function can be 
applied. I couldn't think of a way around it.

Many thanks,

Thalita

-----Original Message-----
From: Maximilian Michels <[email protected]>
Sent: 10 September 2018 16:48
To: [email protected]; Vergilio, Thalita <[email protected]>
Cc: Mullier, Duncan <[email protected]>; Ramachandran, Muthu 
<[email protected]>
Subject: Re: Comparison between Beam (with Flink runner) and using the Flink 
SDK directly

Hi Thalita,

Your code examples for Beam/Flink are not equivalent.

For the Beam version, you load data from Kafka, then you window, produce a 
KV<Window, KafkaRecord>, group by the window, and finally process the groupings.

For Flink, you load from Kafka, parse the Kafka Json, then window and keyby the 
id from Kafka, followed by the processing.

The jobs are semantically very different because in Beam you group by an 
interval window, but in Flink you group by a key extracted from Kafka.

Best,
Max

On 07.09.18 10:41, Vergilio, Thalita wrote:
> Hi,
>
> My name is Thalita and I am a PhD student researching big data
> processing using a multi-cloud architecture. I am currently evaluating
> the advantages of using an intermediate framework such as Beam in
> order to achieve framework agnosticism for processing code (write the
> code once, use any framework to run it).
>
> While running some experiments to assess the "cost" of using an
> additional framework such as Beam, as opposed to using the runner's
> SDK directly, I have come across some bizarre findings. I produced
> equivalent processing pipelines using Beam and the Flink SDK. I then
> ran them on my Flink cluster (same infrastructure, same settings), and
> found that the Beam pipeline performed significantly better than the
> Flink pipeline.
>
> There are more details of this experiment here:
>
> https://stackoverflow.com/questions/52182558/how-flink-and-beam-sdks-h
> andle-windowing-which-is-more-efficient
>
> I should be grateful for any insight from you guys on:
>
> 1.Whether I am comparing like for like (are the code samples equivalent?).
>
> 2.Any possible explanations for what I am observing.
>
> 3.Your own experiences using Beam vs. using the runner's SDK directly.
> Are they different from mine?
>
> Thank you very much for your help. Any advice you can give me is much
> much appreciated J
>
> Best wishes,
>
> Thalita
>
> To view the terms under which this email is distributed, please go
> to:- http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html
>
To view the terms under which this email is distributed, please go to:-
http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html

Reply via email to