Hey!
My name is Teodor Spæren and I'm writing a master thesis investigating
the performance overhead of using Beam instead of using the underlying
systems directly. My focus has been on Flink and I've made a discovery
about some unnecessary copying between operators in the Flink
runner[1][2]. I wrote a fixed for this and it got accepted and merged,
and will be in the upcoming 2.26.0 release[3].
I'm writing this email to ask if anyone on these mailing lists would be
willing to send me some result of applying this option when the new
version of beam releases. Anything will be very much appreciated,
stories, screenshots of performance monitoring before and after, hard
numbers, anything! If you include the cluster size and the workload that
would be awesome too! My master thesis is set to be complete the coming
summer, so there is no real hurry :)
The thesis will be freely accessible[4] and I hope that these
findings will be of help to the beam community. If anyone wishes to
submit stories, but remain anonymous that is also ok :)
The best way to contact me would be to send an email my way here, or on
teod...@mail.uio.no.
Any help is appreciated, thanks for your attention!
Best regards,
Teodor Spæren
[1]:
https://lists.apache.org/thread.html/r24129dba98782e1cf4d18ec738ab9714dceb05ac23f13adfac5baad1%40%3Cdev.beam.apache.org%3E
[2]: https://issues.apache.org/jira/browse/BEAM-11146
[3]: https://github.com/apache/beam/pull/13240
[4]: https://www.duo.uio.no/