Hi all.

I'm curious if anyone has done any comparison of the performance of a
pipeline that uses CombineByKey, vs one that uses a stateful DoFn with
combining state. [1]

More specifically, if I had a pipeline that had a CombineByKey configured
with early firings every N minutes, and I replaced the CBK with a stateful
DoFn with combining state and a timer that fired every N minutes instead,
would there be a (significant?) performance difference?  Specifically I'm
using dataflow (with streaming engine) but I'd be curious for other runners
as well

If no one has tried this I might do a benchmark to test, I'd be very
interested to see the results.

[1]
https://beam.apache.org/releases/javadoc/2.11.0/org/apache/beam/sdk/state/CombiningState.html

Reply via email to