Hi everyone, My team is developing a streaming pipeline for analytics on top of market data. The ultimate goal is to be able to handle tens of millions of events per second distributed across the cluster according to the unique ID of a particular financial instrument. Unfortunately, we struggle with achieving acceptable performance. As far as I can see, Flink forcibly breaks operator chaining when it encounters a job graph node with multiple inputs. Subsequently, it severely affects the performance because a network boundary is enforced, and every event is forcibly serialised and deserialised.
>From the pipeline graph perspective, the requirements are: * Read data from multiple Kafka topics that are connected to different nodes in the graph. * Broadcast a number of dynamic rules to the pipeline. The cleanest way is to achieve the first goal is to have a bunch of KeyedCoProcessFunction operations. This design didn't work for us because the SerDe overhead added by broken chains was too high, we had to completely flatten the pipeline instead. Unfortunately, I can't find any way to solve the second problem. As soon as the broadcast stream is introduced into the pipeline, the performance tanks. Is there any technique that I could possibly utilise to preserve the chaining? Kind regards, Viacheslav