Hi,
As you noticed, Flink does currently not put Source-X and Throttler-X (for some
X) in the same task slot (TaskManager). In the low-level execution system,
there are two connection patterns: ALL_TO_ALL and POINTWISE. Flink will only
schedule Source-X and Throttler-X on the same slot when
Hi Kostas, Aljoscha,
To answer Kostas’s concern, the algorithm works this way:
Let’s say we have two sources Source-0 and Source-1. Source-0 is slow and
Source-1 is fast. Sources read from Kafka at different paces. Threshold is 10
time units.
1st cycle: Source-0 sends records with timestamp
To quickly make Kostas' intuition concrete: it's currently not possible to have
watermarks broadcast but the data be locally forwarded. The reason is that
watermarks and data travel in the same channels so if the watermark needs to be
broadcast there needs to be an n to m (in this case m == n)
Hi Yunus,
I see. Currently I am not sure that you can simply broadcast the watermark
only, without
having a shuffle.
But one thing to notice about your algorithm is that, I am not sure if your
algorithm solves
the problem you encounter.
Your algorithm seems to prioritize the stream with the
Hi Kostas,
Yes, you have summarized well. I want to only forward the data to the next
local operator, but broadcast the watermark through the cluster.
- I can’t set parallelism of taskB to 1. The stream is too big for that. Also,
the data is ordered at each partition. I don’t want to change