Suppose I have a data stream of tuples <tick: Int, key: Int, Value: Double> with the sequence of ticks being 1,2,3,…. for each separate k.
I understand and keyBy(2) will partition the stream so each partition has the same key in each tuple. I now have a sequence of functions to apply to the streams say f(),g() and h() in that order. With parallelism set to 1 then each partition-stream passes through f then g then h (f | g | h) in order of tick. I want to run each partition-stream in parallel, setting parallelism in the Web GUI. My question is how do I ensure each partition stream passes through a fixed sequence (f | g | h) rather than if parallelism is p running p instances each of f g & h with no guarantee that each partition-stream flows through a unique set of three instances in tick-order, especially if p is greater than the largest value of key. A typical use case would be to maintain a moving average over each key I need to remove the crossover in the middle box, so [1] -> [1] -> [1] and [2] -> [2] -> [2], instead of [1] -> [1] -> [1 or 2] . Nick