Suppose I have a data stream of tuples <tick: Int, key: Int, Value: Double> 
with the sequence of ticks being 1,2,3,…. for each separate k.

I understand and keyBy(2) will partition the stream so each partition has the 
same key in each tuple. I now have a sequence of functions to apply to the 
streams say f(),g() and h() in that order. 

With parallelism set to 1 then each partition-stream passes through f then g 
then h (f | g | h) in order of tick.

I want to run each partition-stream in parallel, setting parallelism in the Web 
GUI. 

My question is how do I ensure  each partition stream passes through a fixed 
sequence (f | g | h)  rather than if parallelism is p running p instances each 
of f g & h with no guarantee that each partition-stream flows through a unique 
set of three instances  in tick-order, especially if p is greater than the 
largest value of key. 

A typical use case would be to maintain a moving average over each key 



I need to remove the crossover in the middle box, so [1] -> [1] -> [1] and [2] 
-> [2] -> [2], instead of  [1] -> [1] -> [1 or 2] .

Nick

Reply via email to