Hi Adam, sorry for the late reply. Introducing a global state is something that should be avoided as it introduces bottlenecks and/or concurrency/order issues. Broadcasting the state between different subtasks will also bring a loss in performance since each state change has to be shared with every other subtask. Ideally, you might be able to reconsider the design of your pipeline.
Is there a specific reason that prevents you from doing the merging on a single instance? Best, Matthias On Wed, Sep 16, 2020 at 11:21 PM Adam Atrea <atreaa...@gmail.com> wrote: > > Hi, > > I am pretty new to Flink and I'm trying to implement - which seems to me - > a pretty basic pattern: > > Say I have a single stream of *Price *objects encapsulating a price value > and a symbol (for example A to Z) they are emitted at a very random > interval all day - could be 10000 /day or once a week. > > *..(price= .22, symbol = 'B')...(price= .12 , symbol = 'C').. (price= .12 > , symbol = 'A'). .(price= .22, symbol = 'Z')...* > > I want to define an operator that should emit a single object only once > when ALL symbols have been received (not before) - the object will include > all the received prices. > If a price is received again for the same symbol it will reemit the same > object with the updated price and all previous prices will stay the same. > > If the stream is keyed by symbol I understand that using MapState state > will not help because the state is local to the partition - each task will > need to know that all symbols have been received. > > Basically I'm looking for a global state for a single keyed stream - a > state accessible by all parallel tasks - I have used the broadcast pattern > but I understand this is when connecting 2 streams - is there a way to do > it without forcing parallelism to 1. > > Thanks in advance for your assistance > > Adam > > >