Hi Adam,
sorry for the late reply. Introducing a global state is something that
should be avoided as it introduces bottlenecks and/or concurrency/order
issues. Broadcasting the state between different subtasks will also bring a
loss in performance since each state change has to be shared with every
other subtask. Ideally, you might be able to reconsider the design of your
pipeline.

Is there a specific reason that prevents you from doing the merging on a
single instance?

Best,
Matthias

On Wed, Sep 16, 2020 at 11:21 PM Adam Atrea <atreaa...@gmail.com> wrote:

>
> Hi,
>
> I am pretty new to Flink and I'm trying to implement - which seems to me -
> a pretty basic pattern:
>
> Say I have a single stream of *Price *objects encapsulating a price value
> and a symbol (for example A to Z)  they are emitted at a very random
> interval all day - could be 10000 /day or once a week.
>
> *..(price= .22, symbol = 'B')...(price= .12 , symbol = 'C').. (price= .12
> , symbol = 'A'). .(price= .22, symbol = 'Z')...*
>
> I want to define an operator that should emit a single object only once
> when ALL symbols have been received (not before) - the object will include
> all the received prices.
> If a price is received again for the same symbol it will reemit the same
> object with the updated price and all previous prices will stay the same.
>
> If the stream is keyed by symbol I understand that using MapState state
> will not help because the state is local to the partition - each task will
> need to know that all symbols have been received.
>
> Basically I'm looking for a global state for a single keyed stream - a
> state accessible by all parallel tasks - I have used the broadcast pattern
> but I understand this is when connecting 2 streams - is there a way to do
> it without forcing parallelism to 1.
>
> Thanks in advance for your assistance
>
> Adam
>
>
>

Reply via email to