Hi Miguel,

your initial idea sounds not too bad but why do you want to key by timestamp? Usually, you can simply key your stream by a custom key and store the events in a ListState until a watermark comes in.

But if you really want to have some kind of global event-time order, you have two choices:

- either a single operator with parallelism 1 that performs the ordering
- or you send the every event to every operator using the broadcast state pattern [1]

It is guaranteed that watermark will reach the downstream operator or sink after all events. Watermarks are synchronized across all parallel operator instances. You can store a Map uncheckpointed by this means that you have to ensure to initialize the map again during recovery.

Regards,
Timo


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/state/broadcast_state.html

On 30.04.21 11:37, Miguel Araújo wrote:
Hi everyone,

I have a KeyedProcessFunction whose events I would like to process in event-time order. My initial idea was to use a Map keyed by timestamp and, when a new event arrives, iterate over the Map to process events older than the current watermark.

The issue is that I obviously can't use a MapState, as my stream is keyed, so the map would be scoped to the current key. Is using a "regular" (i.e., not checkpointed) Map an option, given that its content will be recreated by the replay of the events on a restart? Is it guaranteed that the watermark that triggered the processing of multiple events (and their subsequent push downstream) is not sent downstream before these events themselves?

Thanks,
Miguel

Reply via email to