Hi Miguel,
your initial idea sounds not too bad but why do you want to key by
timestamp? Usually, you can simply key your stream by a custom key and
store the events in a ListState until a watermark comes in.
But if you really want to have some kind of global event-time order, you
have two choices:
- either a single operator with parallelism 1 that performs the ordering
- or you send the every event to every operator using the broadcast
state pattern [1]
It is guaranteed that watermark will reach the downstream operator or
sink after all events. Watermarks are synchronized across all parallel
operator instances. You can store a Map uncheckpointed by this means
that you have to ensure to initialize the map again during recovery.
Regards,
Timo
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/state/broadcast_state.html
On 30.04.21 11:37, Miguel Araújo wrote:
Hi everyone,
I have a KeyedProcessFunction whose events I would like to process in
event-time order.
My initial idea was to use a Map keyed by timestamp and, when a new
event arrives, iterate over the Map to process events older than the
current watermark.
The issue is that I obviously can't use a MapState, as my stream is
keyed, so the map would be scoped to the current key.
Is using a "regular" (i.e., not checkpointed) Map an option, given that
its content will be recreated by the replay of the events on a restart?
Is it guaranteed that the watermark that triggered the processing of
multiple events (and their subsequent push downstream) is not sent
downstream before these events themselves?
Thanks,
Miguel