This is a conceptual question, and the answer will probably depend on the specific use case.
But to make the context clear, I'm joining multiple input streams by a common (entity) id. Those inputs represent partial/disjoint updates for the same entity. E.g., consider device interfaces, device vms, device metrics, device vendor information, etc. I've personally worked on multiple such assemblers and the event time strategy was never super clear. That is, how to propagate the inputs' event times to the output, assembled entity. Again, conceptually those services emit entity snapshots at a given time. The strategies that I've commonly found in practice: 1. Use the max timestamp seen among the inputs (that has some nice properties like being monotonically increasing but it can potentially hide lag/delay in one of the inputs) 2. Use the timestamp of the latest processed event (this is how Flink would propagate the timestamp by default, e.g., when using a CoProcessFunction; this approach does not look semantically consistent plus the output timestamp wouldn't be monotonically increasing) 3. Use the watermark in conjunction with event time timers to emit periodic snapshots for example. This would recover the monotonicity at the expense of extra latency, exacerbated by the fact that watermarks are not calculated per-key so to say. On the flip side, it would immediately show if one source is delayed, which might be desirable—or not! Taking the chance for asking if there is something on the roadmap for supporting per-key watermarks along the lines of this: - https://github.com/TawfikYasser/Keyed-Watermarks-in-Apache-Flink Thanks! Salva
