> Regarding the event-time processing and watermarking, I have got that if
> an event will be received late, after the allowed lateness time, it will be
> dropped even though I think it is an antithesis of exactly-once semantic.
>
> Yes, allowed lateness is a compromise between exactly-once semantic and
> acceptable delay of streaming application. Flink cannot ensure all data
> sources could generate data without any late which is not the scope of a
> streaming system should do. If you really need to the exactly once in
> event-time processing in this scenario, I suggest to run a batch job later
> to consume all data source and use that result as a credible one. For
> processing-time data, use Flink to generate a credible result is enough.
>

The default behavior is to drop late event, but you can tolerate as much
lateness as you need via `allowedLateness()` (Window parameter) and
re-trigger the window computation taking also into account late events. Of
course the memory consumption increases at the increase of the allowed
lateness, and in streaming scenarios you usually go for a sensible
trade-off as Yun Tang was mentioning. To selectively store late events for
further processing, you can use a custom `ProcessFunction` which sends late
events to a SideOutput, and store them somewhere (e.g., HDFS).

Best regards,
Alessandro

Reply via email to