Hi,

this depends on how you generate watermarks [1].
You could generate watermarks with a four hour delay and be fine (at the
cost of a four hour latency) or have some checks that you don't increment a
watermark by more than x minutes at a time.
These considerations are quite use case specific, so it's hard to give an
advice that applies to all cases.

There are also different strategies for how to handle late data in windows.
You can drop it (default behavior), you can update previously emitted
results (allowed lateness) [2], or emit them to a side output [3].

Flink is quite flexible when dealing with watermarks and late data.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_timestamps_watermarks.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#allowed-lateness
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#getting-late-data-as-a-side-output

2017-12-12 10:16 GMT+01:00 Jinhua Luo <luajit...@gmail.com>:

> Hi All,
>
> The watermark is monotonous incremental in a stream, correct?
>
> Given a stream out-of-order extremely, e.g.
> e4(12:04:33) --> e3 (15:00:22) --> e2(12:04:21) --> e1 (12:03:01)
>
> Here e1 appears first, so watermark start from 12:03:01, so e3 is an
> early event, it would be placed in another window, and fired
> individually, correct? If so, the result is not bad.
>
> The worse case is:
>
> e4(12:04:33) --> e3 (12:03:01) --> e2(12:04:21) --> e1 (15:00:22)
>
>
> Then e2,e3,e4 would be considered late events and get discarded? And
> the watermark are set to a wrong value permanently?
>
> So the stream must not be that out-of-order, otherwise flink could not
> handle them well?
>

Reply via email to