Hey,

If you are worried about increased amount of buffered data by the
WindowOperator if watermarks/event time is not progressing uniformly across
multiple sources, then there is little you can do currently. FLIP-27 [1]
will allow us to address this problem in more generic way. What you can
currently do is one of two things:

1. Implement a custom throttling function/operator sitting after the
sources, that would throttle the sources. If you chain it with the source
function, it's relatively ok solution. Note, while you are blocking
execution, you will be blocking for example checkpoints from happening. So
it's better to sleep 10 ms per every record, compared to sleep 10 seconds
once every 1000 records.
2. Throttle the sources themselves (you would need to modify or write your
custom sources).

But in both cases you need to manually track the event time, and manually
make decision which source should be throttled and by how much.

Best regards, Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-EventTimeAlignment

śr., 16 wrz 2020 o 04:17 hao kong <h...@lemonbox.me> napisał(a):

> Hello guys,
>
> I have a job with multiple Kafka sources. They all contain certain
> historical data. If you use the events-time window, it will cause sources
> with less data to cover more sources through water mark.
>
>
> I can think of a solution, Implement a scheduler in the source phase, But
> it is quite complicated to implement. Are ther otherbetter solutions?
>
>
> Any suggestions?
> Thanks!
>
>
>

Reply via email to