I have a job which includes about 50+ tasks. I want to split it to multiple
jobs, and the data is transferred through Kafka, but how about watermark?

Is anyone have do something similar and solved this problem?

Here I give an example:
The original job: kafkaStream1(src-topic) => xxxProcess => xxxWindow1 ==>
xxxWindow2 resultSinkToKafka(result-topic).

The new job1: kafkaStream1(src-topic) => xxxProcess => xxxWindow1 ==>
resultSinkToKafka(mid-topic).
The new job2: kafkaStream1(mid-topic) => xxxWindow2 ==>
resultSinkToKafka(result-topic).

The watermark for window1 and window 2 is separated to two jobs, which also
seems to be working, but this introduces a 5-minute delay for window2 (both
window is 5min's cycle).

The key problem is that the window's cycle is 5min, so the window2 will
have a 5min's delay.
If watermark can be transferred between jobs, it is not a problem anymore.

Reply via email to