[Question] Wait.on() transform in a streaming pipeline - window not closing until new data arrives?

Paweł Paprota Tue, 12 Dec 2023 23:33:19 -0800

Hello all,

I have a streaming pipeline with a JmsIO source and downstream there are some 
custom transforms that include the Wait.on() transform. This is done to wait on 
certain things like writing to database before publishing messages to JMS at 
the end of my pipeline.


This custom transform is also used in a batch pipeline and it works perfectly, 
waiting works as expected and all is good.

However, as I mentioned, I am now using this custom transform in a streaming 
pipeline and it appears that the Wait.on() never "finishes" until the watermark 
gets past the end of the window. (I am using a window because the custom 
transform has GroupByKey in it; and also it uses FileIO.writeDynamic which uses 
GroupByKey internally it seems.)

I realize this is how Wait.on() (and the trigger Never.ever() used inside) are 
supposed to work but then I wonder how to make it work in a streaming setup? I 
tried with different window types and triggers, but that doesn't seem to make 
any difference.

The only thing that "unblocks" the Wait.on() is simply moving the watermark 
further - by pushing more data to the JMS source. In my use case, the source is 
not always producing data constantly, sometimes it is just a big burst of 
messages once a day. If those messages fit into one window, processing would 
hold them until next batch of data arrives (possibly, next day) which is not 
desired - I would like that it basically works as in batch mode, so the whole 
custom transform is executed for each window - together with Wait.on() working 
and expected, and subsequent transforms after waiting.

Does it make sense or am I missing something in the general design of 
windowing/triggers/watermark/Wait.on/...?

-- 
Paweł

[Question] Wait.on() transform in a streaming pipeline - window not closing until new data arrives?

Reply via email to