Thanks Kenn for the clear explanation. Very helpful. I am trying to read a small BQ table as side input and refresh it every 24 hours or so but I still want to main stream to be processed during that time. Is there a better way to do this than have a 24 hour window with 1 minute triggers on the side input? Maybe just restarting the job every 24 hour and reading the side input on setup would be the best option.
On Tue, 5 Jan 2021 at 17:53, Kenneth Knowles <[email protected]> wrote: > You have it basically right. However, there are a couple minor > clarifications: > > 1. A particular window on the side input is not "ready" until there has > been some element output to it (or it has expired, which will make it the > default value). Main input elements will wait for the side input to be > ready. If you configure triggering on the side input, then the first > triggering will make it "ready". Of course, this means that the value you > will read will be incomplete view of the data. If you have a 24 hour window > with triggering set up then the value that is read will be whatever the > most recent trigger is, but with some caching delay. > 2. None of the "time" that you are talking about is real time. It is all > event time so it is controlled by the side input and main input watermarks. > Of course in streaming these are usually close to real time so yes on > average what you describe is probably right. > > It sounds like you want a side input with a trigger on it, if you want to > read it before you have all the data. This is highly nondeterministic so > you want to be sure that you do not require exact answers on the side input. > > Kenn > > On Tue, Jan 5, 2021 at 6:56 AM Manninger, Matyas < > [email protected]> wrote: > >> Dear Beam users, >> >> Can someone clarify me how side input works in streaming? If I use a >> stream as a side input to my main stream, each element will be paired with >> a side input from the according time window. does this mean that the >> element will not be processed until the appropriate window on the side >> input stream is closed? So if my side input is windowed into 24 hour >> windows will my elements from the main stream be processed only every 24 >> hour? If not, then if the window is triggered for the sideinput at 12:00 >> and the input actually only arrives at 12:05 then all elements from the >> main stream processed between 12:00 and 12:05 will be matched with an empty >> sideinput? >> >> Any clarification is appreciated. >> >> Best regards, >> Matyas >> >
