Actually, if you want to actually re-read the BQ table then you need something more, following the pattern here: https://beam.apache.org/documentation/patterns/side-inputs/. There are two variations on the page there, and these do not use triggers but instead the read from BigQuery at the beginning of the 24 hours is used by all of the main input elements for the whole 24 hour window of the main input. The general pattern is PeriodImpulse --> ParDo(convert impulse to read spec) --> ReadAll
I realize you did not specify what language you are using. The ReadAll transform only exists for BigQuery in Python right now, and it is not yet easy to use it from Java (plus you may not want to). Kenn On Thu, Jan 7, 2021 at 12:36 AM Manninger, Matyas < [email protected]> wrote: > Thanks Kenn for the clear explanation. Very helpful. I am trying to read a > small BQ table as side input and refresh it every 24 hours or so but I > still want to main stream to be processed during that time. Is there a > better way to do this than have a 24 hour window with 1 minute triggers on > the side input? Maybe just restarting the job every 24 hour and reading the > side input on setup would be the best option. > > On Tue, 5 Jan 2021 at 17:53, Kenneth Knowles <[email protected]> wrote: > >> You have it basically right. However, there are a couple minor >> clarifications: >> >> 1. A particular window on the side input is not "ready" until there has >> been some element output to it (or it has expired, which will make it the >> default value). Main input elements will wait for the side input to be >> ready. If you configure triggering on the side input, then the first >> triggering will make it "ready". Of course, this means that the value you >> will read will be incomplete view of the data. If you have a 24 hour window >> with triggering set up then the value that is read will be whatever the >> most recent trigger is, but with some caching delay. >> 2. None of the "time" that you are talking about is real time. It is all >> event time so it is controlled by the side input and main input watermarks. >> Of course in streaming these are usually close to real time so yes on >> average what you describe is probably right. >> >> It sounds like you want a side input with a trigger on it, if you want to >> read it before you have all the data. This is highly nondeterministic so >> you want to be sure that you do not require exact answers on the side input. >> >> Kenn >> >> On Tue, Jan 5, 2021 at 6:56 AM Manninger, Matyas < >> [email protected]> wrote: >> >>> Dear Beam users, >>> >>> Can someone clarify me how side input works in streaming? If I use a >>> stream as a side input to my main stream, each element will be paired with >>> a side input from the according time window. does this mean that the >>> element will not be processed until the appropriate window on the side >>> input stream is closed? So if my side input is windowed into 24 hour >>> windows will my elements from the main stream be processed only every 24 >>> hour? If not, then if the window is triggered for the sideinput at 12:00 >>> and the input actually only arrives at 12:05 then all elements from the >>> main stream processed between 12:00 and 12:05 will be matched with an empty >>> sideinput? >>> >>> Any clarification is appreciated. >>> >>> Best regards, >>> Matyas >>> >>
