Dear Kenn,

Thanks again, that pattern was my initial plan but there seems to be a bug
in the python API in the periodicsequence.py on line 42 "total_outputs =
math.ceil((end - start) / interval)". Here end start and interval are all
Durations and the / operator is not defined for the Duration class. I
already wrote an email about this to this list but I didn't get a
satisfactory answer on how to go around this issue so now I am trying to go
around using PeriodicImpulse. If you have any other suggestions I would
highly appreciate it.

On Thu, 7 Jan 2021 at 18:29, Kenneth Knowles <[email protected]> wrote:

> Actually, if you want to actually re-read the BQ table then you need
> something more, following the pattern here:
> https://beam.apache.org/documentation/patterns/side-inputs/. There are
> two variations on the page there, and these do not use triggers but instead
> the read from BigQuery at the beginning of the 24 hours is used by all of
> the main input elements for the whole 24 hour window of the main input. The
> general pattern is PeriodImpulse --> ParDo(convert impulse to read spec)
> --> ReadAll
>
> I realize you did not specify what language you are using. The ReadAll
> transform only exists for BigQuery in Python right now, and it is not yet
> easy to use it from Java (plus you may not want to).
>
> Kenn
>
> On Thu, Jan 7, 2021 at 12:36 AM Manninger, Matyas <
> [email protected]> wrote:
>
>> Thanks Kenn for the clear explanation. Very helpful. I am trying to read
>> a small BQ table as side input and refresh it every 24 hours or so but I
>> still want to main stream to be processed during that time. Is there a
>> better way to do this than have a 24 hour window with 1 minute triggers on
>> the side input? Maybe just restarting the job every 24 hour and reading the
>> side input on setup would be the best option.
>>
>> On Tue, 5 Jan 2021 at 17:53, Kenneth Knowles <[email protected]> wrote:
>>
>>> You have it basically right. However, there are a couple minor
>>> clarifications:
>>>
>>> 1. A particular window on the side input is not "ready" until there has
>>> been some element output to it (or it has expired, which will make it the
>>> default value). Main input elements will wait for the side input to be
>>> ready. If you configure triggering on the side input, then the first
>>> triggering will make it "ready". Of course, this means that the value you
>>> will read will be incomplete view of the data. If you have a 24 hour window
>>> with triggering set up then the value that is read will be whatever the
>>> most recent trigger is, but with some caching delay.
>>> 2. None of the "time" that you are talking about is real time. It is all
>>> event time so it is controlled by the side input and main input watermarks.
>>> Of course in streaming these are usually close to real time so yes on
>>> average what you describe is probably right.
>>>
>>> It sounds like you want a side input with a trigger on it, if you want
>>> to read it before you have all the data. This is highly nondeterministic so
>>> you want to be sure that you do not require exact answers on the side input.
>>>
>>> Kenn
>>>
>>> On Tue, Jan 5, 2021 at 6:56 AM Manninger, Matyas <
>>> [email protected]> wrote:
>>>
>>>> Dear Beam users,
>>>>
>>>> Can someone clarify me how side input works in streaming? If I use a
>>>> stream as a side input to my main stream, each element will be paired with
>>>> a side input from the according time window. does this mean that the
>>>> element will not be processed until the appropriate window on the side
>>>> input stream is closed? So if my side input is windowed into 24 hour
>>>> windows will my elements from the main stream be processed only every 24
>>>> hour? If not, then if the window is triggered for the sideinput at 12:00
>>>> and the input actually only arrives at 12:05 then all elements from the
>>>> main stream processed between 12:00 and 12:05 will be matched with an empty
>>>> sideinput?
>>>>
>>>> Any clarification is appreciated.
>>>>
>>>> Best regards,
>>>> Matyas
>>>>
>>>

Reply via email to