That is an unfortunate bug. I found the JIRA.

It is actually just a transform built out of other primitives. You could
use it as the basis for your own version that works, until the fix has been
released.

Kenn

On Fri, Jan 8, 2021 at 12:08 AM Manninger, Matyas <
[email protected]> wrote:

> Dear Kenn,
>
> Thanks again, that pattern was my initial plan but there seems to be a bug
> in the python API in the periodicsequence.py on line 42 "total_outputs =
> math.ceil((end - start) / interval)". Here end start and interval are all
> Durations and the / operator is not defined for the Duration class. I
> already wrote an email about this to this list but I didn't get a
> satisfactory answer on how to go around this issue so now I am trying to go
> around using PeriodicImpulse. If you have any other suggestions I would
> highly appreciate it.
>
> On Thu, 7 Jan 2021 at 18:29, Kenneth Knowles <[email protected]> wrote:
>
>> Actually, if you want to actually re-read the BQ table then you need
>> something more, following the pattern here:
>> https://beam.apache.org/documentation/patterns/side-inputs/. There are
>> two variations on the page there, and these do not use triggers but instead
>> the read from BigQuery at the beginning of the 24 hours is used by all of
>> the main input elements for the whole 24 hour window of the main input. The
>> general pattern is PeriodImpulse --> ParDo(convert impulse to read spec)
>> --> ReadAll
>>
>> I realize you did not specify what language you are using. The ReadAll
>> transform only exists for BigQuery in Python right now, and it is not yet
>> easy to use it from Java (plus you may not want to).
>>
>> Kenn
>>
>> On Thu, Jan 7, 2021 at 12:36 AM Manninger, Matyas <
>> [email protected]> wrote:
>>
>>> Thanks Kenn for the clear explanation. Very helpful. I am trying to read
>>> a small BQ table as side input and refresh it every 24 hours or so but I
>>> still want to main stream to be processed during that time. Is there a
>>> better way to do this than have a 24 hour window with 1 minute triggers on
>>> the side input? Maybe just restarting the job every 24 hour and reading the
>>> side input on setup would be the best option.
>>>
>>> On Tue, 5 Jan 2021 at 17:53, Kenneth Knowles <[email protected]> wrote:
>>>
>>>> You have it basically right. However, there are a couple minor
>>>> clarifications:
>>>>
>>>> 1. A particular window on the side input is not "ready" until there has
>>>> been some element output to it (or it has expired, which will make it the
>>>> default value). Main input elements will wait for the side input to be
>>>> ready. If you configure triggering on the side input, then the first
>>>> triggering will make it "ready". Of course, this means that the value you
>>>> will read will be incomplete view of the data. If you have a 24 hour window
>>>> with triggering set up then the value that is read will be whatever the
>>>> most recent trigger is, but with some caching delay.
>>>> 2. None of the "time" that you are talking about is real time. It is
>>>> all event time so it is controlled by the side input and main input
>>>> watermarks. Of course in streaming these are usually close to real time so
>>>> yes on average what you describe is probably right.
>>>>
>>>> It sounds like you want a side input with a trigger on it, if you want
>>>> to read it before you have all the data. This is highly nondeterministic so
>>>> you want to be sure that you do not require exact answers on the side 
>>>> input.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, Jan 5, 2021 at 6:56 AM Manninger, Matyas <
>>>> [email protected]> wrote:
>>>>
>>>>> Dear Beam users,
>>>>>
>>>>> Can someone clarify me how side input works in streaming? If I use a
>>>>> stream as a side input to my main stream, each element will be paired with
>>>>> a side input from the according time window. does this mean that the
>>>>> element will not be processed until the appropriate window on the side
>>>>> input stream is closed? So if my side input is windowed into 24 hour
>>>>> windows will my elements from the main stream be processed only every 24
>>>>> hour? If not, then if the window is triggered for the sideinput at 12:00
>>>>> and the input actually only arrives at 12:05 then all elements from the
>>>>> main stream processed between 12:00 and 12:05 will be matched with an 
>>>>> empty
>>>>> sideinput?
>>>>>
>>>>> Any clarification is appreciated.
>>>>>
>>>>> Best regards,
>>>>> Matyas
>>>>>
>>>>

Reply via email to