As far as I know, that behavior is not specified. I do not think that
Dataflow streaming supports this sort of updating to side inputs, though
I've added Slava who might have more to add.

If updating side inputs is really not supported in Dataflow, you may be
able to use a LoadingCache, like so:
https://lists.apache.org/thread.html/%[email protected]%3E

Best
-P.

On Tue, May 29, 2018 at 2:36 PM Carlos Alonso <[email protected]> wrote:

> Hi Lukasz, many thanks for your responses.
>
> I'm actually using them but I think I'm not being able to synchronise the
> following steps:
> 1: The side input gets its first value (v1)
> 2: The main stream gets that side input applied and finds that v1 value
> 3: The side one gets updated (v2)
> 4: The main stream gets the side input applied again and finds the v2
> value (along with v1 as this is multimap)
>
> Regards
>
> On Tue, May 29, 2018 at 10:57 PM Lukasz Cwik <[email protected]> wrote:
>
>> Your best bet is to use TestStreams[1] as it is used to validate
>> window/triggering behavior. Note that the transform requires special runner
>> based execution and currently only works with the DirectRunner. All
>> examples are marked with the JUnit category "UsesTestStream", for example
>> [2].
>>
>> 1:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestStream.java
>> 2:
>> https://github.com/apache/beam/blob/0cbcf4ad1db7d820c5476d636f3a3d69062021a5/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L69
>>
>>
>> On Tue, May 29, 2018 at 1:05 PM Carlos Alonso <[email protected]>
>> wrote:
>>
>>> Hi all!!
>>>
>>> Basically that's what I'm trying to do. I'm building a pipeline that has
>>> a refreshing, multimap, side input (BQ schemas) that then I apply to the
>>> main stream of data (records that are ultimately saved to the corresponding
>>> BQ table).
>>>
>>> My job, although being of streaming nature, runs on the global window,
>>> and I want to unit test that the side input refreshes and that the updates
>>> are successfully applied.
>>>
>>> I'm using scio and I can't seem to simulate that refreshing behaviour.
>>> These are the relevant bits of the code:
>>> https://gist.github.com/calonso/87d392a65079a66db75a78cb8d80ea98
>>>
>>> The way I see understand it, the side collection is refreshed before
>>> accessing it so when accessed, it already contains the final (updated)
>>> snapshot of the schemas, is that true? In which case, how can I simulate
>>> that synchronisation? I'm using processing times as I thought that could be
>>> the way to go, but obviously something is wrong there.
>>>
>>> Many thanks!!
>>>
>> --
Got feedback? go/pabloem-feedback

Reply via email to