Most streaming pipelines have been accessing the datastore directly or
using stateful DoFns for this. Using a side input doesn't scale well even
for Dataflow even though users such as yourself would benefit from it.

In this thread[1], the user has a similar problem and I describe the
solution for a stateful DoFn approach.

1:
https://lists.apache.org/thread.html/rdfae2bfe895e5ee38bbd5014e8c293a313b61e9d63a3484acd4e9864%40%3Cuser.beam.apache.org%3E


On Wed, May 13, 2020 at 11:28 AM Kaymak, Tobias <[email protected]>
wrote:

> Hi,
>
> First of all thank you for the Webinar Beam Sessions this month. They are
> super helpful especially for getting people excited and on-boarded with
> Beam!
>
> We are currently trying to promote Beam with more use cases within our
> company and tackling a problem, where we have to join a stream of articles
> with asset-information. The asset information as a table has a size of 2
> TiB+ and therefore, we think the only way to enrich the stream would be by
> having it in a fast lookup store, so that the (batched) RPC pattern could
> be applied. (So in product terms of Google Cloud having it in something
> like BigTable or a similar fast and big key/value store.)
>
> Is there an alternative that we could try? Maintaining that additional
> data store would add overhead we are looking to avoid. :)
>
> Best,
> Tobi
>
>
>
>

Reply via email to