Understanding GenerateSequence and SideInputs

Carlos Alonso Thu, 24 May 2018 13:12:02 -0700

Hi everyone!!

I'm building a pipeline to store streaming data into BQ and I'm using the
pattern: Slowly changing lookup cache described here:
https://cloud.google.com/blog/big-data/2017/06/guide-to-common-cloud-dataflow-use-case-patterns-part-1
to
hold and refresh the table schemas (as they may change from time to time).


Now I'd like to understand how that is scheduled on a distributed system.
Who is running that code? One random node? One node but always the same?
All nodes?

Also, what are the GenerateSequence guarantees in terms of precision? I
have it configured to generate 1 element every 5 minutes and most of the
time it works exact, but sometimes it doesn't... Is that expected?

Regards

Understanding GenerateSequence and SideInputs

Reply via email to