Regarding ordering; anything that requires inputs to be in a specific
order in Beam will be problematic due the nature of parallel processing -
you will always get race conditions.
Assuming you are still intending to Flatten the bigQuery and PubSub
PCollections, using Wait(on) before flattening t
The number of keys/data in BQ would not be constant and grow with time.
A rough estimate would be around 300k keys with an average size of 5kb per
key. Both the count of the keys and the size of the key would be feature
dependent (based on the upstream pipelines) and we won't have control over
thi
I'm also curious how much you depend on order to get the state contents
right. The ordering of the side input will be arbitrary, and even the
streaming input can have plenty of out of order messages. So I want to
think about what are the data dependencies that result in the requirement
of order. Or
AFAIK Dependabot doesn't have a great replacement for this. I'm not sure
why the dependency reports stopped, but we could probably try to fix them -
looks like they stopped working in October -
https://lists.apache.org/list?dev@beam.apache.org:2021-10:dependency%20report.
We still have the job whic
This is your daily summary of Beam's current high priority issues that may need
attention.
See https://beam.apache.org/contribute/issue-priorities for the meaning and
expectations around issue priorities.
Unassigned P1 Issues:
https://github.com/apache/beam/issues/24776 [Bug]: Race conditi
In the case that the data is too large for side input, you could do the
same by reassigning timestamps of the BQ input to
BoundedWindow.TIMESTAMP_MIN_VALUE (you would have to do that in a
stateful DoFn with a timer having outputTimestamp set to
TIMESTAMP_MIN_VALUE to hold watermark, or using sp