Hi Robert,
Thanks for your confirmation that Fn API is already ready for supporting
the MultimapUserState use cases, really appreciate it! And totally agree
that how to integrate it depends on the runner's implementation.
A follow-up question:
- is there any runner that already implemented
The runners should be able to support Multimap User State portably over the
FnApi already.
https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L937
How that's supported on each SDK is a different matter though.
On
Appreciate it if anyone can help confirm and share thoughts.
On Wed, Feb 22, 2023 at 11:46 PM Alan Zhang wrote:
> Hi Beam devs.
>
> According to the Fn State API design doc[1], the state type
> MultimapUserState is intended for supporting MapState/SetState. And the
> implementation[2] for this
My suggestion is to try to solve the problem in terms of what you want to
compute. Instead of trying to control the operational aspects like "read
all the BQ before reading Pubsub" there is presumably some reason that the
BQ data naturally "comes first", for example if its timestamps are earlier
First PCollections are completely unordered, so there is no guarantee on
what order you'll see events in the flattened PCollection.
There may be ways to process the BigQuery data in a separate transform
first, but it depends on the structure of the data. How large is the
BigQuery table? Are you
Yes, this is a streaming pipeline.
Some more details about existing implementation v/s what we want to achieve.
Current implementation:
Reading from pub-sub:
Pipeline input = Pipeline.create(options);
PCollection pubsubStream = input.apply("Read From Pubsub",
This is your daily summary of Beam's current high priority issues that may need
attention.
See https://beam.apache.org/contribute/issue-priorities for the meaning and
expectations around issue priorities.
Unassigned P1 Issues:
https://github.com/apache/beam/issues/24776 [Bug]: Race