On Thu, Jul 25, 2019 at 6:34 PM rahul patwari
wrote:
>
> So, If an RPC call has to be performed for a batch of Rows(PCollection),
> instead of each Row, the recommended way is to batch the Rows in
> startBundle() of
> DoFn(https://stackoverflow.com/questions/49094781/yield-results-in-finish-bun
Yes. But, GroupIntoBatches works on KV. We are working on
PCollection throughout our pipeline.
We can convert Row to KV. But, we only have a few keys and a Bounded
PCollection. As we have Global windows and a few keys, the opportunity for
parallelism is limited to [No. of keys] with Stateful ParDo
Have you looked at the GroupIntoBatches transform?
On Thu, Jul 25, 2019 at 9:34 AM rahul patwari
wrote:
> So, If an RPC call has to be performed for a batch of
> Rows(PCollection), instead of each Row, the recommended way is to
> batch the Rows in startBundle() of DoFn(
> https://stackoverflow.c
So, If an RPC call has to be performed for a batch of
Rows(PCollection), instead of each Row, the recommended way is to
batch the Rows in startBundle() of DoFn(
https://stackoverflow.com/questions/49094781/yield-results-in-finish-bundle-from-a-custom-dofn/49101711#49101711)?
I thought Stateful and
Though it's not obvious in the name, Stateful ParDos can only be
applied to keyed PCollections, similar to GroupByKey. (You could,
however, assign every element to the same key and then apply a
Stateful DoFn, though in that case all elements would get processed on
the same worker.)
On Thu, Jul 25,