Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread Robert Bradshaw
On Thu, Jul 25, 2019 at 6:34 PM rahul patwari wrote: > > So, If an RPC call has to be performed for a batch of Rows(PCollection), > instead of each Row, the recommended way is to batch the Rows in > startBundle() of > DoFn(https://stackoverflow.com/questions/49094781/yield-results-in-finish-bun

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
Yes. But, GroupIntoBatches works on KV. We are working on PCollection throughout our pipeline. We can convert Row to KV. But, we only have a few keys and a Bounded PCollection. As we have Global windows and a few keys, the opportunity for parallelism is limited to [No. of keys] with Stateful ParDo

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread Reuven Lax
Have you looked at the GroupIntoBatches transform? On Thu, Jul 25, 2019 at 9:34 AM rahul patwari wrote: > So, If an RPC call has to be performed for a batch of > Rows(PCollection), instead of each Row, the recommended way is to > batch the Rows in startBundle() of DoFn( > https://stackoverflow.c

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
So, If an RPC call has to be performed for a batch of Rows(PCollection), instead of each Row, the recommended way is to batch the Rows in startBundle() of DoFn( https://stackoverflow.com/questions/49094781/yield-results-in-finish-bundle-from-a-custom-dofn/49101711#49101711)? I thought Stateful and

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread Robert Bradshaw
Though it's not obvious in the name, Stateful ParDos can only be applied to keyed PCollections, similar to GroupByKey. (You could, however, assign every element to the same key and then apply a Stateful DoFn, though in that case all elements would get processed on the same worker.) On Thu, Jul 25,