Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-25 Thread Matthew Darwin
Thanks Ahmet. Looking at the source code, with my untrained python eyes, I think if the intention is to include the message id and the publish time in the attributes attribute of the PubSubMessage type, then the protobuf mapping is missing something:- @staticmethod def _from_proto_str(proto_ms

Re: applying keyed state on top of stream from co-groupByKey output

2019-07-25 Thread Kenneth Knowles
Thanks for the very detailed question! I have written up an answer and I suggest we continue discussion there. Kenn On Tue, Jul 23, 2019 at 9:11 PM Kiran Hurakadli wrote: > > Hi All, > I am trying to merge 2 data streams using coGroupByKey and applying > stateful > ParDo. Input to the cogroup

Re: How to merge two streams and perform stateful operations on merged stream using Apache Beam

2019-07-25 Thread Kenneth Knowles
Is this the same question as your other email about your StackOverflow question? If it is, then please see my answer on StackOverflow. If it is not, can you explain a little more? Kenn On Wed, Jul 24, 2019 at 10:48 PM Kiran Hurakadli wrote: > I have 2 kafka streams , i want to merge by some ke

Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
Hi, https://beam.apache.org/blog/2017/02/13/stateful-processing.html gives an example of assigning an arbitrary-but-consistent index to each element on a per key-and-window basis. If the Stateful ParDo is applied on a Non-Keyed PCollection, say, PCollection with Fixed Windows, the state is maint

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread Robert Bradshaw
Though it's not obvious in the name, Stateful ParDos can only be applied to keyed PCollections, similar to GroupByKey. (You could, however, assign every element to the same key and then apply a Stateful DoFn, though in that case all elements would get processed on the same worker.) On Thu, Jul 25,

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
So, If an RPC call has to be performed for a batch of Rows(PCollection), instead of each Row, the recommended way is to batch the Rows in startBundle() of DoFn( https://stackoverflow.com/questions/49094781/yield-results-in-finish-bundle-from-a-custom-dofn/49101711#49101711)? I thought Stateful and

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread rahul patwari
Yes. But, GroupIntoBatches works on KV. We are working on PCollection throughout our pipeline. We can convert Row to KV. But, we only have a few keys and a Bounded PCollection. As we have Global windows and a few keys, the opportunity for parallelism is limited to [No. of keys] with Stateful ParDo

Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-25 Thread Ahmet Altay
Your analysis and suggestion looks correct to me. Thank you for reporting this. I filed https://issues.apache.org/jira/browse/BEAM-7819 to track this. On Thu, Jul 25, 2019 at 4:51 AM Matthew Darwin < matthew.dar...@carfinance247.co.uk> wrote: > Thanks Ahmet. Looking at the source code, with my un

Re: applying keyed state on top of stream from co-groupByKey output

2019-07-25 Thread Rui Wang
I also added an alternatively solution to your example to the SO question. -Rui On Thu, Jul 25, 2019 at 9:02 AM Kenneth Knowles wrote: > Thanks for the very detailed question! I have written up an answer and I > suggest we continue discussion there. > > Kenn > > On Tue, Jul 23, 2019 at 9:11 PM

Re: How to merge two streams and perform stateful operations on merged stream using Apache Beam

2019-07-25 Thread Kiran Hurakadli
Yeah !This is same question .. thank you for you detailed explanation On Thu, 25 Jul 2019 at 9:32 PM, Kenneth Knowles wrote: > Is this the same question as your other email about your StackOverflow > question? If it is, then please see my answer on StackOverflow. If it is > not, can you explain

Re: applying keyed state on top of stream from co-groupByKey output

2019-07-25 Thread Kiran Hurakadli
Thank you for your response On Thu, 25 Jul 2019 at 9:32 PM, Kenneth Knowles wrote: > Thanks for the very detailed question! I have written up an answer and I > suggest we continue discussion there. > > Kenn > > On Tue, Jul 23, 2019 at 9:11 PM Kiran Hurakadli > wrote: > >> >> Hi All, >> I am try

Re: applying keyed state on top of stream from co-groupByKey output

2019-07-25 Thread Kiran Hurakadli
Thank you On Thu, 25 Jul 2019 at 10:51 PM, Rui Wang wrote: > I also added an alternatively solution to your example to the SO question. > > > -Rui > > On Thu, Jul 25, 2019 at 9:02 AM Kenneth Knowles wrote: > >> Thanks for the very detailed question! I have written up an answer and I >> suggest

Re: Stateful ParDo on Non-Keyed PCollection

2019-07-25 Thread Robert Bradshaw
On Thu, Jul 25, 2019 at 6:34 PM rahul patwari wrote: > > So, If an RPC call has to be performed for a batch of Rows(PCollection), > instead of each Row, the recommended way is to batch the Rows in > startBundle() of > DoFn(https://stackoverflow.com/questions/49094781/yield-results-in-finish-bun

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-25 Thread Pablo Estrada
The link is here: https://www.youtube.com/watch?v=xpIpEO4PUDo This is still happening. On Thu, Jul 25, 2019 at 2:55 PM Innocent Djiofack wrote: > Did I miss the link or this was postponed? > > On Tue, Jul 23, 2019 at 3:05 PM Austin Bennett < > whatwouldausti...@gmail.com> wrote: > >> Pablo, >> >

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-25 Thread sridhar inuog
Thanks, Pablo for organizing this session. I found it useful. On Thu, Jul 25, 2019 at 4:56 PM Pablo Estrada wrote: > The link is here: https://www.youtube.com/watch?v=xpIpEO4PUDo > This is still happening. > > On Thu, Jul 25, 2019 at 2:55 PM Innocent Djiofack > wrote: > >> Did I miss the link o

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-25 Thread Pablo Estrada
Thanks for those who tuned in : ) - I feel like I might have spent too long fiddling with Python code, and not long enough doing setup, testing, etc. I will try to do another one where I just test / setup the environment / lint checks etc. Here are links for: Setting up the Python environment: htt

Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-25 Thread Andres Angel
Awesome Pablo thanks so much!!! AU On Thu, Jul 25, 2019 at 7:48 PM Pablo Estrada wrote: > Thanks for those who tuned in : ) - I feel like I might have spent too > long fiddling with Python code, and not long enough doing setup, testing, > etc. I will try to do another one where I just test