Re: SQL massively more resource-intensive? Memory leak?

2019-06-03 Thread deklanw
Yes, it works but I think it has the same problem. It's a lot slower so it took me hours of running it, but by the end the memory usage was high and the CPU about 100% so it seems to be the same problem. Worth noting perhaps that when I use the DirectRunner I have to turn enforceImmutability of

Re: send Avro messages to Kafka using Beam

2019-06-03 Thread Yohei Onishi
I think that might be possible too. Yohei Onishi On Tue, Jun 4, 2019 at 9:00 AM Chamikara Jayalath wrote: > > > On Mon, Jun 3, 2019 at 5:14 PM Yohei Onishi wrote: > >> Hi Nicolas, >> >> Are you running you job on Dataflow? According to GCP support, Dataflow >> currently does not support Schem

Re: send Avro messages to Kafka using Beam

2019-06-03 Thread Chamikara Jayalath
On Mon, Jun 3, 2019 at 5:14 PM Yohei Onishi wrote: > Hi Nicolas, > > Are you running you job on Dataflow? According to GCP support, Dataflow > currently does not support Schema Registry. But we can still use Schema > Registry to serialize / deserialize your message using > custom KafkaAvroSeriali

Re: send Avro messages to Kafka using Beam

2019-06-03 Thread Yohei Onishi
Hi Nicolas, Are you running you job on Dataflow? According to GCP support, Dataflow currently does not support Schema Registry. But we can still use Schema Registry to serialize / deserialize your message using custom KafkaAvroSerializer. In my case I implemented my custom KafkaAvroDeserializer t

Re: How to build a beam python pipeline which does GET/POST request to API's

2019-06-03 Thread Ankur Goenka
By looking at your usecase, the whole processing logic seems to be very custom. I would recommend using ParDo's to express your use case. If the processing for individual dictionary is expensive then you can potentially use a reshuffle operation to distribute the updation of dictionary over multipl

RE: How to build a beam python pipeline which does GET/POST request to API's

2019-06-03 Thread Anjana Pydi
Hi Ankur, Thanks for reply. Please find responses updated in below mail. Thanks, Anjana From: Ankur Goenka [goe...@google.com] Sent: Monday, June 03, 2019 11:01 AM To: user@beam.apache.org Subject: Re: How to build a beam python pipeline which does GET/POST reques

Re: How to build a beam python pipeline which does GET/POST request to API's

2019-06-03 Thread Ankur Goenka
Thanks for providing more information. Some follow up questions/comments 1. Call an API which would provide a dictionary as response. Question: Do you need to make multiple of these API calls? If yes, what distinguishes API call1 from call2? If its the input to the API, then can you provide the in

Re: Writing and serializing a custom WindowFn

2019-06-03 Thread Chad Dombrova
Hi Robert, Thanks for the response. As you've discovered, fully custom merging window fns are not yet > supported portably, though this is on our todo list. > > https://issues.apache.org/jira/browse/BEAM-5601 > Thanks for linking me to that. I've watched it and voted for it, and maybe I'll even

Re: SQL massively more resource-intensive? Memory leak?

2019-06-03 Thread Rui Wang
Ha sorry I was only reading screenshots but ignored your other comments. So count fn indeed worked. Can I ask if your sql pipeline works on direct runner? -Rui On Mon, Jun 3, 2019 at 10:39 AM Rui Wang wrote: > BeamSQL actually only converts SELECT COUNT(*) query to the Java pipeline > that ca

Re: SQL massively more resource-intensive? Memory leak?

2019-06-03 Thread Rui Wang
BeamSQL actually only converts SELECT COUNT(*) query to the Java pipeline that calls Java's builtin Count[1] transform. Could you implement your pipeline by Count transform to see whether this memory issue still exists? By doing so we could narrow down problem a bit. If using Java directly without

Re: Windows were processed out of order

2019-06-03 Thread Robert Bradshaw
Yep, that's correct. On Mon, Jun 3, 2019 at 2:06 PM Juan Carlos Garcia wrote: > > Hi Robert, > > The elements of a PCollection are unordered. >> Yes this is something known > and understood given the nature of a PCollection. > > So that means, that when we are doing a replay of past data (we rew

Re: Windows were processed out of order

2019-06-03 Thread Juan Carlos Garcia
Hi Robert, *The elements of a PCollection are unordered.* >> Yes this is something known and understood given the nature of a PCollection. So that means, that when we are doing a replay of past data (we rewind our kafka consumer groups), in 1h of processing time, there might be multiple 1h window

Re: Windows were processed out of order

2019-06-03 Thread Robert Bradshaw
The elements of a PCollection are unordered. This includes the results of a GBK--there is no promise that the output be processed in any (in particular, windows ordered by timestamp) order. DoFns, especially one with side effects, should be written with this in mind. (There is actually ongoing dis

Windows were processed out of order

2019-06-03 Thread Juan Carlos Garcia
Hi Folks, My team and i have a situation that cannot be explain and would like to hear your thoughts, we have a pipeline which enrich the incoming messages and write them to BigQuery, the pipeline looks like this: Apache Beam 2.12.0 / GCP Dataflow - - ReadFromKafka (with withCreateTime and 1

Re: Writing and serializing a custom WindowFn

2019-06-03 Thread Robert Bradshaw
Hi Chad! As you've discovered, fully custom merging window fns are not yet supported portably, though this is on our todo list. https://issues.apache.org/jira/browse/BEAM-5601 This involves calling back into the SDK to perform the actually merging logic (and also, for full generality, being able