Re: Help with adding python package dependencies when executing pyhton pipeline

2018-07-03 Thread OrielResearch Eila Arich-Landkof
Based on https://stackoverflow.com/questions/44423769/how-to-use-google-cloud-storage-in-dataflow-pipeline-run-from-datalab I tried this: options = PipelineOptions(flags = ["--requirements_file", "./requirements.txt"]) the requirements file was generated by: pip freeze > requirements.txt But it fi

Re: Go SDK: Teardown() not being called with dataflow runner

2018-07-03 Thread Eugene Kirpichov
Bundle boundaries are unspecified, dependent on the runner and the particular circumstances during this particular execution, and are generally unrelated to windowing or to the data contents itself. They have no semantic meaning - everything would still work exactly the same way even if every eleme

Re: Go SDK: Teardown() not being called with dataflow runner

2018-07-03 Thread eduardo . morales
Thanks. It is much clearer now... However, the code comments don't mention how often are {Start,Finish}Bundle called. What constitutes a batch? If I am using a window of 1 minute, can I expect for {Start,Finish}Bundle every minute? In other words, will the window produce a batch of my data? On

Re: Help with adding python package dependencies when executing pyhton pipeline

2018-07-03 Thread OrielResearch Eila Arich-Landkof
thank you. where do i add the reference to requirements.txt? can i do it from the pipline options code? On Tue, Jul 3, 2018 at 5:13 PM, Lukasz Cwik wrote: > Take a look at https://beam.apache.org/documentation/sdks/python- > pipeline-dependencies/ > > On Tue, Jul 3, 2018 at 2:09 PM OrielResearch

Re: Help with adding python package dependencies when executing pyhton pipeline

2018-07-03 Thread Lukasz Cwik
Take a look at https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ On Tue, Jul 3, 2018 at 2:09 PM OrielResearch Eila Arich-Landkof < e...@orielresearch.org> wrote: > Hello all, > > > I am using the python code to run my pipeline. similar to the following: > > options = Pipeli

Help with adding python package dependencies when executing pyhton pipeline

2018-07-03 Thread OrielResearch Eila Arich-Landkof
Hello all, I am using the python code to run my pipeline. similar to the following: options = PipelineOptions()google_cloud_options = options.view_as(GoogleCloudOptions)google_cloud_options.project = 'my-project-id'google_cloud_options.job_name = 'myjob'google_cloud_options.staging_location = 'g

Re: Go SDK: Teardown() not being called with dataflow runner

2018-07-03 Thread Eugene Kirpichov
Hi Eduardo, These differences are described by the link I sent ( https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L465-L666) - it documents what kind of things it's best to do in each method. Please let me know if something is still un

Re: Go SDK: Teardown() not being called with dataflow runner

2018-07-03 Thread eduardo . morales
FinishBundle() does the job. Should I keep using Setup()? What is the difference between Setup() and StartBundle()? Thanks again. On 2018/07/03 20:10:21, Henning Rohde wrote: > Teardown has very loose guarantees on when it's called and you essentially > can't rely on it. Currently, for Go on

Re: Go SDK: Teardown() not being called with dataflow runner

2018-07-03 Thread Eugene Kirpichov
Hi Eduardo, Henning is right - the specific guarantees around Setup/Teardown vs. StartBundle/FinishBundle are currently described best in the Java SDK documentation: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java (see documentation t

Re: Go SDK: Teardown() not being called with dataflow runner

2018-07-03 Thread Henning Rohde
Teardown has very loose guarantees on when it's called and you essentially can't rely on it. Currently, for Go on non-direct runners, we hang on to the bundle descriptors forever and never destroy them (and in turn never call Teardown). Even if we didn't, failures/restarts could cause Teardown to n

Re: How Beam's helped our bottom line...

2018-07-03 Thread Rafael Fernandez
Wow, what a nice thing to read! Thanks for sharing, Peter! On Tue, Jul 3, 2018 at 9:50 AM Peter Mueller wrote: > Hi Gris and everyone, > Not a very techie post, but thought I'd contribute a little on the > economic case for open-source, and Apache Beam in particular. > > TLDR; We're finding tha

Go SDK: Teardown() not being called with dataflow runner

2018-07-03 Thread eduardo . morales
Essentially I have the following code: type Writer struct { Pool WriterPool } func (w *Writer) Setup() { w.Pool = Init() } func (w* Writer) ProcessElement(ctx, elem Elem) { w.Pool.Add(elem) } func (w* Writer) Teardown() { w.Pool.Write() w.Pool.Close() } beam.ParDo0(scope, &Writer{}, e

How Beam's helped our bottom line...

2018-07-03 Thread Peter Mueller
Hi Gris and everyone, Not a very techie post, but thought I'd contribute a little on the economic case for open-source, and Apache Beam in particular. TLDR; We're finding that, at critical points in our sales cycle, our customers are choosing us precisely because we offer a 'call option' on futur

Re: BiqQueryIO.write and Wait.on

2018-07-03 Thread Eugene Kirpichov
Awesome!! Thanks for the heads up, very exciting, this is going to make a lot of people happy :) On Tue, Jul 3, 2018, 3:40 AM Carlos Alonso wrote: > + d...@beam.apache.org > > Just a quick email to let you know that I'm starting developing this. > > On Fri, Apr 20, 2018 at 10:30 PM Eugene Kirpic

Re: BiqQueryIO.write and Wait.on

2018-07-03 Thread Carlos Alonso
+ d...@beam.apache.org Just a quick email to let you know that I'm starting developing this. On Fri, Apr 20, 2018 at 10:30 PM Eugene Kirpichov wrote: > Hi Carlos, > > Thank you for expressing interest in taking this on! Let me give you a few > pointers to start, and I'll be happy to help everyw