Re: Migrating to Gradle: Community Fixit day

2018-03-23 Thread Jean-Baptiste Onofré
Ready to help about the artifact/release push and verify. April 3rd works for me (not 28th). Regards JB On 03/23/2018 10:00 PM, Reuven Lax wrote: > Hi, > > Late last November we voted on migrating our build process from Maven to > Gradle. > The vote at the time was specifically about increment

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Holden Karau
On Fri, Mar 23, 2018 at 7:00 PM Eugene Kirpichov wrote: > On Fri, Mar 23, 2018 at 6:49 PM Holden Karau wrote: > >> On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov >> wrote: >> >>> On Fri, Mar 23, 2018 at 6:12 PM Holden Karau >>> wrote: >>> On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Eugene Kirpichov
On Fri, Mar 23, 2018 at 6:49 PM Holden Karau wrote: > On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov > wrote: > >> On Fri, Mar 23, 2018 at 6:12 PM Holden Karau >> wrote: >> >>> On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov >>> wrote: >>> Reviving this thread. I think SDF is a pretty

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Holden Karau
On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov wrote: > On Fri, Mar 23, 2018 at 6:12 PM Holden Karau wrote: > >> On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov >> wrote: >> >>> Reviving this thread. I think SDF is a pretty big risk for Spark runner >>> streaming. Holden, is it correct that

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Eugene Kirpichov
On Fri, Mar 23, 2018 at 6:12 PM Holden Karau wrote: > On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov > wrote: > >> Reviving this thread. I think SDF is a pretty big risk for Spark runner >> streaming. Holden, is it correct that Spark appears to have no way at all >> to produce an infinite DStr

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Holden Karau
On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov wrote: > Reviving this thread. I think SDF is a pretty big risk for Spark runner > streaming. Holden, is it correct that Spark appears to have no way at all > to produce an infinite DStream from a finite RDD? Maybe we can somehow > dynamically crea

Re: Beam Summit - IO brainstorming

2018-03-23 Thread Chamikara Jayalath
Thanks JB for detailed notes. On Fri, Mar 23, 2018 at 2:43 PM Eugene Kirpichov wrote: > Hi! Thanks for the notes. > > On Fri, Mar 23, 2018 at 3:07 AM Jean-Baptiste Onofré > wrote: > >> Hi all, >> >> Sorry for the delay, but I got issues with my e-mail provider (I was not >> able to >> send e-ma

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Eugene Kirpichov
Reviving this thread. I think SDF is a pretty big risk for Spark runner streaming. Holden, is it correct that Spark appears to have no way at all to produce an infinite DStream from a finite RDD? Maybe we can somehow dynamically create a new DStream for every initial restriction, said DStream being

Re: Migrating to Gradle: Community Fixit day

2018-03-23 Thread Henning Rohde
I can help with at least the Go and docker aspects and prefer April 3rd as well. On Fri, Mar 23, 2018 at 2:08 PM Reuven Lax wrote: > Yes, due to time-zone differences between participants, a one-day fixit > will probably actually last two days :) > > > On Fri, Mar 23, 2018 at 2:03 PM Romain Man

Re: Beam Summit - IO brainstorming

2018-03-23 Thread Eugene Kirpichov
Hi! Thanks for the notes. On Fri, Mar 23, 2018 at 3:07 AM Jean-Baptiste Onofré wrote: > Hi all, > > Sorry for the delay, but I got issues with my e-mail provider (I was not > able to > send e-mails :( ). > > Last week during Beam Summit, I had the change to participate to the IO > brainstorming

Re: Migrating to Gradle: Community Fixit day

2018-03-23 Thread Reuven Lax
Yes, due to time-zone differences between participants, a one-day fixit will probably actually last two days :) On Fri, Mar 23, 2018 at 2:03 PM Romain Manni-Bucau wrote: > Hi Reuven > > I can try to help on the 3rd (don't forget you are in the future for me so > you can need to launch it on the

Re: Migrating to Gradle: Community Fixit day

2018-03-23 Thread Lukasz Cwik
I would help out and would prefer April 3rd over March 28th. On Fri, Mar 23, 2018 at 2:00 PM Reuven Lax wrote: > Hi, > > Late last November we voted on migrating our build process from Maven to > Gradle. The vote at the time was specifically about incremental migration, > with the statement tha

Re: Migrating to Gradle: Community Fixit day

2018-03-23 Thread Romain Manni-Bucau
Hi Reuven I can try to help on the 3rd (don't forget you are in the future for me so you can need to launch it on the 2nd for you maybe) but not on the 28th :(. Romain Manni-Bucau @rmannibucau | Blog | Old Blog

Migrating to Gradle: Community Fixit day

2018-03-23 Thread Reuven Lax
Hi, Late last November we voted on migrating our build process from Maven to Gradle. The vote at the time was specifically about incremental migration, with the statement that as each specific process was migrated to Gradle we would stop maintaining Maven for that process. The vote concluded with

Re: [PROPOSAL] Scripting extension based on Java JSR-223

2018-03-23 Thread Ismaël Mejía
Nice, it is great to see a good amount of support and enthusiasm on this. I want just to remind that the whole idea and code donation comes from Romain Manni-Bucau. I just did some ‘mise-en-forme’ plus ValueProviders. All credit to Romain! Eugene thanks a lot for the feedback. I would like to get

Re: [PROPOSAL] Scripting extension based on Java JSR-223

2018-03-23 Thread Thomas Weise
+1, nice! On Fri, Mar 23, 2018 at 4:03 AM, Ismaël Mejía wrote: > This is a really simple proposal to add an extension with transforms > that package the Java Scripting API )JSR-223) [1] to allow users to > specialize some transforms via a scripting language. This work was > initially created by

Re: [PROPOSAL] Scripting extension based on Java JSR-223

2018-03-23 Thread Eugene Kirpichov
Ismael - thanks, adding scripting language support to Beam is an awesome idea and we should absolutely do it. However I think it the current proposal can be made significantly more general, and it would merit from a formal design discussion. E.g. a couple of points I can think of, that seem very i

Re: Unbounded source translation for portable pipelines

2018-03-23 Thread Eugene Kirpichov
Luke is right - unbounded sources should go through SDF. I am currently working on adding such support to Fn API. The relevant document is s.apache.org/beam-breaking-fusion (note: it focuses on a much more general case, but also considers in detail the specific case of running unbounded sources on

Re: Unbounded source translation for portable pipelines

2018-03-23 Thread Lukasz Cwik
Using impulse is a precursor for both bounded and unbounded SDF. This JIRA represents the work that would be to add support for unbounded SDF using portability APIs: https://issues.apache.org/jira/browse/BEAM-2939 On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise wrote: > So for streaming, we will

Re: executing the pipeline from datalab

2018-03-23 Thread Ahmet Altay
+ user, dev to bcc Eila, Is it possible that you are using an old version? I remember pending was missing in the dictionary and was added later. If that is not the reason, could you file a JIRA issue? Thank you, Ahmet On Fri, Mar 23, 2018 at 6:15 AM, Jean-Baptiste Onofré wrote: > Hi Eila, >

Re: [PROPOSAL] Python 3 support

2018-03-23 Thread Ahmet Altay
Thank you Robbe. I reviewed the document it looks reasonable to me. I will touch on some points that were not mentioned: - Runner exercise different code paths. Doing auto conversions and focusing on DirectRunner is not enough. It is worthwhile to run things on DataflowRunner as well. This can be

Re: Unbounded source translation for portable pipelines

2018-03-23 Thread Thomas Weise
So for streaming, we will need the Impulse translation for bounded input, identical with batch, and then in addition to that support for SDF? Any pointers what's involved in adding the SDF support? Is it runner specific? Does the ULR cover it? On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik wrote

Re: Unbounded source translation for portable pipelines

2018-03-23 Thread Lukasz Cwik
All "sources" in portability will use splittable DoFns for execution. Specifically, runners will need to be able to checkpoint unbounded sources to get a minimum viable pipeline working. For bounded pipelines, a DoFn can read the contents of a bounded source. On Fri, Mar 23, 2018 at 10:52 AM Tho

Re: How to decide how much quota do I need

2018-03-23 Thread Ahmet Altay
+ user, dev to bcc Eila, there is some information here: https://cloud.google.com/dataflow/quotas on quotas in general. Specifically for in use IP addresses, you can look at autoscaling messages and see what was autoscaling trying to upscale to. It is also possible to use large machine types (e.g

Re: Python PostCommit Broken

2018-03-23 Thread Ahmet Altay
https://issues.apache.org/jira/browse/BEAM-3922 is the JIRA for tracking this. On Fri, Mar 23, 2018 at 10:51 AM, Pablo Estrada wrote: > Hello everyone, > I see that the Python PostCommit has been broken for a couple days. Is > there a PR / JIRA to track this? > See breakage: https://builds.apach

Unbounded source translation for portable pipelines

2018-03-23 Thread Thomas Weise
Hi, I'm looking at the portable pipeline translation for streaming. I understand that for batch pipelines, it is sufficient to translate Impulse. What is the intended path to support unbounded sources? The goal here is to get a minimum translation working that will allow streaming wordcount exec

Python PostCommit Broken

2018-03-23 Thread Pablo Estrada
Hello everyone, I see that the Python PostCommit has been broken for a couple days. Is there a PR / JIRA to track this? See breakage: https://builds.apache.org/job/beam_PostCommit_Python_Verify/4472/console Best -P. -- Got feedback? go/pabloem-feedback

Re: Gradle status

2018-03-23 Thread Scott Wegner
Thanks for organizing, Reuven. I too would like to see us move back to a single build system to reduce complexity. Count me in for the fixit. On Thu, Mar 22, 2018 at 11:27 PM Reuven Lax wrote: > I'll send an email tomorrow with a few proposed dates and set up a > burndown list of tasks. > > Reuv

[PROPOSAL] Python 3 support

2018-03-23 Thread Robbe Sneyders
Hello everyone, In the next month(s), me and my colleague Matthias will commit a lot of time and effort to python 3 support for beam and we would like to discuss the best way to go forward with this. We have drawn up a document [1] with a high level outline of the proposed approach and would like

Re: [PROPOSAL] Scripting extension based on Java JSR-223

2018-03-23 Thread Tyler Akidau
+1, I like it. Thanks! On Fri, Mar 23, 2018 at 9:03 AM Ahmet Altay wrote: > Thank you Ismaël, this looks really cool. > > On Fri, Mar 23, 2018 at 5:33 AM, Jean-Baptiste Onofré > wrote: > >> Hi, >> >> it sounds like a very good extension mechanism to PTransform. >> >> +1 >> >> Regards >> JB >> >

Re: [PROPOSAL] Scripting extension based on Java JSR-223

2018-03-23 Thread Ahmet Altay
Thank you Ismaël, this looks really cool. On Fri, Mar 23, 2018 at 5:33 AM, Jean-Baptiste Onofré wrote: > Hi, > > it sounds like a very good extension mechanism to PTransform. > > +1 > > Regards > JB > > On 03/23/2018 12:03 PM, Ismaël Mejía wrote: > > This is a really simple proposal to add an ex

Re: executing the pipeline from datalab

2018-03-23 Thread Jean-Baptiste Onofré
Hi Eila, can you please address this kind of question to the user mailing list ? Thanks ! Regards JB On 03/23/2018 02:08 PM, OrielResearch Eila Arich-Landkof wrote: > Hello all, > > When I run the pipeline with 4 samples (very small dataset), I don't get any > error on DirectRunner or DataflowR

executing the pipeline from datalab

2018-03-23 Thread OrielResearch Eila Arich-Landkof
Hello all, When I run the pipeline with 4 samples (very small dataset), I don't get any error on DirectRunner or DataflowRunner When I run it with 50 samples dataset, I get the following error for the run.wait_until_finished() What does this error mean? Thanks, Eila KeyErrorTraceback (most recen

How to decide how much quota do I need

2018-03-23 Thread OrielResearch Eila Arich-Landkof
Hello, I am getting the following error: Autoscaling: Unable to reach resize target in zone us-central1-f. QUOTA_EXCEEDED: Quota 'IN_USE_ADDRESSES' exceeded. Limit: 8.0 in region us-central1. I asked for additional quota. My question is: are there any hints to how much quota do i need. Thanks,

Re: Beam Summit - IO brainstorming

2018-03-23 Thread Jean-Baptiste Onofré
Thanks for the update Eila ! Much appreciated. Regards JB On 03/23/2018 12:57 PM, OrielResearch Eila Arich-Landkof wrote: > Hi All, > > Cham and myself were trying to initiate the HDF5 support with the HDF5 team. > It > seems that their forum might be able to provide the required support. > I

Re: [PROPOSAL] Scripting extension based on Java JSR-223

2018-03-23 Thread Jean-Baptiste Onofré
Hi, it sounds like a very good extension mechanism to PTransform. +1 Regards JB On 03/23/2018 12:03 PM, Ismaël Mejía wrote: > This is a really simple proposal to add an extension with transforms > that package the Java Scripting API )JSR-223) [1] to allow users to > specialize some transforms v

Re: Beam Summit - IO brainstorming

2018-03-23 Thread OrielResearch Eila Arich-Landkof
Hi All, Cham and myself were trying to initiate the HDF5 support with the HDF5 team. It seems that their forum might be able to provide the required support. I have created a ticket on their system. https://forum.hdfgroup.org/ and will follow up after that to make sure that this is not being forgo

[PROPOSAL] Scripting extension based on Java JSR-223

2018-03-23 Thread Ismaël Mejía
This is a really simple proposal to add an extension with transforms that package the Java Scripting API )JSR-223) [1] to allow users to specialize some transforms via a scripting language. This work was initially created by Romain [2] and I just took it with his authorization and refined it to mak

Re: Apache Beam 2.4.0 release process retrospective and automation possibilities

2018-03-23 Thread Romain Manni-Bucau
2018-03-23 9:52 GMT+01:00 Robert Bradshaw : > To put this in context, this was a brain dump of some of the things I > encountered while doing the release. Were I to do a release again, it would > be a lot easier (though still not ideal). > > At the high level, rather than focusing on steps, I thin

Re: Apache Beam 2.4.0 release process retrospective and automation possibilities

2018-03-23 Thread Robert Bradshaw
To put this in context, this was a brain dump of some of the things I encountered while doing the release. Were I to do a release again, it would be a lot easier (though still not ideal). At the high level, rather than focusing on steps, I think it's more interesting to focus on why we need a huma

Build failed in Jenkins: beam_Release_NightlySnapshot #722

2018-03-23 Thread Apache Jenkins Server
See Changes: [alex] Correct BigQuery.write JavaDoc example [andreas.ehrencrona] [BEAM-2264] Credentials were not being reused between GCS calls [coheigea] Remove "i == numSplits" condition, which ca

Re: Beam Summit - IO brainstorming

2018-03-23 Thread Romain Manni-Bucau
+1000 for record metadata (camel headers) For python it can be interesting to "just" generate python styled IO API and use jython under the hood to let python users code as they know but reuse all the beam ecosystem - including runners! The other way around implies a lot of work for the community

Beam Summit - IO brainstorming

2018-03-23 Thread Jean-Baptiste Onofré
Hi all, Sorry for the delay, but I got issues with my e-mail provider (I was not able to send e-mails :( ). Last week during Beam Summit, I had the change to participate to the IO brainstorming session. Here's the minute notes: 1. IOs set We now have a decent number of IOs in Beam, and new are