Re: Apache Beam, version 2.2.0

2017-12-04 Thread Jean-Baptiste Onofré
+1, and sorry again, I thought we got an consensus. Regards JB On 12/05/2017 07:10 AM, Kenneth Knowles wrote: +1 to the poll and also to Reuven's point. Those without a support contract would have been using JDK 7 without security updates for years. IMO it seems harmful, as a netizen, to

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Kenneth Knowles
+1 to the poll and also to Reuven's point. Those without a support contract would have been using JDK 7 without security updates for years. IMO it seems harmful, as a netizen, to encourage its use/existence. If there's no noise from the prior thread, then I would assume no one on user@ has any

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Ahmet Altay
Thank you Reuven! I tweeted the release announcement on Beam's account. On Mon, Dec 4, 2017 at 9:49 PM, Reuven Lax wrote: > Technically it's a backwards-incompatible change, however if we are > convinced the risk is low we could do it. > > As mentioned on the original thread,

Re: How to cope with Maven transient network issues?

2017-12-04 Thread Romain Manni-Bucau
Le 5 déc. 2017 06:20, "Eugene Kirpichov" a écrit : On Mon, Dec 4, 2017 at 1:45 PM Romain Manni-Bucau wrote: > > > Le 4 déc. 2017 21:45, "Eugene Kirpichov" a écrit : > > Romain - as far as I understand, Maven *does* have a

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Reuven Lax
Technically it's a backwards-incompatible change, however if we are convinced the risk is low we could do it. As mentioned on the original thread, it's not clear that all Beam users read user@ - e.g. most Dataflow users definitely do not. I think we need to separately reach out to users of each

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Eugene Kirpichov
On the original thread https://lists.apache.org/thread.html/2e1890c62d9f022f09b20e9f12f130fe9f1042e391979087f725d2e0@%3Cuser.beam.apache.org%3E , Robert and Ismaël were in favor of no major version change [Ismaël said:* Also I am afraid that if we wait* *until we have enough changes to switch Beam

Re: How to cope with Maven transient network issues?

2017-12-04 Thread Eugene Kirpichov
On Mon, Dec 4, 2017 at 1:45 PM Romain Manni-Bucau wrote: > > > Le 4 déc. 2017 21:45, "Eugene Kirpichov" a écrit : > > Romain - as far as I understand, Maven *does* have a retry strategy, but > it is a poor retry strategy and there is no way to tweak

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Reuven Lax
We should bring this up on the Beam 3.0 thread. Since it's technically a backwards-incompatible change, it might make a good item for Beam 3.0. Reuven On Mon, Dec 4, 2017 at 8:20 PM, Jean-Baptiste Onofré wrote: > My apologizes, I thought we had a consensus already. > >

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Jean-Baptiste Onofré
My apologizes, I thought we had a consensus already. Regards JB On 12/04/2017 11:22 PM, Eugene Kirpichov wrote: Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot of exciting things indeed. Regarding Java 8: I thought our consensus was to have the release notes say that

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Ben Chambers
This would be absolutely great! It seems somewhat similar to the changes that were made to the BigQuery sink to support WriteResult ( https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.java ). I find it

Re: Guarding against unsafe triggers at construction time

2017-12-04 Thread Robert Bradshaw
On Mon, Dec 4, 2017 at 3:19 PM, Eugene Kirpichov wrote: > Hi, > > After a recent investigation of a data loss bug caused by unintuitive > behavior of some kinds of triggers, we had a discussion about how we can > protect against future issues like this, and I summarized it

Re: Guarding against unsafe triggers at construction time

2017-12-04 Thread Thomas Groh
I'm in favor of option 3 in all cases; in favor of option 2 if it's considered to be "more backwards-compatible" than option 1; Option 1 I'm in favor of at minimum making the continuation trigger for all triggers be nonterminating. I'm not super bothered by how we accomplish that, so long as we

Re: Guarding against unsafe triggers at construction time

2017-12-04 Thread Raghu Angadi
I have been thinking about this since last week's discussions about buffering in sinks and was reading https://s.apache.org/beam-sink-triggers. It says BEAM-3169 is an example of a bug caused by misunderstanding of trigger semantics. - I would like to know which part of the (documented) trigger

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Eugene Kirpichov
It makes sense to consider how this maps onto existing kinds of sinks. E.g.: - Something that just makes an RPC per record, e.g. MqttIO.write(): that will emit 1 result per bundle (either a bogus value or number of records written) that will be Combine'd into 1 result per pane of input. A user

Re: Guarding against unsafe triggers at construction time

2017-12-04 Thread Kenneth Knowles
My own thoughts inline on the three ideas discussed. On Mon, Dec 4, 2017 at 3:19 PM, Eugene Kirpichov wrote: > > Continuation triggers are still worse. For context: continuation trigger > is the trigger that's set on the output of a GBK and controls further > aggregation of

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Eugene Kirpichov
I agree that the proper API for enabling the use case "do something after the data has been written" is to return a PCollection of objects where each object represents the result of writing some identifiable subset of the data. Then one can apply a ParDo to this PCollection, in order to "do

Guarding against unsafe triggers at construction time

2017-12-04 Thread Eugene Kirpichov
Hi, After a recent investigation of a data loss bug caused by unintuitive behavior of some kinds of triggers, we had a discussion about how we can protect against future issues like this, and I summarized it in https://issues.apache.org/jira/browse/BEAM-3288 . Copying here: Current Beam trigger

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Lukasz Cwik
I also believe we were still in the investigatory phase for dropping support for Java 7. On Mon, Dec 4, 2017 at 2:22 PM, Eugene Kirpichov wrote: > Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot > of exciting things indeed. > > Regarding Java 8: I

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Eugene Kirpichov
Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot of exciting things indeed. Regarding Java 8: I thought our consensus was to have the release notes say that we're *considering* going Java8-only, and use that to get more opinions from the user community - but I can't find

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Vilhelm von Ehrenheim
I'm super excited about this release! Great work everyone involved! On Mon, Dec 4, 2017 at 10:58 AM, Jean-Baptiste Onofré wrote: > Just an important note that we forgot to mention. > > !! The 2.2.0 release will be the last one supporting Spark 1.x and Java 7 > !! > > Starting

Re: How to cope with Maven transient network issues?

2017-12-04 Thread Romain Manni-Bucau
Le 4 déc. 2017 21:45, "Eugene Kirpichov" a écrit : Romain - as far as I understand, Maven *does* have a retry strategy, but it is a poor retry strategy and there is no way to tweak it. In particular, if it uses https://maven.apache.org/guides/mini/guide-http-settings.html ,

Re: Schema-Aware PCollections

2017-12-04 Thread Kenneth Knowles
Nice. Commented a bit on the doc a bit. +1 to working up the Python, Go, portability implications. Kenn On Thu, Nov 30, 2017 at 1:06 PM, Reuven Lax wrote: > Thanks! > > > On Thu, Nov 30, 2017 at 11:25 AM, Holden Karau > wrote: > >> Rocking, I'll start

Re: How to cope with Maven transient network issues?

2017-12-04 Thread Eugene Kirpichov
Romain - as far as I understand, Maven *does* have a retry strategy, but it is a poor retry strategy and there is no way to tweak it. In particular, if it uses https://maven.apache.org/guides/mini/guide-http-settings.html , that means it uses Apache Http Client 4.1.2 whose default retry settings

Re: How to cope with Maven transient network issues?

2017-12-04 Thread Romain Manni-Bucau
2017-12-04 18:58 GMT+01:00 Kenneth Knowles : > On Sat, Dec 2, 2017 at 3:06 AM, Romain Manni-Bucau > wrote: >> >> Need to check but if plugin dependencies were not tuned it should be the >> default with a retry of 3 IIRC. >> >> But arent you sure it was a

Re: [PROPOSAL] Beam Go SDK feature branch

2017-12-04 Thread Henning Rohde
Thanks everyone. Appreciate the feedback! Henning On Sat, Dec 2, 2017 at 2:03 PM, Ankur Chauhan wrote: > Hi > > This is amazing. Having used Java ask for over two years now and recently > transitioned to writing go for a bunch of microservices, I really like the >

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Robert Bradshaw
+1 At the very least an empty PCollection could be produced with no promises about its contents but the ability to be followed (e.g. as a side input), which is forward compatible with whatever actual metadata one may decide to produce in the future. On Mon, Dec 4, 2017 at 11:06 AM, Kenneth

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Kenneth Knowles
+dev@ I am in complete agreement with Luke. Data dependencies are easy to understand and a good way for an IO to communicate and establish causal dependencies. Converting an IO from PDone to real output may spur further useful thoughts based on the design decisions about what sort of output is

Re: How to cope with Maven transient network issues?

2017-12-04 Thread Kenneth Knowles
On Sat, Dec 2, 2017 at 3:06 AM, Romain Manni-Bucau wrote: > Need to check but if plugin dependencies were not tuned it should be the > default with a retry of 3 IIRC. > > But arent you sure it was a repo/server issue any client cant solve? > Not sure what you mean here.

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Jean-Baptiste Onofré
Just an important note that we forgot to mention. !! The 2.2.0 release will be the last one supporting Spark 1.x and Java 7 !! Starting from Beam 2.3.0, the Spark runner will work only with Spark 2.x and we will focus only Java 8. Regards JB On 12/04/2017 10:15 AM, Jean-Baptiste Onofré