Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-10 Thread Jean-Baptiste Onofré
Gently reminder on this thread. Thanks ! Regards JB On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote: Hi everyone, Please review and vote on the release candidate #3 for the version 2.1.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific

Re: [PROPOSAL] Merge gearpump-runner to master

2017-08-10 Thread Manu Zhang
Hi Paul, The latest master compiles fine for me. Could you check again ? You may also want to check out the contribution guide . In short, the Apache way is to file a JIRA issue and submit a GitHub

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-10 Thread Eugene Kirpichov
I think this is the essence of the guidance: in such cases, the caller should indeed pass a coder to the PTransform. This might seem trivial if the only thing the PTransform will do is set it on the output collection, but it allows the transform to evolve in case it ever needs to create an interme

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-10 Thread Reuven Lax
Interestingly I've seen examples of PTransforms where the transform itself is unable to easily set its own coder. This happens when the transform is parametrized in such a way that its ouput coder is not determinable except by the caller of the PTransform. The caller can of course pass a coder into

Re: Style of messages for checkArgument/checkNotNull in IOs

2017-08-10 Thread Eugene Kirpichov
https://beam.apache.org/contribute/ptransform-style-guide/#validation now includes the new guidance. It also includes updated guidance on what to put in expand() vs. validate() (TL;DR: validate() is almost always unnecessary. Put almost all validation in expand()) On Fri, Jul 28, 2017 at 11:56 AM

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-10 Thread Eugene Kirpichov
I've updated the guidance in PTransform Style Guide on setting coders https://beam.apache.org/contribute/ptransform-style-guide/#coders according to this discussion. https://github.com/apache/beam-site/pull/279 On Thu, Aug 3, 2017 at 6:27 PM Robert Bradshaw wrote: > On Thu, Aug 3, 2017 at 6:08 P

Re: beam-site issues with Jenkins and MergeBot

2017-08-10 Thread Jason Kuster
Investigating mergebot outage currently. Apologies for the downtime. On Wed, Aug 9, 2017 at 9:55 PM, Eugene Kirpichov wrote: > Indeed beam-site is at https://gitbox.apache.org/repos/asf/beam-site.git > now. > > However, Mergebot app

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Reuven Lax
On Thu, Aug 10, 2017 at 1:07 PM, Kenneth Knowles wrote: > > > >- Does it also imply fixed length and content for value > iterators? > > > > > > The concept of "value iterator" brings up a nit. > > First, there is no such concept in the Beam model, and I don't think there > should be. I don't

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Kenneth Knowles
> > >- Does it also imply fixed length and content for value iterators? > > > The concept of "value iterator" brings up a nit. First, there is no such concept in the Beam model, and I don't think there should be. I don't think we should special case GBK if we can avoid it. If a PCollection c

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Reuven Lax
On Thu, Aug 10, 2017 at 11:18 AM, Thomas Groh wrote: > I think it must imply fixed content >s - making a decision based > on the contents of an iterable assuming the Iterable is deterministic seems > an acceptable use of the API, and that requires the contents to be > identical through failures.

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Thomas Groh
I think it must imply fixed content >s - making a decision based on the contents of an iterable assuming the Iterable is deterministic seems an acceptable use of the API, and that requires the contents to be identical through failures. This does imply that (assuming this is reading directly from th

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Reuven Lax
It means that single element replay is stable. On Thu, Aug 10, 2017 at 10:56 AM, Raghu Angadi wrote: > Can we define what exactly is meant by deterministic/stable/replayable > etc? > >- Does it imply a fixed order? If yes, it implies fixed order of >processElement() invocations, right? A

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Raghu Angadi
Can we define what exactly is meant by deterministic/stable/replayable etc? - Does it imply a fixed order? If yes, it implies fixed order of processElement() invocations, right? Are there any qualifiers (within a window+key etc)? - Does it also imply fixed length and content for value

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Ben Chambers
I think it only makes sense in places where a user might reasonable require stable input to ensure idempotency of side-effects. It also only makes sense in places where a runner could reasonably provide such a guarantee. A given Combine is unlikely to have side effects so it is less likely to bene

Re: Exactly-once Kafka sink

2017-08-10 Thread Raghu Angadi
On Thu, Aug 10, 2017 at 5:15 AM, Aljoscha Krettek wrote: > Ah, also regarding your earlier mail: I didn't know if many people were > using Kafka with Dataflow, thanks for that clarification! :-) > > Also, I don't think that the TwoPhaseCommit Sink of Flink would work in a > Beam context, I was ju

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Reuven Lax
I don't think it really makes sense to to do this on Combine. And I agree with you, it doesn't make sense on composites either. On Thu, Aug 10, 2017 at 9:19 AM, Scott Wegner wrote: > Does requires-stable-input only apply to ParDo transforms? > > I don't think it would make sense to annotate to c

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Scott Wegner
Does requires-stable-input only apply to ParDo transforms? I don't think it would make sense to annotate to composite, because checkpointing should happen as close to the side-effecting operation as possible, since upstream transforms within a composite could introduce non-determinism. So it's the

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Tyler Akidau
+1 to the annotation idea, and to having it on processTimer. -Tyler On Thu, Aug 10, 2017 at 2:15 AM Aljoscha Krettek wrote: > +1 to the annotation approach. I outlined how implementing this would work > in the Flink runner in the Thread about the exactly-once Kafka Sink. > > > On 9. Aug 2017, a

Re: streaming output in just one files

2017-08-10 Thread Reuven Lax
On Thu, Aug 10, 2017 at 8:29 AM, Reuven Lax wrote: > This is how the file sink has always worked in Beam. If no sharding is > specified, then this means runner-determined sharding, and by default that > is one file per bundle. If Flink has small bundles, then I suggest using > the withNumShards m

Re: Exactly-once Kafka sink

2017-08-10 Thread Aljoscha Krettek
Ah, also regarding your earlier mail: I didn't know if many people were using Kafka with Dataflow, thanks for that clarification! :-) Also, I don't think that the TwoPhaseCommit Sink of Flink would work in a Beam context, I was just posting that for reference. Best, Aljoscha > On 10. Aug 2017,

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Aljoscha Krettek
+1 to the annotation approach. I outlined how implementing this would work in the Flink runner in the Thread about the exactly-once Kafka Sink. > On 9. Aug 2017, at 23:03, Reuven Lax wrote: > > Yes - I don't think we should try and make any deterministic guarantees > about what is in a bundle.

Re: Exactly-once Kafka sink

2017-08-10 Thread Aljoscha Krettek
@Raghu: Yes, exactly, that's what I thought about this morning, actually. These are the methods of an operator that are relevant to checkpointing: class FlinkOperator() { open(); snapshotState(): notifySnapshotComplete(); initializeState(); } Input would be buffered in state, would be ch

Jenkins build is back to normal : beam_Release_NightlySnapshot #499

2017-08-10 Thread Apache Jenkins Server
See