Re: IO Integration tests - concrete proposal

2017-01-25 Thread Etienne Chauchot
Hey Stephen, That seems perfect! Another thing, more about software design, maybe you could add in the guide comments what have been discussed in the ML about making standard the use of: - IOService interface in UT and IT, - implementations EmbeddedIOService and MockIOServcice for UT - imp

Re: How to implement Timer in runner

2017-01-25 Thread Aljoscha Krettek
Hi Jingsong, you're right, it is indeed somewhat tricky to find a good data structure for out-of-core timers. That's why we have them in memory in Flink for now and that's also why I'm afraid I don't have any good advice for you right now. We're aware of the problem in Flink but we're not yet worki

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Flink #1424

2017-01-25 Thread Aljoscha Krettek
I'm investigating (using git bisect). On Tue, 24 Jan 2017 at 05:40 Kenneth Knowles wrote: > It looks like there was some downtime in postcommit coverage. So I made a > guess at the PR that might have caused this and started a postcommit run: > > https://builds.apache.org/job/beam_PostCommit_Java

Re: IO Integration tests - concrete proposal

2017-01-25 Thread Jean-Baptiste Onofré
Hi It's what I mentioned in a previous email yup. It should refer a "IO Writing Guide⁣​" describing the purpose of service interface, fake/mock, ... I will tackle that in a PR. Regards JB On Jan 25, 2017, 09:54, at 09:54, Etienne Chauchot wrote: >Hey Stephen, > >That seems perfect! > >Another

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Flink #1424

2017-01-25 Thread Kenneth Knowles
Closing the loop here, I think https://github.com/apache/beam/pull/1839 fixed it up. I'll be better about filing JIRA and assigning so there's no parallel work next time. On Wed, Jan 25, 2017 at 6:29 AM, Aljoscha Krettek wrote: > I'm investigating (using git bisect). > > On Tue, 24 Jan 2017 at 0

Conceptually, what are bundles?

2017-01-25 Thread Matthew Jadczak
Hi, I’m a finalist CompSci student at the University of Cambridge, and for my final project/dissertation I am writing an implementation of the Beam SDK in Elixir [1]. Given that the Beam project is obviously still very much WIP, it’s still somewhat difficult to find good conceptual overviews of

Re: Conceptually, what are bundles?

2017-01-25 Thread Robert Bradshaw
Bundles are simply the unit of commitment (retry) in the Beam SDK. They're not really a model concept, but do leak from the implementation into the API as it's not feasible to checkpoint every individual process call, and this allows some state/compute/... to be safely amortized across elements (ei

Re: Conceptually, what are bundles?

2017-01-25 Thread Thomas Groh
I have a couple of points in addition to what Robert said Runners are permitted to determine bundle sizes as appropriate to their implementation, so long as bundles are atomically committed. The contents of a PCollection are independent of the bundling of that PCollection. Runners can process all

Re: Better developer instructions for using Maven?

2017-01-25 Thread Dan Halperin
Here is my summary of the threads: Overwhelming agreement: - rename `release` to something more appropriate. - add `checkstyle` to the default build (it's basically a compile error) - add more information to contributor guide Reasonable agreement - don't update the github instructions to make p

Re: Conceptually, what are bundles?

2017-01-25 Thread Amit Sela
On Wed, Jan 25, 2017 at 8:23 PM Thomas Groh wrote: > I have a couple of points in addition to what Robert said > > Runners are permitted to determine bundle sizes as appropriate to their > implementation, so long as bundles are atomically committed. The contents > of a PCollection are independent

Re: Better developer instructions for using Maven?

2017-01-25 Thread Jason Kuster
+1 On Wed, Jan 25, 2017 at 10:38 AM, Dan Halperin wrote: > Here is my summary of the threads: > > Overwhelming agreement: > > - rename `release` to something more appropriate. > - add `checkstyle` to the default build (it's basically a compile error) > - add more information to contributor guide

Re: Conceptually, what are bundles?

2017-01-25 Thread Eugene Kirpichov
One more thing. I think ideally, bundles should not leak into the model at all - e.g. ideally, startBundle/finishBundle methods in DoFn should not exist. They interact poorly with windowing. The proper way to address what is commonly done in these methods is either Setup/Teardown methods, or a (to

Re: Better developer instructions for using Maven?

2017-01-25 Thread Jean-Baptiste Onofré
+1 It sounds good to me. Thanks Dan ! Regards JB⁣​ On Jan 25, 2017, 19:39, at 19:39, Dan Halperin wrote: >Here is my summary of the threads: > >Overwhelming agreement: > >- rename `release` to something more appropriate. >- add `checkstyle` to the default build (it's basically a compile >erro

Re: Conceptually, what are bundles?

2017-01-25 Thread Kenneth Knowles
There's actually not a JIRA filed beyond BEAM-25 for what Eugene is referring to. Context: Prior to windowing and streaming it was safe to buffer elements in @ProcessElement and then actually perform output in @FinishBundle. This pattern is only suitable for global windowing, flushing to external s

Re: Better developer instructions for using Maven?

2017-01-25 Thread Kenneth Knowles
+1 On Jan 25, 2017 11:15, "Jean-Baptiste Onofré" wrote: > +1 > > It sounds good to me. > > Thanks Dan ! > > Regards > JB⁣​ > > On Jan 25, 2017, 19:39, at 19:39, Dan Halperin > wrote: > >Here is my summary of the threads: > > > >Overwhelming agreement: > > > >- rename `release` to something more

Re: Conceptually, what are bundles?

2017-01-25 Thread Matthew Jadczak
Thanks! So if I’m understanding right, with a greenfield implementation that does not have to worry about actual interop with other Beam SDKs/runners in the near future, implementing setup/teardown callbacks as well as the state/timer API [1] for DoFns, and handling any committing and retrying a

Re: Conceptually, what are bundles?

2017-01-25 Thread Lukasz Cwik
Apache Beam is attempting to reduce the amount of work to create an SDK by allowing one to use a Runner written within a different language. Within the Apache Beam technical vision [1] we discuss a world where an SDK is made portable by using a common pipeline representation (part of Runner Api [2]

Re: Conceptually, what are bundles?

2017-01-25 Thread Matthew Jadczak
Sure, if I wasn’t clear, what I meant was that I would indeed like to align my code with the Beam technical vision, as opposed to current implementation details. Since this is a university project I cannot release the source code until July most likely, so I am trying to align the high-level API

Re: Conceptually, what are bundles?

2017-01-25 Thread Lukasz Cwik
No misunderstanding. On Wed, Jan 25, 2017 at 5:17 PM, Matthew Jadczak wrote: > Sure, if I wasn’t clear, what I meant was that I would indeed like to > align my code with the Beam technical vision, as opposed to current > implementation details. Since this is a university project I cannot release

Re: IO Integration tests - concrete proposal

2017-01-25 Thread Stephen Sisk
hi JB! "IO Writing Guide" sounds like BEAM-1025 (User guide - "How to create Beam IO Transforms") that I've been working on. Let me pull together the stuff I've been working on into a draft that folks can take a look at. I had an earlier draft that was more focused on sources/sinks but since we're

Default Timestamp and Watermark

2017-01-25 Thread Shen Li
Hi, When reading from a source with no timestamp specified on elements, what should be the default timestamp? I presume that it should be 0 as I saw PAssertTest trying to set timestamps to very small values with 0 allowed timestamp skew. Is that right? What about the default watermark policy? If

Re: IO Integration tests - concrete proposal

2017-01-25 Thread Jean-Baptiste Onofré
Hi Stephen Yup it sounds good. My proposal is just to document a bit the best practices for IO. Thanks ! Regards JB⁣​ On Jan 26, 2017, 02:25, at 02:25, Stephen Sisk wrote: >hi JB! > >"IO Writing Guide" sounds like BEAM-1025 (User guide - "How to create >Beam >IO Transforms") that I've been wor