Re: Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Apex #363

2017-01-31 Thread Jason Kuster
This seems like it could be a legitimate flake. Expected: <1970-01-01T00:09:59.999Z> but: was <2017-02-01T01:38:42.261Z> Anyone with more knowledge about the apex runner have any ideas? On Tue, Jan 31, 2017 at 5:48 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: How to fire the global window when using GroupAlsoByWindowViaWindowSetDoFn?

2017-01-31 Thread Kenneth Knowles
Sorry, that is BoundedWindow.TIMESTAMP_MAX_VALUE. On Tue, Jan 31, 2017 at 8:37 PM, Kenneth Knowles wrote: > Hi Shen, > > Your runner should advance the watermark for the PCollection coming out of > the BoundedSource to BoundedWindow.MAX_TIMESTAMP, which is "positive > infinity" and indicates tha

Re: How to fire the global window when using GroupAlsoByWindowViaWindowSetDoFn?

2017-01-31 Thread Kenneth Knowles
Hi Shen, Your runner should advance the watermark for the PCollection coming out of the BoundedSource to BoundedWindow.MAX_TIMESTAMP, which is "positive infinity" and indicates that even the global window has fired/expired (for the global window these are the same instant). Kenn On Tue, Jan 31,

How to fire the global window when using GroupAlsoByWindowViaWindowSetDoFn?

2017-01-31 Thread Shen Li
Hi, My runner is translating GroupByKey using GroupAlsoByWindowViaWindowSetDoFn. Say I have a BoundedSource with five tuples all placed into a global window. When the source is depleted, how should the runner notify the downstream GroupByKey(GroupAlsoByWindowViaWindowSetDoFn) that it should fire t

Re: Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Spark #807

2017-01-31 Thread Kenneth Knowles
Issue communicating with Maven central. On Tue, Jan 31, 2017 at 7:12 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See RunnableOnService_Spark/807/changes> > > Changes: > > [kirpichov] Removes inputProvider() and outputRecei

Re: Let's make Beam transforms comply with PTransform Style Guide

2017-01-31 Thread Eugene Kirpichov
On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin wrote: > On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > > > Hello, > > > > The PTransform Style Guide is live > > https://beam.apache.org/contribute/ptransform-style-guide/ - a natural > > next > > step is

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Aljoscha Krettek
I opened this PR with three revert commits: https://github.com/apache/beam/pull/1883 I also started PostCommit runs for this: - https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/2486/ - https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Flink/1

Re: Doesn't PAssertTest.runExpectingAssertionFailure need to call waitUntilFinish?

2017-01-31 Thread Shen Li
Hi Dan, Thanks a lot for the explanation. :) Best, Shen On Tue, Jan 31, 2017 at 4:19 PM, Dan Halperin wrote: > Hi Shen, > > Great question. The trick is that the `pipeline` object is an instance of > TestPipeline [0], for which p.run() is the same as > p.run().waitUntilFinish(). > > It might

Re: Doesn't PAssertTest.runExpectingAssertionFailure need to call waitUntilFinish?

2017-01-31 Thread Dan Halperin
Hi Shen, Great question. The trick is that the `pipeline` object is an instance of TestPipeline [0], for which p.run() is the same as p.run().waitUntilFinish(). It might be documentationally better to use p.run().waitUntilFinish() to be consistent with real runners, or add a method to TestPipelin

Doesn't PAssertTest.runExpectingAssertionFailure need to call waitUntilFinish?

2017-01-31 Thread Shen Li
Hi, In the PAssertTest, doesn't it need to append a "waitUntilFinish()" to the "pipeline.run()" (please see the link below)? Otherwise, the runner may return the PipelineResult immediately without actually kicking off the execution, and therefore the AssertionError won't be thrown. Or did I miss a

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Aljoscha Krettek
Agreed, since it's a regression. Let's hope that the transitive closure of "revert those two commits" doesn't get to large. I'll checkout the release-0.5.0 branch and see where we get with reverting. On Tue, 31 Jan 2017 at 19:28 Kenneth Knowles wrote: I agree. -1 and let's do the smartest thing

Re: TextIO binary file

2017-01-31 Thread Eugene Kirpichov
+1 to Robert. Either this will be a Beam-specific file format (and then nothing except Beam will be able to read it - which I doubt is what you want), or it is an existing well-known file format and then we should just develop an IO for it. Note that any file format that involves encoding elements

Re: TextIO binary file

2017-01-31 Thread Robert Bradshaw
On Tue, Jan 31, 2017 at 12:04 PM, Aviem Zur wrote: > +1 on what Stas said. > I think there is value in not having the user write a custom IO for a > protocol they use which is not covered by Beam IOs. Plus having them deal > with not only the encoding but also the IO part is not ideal. > I think h

Re: TextIO binary file

2017-01-31 Thread Aviem Zur
+1 on what Stas said. I think there is value in not having the user write a custom IO for a protocol they use which is not covered by Beam IOs. Plus having them deal with not only the encoding but also the IO part is not ideal. I think having a basic FileIO that can write to the Filesystems support

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Kenneth Knowles
Awesome! On Tue, Jan 31, 2017 at 9:38 AM, Ahmet Altay wrote: > Thank you Prabeesh and Sergio for fixing those! > > On Tue, Jan 31, 2017 at 4:51 AM, Jean-Baptiste Onofré > wrote: > > > Awesome, thanks Sergio ! Much appreciated ;) > > > > Regards > > JB > > > > > > On 01/31/2017 01:42 PM, Sergio

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Kenneth Knowles
I agree. -1 and let's do the smartest thing to undo the regression. Those two commits are not sufficient to restore late data dropping. You'll also need to revert the switch of the Flink runner to use new DoFn, maybe more. On Tue, Jan 31, 2017 at 10:21 AM, Jean-Baptiste Onofré wrote: > Basicall

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Jean-Baptiste Onofré
Basically, my question is: is it a regression ? If yes, definitely a -1 and we should cancel the release. Correct me if I'm wrong, but the commits in the LateDataDroppingDoFnRunner introduced a regression. So, I would cancel this vote and revert the two commits for RC2. WDYT ? Regards JB O

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Dan Halperin
Should we revert the CLs that lost the functionality? I'd really not like to ship a release with such a functional regression On Tue, Jan 31, 2017 at 10:07 AM, Jean-Baptiste Onofré wrote: > Fair enough. Let's do that. > > Thanks ! > > Regards > JB > > > On 01/31/2017 06:58 PM, Aljoscha Krett

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Jean-Baptiste Onofré
Fair enough. Let's do that. Thanks ! Regards JB On 01/31/2017 06:58 PM, Aljoscha Krettek wrote: I'm not sure. Poperly fixing this will take some time, especially since we have to add tests to prevent breakage from happening in the future. Plus, if my analysis is correct other runners might als

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Aljoscha Krettek
I'm not sure. Poperly fixing this will take some time, especially since we have to add tests to prevent breakage from happening in the future. Plus, if my analysis is correct other runners might also not have proper late data dropping and it's fine to have a release with some missing features. (The

Re: TextIO binary file

2017-01-31 Thread Stas Levin
I believe the motivation is to have an abstraction that allows one to write stuff to a file in a way that is agnostic to the coder. If one needs to write a non-Avro protocol to a file, and this particular protocol does not meet the assumption made by TextIO, one might need to duplicate the file IO

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Ahmet Altay
Thank you Prabeesh and Sergio for fixing those! On Tue, Jan 31, 2017 at 4:51 AM, Jean-Baptiste Onofré wrote: > Awesome, thanks Sergio ! Much appreciated ;) > > Regards > JB > > > On 01/31/2017 01:42 PM, Sergio Fernández wrote: > >> PR #1879 provides the basics: https://github.com/apache/beam/pul

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Jean-Baptiste Onofré
Hi Aljoscha, so you propose to cancel this vote to prepare a RC2 ? Regards JB On 01/31/2017 05:06 PM, Aljoscha Krettek wrote: It's not just an issue with the Flink Runner, if I'm not mistaken. Flink had late-data dropping via the LateDataDroppingDoFnRunner (which got "disabled" by the two com

Re: TextIO binary file

2017-01-31 Thread Eugene Kirpichov
Could you clarify why it would be useful to write objects to files using Beam coders, as opposed to just using e.g. AvroIO? Coders (should) make no promise as to what their wire format is, so such files could be read back only by other Beam pipelines using the same IO. On Tue, Jan 31, 2017 at 2:4

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Aljoscha Krettek
It's not just an issue with the Flink Runner, if I'm not mistaken. Flink had late-data dropping via the LateDataDroppingDoFnRunner (which got "disabled" by the two commits I mention in the issue) while I think that the Apex and Spark Runners might not have had dropping in the first place. (Not sur

Re: Projects for Google Summer of Code 2017

2017-01-31 Thread Kenneth Knowles
I think this is a great idea. I also participated in GSOC once. I've been particularly interested in coming up with great new applications of Beam to new domains. In chatting with professors at the University of Washington, I've learned that scholars of many fields would really like to explore new

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Jean-Baptiste Onofré
Awesome, thanks Sergio ! Much appreciated ;) Regards JB On 01/31/2017 01:42 PM, Sergio Fernández wrote: PR #1879 provides the basics: https://github.com/apache/beam/pull/1879 On Tue, Jan 31, 2017 at 1:33 PM, Jean-Baptiste Onofré wrote: No, that's fine as soon as we clearly document the prer

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Sergio Fernández
PR #1879 provides the basics: https://github.com/apache/beam/pull/1879 On Tue, Jan 31, 2017 at 1:33 PM, Jean-Baptiste Onofré wrote: > No, that's fine as soon as we clearly document the prerequisite for the > build. IMHO, we should provide quick BUILDING instructions in the README.md. > > Regards

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Jean-Baptiste Onofré
No, that's fine as soon as we clearly document the prerequisite for the build. IMHO, we should provide quick BUILDING instructions in the README.md. Regards JB On 01/31/2017 01:24 PM, Sergio Fernández wrote: Originally we integrate the build in Maven with the default profile. Do you feel like

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Sergio Fernández
Originally we integrate the build in Maven with the default profile. Do you feel like it'd be better to have it under a separated profile or so? On Tue, Jan 31, 2017 at 11:07 AM, Jean-Baptiste Onofré wrote: > Just to be clear, the prerequisite to be able to build the Python SDK are: > > apt-get

Re: TextIO binary file

2017-01-31 Thread Aviem Zur
Looks like Eugene addressed this in the following ticket: https://issues.apache.org/jira/browse/BEAM-1354 Just added a bullet regarding updating the javadoc. On Tue, Jan 31, 2017 at 12:47 PM Aviem Zur wrote: > So If I understand the general agreement is that TextIO should not support > anything

Re: TextIO binary file

2017-01-31 Thread Aviem Zur
So If I understand the general agreement is that TextIO should not support anything but lines from files as strings. I'll go ahead and file a ticket that says the Javadoc should be changed to reflect this and `withCoder` method should be removed. Is there merit for Beam to supply an IO which does

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Jean-Baptiste Onofré
Just to be clear, the prerequisite to be able to build the Python SDK are: apt-get install python-setuptools apt-get install python-pip It's also required by the default "regular" build. Regards JB On 01/31/2017 11:02 AM, Jean-Baptiste Onofré wrote: Just one thing I noticed (and can be helpfu

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Jean-Baptiste Onofré
Just one thing I noticed (and can be helpful for others): to build Beam we now need python setuptools installed. For instance, on Ubuntu, you have to do: apt-get install python-setuptools Same for the pip distribution. I guess (if not already done), we have to update README/Building instruct

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Jean-Baptiste Onofré
Awesome ! Great work guys ! Regards JB On 01/31/2017 08:10 AM, Ahmet Altay wrote: Hi all, This merge is completed. Python SDK is now officially part of the master branch! Thank you all for the support. Please open an issue, if you notice a reference to the now obsolete python-sdk branch in th

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Sergio Fernández
great! On Tue, Jan 31, 2017 at 8:10 AM, Ahmet Altay wrote: > Hi all, > > This merge is completed. Python SDK is now officially part of the master > branch! Thank you all for the support. Please open an issue, if you notice > a reference to the now obsolete python-sdk branch in the documentation.

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Prabeesh K.
https://issues.apache.org/jira/browse/BEAM-1360 On 31 January 2017 at 12:12, Prabeesh K. wrote: > https://issues.apache.org/jira/browse/BAHIR-86 > > On 31 January 2017 at 11:10, Ahmet Altay wrote: > >> Hi all, >> >> This merge is completed. Python SDK is now officially part of the master >> bra

Re: [DISCUSS] Python SDK status and next steps

2017-01-31 Thread Prabeesh K.
https://issues.apache.org/jira/browse/BAHIR-86 On 31 January 2017 at 11:10, Ahmet Altay wrote: > Hi all, > > This merge is completed. Python SDK is now officially part of the master > branch! Thank you all for the support. Please open an issue, if you notice > a reference to the now obsolete pyt