Re: Magic number explanation in ParDoTest.java

2019-01-22 Thread Kenneth Knowles
OK I dug in to the code. 1. This test should be using TestStream to control the watermark if it is to be a reliable test, since it sets a relative timer. Sorry I missed that in review. It is surprising that this test works on any runner, much less multiple. I wonder if it is blacklisted for most.

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Kenneth Knowles
When executed over the portable APIs, it will be primarily the Java SDK harness that makes all of these decisions. If we wanted runners to have some insight into it we would have to add it to the Beam model protos. I don't have any suggestions there, so I would leave it out of this discussion

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
I don't have a strong opinion on the resolution of the futures regarding to @FinishBundle invocation. Leaving it to be unspecified does give runners more room to implement it with their own support. Optimization is also another great point. Fuse seems pretty complex to me too if we need to find a

Re: Magic number explanation in ParDoTest.java

2019-01-22 Thread Sam Rohde
Thanks for digging up the PR. I'm still confused as to why that magic number is still there though. Why is there an expectation that the timestamp from the timer is *exactly *1774ms past BoundedWindow.TIMESTAMP_MIN_VALUE? On Tue, Jan 22, 2019 at 10:43 AM Kenneth Knowles wrote: > The commit

Re: [PROPOSAL] Prepare Beam 2.10.0 release

2019-01-22 Thread Kenneth Knowles
OK. There is just one release blocker remaining; https://issues.apache.org/jira/browse/BEAM-6354 I have no insights yet, but I am bisecting. It was healthy in 2.9.0. Kenn On Tue, Jan 22, 2019 at 9:38 AM Scott Wegner wrote: > The rollback for BEAM-6352 is now in and cherry-picked into the

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Kenneth Knowles
On Tue, Jan 22, 2019, 17:23 Reuven Lax > > On Tue, Jan 22, 2019 at 5:08 PM Xinyu Liu wrote: > >> @Steve: it's good to see that this is going to be useful in your use >> cases as well. Thanks for sharing the code from Scio! I can see in your >> implementation that waiting for the future

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
I can speak on Samza's perspective: Samza only commits the messages once the async callbacks have been completed. So if there are any failures, it will recover from last checkpoint and reprocess the messages that we haven't got the completion. So there is no data lost. The "Guaranteed Semantics"

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Udi Meiri
Alex, the only way to implement my suggestion #1 (that I know of) would be to write to a file and read it back. I don't have good example for #2. Eugene's suggestion no. 1 seems like a good idea. There are some example

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Reuven Lax
On Tue, Jan 22, 2019 at 5:08 PM Xinyu Liu wrote: > @Steve: it's good to see that this is going to be useful in your use cases > as well. Thanks for sharing the code from Scio! I can see in your > implementation that waiting for the future completion is part of the > @FinishBundle. We are

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Eugene Kirpichov
Yeah the "List expected" is constructed from Files.getLastModifiedTime() calls before the files are actually modified, the code is basically unconditionally broken rather than merely flaky. There's several easy options: 1) Use PAssert.that().satisfies() instead of .contains(), and use

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
@Steve: it's good to see that this is going to be useful in your use cases as well. Thanks for sharing the code from Scio! I can see in your implementation that waiting for the future completion is part of the @FinishBundle. We are thinking of taking advantage of the underlying runner async

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Alex Amato
Thanks Udi, is there a good example for either of these? #1 - seems like you have to rewrite your assertion logic without the PAssert? Is there some way to capture the pipeline output and iterate over it? The pattern I have seen for this in the past also has thread safety issues (Using a DoFn at

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Udi Meiri
Some options: - You could wait to assert until after p.waitForFinish(). - You could PAssert using SerializableMatcher and allow any lastModifiedTime. On Tue, Jan 22, 2019 at 3:56 PM Alex Amato wrote: > +Jeff, Eugene, > > Hi Jeff and Eugene, > > I've noticed that Jeff's PR >

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Ruoyun Huang
+1 getting the same issue as well. I saw there were @Ignore on those tests before. If it is not critical and just caused by the way how we do test, does it make sense to put those @Ignores back until it's resolved? On Tue, Jan 22, 2019 at 3:35 PM Alex Amato wrote: > I've seen this fail in a

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Alex Amato
+Jeff, Eugene, Hi Jeff and Eugene, I've noticed that Jeff's PR introduced a race condition in this test, but its not clear exactly how to add Jeff's test check in a thread safe way. I believe this to be the source

FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Alex Amato
I've seen this fail in a few different PRs for different contributors, and its causing some issues during the presubmit process.. This is a multithreadred test with a lot of sleeps, so it looks a bit suspicious as the source of the problem.

Re: Cross-language pipelines

2019-01-22 Thread Kenneth Knowles
Nice! If I recall correctly, there was mostly concern about how to launch and manage the expansion service (Docker? Vendor-specific? Etc). Does this PR a position on that question? Kenn On Tue, Jan 22, 2019 at 1:44 PM Chamikara Jayalath wrote: > > > On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri

Re: :beam-sdks-python:docs fails with docs invocation failure

2019-01-22 Thread Valentyn Tymofieiev
Hi, I just opened https://issues.apache.org/jira/browse/BEAM-6489, and plan to look into this. On Tue, Jan 22, 2019 at 3:13 PM Mikhail Gryzykhin wrote: > Hi everyone, > > I see python precommit tests fail with > > no such option: --process-dependency-links > > > Supposedly when invoking pip

:beam-sdks-python:docs fails with docs invocation failure

2019-01-22 Thread Mikhail Gryzykhin
Hi everyone, I see python precommit tests fail with no such option: --process-dependency-links Supposedly when invoking pip install. Examples: https://builds.apache.org/job/beam_PreCommit_Python_Commit/3752/console https://builds.apache.org/job/beam_PreCommit_Python_Phrase/152/console More

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Steve Niemitz
I'd love to see something like this as well. Also +1 to process(@Element InputT element, @Output OutputReceiver>). I don't know if there's much benefit to passing a future in, since the framework itself could hook up the process function to complete when the future completes. I feel like I've

Re: [DISCUSSION] ParDo Async Java API

2019-01-22 Thread Kenneth Knowles
If the input is a CompletionStage then the output should also be a CompletionStage, since all you should do is async chaining. We could enforce this by giving the DoFn an OutputReceiver(CompletionStage). Another possibility that might be even more robust against poor future use could be

Re: Cross-language pipelines

2019-01-22 Thread Chamikara Jayalath
On Tue, Jan 22, 2019 at 11:35 AM Udi Meiri wrote: > Also debugability: collecting logs from each of these systems. > Agree. > > On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath > wrote: > >> Thanks Robert. >> >> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw >> wrote: >> >>> Now that we

Re: How to use "PortableRunner" in Python SDK?

2019-01-22 Thread Heejong Lee
You can also try without --streaming option. There's a separate streaming wordcount example in the same directory. If you want to look into the output files, it would be easier to use external target like gs:// instead of local file. python -m apache_beam.examples.wordcount --input=/etc/profile

Re: How to use "PortableRunner" in Python SDK?

2019-01-22 Thread Ankur Goenka
Hi Jun, This error can be because of different Flink version. Please make sure that you are using Flink 1.5.6 for the commands you mentioned. Thanks, Ankur On Tue, Jan 22, 2019 at 11:44 AM junwa...@gmail.com wrote: > Hello, > > I tried to follow the instructions at >

Re: How to use "PortableRunner" in Python SDK?

2019-01-22 Thread Ankur Goenka
Hi Jun, Thanks for reporting the error. This seems to be because of mismatch in SDKHarness (Docker image) and the Python SDK. As we are actively developing, this can happen. Can you please retry after rebuilding the docker images and the Python sdk from master and install the python sdk to your

[DISCUSSION] ParDo Async Java API

2019-01-22 Thread Xinyu Liu
Hi, guys, As more users try out Beam running on the SamzaRunner, we got a lot of asks for an asynchronous processing API. There are a few reasons for these asks: - The users here are experienced in asynchronous programming. With async frameworks such as Netty and ParSeq and libs like async

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-22 Thread Reuven Lax
Ah yes, Kenn is correct, and i forget we made that change. To clarify - Beam does not expose late elements as a concept, rather it exposes late panes on its triggering API. The reason we made the change was not just because we wanted to include as much data as possible. but also because we wanted

Re: Our jenkins beam1 server is down

2019-01-22 Thread Yifan Zou
The inventory test on the beam1 passed. The beam1 is back to normal. https://builds.apache.org/job/beam_Inventory_beam1/303/ On Tue, Jan 22, 2019 at 11:41 AM Yifan Zou wrote: > Thanks for reporting the failures. Just disconnect and reconnect beam1. I > am creating a PR that force run a job on

Re: Confluence wiki edit access request

2019-01-22 Thread Udi Meiri
bump On Fri, Jan 18, 2019 at 1:57 PM Udi Meiri wrote: > username: udim > > Thanks! > smime.p7s Description: S/MIME Cryptographic Signature

Re: How to use "PortableRunner" in Python SDK?

2019-01-22 Thread junwan01
I downgraded the Flink from 1.7.1 to 1.5.6, and was able to go further, but still fails, here is the latest error from Flink. Thanks! the job cmd I launched : python -m apache_beam.examples.wordcount --input=/etc/profile --output=/tmp/py-wordcount-direct --runner=PortableRunner

Re: How to use "PortableRunner" in Python SDK?

2019-01-22 Thread junwan01
Hello, I tried to follow the instructions at https://beam.apache.org/roadmap/portability/#python-on-flink, 1. I installed Flink local cluster, and followed their SocketWindowWordCount example and confirmed the cluster works properly. 2. Start Flink job server: ./gradlew

Re: Our jenkins beam1 server is down

2019-01-22 Thread Yifan Zou
Thanks for reporting the failures. Just disconnect and reconnect beam1. I am creating a PR that force run a job on that agent to verify. On Tue, Jan 22, 2019 at 11:08 AM Ankur Goenka wrote: > Beam 1 seems to be down again > >

Re: Cross-language pipelines

2019-01-22 Thread Udi Meiri
Also debugability: collecting logs from each of these systems. On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath wrote: > Thanks Robert. > > On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw > wrote: > >> Now that we have the FnAPI, I started playing around with support for >> cross-language

Re: Our jenkins beam1 server is down

2019-01-22 Thread Ankur Goenka
Beam 1 seems to be down again https://builds.apache.org/job/beam_PreCommit_Portable_Python_Phrase/88/console https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink_PR/141/console On Tue, Jan 22, 2019 at 10:53 AM Yifan Zou wrote: > The beam1 and 14 are back and building. > > On Thu, Jan

Re: Our jenkins beam1 server is down

2019-01-22 Thread Yifan Zou
The beam1 and 14 are back and building. On Thu, Jan 17, 2019 at 7:04 AM Ismaël Mejía wrote: > Thanks Yifan for taking care. > > On Thu, Jan 17, 2019 at 1:24 AM Yifan Zou wrote: > > > > Yes, beam14 is offline as well. We're on it. > > > > On Wed, Jan 16, 2019 at 4:11 PM Ruoyun Huang wrote: >

Re: Cross-language pipelines

2019-01-22 Thread Chamikara Jayalath
Thanks Robert. On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw wrote: > Now that we have the FnAPI, I started playing around with support for > cross-language pipelines. This will allow things like IOs to be shared > across all languages, SQL to be invoked from non-Java, TFX tensorflow >

Re: Magic number explanation in ParDoTest.java

2019-01-22 Thread Kenneth Knowles
The commit comes from this PR: https://github.com/apache/beam/pull/2273 Kenn On Tue, Jan 22, 2019 at 10:21 AM Sam Rohde wrote: > Hi all, > > Does anyone have context why there is a magic number of "1774" > milliseconds in the ParDoTest.java on line 2618? This is in > the

Re: Proposal: Portability SDKHarness Docker Image Release with Beam Version Release.

2019-01-22 Thread Mark Liu
+1 to have an official Beam released container image. Also I would propose to add a verification step to (or after) the release process to do smoke check. Python have ValidatesContainer test that runs basic pipeline using newly built container for verification. Other sdk languages can do similar

Magic number explanation in ParDoTest.java

2019-01-22 Thread Sam Rohde
Hi all, Does anyone have context why there is a magic number of "1774" milliseconds in the ParDoTest.java on line 2618? This is in the testEventTimeTimerAlignBounded method. File at master:

Re: [PROPOSAL] Prepare Beam 2.10.0 release

2019-01-22 Thread Scott Wegner
The rollback for BEAM-6352 is now in and cherry-picked into the release branch. On Fri, Jan 18, 2019 at 9:04 AM Scott Wegner wrote: > For BEAM-6352, I have a rollback ready for review: > https://github.com/apache/beam/pull/7540 > Conversation about the decision to rollback vs. roll-forward for

Cross-language pipelines

2019-01-22 Thread Robert Bradshaw
Now that we have the FnAPI, I started playing around with support for cross-language pipelines. This will allow things like IOs to be shared across all languages, SQL to be invoked from non-Java, TFX tensorflow transforms to be invoked from non-Python, etc. and I think is the next step in

Re: [DISCUSSION] UTests and embedded backends

2019-01-22 Thread Robert Bradshaw
On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > Robert - you meant this as a mostly-automatic thing that we would engineer, > yes? Yes, something like TestPipeline that buffers up the pipelines and then executes on class teardown (details TBD). > A lighter-weight fake, like using