Re: Merge of vendored Guava (Some PRs need a rebase)

2019-01-15 Thread Łukasz Gajowy
Great news. Thanks all for this work! +1 to enforcing this on dependency level as Kenn suggested. Łukasz wt., 15 sty 2019 o 01:18 Kenneth Knowles napisał(a): > We can enforce at the dependency level, since it is a compile error. I > think some IDEs and build tools may allow the compile-time cl

Re: Beam JobService Problem

2019-01-15 Thread Robert Bradshaw
On Tue, Jan 15, 2019 at 1:19 AM Ankur Goenka wrote: > > Thanks Sam for bringing this to the list. > > As preparation_ids are not reusable, having preparation_id and job_id same > makes sense to me for Flink. I think we change the protocol and only have one kind of ID. As well as solving the prob

Re: [PROPOSAL] Prepare Beam 2.10.0 release

2019-01-15 Thread Ismaël Mejía
There is also another issue, after the 2.10.0 branch cut some identifier in the build was not changed and the Apache Beam Snapshots keep generating SNAPSHOTS for 2.10.0 instead of the now current 2.11.0-SNAPSHOT. Can somebody PTAL? On Thu, Jan 3, 2019 at 6:17 PM Maximilian Michels wrote: > > Than

[spark runner based on dataset POC] your opinion

2019-01-15 Thread Etienne Chauchot
Hi guys, regarding the new (made from scratch) spark runner POC based on the dataset API, I was able to make a big step forward: it can now run a first batch pipeline with a source ! See https://github.com/apache/beam/blob/spark-runner_structured-streaming/runners/spark-structured-streaming/src/

Re: Joining an Unbounded Source with a Bounded Source

2019-01-15 Thread Pierre Bailly Ferry
Hello Kenneth, Thank you so much for your answer. I think I'm going to implement the side input method. However, on the Spark Runner, the MapView is a simple HashMap[1], so I will have to put a lot of memory on my different spark executors. For the life cycle of MySQL data, currently I focus on

Re: [PROPOSAL] Prepare Beam 2.10.0 release

2019-01-15 Thread Kenneth Knowles
I'm on it. On Tue, Jan 15, 2019 at 8:10 AM Ismaël Mejía wrote: > There is also another issue, after the 2.10.0 branch cut some > identifier in the build was not changed and the Apache Beam Snapshots > keep generating SNAPSHOTS for 2.10.0 instead of the now current > 2.11.0-SNAPSHOT. Can somebody

Re: Joining an Unbounded Source with a Bounded Source

2019-01-15 Thread Alexey Romanenko
In case of joining bounded and unbounded sources with CoGroupByKey, I can guess that all data from bounded source (MySQL in this case) just comes into only one (first) window because all elements have the same timestamp (minimum timestamp or “-infinity" used for bounded sources, afaik). So, in o

Re: [PROPOSAL] Prepare Beam 2.10.0 release

2019-01-15 Thread Kenneth Knowles
As a heads up, I did not realize that the release guide specified a custom process for starting a release branch. It makes sense; cut_release_branch.sh consolidates knowledge about all the places the version is hardcoded in the codebase. To keep the history simple, I will re-cut the release branch

Re: Add all tests to release validation

2019-01-15 Thread Sam Rohde
+Boyuan Zhang who is modifying the rc validation script I'm thinking of a small change to the proposed process brought to my attention from Boyuan. Instead of running the additional validation tests during the rc validation, run the tests and the proposed process after the release branch has bee

Re: Beam JobService Problem

2019-01-15 Thread Sam Rohde
On Tue, Jan 15, 2019 at 5:23 AM Robert Bradshaw wrote: > On Tue, Jan 15, 2019 at 1:19 AM Ankur Goenka wrote: > > > > Thanks Sam for bringing this to the list. > > > > As preparation_ids are not reusable, having preparation_id and job_id > same makes sense to me for Flink. > > I think we change t

Re: Add all tests to release validation

2019-01-15 Thread Kenneth Knowles
Since you brought up the entirety of the process, I would suggest to move the release branch cut up like so: - Decide to release - Create a new version in JIRA - Find a recent green commit (according to postcommit) - Create a release branch from that commit - Bump the version on master (green

Apache Beam Newsletter - January 2019

2019-01-15 Thread Rose Nguyen
[image: Beam.png] January 2019 | Newsletter What’s been done -- Apache Beam 2.9.0 released (by: many contributors) - Download the release here. - See the blog post

TestDirectRunner for Java?

2019-01-15 Thread Udi Meiri
Hi, I want to use DirectRunner for a new IT I'm writing, since it's testing I/O code that's runner agnostic. The problem is that DirectRunner doesn't have a TestDataflowRunner analog, so features like OnSuccessMatcher aren't available. Any objections to adding a TestDirectRunner class? smime.p7s

I think spotless needs to be applied and merged into master (PR inside)

2019-01-15 Thread Alex Amato
I noticed a lot of files got added to one of my PRs when I ran spotlessApply. Perhaps the rules for spotless were changed but not applied to the branch? I create a PR for this, if anyone would like to merge it. https://github.com/apache/beam/pull/7527/files Or feel free to make your own PR and me

Re: I think spotless needs to be applied and merged into master (PR inside)

2019-01-15 Thread Kenneth Knowles
For context, previously the version of google-java-format used by spotless was dynamic and arbitrary. Recently it was pinned in https://github.com/apache/beam/pull/7505/files so we wouldn't get unpleasant surprises. Of course, now we have an unpleasant surprise. Very suspicious that the check passe

Re: I think spotless needs to be applied and merged into master (PR inside)

2019-01-15 Thread Kenneth Knowles
Reuven also hit this and opened https://github.com/apache/beam/pull/7523. I just cloned master and spotlessCheck passed but spotlessApply was not a noop. It seems a bug has been introduced. We do have paddedCell turned on to let spotless deal with non-idempotence (i.e. bugs) in the google-java-form

Re: TestDirectRunner for Java?

2019-01-15 Thread Kenneth Knowles
Since it is primarily for testing, how about just making it use the existing one in the pipeline options? I'm honestly a bit lost as to what the use case was when that was introduced, versus waiting for termination and running the assertion more directly. Can you enlighten me? Kenn On Tue, Jan 15

Python Flink tests failing on Jenkins

2019-01-15 Thread Reuven Lax
Seems to be failing seting up virtualenv. Anyone else seeing this? *18:15:03* Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/pluggy'

Re: Python Flink tests failing on Jenkins

2019-01-15 Thread Ahmet Altay
+Robert Bradshaw Is it this test suite: https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Python_ValidatesRunner_Flink_Commit/ There is a recent change related to that https://github.com/apache/beam/pull/7514 and they seemed to be failing since then. Might be a configurati

Re: I think spotless needs to be applied and merged into master (PR inside)

2019-01-15 Thread Alex Amato
Hmm, and it seems that presubmit failed in my PR as well? In my https://github.com/apache/beam/pull/7527/files Due to some virtual env setup issue. Was there another change at the same time? Or is it caused by spotless https://scans.gradle.com/s/sehczxwqru3ny/console-log?task=:beam-sdks-python:s

Re: I think spotless needs to be applied and merged into master (PR inside)

2019-01-15 Thread Reuven Lax
BTW I just submitted all the spotless changes, so please rebase. The python failure appears unrelated, and there's another thread on dev about this. On Tue, Jan 15, 2019 at 6:40 PM Alex Amato wrote: > Hmm, and it seems that presubmit failed in my PR as well? In my > https://github.com/apache/b

gradle clean causes long-running python installs

2019-01-15 Thread Kenneth Knowles
A global `./gradlew clean` runs various `setupVirtualEnv` tasks that invoke things such as `setup.py bdist_wheel for grpcio-tools`. Overall it took 4 minutes. Is this intended? Kenn

Re: I think spotless needs to be applied and merged into master (PR inside)

2019-01-15 Thread Kenneth Knowles
I had a suspicion so I confirmed that paddedCell is the culprit. Details on https://issues.apache.org/jira/browse/BEAM-6447 and turn it off on https://github.com/apache/beam/pull/7531. But it looks like it was quite deliberately and recently turned on at https://github.com/apache/beam/pull/7390 (b

Re: I think spotless needs to be applied and merged into master (PR inside)

2019-01-15 Thread Kenneth Knowles
Also filed https://github.com/diffplug/spotless/issues/338 Kenn On Tue, Jan 15, 2019 at 8:38 PM Kenneth Knowles wrote: > I had a suspicion so I confirmed that paddedCell is the culprit. Details > on https://issues.apache.org/jira/browse/BEAM-6447 and turn it off on > https://github.com/apache/b

Re: gradle clean causes long-running python installs

2019-01-15 Thread Manu Zhang
I have the same question. Sometimes even `./gradlew clean` fails due to failure of `setupVirtualEnv` tasks. Manu Zhang On Jan 16, 2019, 12:22 PM +0800, Kenneth Knowles , wrote: > A global `./gradlew clean` runs various `setupVirtualEnv` tasks that invoke > things such as `setup.py bdist_wheel fo