Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Romain Manni-Bucau
Le mer. 12 sept. 2018 00:37, Lukasz Cwik a écrit : > I was unaware that users would use multiple versions of Apache Beam on the > classpath at the same time. In that case I don't believe shading is > something that will be there number one problem since we don't have a > stable API surface

Re: [Proposal] Creating a reproducible environment for Beam Jenkins Tests

2018-09-11 Thread Yifan Zou
Thanks all. I am struggling with the missing buildscan reports when running jobs with containers. I believe it is a big disadvantage to use docker if the buildscan doesn't show up. I will keep updating my progress in this thread. In the meanwhile, any comments, suggestions and objections are still

Jenkins build is back to normal : beam_Release_Gradle_NightlySnapshot #169

2018-09-11 Thread Apache Jenkins Server
See

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Lukasz Cwik
I was unaware that users would use multiple versions of Apache Beam on the classpath at the same time. In that case I don't believe shading is something that will be there number one problem since we don't have a stable API surface between internal Apache Beam components. For users who aren't

Re: Spotless broken on master

2018-09-11 Thread Andrew Pilloud
I don't think spotless is included in the default test target. Jenkins runs a more expanded ':javaPreCommit' gradle target. Andrew On Tue, Sep 11, 2018 at 2:32 PM Ismaël Mejía wrote: > Mmm this is weird, I tested this locally and passed without issue, I > am wondering how could this happen. >

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Romain Manni-Bucau
I understand Lukasz but it makes using shades properly pretty impossible since this warning is not just something you can ignore but something you have to fix since it can hide bugs. I get the "it is ok while you have a single beam version" point but why would you get only beam in your classpath,

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Lukasz Cwik
Romain, the beam-model-fn-execution-2.7.0.jar, beam-model-job-management-2.7.0.jar, beam-model-pipeline-2.7.0.jar have duplicates of the same classes to satisfy their dependencies (gRPC and protobuf and their transitive dependencies). Producing a separate artifact is still not done to prevent the

Re: Spotless broken on master

2018-09-11 Thread Ismaël Mejía
Mmm this is weird, I tested this locally and passed without issue, I am wondering how could this happen. Thanks anyway for the quick fix.

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Romain Manni-Bucau
BTW, did you notice that doing a shade now logs something like: [WARNING] beam-model-fn-execution-2.7.0.jar, beam-model-job-management-2.7.0.jar, beam-model-pipeline-2.7.0.jar define 6660 overlapping classes: [WARNING] -

Spotless broken on master

2018-09-11 Thread Andrew Pilloud
Looks like the Java PreCommit is broken due to a commit manually merged to master. Thanks to Huygaa for finding it in our unstable tests. Fix is here, I will merge when tests pass: https://github.com/apache/beam/pull/6364 Andrew

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Jean-Baptiste Onofré
I'm taking the Spark runner one. Regards JB On 11/09/2018 21:15, Ahmet Altay wrote: Could anyone else help with looking at these issues earlier? On Tue, Sep 11, 2018 at 12:03 PM, Romain Manni-Bucau mailto:rmannibu...@gmail.com>> wrote: Im running this main [1] through this IT [2]. Was

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Charles Chen
The SparkRunner validation test (here: https://beam.apache.org/contribute/release-guide/#run-validation-tests) passes on my machine. It looks like we are likely missing test coverage where Romain is hitting issues. On Tue, Sep 11, 2018 at 12:15 PM Ahmet Altay wrote: > Could anyone else help

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Ahmet Altay
Could anyone else help with looking at these issues earlier? On Tue, Sep 11, 2018 at 12:03 PM, Romain Manni-Bucau wrote: > Im running this main [1] through this IT [2]. Was working fine since ~1 > year but 2.7.0 broke it. Didnt investigate more but can have a look later > this month if it

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Romain Manni-Bucau
Im running this main [1] through this IT [2]. Was working fine since ~1 year but 2.7.0 broke it. Didnt investigate more but can have a look later this month if it helps. [1]

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Charles Chen
Romain: can you give more details on the failure you're encountering, i.e. how you are performing this validation? On Tue, Sep 11, 2018 at 9:36 AM Jean-Baptiste Onofré wrote: > Hi, > > weird, I didn't have it on Beam samples. Let me try to reproduce and I > will create the Jira. > > Regards >

JIRA permissions request

2018-09-11 Thread Connell O'Callaghan
Hi dev@, There are quite a few efforts in flight that have a lot of identified work that needs a bit of project management to better communicate what is being worked on and in what order across the community -- Portability framework, portable runners, and SQL being examples that come to mind.

Re: PTransforms and Fusion

2018-09-11 Thread Henning Rohde
Empty pipelines have neither subtransforms or a spec, which is what I don't think is useful -- especially given the only usecase (which is really "nop") would create non-timer loops in the representations. I'd rather have a well-known nop primitive instead. Even now, for the A example, I don't

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Jean-Baptiste Onofré
Hi, weird, I didn't have it on Beam samples. Let me try to reproduce and I will create the Jira. Regards JB On 11/09/2018 11:44, Romain Manni-Bucau wrote: -1, seems spark integration is broken (tested with spark 2.3.1 and 2.2.1): 18/09/11 11:33:29 WARN TaskSetManager: Lost task 0.0 in

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Maximilian Michels
Could we still include some fixes for the RC2? I just discovered two JIRA issues which were not properly marked with "Fix Version". https://issues.apache.org/jira/browse/BEAM-5239 https://issues.apache.org/jira/browse/BEAM-5246 They are not show-stoppers, so also fine with me if we don't

Re: [portablility] metrics interrogations

2018-09-11 Thread Robert Bradshaw
On Mon, Sep 10, 2018 at 11:07 AM Etienne Chauchot wrote: > Hi all, > > @Luke, @Alex I have a general question related to metrics in the Fn API: > as the communication between runner harness and SDK harness is done on a > bundle basis. When the runner harness sends data to the sdk harness to >

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-11 Thread Thomas Weise
I'm in favor of a combination of 2) and 3): New module "hadoop-mapreduce-format" ("hadoop-format" does not sufficiently qualify what it is). Turn existing " hadoop-input-format" into a proxy for new module for backward compatibility (marked deprecated and removed in next major version). I don't

Re: How to implement repartition.

2018-09-11 Thread Robert Bradshaw
Does Reshuffle do what you want? On Tue, Sep 11, 2018, 3:46 PM devinduan(段丁瑞) wrote: > Hi all: > I recently start studying the Beam on spark runner. > I want to implement a method *repartition* similar to Spark > *rdd.repartition()* , but I can't find a solution. > Could anyone help

How to implement repartition.

2018-09-11 Thread 段丁瑞
Hi all: I recently start studying the Beam on spark runner. I want to implement a method repartition similar to Spark rdd.repartition() , but I can't find a solution. Could anyone help me? Thanks for your reply. devin.

Re: [Proposal] Creating a reproducible environment for Beam Jenkins Tests

2018-09-11 Thread Alexey Romanenko
+1 Great feature that should help with complicated error cases. > On 11 Sep 2018, at 03:39, Henning Rohde wrote: > > +1 Nice proposal. It will help eradicate some of the inflexibility and > frustrations with Jenkins. > > On Wed, Sep 5, 2018 at 2:30 PM Yifan Zou >

Re: [PROPOSAL] Test performance of basic Apache Beam operations

2018-09-11 Thread Alexey Romanenko
I agree that we can benefit from having two types of performance tests (low and high level) that could complement each other. Can we detect a regression (if any) automatically and send a report about that? Sorry if we already do that for Nexmark. > On 11 Sep 2018, at 11:29, Etienne Chauchot

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-11 Thread Alexey Romanenko
Dharmendra, For now, you can’t write with Hadoop MapReduce OutputFormat. However, you can use FileIO or TextIO to write to HDFS, these IOs support different file systems. > On 11 Sep 2018, at 11:11, dharmendra pratap singh > wrote: > > Hello Team, > Does this mean, as of today we can read

Re: Gradle Races in beam-examples-java, beam-runners-apex

2018-09-11 Thread Maximilian Michels
Do we have inotifywait available on Travis and could set it up to log concurrent access to the relevant Jar files? On 10.09.18 22:41, Lukasz Cwik wrote: I had originally suggested to use some Linux kernel tooling such as inotifywait[1] to watch what is happening. It is likely that we have

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-11 Thread Romain Manni-Bucau
-1, seems spark integration is broken (tested with spark 2.3.1 and 2.2.1): 18/09/11 11:33:29 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, RMANNIBUCAU, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field

Re: [portablility] metrics interrogations

2018-09-11 Thread Etienne Chauchot
Le lundi 10 septembre 2018 à 09:42 -0700, Lukasz Cwik a écrit : > Alex is out on vacation for the next 3 weeks. > Alex had proposed the types of metrics[1] but not the exact protocol as to > what the SDK and runner do. I could > envision Alex proposing that the SDK harness only sends diffs or

Re: [PROPOSAL] Test performance of basic Apache Beam operations

2018-09-11 Thread Etienne Chauchot
Hi Lukasz, Well, having low level byte[] based pure performance tests makes sense. And having high level realistic model (Nexmark auction system) makes sense also to avoid testing unrealistic pipelines as you describe. Have common code between the 2 seems difficult as both the architecture and

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-11 Thread dharmendra pratap singh
Hello Team, Does this mean, as of today we can read from Hadoop FS but can't write to Hadoop FS using Beam HDFS API ? Regards Dharmendra On Thu, Sep 6, 2018 at 8:54 PM Alexey Romanenko wrote: > Hello everyone, > > I’d like to discuss the following topic (see below) with community since > the

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #168

2018-09-11 Thread Apache Jenkins Server
See Changes: [robertwb] Update container versions of NumPy and TensorFlow. [lcwik] [BEAM-5149] Add support for the Java SDK harness to merge windows. [lcwik] Address PR comments. [lcwik]

Re: PTransforms and Fusion

2018-09-11 Thread Robert Bradshaw
For (A), it really boils down to the question of what is a legal pipeline. A1 takes the position that all empty transforms must be on a whitelist (which implies B1, unless we make the whitelist extensible, which starts to sound a lot like B3). Presumably if we want to support B2, we cannot remove