Re: AutoValue objects can now be used in Beam!

2019-01-08 Thread Kenneth Knowles
Nice! This has been requested many times for years. Great to finally have it so we can encourage good Java practices. Kenn On Tue, Jan 8, 2019 at 1:35 PM Pablo Estrada wrote: > Very cool. Thanks Jeff & Reuven : ) > > On Tue, Jan 8, 2019 at 12:00 PM Reuven Lax wrote: > >> Thanks to help from Je

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Kenneth Knowles
I think @Internal would be a reasonable annotation to exempt from documentation, as that means it is explicitly *not* part of the actual public API, as Ismaël alluded to. (I'm still on the docs-on-private-too side of things, but realize that's an extreme position) It is a shame that we chose black

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Ruoyun Huang
To Ismael's question: When applying such a check (i.e. public method with >30 Loc), our code base shows in total 115 violations. Thanks for the feedback everyone. As some of you mentioned already, suppress warning is always available whenever contributor/reviewer feels appropriate, instead of bee

Re: AutoValue objects can now be used in Beam!

2019-01-08 Thread Pablo Estrada
Very cool. Thanks Jeff & Reuven : ) On Tue, Jan 8, 2019 at 12:00 PM Reuven Lax wrote: > Thanks to help from Jeff Klukas, I just merged PR/7334 which adds schema > support to AutoValue objects. > > in particular, since Beam knows how to encode any type with a schema, this > means that AutoValue o

[DISCUSS] (Forked thread) Beam issue triage & assignees

2019-01-08 Thread Kenneth Knowles
Forking discussion on - Some folks have 100+ issues assigned - Jira components might not be the best way to get things triaged I just ran a built-in Jira report to summarize how many tickets are claimed by different users [1]. It does look like component owners tend to accumulate issues and the

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Burke
Schemas allow the runner to know the structure of the data they're manipulating, so if a value is schema encoded, then the runner can manipulate it, including selection and aggregation. In essense, it allows a beam to handle the "common and boring but useful" parts of pipelines agnostic of an SDK l

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Reuven Lax
I wonder if we could do this _only_ over the FnApi. The FnApi already does batching I believe. What if we made schemas a fundamental part of our protos, and had no SchemaCoder. The FnApi could then batch up a bunch of rows an encode using Arrow before sending over the wire to the harness. Of cours

AutoValue objects can now be used in Beam!

2019-01-08 Thread Reuven Lax
Thanks to help from Jeff Klukas, I just merged PR/7334 which adds schema support to AutoValue objects. in particular, since Beam knows how to encode any type with a schema, this means that AutoValue objects can now be used inside of PCollections, which has been a long-requested feature. The simpl

Re: Add code quality checks to pre-commits.

2019-01-08 Thread Mikhail Gryzykhin
Hi everyone, I've summarized discussion so far into proposal doc , so that it gets easier to comment upon. Please add your comments. Link explicitly: https://docs.google.com/document/d/1YbV18mrHujmiLBtadS1WzCVeiI3Lo

[Call for items] January Beam Newsletter

2019-01-08 Thread Rose Nguyen
Hi Beamers: Time for the *first *newsletter of the year! 🎉 *Add to [1] the highlights from December to now (or planned events and talks) that you want to share by 01/11 11:59 p.m. PDT.* We will collect the notes via Google docs but send out the final version directly to the user mailing list. If

Re: Add all tests to release validation

2019-01-08 Thread Ahmet Altay
On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles wrote: > > > On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner wrote: > >> For reference, there are currently 34 unresolved JIRA issues under the >> test-failures component [1]. >> >> [1] >> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Kenneth Knowles
On Tue, Jan 8, 2019 at 7:44 AM Robert Bradshaw wrote: > On Tue, Jan 8, 2019 at 4:32 PM Reuven Lax wrote: > > > > Also while columnar can be a large perf win, I suspect that we currently > have lower-hanging fruit to optimize when it comes to performance. > > It's probably a bigger win for Python

Re: Add code quality checks to pre-commits.

2019-01-08 Thread Michael Luckey
Currently I d opt for just enabling jacocoTestReport on javaPreCommit. Although this will disable the build cache, it seems we could consider that to be effectively disabled anyway. E.g. looking into https://scans.gradle.com/s/eglskrakojhrm/performance/buildCache we see a local cache miss rate of

Re: Add code quality checks to pre-commits.

2019-01-08 Thread Maximilian Michels
I don't think it would be unreasonable to disable parallel build for code coverage if necessary, perhaps in a separate Jenkins job if it significantly increases job time. +1 A separate coverage job would be helpful. It would be fine if it ran longer than the regular PreCommit. On 08.01.19 1

Re: Add all tests to release validation

2019-01-08 Thread Kenneth Knowles
On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner wrote: > For reference, there are currently 34 unresolved JIRA issues under the > test-failures component [1]. > > [1] > https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Ismaël Mejía
-0 Same comments than Robert I am particularly worried on how this affect contributors in particular casual ones. Even if the intended idea is good I am also worried that people just write poor comments to get rid of the annoyance. Have you already estimated how hard is the current codebase impac

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Kenneth Knowles
+1 I even thought this was already on (at some point). On Tue, Jan 8, 2019 at 8:01 AM Scott Wegner wrote: > I would even propose applying this to non-public methods, but I suspect > that would be more controversial. > I also would support this. It will improve code quality as well. Often missin

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Scott Wegner
+1. I didn't realize checkstyle had the ability to apply policy based on line count. Line count is a fitting heuristic for complexity, so the policy is "complex method need documentation". I would even propose applying this to non-public methods, but I suspect that would be more controversial. I s

Re: Add all tests to release validation

2019-01-08 Thread Scott Wegner
+1; this essentially converts flaky automated tests into manual release tests until the automation gets fixed. It's an improvement over the current behavior of simply disabling tests, because when tests are disabled the quality signal is lost. This also creates a stronger incentive to fix tests: fi

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Tue, Jan 8, 2019 at 4:32 PM Reuven Lax wrote: > > I agree with this, but I think it's a significant rethinking of Beam that I > didn't want to couple to schemas. In addition to rethinking the API, it might > also require rethinking all of our runners. We're already marshaling (including batc

Re: Add code quality checks to pre-commits.

2019-01-08 Thread Scott Wegner
I started a PR to see what it would take to upgrade to Gradle 5.0: https://github.com/apache/beam/pull/7402 It seems the main blocker it gogradle plugin compatibility for the Go SDK. The gogradle project is actively working on compatibility, so perhaps we could check back in a month or so: https:/

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Reuven Lax
I agree with this, but I think it's a significant rethinking of Beam that I didn't want to couple to schemas. In addition to rethinking the API, it might also require rethinking all of our runners. Also while columnar can be a large perf win, I suspect that we currently have lower-hanging fruit to

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Fri, Jan 4, 2019 at 12:54 AM Reuven Lax wrote: > > I looked at Apache Arrow as a potential serialization format for Row coders. > At the time it didn't seem a perfect fit - Beam's programming model is > record-at-a-time, and Arrow is optimized for large batches of records (while > Beam has a

Re: [Go SDK] User Defined Coders

2019-01-08 Thread Robert Bradshaw
On Fri, Jan 4, 2019 at 7:05 PM Kenneth Knowles wrote: > > On Thu, Jan 3, 2019 at 4:33 PM Reuven Lax wrote: >> >> If a user wants custom encoding for a primitive type, they can create a >> byte-array field and wrap that field with a Coder I don't think the primary use of coders is a custom encod

Re: Enforce javadoc comments in public methods?

2019-01-08 Thread Robert Bradshaw
With the clarification that we're looking at the intersection of public + "big", I think this is a great idea. We should make it clear that this is a lower bar--many private or shorter methods merit documentation as well (but that's harder to automatically detect). The one difficulty with a thresho