Re: [VOTE] Release 0.4.0-incubating, release candidate #3

2016-12-19 Thread Dan Halperin
the project, plus example ITs across all runners >> [2], [3]. >> * All integration tests on the Apex runner [4]. >> * All integration tests on the Flink runner [5]. >> * All integration tests on the Spark runner [6]. >> * All integration tests on the Dataflow runner [7]. >&g

Re: [VOTE] Release 0.4.0-incubating, release candidate #1

2016-12-15 Thread Dan Halperin
I think JB and Davor are right, have heard other supporting votes, and I haven't heard any specific disagreement. My understanding of rules/procedure is that JB as release manager is free to cancel the vote right now and begin RC2 when he is ready. On Thu, Dec 15, 2016 at 11:36 AM, Jean-Baptiste

Re: [component] tag in JIRA tickets

2016-12-15 Thread Dan Halperin
Amit, I think you bring up a wonderful point. Release notes are hard to grok right now. I wonder if we can expose the component name (which issues are already tagged with) in a custom release notes template? https://developer. atlassian.com/jiradev/jira-platform/jira-architecture/ jira-templates-a

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1805

2016-12-13 Thread Dan Halperin
If you look at the console output, we are retrying: [WARNING] Upload attempt failed, sleeping before retrying staging of classpath: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_RunnableOnService_Dataflow/.repository/com/google/auth/google-auth-library-credentials/0.6.0/google-auth-l

Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Dan Halperin
Update: we think we've knocked off all the 0.4.0-incubating blockers, including postponing some. JB is going to start the release process soon! On Sat, Dec 3, 2016 at 10:42 PM, Jean-Baptiste Onofré wrote: > Very good point Frances. > > Definitely something we have to do. > > Regards > JB > > > O

Re: Jenkins pre/postcommit increased from 35m to 60m+ on Friday

2016-12-13 Thread Dan Halperin
to > Jenkins, should reveal more. > > Meanwhile - any known issues with Jenkins or Maven Central? Status > dashboard for Maven Central doesn't look unhappy. > > On Mon, Dec 12, 2016 at 6:25 PM, Dan Halperin > > wrote: > > > From the "bad run", the Maven pa

Re: Jenkins pre/postcommit increased from 35m to 60m+ on Friday

2016-12-12 Thread Dan Halperin
>From the "bad run", the Maven part took 35 minutes and presumably the rest is Jenkins / Maven / downloading overhead. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Beam :: Parent .. SUCCESS

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Dan Halperin
I did not expect this to be merged until after we'd confirmed there were no more major changes to be made, or that they were all "ready to go". Are there any? If so, we should block the next release. On Fri, Dec 9, 2016 at 1:58 AM, Kenneth Knowles wrote: > Thanks all! This has been done. > > On

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-07 Thread Dan Halperin
+user@, because this is a user-impacting change and they might not all be paying attention to the dev@ list. +1 I'm mildly reluctant because this will break all users that have written composite transforms -- and I'm the jerk that filed the issue (a few times now, on different iterations of the S

Meet up at Strata+Hadoop World in Singapore

2016-11-29 Thread Dan Halperin
Hey folks, Who will be attending Strata+Hadoop World next week in Singapore? Tyler and I will be there, giving a Beam tutorial [0] and some talks [2,3]. I'd love to sync in person with anyone who wants to talk Beam. Please reach out to me directly if you'd like to meet. Thanks! Dan [0] http://c

Re: UnboundedSource backlog num events

2016-11-29 Thread Dan Halperin
Hi Aviem, Another good question. There's no strong reason why not have Count in addition to Bytes. Practically, in the Dataflow runner we found bytes to be the best signal here. I won't go deeply into why, but two intuitions: * Beam is designed to enable runners to minimize the per-element overhe

Re: Question regarding UnboundedReader#getTotalBacklogBytes

2016-11-29 Thread Dan Halperin
Hi Aviem, A great question! The two backlog methods (getSplitBacklogBytes () and getTotalBacklogBytes

Re: Jenkins build became unstable: beam_PostCommit_MavenVerify #1838

2016-11-17 Thread Dan Halperin
Filed https://issues.apache.org/jira/browse/BEAM-999 on tgroh@; likely caused by https://github.com/apache/incubator-beam/pull/1254 This is flaking a little, but mostly green, so not rolling back. On Thu, Nov 17, 2016 at 1:59 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: Jenkins skipping PreCommit for a PR caused build failures on master.

2016-11-15 Thread Dan Halperin
The command is apache-rat:check, not rat:check. http://creadur.apache.org/rat/apache-rat-plugin/ rat:check is using the (very old) Maven2 plugin for RAT from the now defunct org.codehaus: https://mvnrepository.com/artifact/org.codehaus.mojo/rat-maven-plugin/1.0-alpha-3 Dan On Wed, Nov 16, 2016 a

Re: Beam dinner tonight

2016-11-15 Thread Dan Halperin
Hi Yonglong, First, sorry for spamming the dev list. It may not be the best thing to spam the entire community when 10 of us are at a conference ... but since Beam is a new project, this conference is the first time we're all meeting each other. Second, no we are not in CA currently -- we are at

Beam dinner tonight

2016-11-15 Thread Dan Halperin
Meet in the lobby by the front door at 8; we will walk somewhere good to eat. Dan

Re: Configuring Jenkins

2016-11-15 Thread Dan Halperin
Seems phenomenal! Reading between the lines of your email, it sounds like changes to Jenkins configuration will not actually be exercised on the PR that makes them. So, we still need to work out a process of how we test changes that would affect Jenkins config. (That does not take away from the f

Re: Jenkins build became unstable: beam_PostCommit_RunnableOnService_FlinkLocal #813

2016-11-11 Thread Dan Halperin
Basically, what you did is perfect IMO. To pin it down, I think I've roughly been following this procedure: 1. Notice break and email list "I'm investigating", maybe plus in some relevant people. 2. Attempt to identify cause of break, file JIRA. If cannot diagnose relatively quickly, file JIRA wi

Meet up at ApacheCon Seville

2016-11-10 Thread Dan Halperin
Hey folks, Who will be attending Apache Big Data / ApacheCon next week in Sevilla? JB and I will be there to give a Beam talk Wednesday morning; I'm around all week. I'd love to sync in person with anyone who wants to talk Beam. Please reach out to me directly if you'd like to meet. Thanks! Dan

Re: [jira] [Created] (BEAM-961) CountingInput could have starting number

2016-11-10 Thread Dan Halperin
Why not support this in a follow-on pardo that shifts the range? On Thu, Nov 10, 2016 at 1:22 PM, Kenneth Knowles (JIRA) wrote: > Kenneth Knowles created BEAM-961: > > > Summary: CountingInput could have starting number > Key: BE

Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-08 Thread Dan Halperin
Nice! +1 On Tue, Nov 8, 2016 at 10:06 AM, Thomas Groh wrote: > +1. Sweet (and congratulations) > > On Tue, Nov 8, 2016 at 9:57 AM, Kenneth Knowles > wrote: > > > +1, with enthusiasm. > > > > On Tue, Nov 8, 2016 at 9:16 AM, Davor Bonaci > > wrote: > > > > > +1 > > > > > > I'd treat this as an o

Re: Timer and Window behavior

2016-11-07 Thread Dan Halperin
Good bug catch! Thanks! I would add that your test reader is not at all guaranteed to work in Beam. It is only correct if the reader is never restarted from checkpoint. Otherwise, when it is restarted from checkpoint it will reset `firstStarted` and the `current` counter. To be properly correct,

Re: incubator-beam git commit: fixup! spark pom.xml: limit parallelism in runnable-on-service tests

2016-11-02 Thread Dan Halperin
org/repos/asf/incubator-beam/diff/f2637d74 > > Branch: refs/heads/spark-ros > Commit: f2637d74500bb33e6393ea446d49f2591dbe7632 > Parents: 6e1652a > Author: Dan Halperin > Authored: Wed Nov 2 09:33:28 2016 -0700 > Comm

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Dan Halperin
tely we haven't yet centralized this functionality into > TestPipeline or thereabouts. > > On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin wrote: > >> +Kenn >> >> I believe this is done because if there is no output, no assertions will >> be run and tests will fail

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Dan Halperin
+Kenn I believe this is done because if there is no output, no assertions will be run and tests will fail silently. (This was a side effect of switching from side inputs to groupbykey for this flow, which enabled testing of triggers/panes/etc.) On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela wrote: >

Re: Podling Report Reminder - November 2016

2016-11-01 Thread Dan Halperin
Minor suggestions inline: On Tue, Nov 1, 2016 at 4:12 PM, James Malone wrote: > Howdy, > > Sorry for being delayed; here is a proposal for our podling report! > > James > > --- > > Beam > > Apache Beam is an open source, unified model and set of language-specific > SDKs for defining and execut

Re: [ANNOUNCE] Beam 0.3.0-incubating Released

2016-10-31 Thread Dan Halperin
Wow! This is awesome, thanks Aljoscha. And congrats on the first release where RC1 went out successfully ;) Dan On Mon, Oct 31, 2016 at 9:36 AM, Aljoscha Krettek wrote: > Congratulations, team! I just finalised everything for the most recent > release. The artefacts are on Maven, the website is

Re: Intro + getting started

2016-10-28 Thread Dan Halperin
Hey Nick, Awesome! Welcome. http://beam.incubator.apache.org/contribute/contribution-guide/ is the place to start (have you seen it yet? if so, send more specific questions?) Dan On Fri, Oct 28, 2016 at 1:33 PM, Nick Travers wrote: > Hi Beamers, > > I've been following along the lists for a w

Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Dan Halperin
t; wrote: > >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré > >> > > wrote: > >> > >> And what about use RemoveDuplicates and create an alias Distinct > >? > >> > > > >> > >I'd really like to avoid (lo

Re: Apex runner status and next steps

2016-10-27 Thread Dan Halperin
I would add (explicitly, though this may be implicit or already supported) that Apex should also be able to run the precommit WordCountIT/WindowedWordCountIT that execute on all runners. https://github.com/apache/incubator-beam/blob/master/examples/java/pom.xml#L42 and https://github.com/apache/in

Re: GitHub mirroring issue

2016-10-26 Thread Dan Halperin
(Sometimes this happens even when there is not a systemic issue: I have seen github mirroring fail if two things are merged close together, but usually the bot "magically" fixes it on the next commit.) Dan On Wed, Oct 26, 2016 at 1:40 PM, Amit Sela wrote: > Thanks! > > On Wed, Oct 26, 2016, 23:

Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-25 Thread Dan Halperin
oking at those same threads when I reviewing the artefacts. > The release was already close to being finished so I went through with it > but if we think it's not good to have them in we should quickly cancel in > favour of a new RC without a published Kinesis connector. > > On T

Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-25 Thread Dan Halperin
I can't tell whether it is a problem that we are distributing the beam-sdks-java-io-kinesis module [0]. Here is the dev@ discussion thread [1] and the (unanswered) relevant LEGAL thread [2]. We linked through to a Spark-related discussion [3], and here is how to disable distribution of the Kinesis

Re: [DISCUSS] Using Verbs for Transforms

2016-10-24 Thread Dan Halperin
I find "MakeDistinct" more confusing. My votes in decreasing preference: 1. Keep `RemoveDuplicates` name, ensure that important keywords are in the Javadoc. This reduces churn on our users and is honestly pretty dang descriptive. 2. Rename to `Distinct`, which is clear if you're a SQL user and li

Re: Maven Release Plugin Does Not Update Version of Archetypes

2016-10-24 Thread Dan Halperin
mit%2F1f30255edcdd9c1e445b69248191c8552724f086%23diff-4795b1d27449c01332aad192348eL111&sa=D&sntz=1&usg=AFQjCNGOYTW7DSiNZuGnOKWuHhggzsnztQ> Thinking if we can revert this part of the commit. Pei, Luke -- remember what's up? On Mon, Oct 24, 2016 at 11:17 AM, Dan Halperin wrote: > Would it unblock the rel

Re: Maven Release Plugin Does Not Update Version of Archetypes

2016-10-24 Thread Dan Halperin
Would it unblock the release to manually configure the version in the 0.3.0-release branch? On Mon, Oct 24, 2016 at 11:09 AM, Dan Halperin wrote: > Correct issue link: https://issues.apache.org/jira/browse/BEAM-806 > > No answers, but looking around. > > On Mon, Oct 24, 2

Re: Maven Release Plugin Does Not Update Version of Archetypes

2016-10-24 Thread Dan Halperin
Correct issue link: https://issues.apache.org/jira/browse/BEAM-806 No answers, but looking around. On Mon, Oct 24, 2016 at 10:10 AM, Aljoscha Krettek wrote: > Hi, > are there any Maven mavens who happen to know how > https://issues.apache.org/jira/browse/BEAM-108 can be fixed? By the way, > the

Re: Start of release 0.3.0-incubating

2016-10-24 Thread Dan Halperin
http://karaf.apache.org/download.html#container-schedule for instance). > > Just my $0.01 ;) > > Regards > JB > > > On 10/20/2016 06:30 PM, Dan Halperin wrote: > >> Hi JB, >> >> This is a great discussion to have! IMO, there's no special functionality

Tracking backward-incompatible changes for Beam

2016-10-20 Thread Dan Halperin
Hey everyone, In the Beam codebase, we’ve improved, rewritten, or deleted many APIs. While this has improved the model and gives us great freedom to experiment, we are also causing churn on users authoring Beam libraries and pipelines. To really kick off Beam as something users can depend on, we

Re: Release Guide

2016-10-20 Thread Dan Halperin
Now published at http://beam.incubator.apache.org/contribute/release-guide/ Thanks! Dan On Thu, Oct 20, 2016 at 10:06 AM, Kenneth Knowles wrote: > This is really nice. Very readable and streamlined. > > On Thu, Oct 20, 2016 at 7:44 AM Aljoscha Krettek > wrote: > > > Hi, > > thanks for taking t

Re: Start of release 0.3.0-incubating

2016-10-20 Thread Dan Halperin
Onofré wrote: > +1 > > Thanks Aljosha !! > > Do you mind to wait the week end or Monday to start the release ? I would > like to include MqttIO if possible. > > Thanks ! > Regards > JB > > ⁣​ > > On Oct 20, 2016, 18:07, at 18:07, Dan Halperin > wrote: &g

Start of release 0.3.0-incubating

2016-10-20 Thread Dan Halperin
On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek wrote: > Hi, > thanks for taking the time and writing this extensive doc! > > If no-one is against this I would like to be the release manager for the > next (0.3.0-incubating) release. I would work with the guide and update it > with anything t

Re: Placement of temporary files by FileBasedSink

2016-10-20 Thread Dan Halperin
This thread is conflating many issues. * Putting temp files where they will not match the glob for the desired output files * Dealing with eventually-consistent filesystems (s3, GCS, ...) * Properly cleaning up all temp files They all need to get solved, but for now I think we only need to solve

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Dan Halperin
ground*: > > > > >>The KafkaIO waits (5 seconds) before starting to read, and (10 > > millis) > > > > >>between advancing the reader, which is problematic for the Spark > > runner > > > > >>as > > > > >>it might att

Re: Exploring Performance Testing

2016-10-18 Thread Dan Halperin
I think there are lots of excellent one-off performance studies, but I'm not sure how useful that is to Beam. >From a test infra point of view, I'm wondering more about tracking of performance over time, identifying regressions, etc. Google has some tools like PerfKit

Re: Jenkins build became unstable: beam_PostCommit_MavenVerify #1525

2016-10-13 Thread Dan Halperin
Filed https://issues.apache.org/jira/browse/BEAM-747 On Thu, Oct 13, 2016 at 5:33 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See > >

Re: Specifying type arguments for generic PTransform builders

2016-10-13 Thread Dan Halperin
For #3 -- I think we should be VERY careful there. You need to be absolutely certain that there will never, ever be another alternative to your mandatory argument. For example, you build an option to read from a DB, so you supply a .from(String query). Then later, you want to add reading just a tab

Re: Jenkins build is back to stable : beam_PostCommit_MavenVerify » Apache Beam :: Examples :: Java #1503

2016-10-12 Thread Dan Halperin
Just an FYI that the issues here were legitimate issues in an external service that have since been resolved. They were present for approximately 90 minutes in a small set of places, and we were affected :) On Tue, Oct 11, 2016 at 7:37 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote:

Re: [DISCUSS] UnboundedSource and the KafkaIO.

2016-10-07 Thread Dan Halperin
Thanks Amit! A little bit inline. On Fri, Oct 7, 2016 at 4:55 PM, Amit Sela wrote: > I started a thread about (suggesting) UnboundedSource splitId's and it > turned into an UnboundedSource/KafkaIO discussion, and I think it's best to > start over in a clear [DISCUSS] thread. > > When working on

Re: [REMINDER] Technical discussion on the mailing list

2016-10-05 Thread Dan Halperin
On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp wrote: > I just want to give a little more context to this…. I’ve been lurking on > this list for several months now reading everything that’s going on. From > Apache’s standpoint, that should be a “very good start” for getting to know > what is happ

Re: Preferred locations (or data locality) for batch pipelines.

2016-10-03 Thread Dan Halperin
bout obtaining the locations of the input splits, and passing > them to the runners to choose how to use them. > > I wonder if there's a need for that besides the Spark runner though, it's > only for batch.. I opened https://issues.apache.org/jira/browse/BEAM-673 > as >

We've hit 1000 PRs!

2016-09-26 Thread Dan Halperin
Hey folks! Just wanted to send out a note -- we've hit 1000 PRs in GitHub as of Saturday! That's a tremendous amount of work for the 7 months since PR#1. I bet we hit 2000 in much fewer than 7 months ;) Dan

Re: Preferred locations (or data locality) for batch pipelines.

2016-09-26 Thread Dan Halperin
Hi Amit, Sorry to be late to the thread, but I've been traveling. I'm not sure I fully grokked the question, but here's one attempt at an answer: In general, any options on where a pipeline is executed should be runner-specific. One example: for Dataflow, we have the zone

Re: Jenkins build failing

2016-09-26 Thread Dan Halperin
Hi JB, could you file a JIRA and follow up there? I don't want someone else to come along and start working on the same issue :) On Mon, Sep 26, 2016 at 10:04 AM, Jean-Baptiste Onofré wrote: > By the way, the issue is: > > java.lang.NoClassDefFoundError: Could not initialize class > com.google.c

Re: Issues with simple KafkaIO-read pipeline -- where to write?

2016-09-19 Thread Dan Halperin
+dev On Mon, Sep 19, 2016 at 4:37 PM, Dan Halperin wrote: > Hey folks, > > Sorry for the confusion around sinks. Let me see if I can clear things up. > > In Beam, a Source+Reader is a very integral part of the model. A source is > the root of a pipeline and it is where runn

FYI - out until Monday

2016-09-15 Thread Dan Halperin
I (along with several of my Google colleagues) will be completely off the grid through the weekend. Thanks, Dan

Re: Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1151

2016-09-15 Thread Dan Halperin
(filed https://issues.apache.org/jira/browse/BEAM-632) On Thu, Sep 15, 2016 at 12:14 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See GoogleCloudDataflow/changes> > >

Re: Anyone @scale tomorrow?

2016-09-09 Thread Dan Halperin
almost two hours afterwards. If you weren't able to attend, here is a video of the talk (warning: I haven't watched it...) https://atscaleconference.com/videos/no-shard-left-behind-apis-for-massive-parallel-efficiency/ Thanks, Dan On Tue, Aug 30, 2016 at 7:32 PM, Dan Halperin wrote: > I

Re: Warning about Kinesis IO license

2016-09-05 Thread Dan Halperin
Thanks, JB. Sorry, this is my fault for not catching this in review. It looks like other Apache products simply put the ASL dependency in a profile. E.g., https://github.com/apache/spark/blob/master/pom.xml#L2428 But, they do still distribute the modules in Maven: https://search.maven.org/#search

Anyone @scale tomorrow?

2016-08-30 Thread Dan Halperin
I'll be giving a talk at the Facebook @scale conference tomorrow. Sorry for the late notice, but if anyone is around to meet in the hallway track or have lunch or drinks, reach out. I'd love to connect. Dan

Re: Remove legacy import-order?

2016-08-24 Thread Dan Halperin
; > > > > > > https://github.com/apache/incubator-beam/pull/869 > > > > > > > > > > > > > > I would need to update the second part (applying optimize > > imports) > > > > > prior > > > > > > to > > >

Re: Remove legacy import-order?

2016-08-23 Thread Dan Halperin
yeah I think that we would be SO MUCH better off if we worked with an out-of-the-box IDE. We don't even distribute an IntelliJ/Eclipse config file right now, and I'd like to not have to. But, ugh, it will mess up ongoing PRs. I guess committers could fix them in merge, or we could just make propos

Re: java.io.NotSerializableException: org.apache.kafka.common.TopicPartition

2016-08-21 Thread Dan Halperin
Explicit +Raghu On Fri, Aug 19, 2016 at 4:24 PM, Chawla,Sumit wrote: > Hi All > > I am trying to use KafkaIO as unbounded source, but the translation is > failing. I am using FlinkRunner for the pipe. It complains about > the org.apache.kafka.common.TopicPartition being not-serializable. > > p

Re: Dev environment set up

2016-08-16 Thread Dan Halperin
There is a hack that I've been using in IntelliJ, since that Maven config does not seem to being picked up correctly: If you go to Edit Configurations > Default > JUnit then you can set it to "Use classpath of module direct-runner" (the -DbeamUseDummyRunner=false may or may not also be necessary)

Re: Beam Testing Guide on Website

2016-08-12 Thread Dan Halperin
This is pretty nice. The pointer to DoFnTester, for instance, is really useful. No need to run a whole pipeline just to test your DoFn! I'll be leaving some comments in the PR. On Fri, Aug 12, 2016 at 11:46 AM, Jason Kuster < jasonkus...@google.com.invalid> wrote: > Hi Beam Community, > > I've j

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-08-12 Thread Dan Halperin
This is pretty cool! I'll be there too. (unless the hangout gets too full -- if so, I'll drop out in favor of others who aren't lucky enough to get to talk to Eugene all the time.) On Fri, Aug 12, 2016 at 4:03 PM, Andrew Psaltis wrote: > +1 I'll join > > On Friday, August 12, 2016, Aparup Banerj

Re: Jenkins build is back to normal : beam_Release_NightlySnapshot #131

2016-08-10 Thread Dan Halperin
*sigh* All I did was clear the workspace and kick the job -- it seems some intermediate build created a heap dump and then that caused all future builds to fail on Apache RAT. It would be nice to be able to prevent this type of persistent failure from happening in the future. On Wed, Aug 10, 201

[RESULT] Release Apache Beam, version 0.2.0-incubating

2016-08-07 Thread Dan Halperin
I am happy to announce that the Incubator PMC has approved version 0.2.0-incubating-RC2 of Apache Beam for release as version 0.2.0-incubating. There have been 6 binding approval votes from the IPMC: * Jean-Baptiste Onofré * John D. Ament * Justin Mclean * P. Taylor Goetz * Seetharam Venkatesh *

Re: [VOTE] Release Apache Beam, version 0.2.0-incubating

2016-08-07 Thread Dan Halperin
n 0.2.0-incubating > > > > +1 (binding) > > > > - built from source > > - “incubating” in file name > > - NOTICE and LICENSE look good > > - license headers present > > - no wayward binaries > > - signatures check out > > > > -T

Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-04 Thread Dan Halperin
Can they share any substantial code? If not, they will really be separate runners. If so, would it make more sense to fork into runners/spark and runners/spark2? On Thu, Aug 4, 2016 at 9:33 AM, Ismaël Mejía wrote: > +1 > > In particular for three reasons: > > 1. The new DataSet API in spark 2

Re: [REFLECT] Beam’s Half Birthday!

2016-08-01 Thread Dan Halperin
+1 (binding? ;) On this part of the email: > > This half birthday is also a good chance to take a step back and reflect > on > > our goals for this year -- TLP graduation and the first stable release. > > Where are we on this path? What can we do better to accomplish these > > high-level goals?

[VOTE] Release Apache Beam, version 0.2.0-incubating

2016-08-01 Thread Dan Halperin
Hey folks! Here's the vote for the second release of Apache Beam: version 0.2.0-incubating. The complete staging area is available for your review, which includes: * the official Apache source release to be deployed to dist.apache.org [1], and * all artifacts to be deployed to the Maven Central R

Re: [RESULT] Release version 0.2.0-incubating

2016-07-31 Thread Dan Halperin
My apologies: a slight revision. We have 4 approving votes, including 3 binding votes. On Sun, Jul 31, 2016 at 12:29 PM, Dan Halperin wrote: > I'm happy to announce that we have unanimously approved this release. > > There are 3 binding approving votes: > * Dan Halperin > *

[RESULT] Release version 0.2.0-incubating

2016-07-31 Thread Dan Halperin
I'm happy to announce that we have unanimously approved this release. There are 4 approving votes, all binding: * Dan Halperin * Aljoscha Krettek * Jean-Baptiste Onofré * Amit Sela There are no disapproving votes. At this point, this proposal will be presented to the Apache Incubator for

Re: [VOTE] Release version 0.2.0-incubating

2016-07-31 Thread Dan Halperin
04/org/apache/beam/beam-parent/0.2.0-incubating/beam-parent-0.2.0-incubating-source-release.zip.sha1 > >> > >> Regards > >> JB > >> > >> On 07/28/2016 02:57 PM, Jean-Baptiste Onofré wrote: > >> > +1 (binding) > >> > > >> &

Re: Suggestion for Writing Sink Implementation

2016-07-29 Thread Dan Halperin
The upcoming work on DoFn setup/teardown completely solves the issue in KafkaIO. You open the connection in setup, but it is used for multiple bundles. However, this is not in yet and so yes, the producer is opened/closed every bundle. BEAM-452 and https://github.com/apache/incubator-beam/pull/690

[VOTE] Release version 0.2.0-incubating

2016-07-28 Thread Dan Halperin
Hey folks! I'm excited to be kicking off the first vote for the second release of Apache Beam: version 0.2.0-incubating! As with 0.1.0-incubating, we are not looking for any specific new functionality. Instead, we're continuing to execute and refine the release process, as well as making stable s

Re: Podling Report Reminder - August 2016

2016-07-27 Thread Dan Halperin
+1 on all the above. On Wed, Jul 27, 2016 at 12:07 PM, Jean-Baptiste Onofré wrote: > Hi James, > > Sure, please go ahead. > > I propose to send the draft on the mailing list for review. When reviewed, > we will add on the incubator wiki (I will help you if you don't have the > permission to do s

Re: how to work on gearpump-runner branch

2016-07-26 Thread Dan Halperin
Hi Manu, Any time you want to merge master into your branch just send a PR -- any of us will be happy to review and merge but especially Kenn and I. (Python-sdk has been doing the same.) Dan On Tue, Jul 26, 2016 at 10:13 PM, Manu Zhang wrote: > Hi JB, > > Thanks. If my PR is based on master b

Re: Jenkins build is still unstable: beam_PostCommit_RunnableOnService_SparkLocal #12

2016-07-25 Thread Dan Halperin
ink you may need to put this into the > > section of the pom for it to get plumbed in the > > needed way. In searching about, I noticed that it is an internal system > > property, not documented (why not?), so we might also set spark.ui.port=0 > > to just get an arbitrary

Re: [Discuss] Beam SDK (Java) providing a shaded jar as a dependency

2016-07-25 Thread Dan Halperin
A few reactions: * Keeping tight control over your module's API surface is *critical* for your users. For example, Hadoop added a public dependency on an old version of Guava many years ago and it has really hurt the community ever since. Google "hadoop guava pain" is pretty instructive, and this

Re: Jenkins build is still unstable: beam_PostCommit_RunnableOnService_SparkLocal #12

2016-07-25 Thread Dan Halperin
Done. We'll see if that fixes things. If not, I'll turn off the build until I have more cycles to get it fixed up. Thanks Amit. On Sat, Jul 23, 2016 at 5:16 AM, Amit Sela wrote: > Not sure what's the setup here, but there seems to be issues with the ports > for the UI. > Generally we don't need

Re: Clojure SDK

2016-07-20 Thread Dan Halperin
hey Ted, Awesome -- glad to hear it and welcome! Looking forward to working with you (and learning about Clojure in the process). Just to verify some definitions: are you planning on implementing an entirely new SDK from scratch, or are you planning on writing a Clojure wrapper for the existing J

Re: New apache_beam branch ?

2016-07-12 Thread Dan Halperin
Hi JB, Actually, this is a good time for a process question. How do we clean this up -- do we have to file an INFRA ticket? Presumably we can't/shouldn't be able to just push a deletion of the branch. Thanks, Dan On Tue, Jul 12, 2016 at 9:39 AM, Jean-Baptiste Onofré wrote: > Hi Silviu, > > th

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Dan Halperin
uses > provided - because it fits them - what about other runners ? > > Hope this clarifies some of the questions here. > > Thanks, > Amit > > On Fri, Jul 8, 2016 at 12:52 AM Dan Halperin > wrote: > > > hey folks, > > > > In general, we should optim

Re: [DISCUSS] Spark runner packaging

2016-07-07 Thread Dan Halperin
hey folks, In general, we should optimize for running on clusters rather than running locally. Examples is a runner-independent module, with non-compile-time deps on runners. Most runners are currently listed as being runtime deps -- it sounds like that works, for most cases, but might not be the

Re: Starter Jiras

2016-06-27 Thread Dan Halperin
Hi Chandni, Great to hear! We have, exactly as you proposed, been tagging issues in JIRA as "starter" or "newbie". Check out this search: https://issues.apache.org/jira/browse/BEAM-372?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20labels%20in

Re: Scala DSL

2016-06-24 Thread Dan Halperin
On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin wrote: > On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi > wrote: > >> DSL is a pretty generic term.. >> > > I agree and am not married to it. Neville? > > >> The fact that scio uses Java SDK is an implementation

Re: Scala DSL

2016-06-24 Thread Dan Halperin
; existing Beam java SDK. My proposition is that scio will be called the > >>>>> Scala API because in the end this is what it is. I think the > confusion > >>>>> comes from the common definition of SDK which is normally an API + a > >>>>> Runtime. In

Re: Scala DSL

2016-06-24 Thread Dan Halperin
I don't think that sdks/scala is the right place -- scio is not a Beam Scala SDK; it wraps the existing Java SDK. Some options: * sdks/java/extensions (Scio builds on the Java SDK) -- mentally vetoed since Scio isn't an extension for the Java SDK, but rather a wrapper * dsls/java/scio (Scio is a

Re: [DISCUSS] PTransform.named vs. named apply

2016-06-23 Thread Dan Halperin
A little late... but yes! +1 On Wed, Jun 22, 2016 at 11:13 PM, Aljoscha Krettek wrote: > ±1 for the named apply > > On Thu, Jun 23, 2016, 07:07 Robert Bradshaw > wrote: > > > +1, I think it makes more sense to name the application of a transform > > rather than the transform itself. (Still mull

NB: Jenkins config change for integration tests

2016-06-20 Thread Dan Halperin
As part of Thomas Groh's recent changes dropping the word `Pipeline` from runner names, we needed to change the Jenkins config. (`TestDataflowPipelineRunner` -> `TestDataflowRunner`). If you are not synced to Apache master, beam_PreCommit_MavenVerify may fail like in this build: https://builds.apa

Re: [dev] Announcing 0.1.0-incubating release

2016-06-15 Thread Dan Halperin
r Bonaci > wrote: > > > > > Hi everyone, > > > I’m happy to announce that we have completed our first release – > version > > > 0.1.0-incubating is now available [1]. > > > > > > I'm thrilled about this -- it is an exciting milestone for t

Re: [VOTE] Release version 0.1.0-incubating

2016-06-09 Thread Dan Halperin
+1 (binding) per checklist 2.1, I decompressed the source-release zip and ensured that `mvn clean verify` passed. per 3.6, I confirmed that there are no binary files. I also did a few other miscellaneous checks. On Thu, Jun 9, 2016 at 8:48 AM, Kenneth Knowles wrote: > +1 (binding) > > Confirmed

Re: DoFn Reuse

2016-06-08 Thread Dan Halperin
On Wed, Jun 8, 2016 at 10:05 AM, Raghu Angadi wrote: > Such data loss can still occur if the worker dies after finishBundle() > returns, but before the consumption is committed. If the runner is correctly implemented, there will not be data loss in this case -- the runner should retry the bundl

Re: 0.1.0-incubating release

2016-06-07 Thread Dan Halperin
>> this > >> >>>> staging, and forward to the IPMC review. > >> >>>> > >> >>>> Thanks all and especially to Davor (to support me when I bother him > >> >>>> bunch of times a day ;)). > >> >&g

Re: One more streaming engine in OSS

2016-06-07 Thread Dan Halperin
Yep! Without having done any analysis of Heron itself, I'd say that we'd love to have a Beam-on-Heron runner as well! On Wed, May 25, 2016 at 2:30 PM, Seetharam Venkatesh < venkat...@innerzeal.com> wrote: > https://blog.twitter.com/2016/open-sourcing-twitter-heron > > More the merrier for Beam? :

Re: 0.1.0-incubating release

2016-06-07 Thread Dan Halperin
+2! This seems most concordant with other Apache products and the most future-proof. On Mon, Jun 6, 2016 at 9:35 PM, Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > > On 06/07/2016 02:49 AM, Davor Bonaci wrote: > >> After a few rounds of discussions and examining patterns of other >> proje

Dynamic work rebalancing for Beam

2016-05-18 Thread Dan Halperin
Hey folks, This morning, my colleagues Eugene & Malo posted *No shard left behind: dynamic work rebalancing in Google Cloud Dataflow *. This article discusses Cloud Dataflow’s sol

Re: BEAM-206

2016-05-17 Thread Dan Halperin
/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L554>> > for another > endpoint such as AWS S3?" - *New JIRA ticket has been created on this: > BEAM-284 <https://issues.apache.org/jira/browse/BEAM-284> Could you > please take a l

  1   2   >