Re: BEAM-6018: memory leak in thread pool instantiation

2018-11-08 Thread Dan Halperin
> > > On Thu, Nov 8, 2018 at 2:12 PM Udi Meiri wrote: > >> Both options risk delaying worker shutdown if the executor's shutdown() >> hasn't been called, which is I guess why the executor in GcsOptions.java >> creates daemon threads. >> > My guess (and it really is a guess at this point) is that t

Re: BEAM-6018: memory leak in thread pool instantiation

2018-11-08 Thread Dan Halperin
Hey Udi, Thanks for the commit comment . I'll try to dump any (old) mental context I have left.. We were trying to find the right point in a space of: * enough parallelism to speed things up - more than

Re: [PROPOSAL] Move sorting to sdks-java-core

2018-10-17 Thread Dan Halperin
On Wed, Oct 17, 2018 at 3:44 PM Kenneth Knowles wrote: > The runner can always just depend on the sorter to do it the legacy way by > class matching; it shouldn't incur other dependency penalties... but now > that I look briefly, the sorter depends on Hadoop bits. That seems a heavy > price to pa

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Dan Halperin
>From my perspective as a (non-Google) community member, huge +1. I don't see anything bad for the community about open sourcing more of the probably-most-used runner. While the DirectRunner is probably still the most referential implementation of Beam, can't hurt to see more working code. Other r

Re: Broken links to releases

2018-06-19 Thread Dan Halperin
Looks like JB removed these during release of 2.3.0: svn commit: r25111 - in /release/beam: ./ 0.1.0-incubating/ 0.2.0-incubating/ 0.3.0-incubating/ 0.4.0/ 0.5.0/ 0.6.0/ 2.0.0/ 2.1.0/ 2.1.1/ 2.2.0/ 2.3.0/ Author: jbonofre Date: Sat Feb 17 06:08:19 2018 New Revision: 25111 Log: Publish 2.3.0 rele

Re: [VOTE] Code Review Process

2018-06-01 Thread Dan Halperin
+1 -- this is encoding what I previously thought the process was and what, in practice, I think was often the behavior of committers anyway. On Fri, Jun 1, 2018 at 12:21 PM, Yifan Zou wrote: > +1 > > On Fri, Jun 1, 2018 at 12:10 PM Robert Bradshaw > wrote: > >> +1 >> >> On Fri, Jun 1, 2018 at 1

Re: Gradle Status [April 6]

2018-04-09 Thread Dan Halperin
On Sat, Apr 7, 2018 at 12:43 Reuven Lax wrote: > So if I understand correctly, we've migrated all precommit, most > postcommits, and we have a working release process using Gradle. There are > a few bugs left, but at this pace it sounds like we're close to fully > migrated. > > I know that multip

Re: Gradle status

2018-03-22 Thread Dan Halperin
ween state has lasted so long, and there is it may be time. Dan > > Thanks, > Cham > > > On Thu, Mar 22, 2018 at 10:56 AM Romain Manni-Bucau > wrote: > >> >> >> Le 22 mars 2018 18:49, "Dan Halperin" a écrit : >> >> It seems that a

Re: Gradle status

2018-03-22 Thread Dan Halperin
It seems that a few groups are talking past each other. * A sizable contingent is interested in a move to Gradle -- it shows promise, but the work is incomplete. * Another contingent noticing the large burden of maintaining multiple build systems. FWICT, both test suites have been broken quite a l

Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Dan Halperin
Looks like it was a good talk! Why is it Google Confidential & Proprietary, though? Dan On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov wrote: > Hey all, > > The slides for my yesterday's talk at Strata San Jose https://conferences. > oreilly.com/strata/strata-ca/public/schedule/detail/63696

Re: [INFO] Build fails on GCP IO (Spanner)

2017-05-29 Thread Dan Halperin
This looks like somewhere the unit tests are inferring a project from the environment when they should not be doing so. On Mon, May 29, 2017 at 8:38 AM, Jean-Baptiste Onofré wrote: > Gonna try to purge my local m2 repo. > > Regards > JB > > > On 05/29/2017 08:05 AM, Jean-Baptiste Onofré wrote: >

Re: graph generator?

2017-05-25 Thread Dan Halperin
I think that a util that converted from the Runner API definition of a pipeline into some sort of graph format (like DOT?) would be generally useful. By using the Runner API, the tool would be SDK- and Runner-independent view of the pipeline. On Thu, May 25, 2017 at 10:54 AM, Jean-Baptiste Onofré

Re: Behavior of Top.Largest

2017-05-21 Thread Dan Halperin
I think this is an unrealistic request -- Python and Java workflows are completely different, and Python developer documentation is especially abysmal. (E.g., I had to have Robert sit with me to get the Python SDK to work at all on my developer machine, and even then I gave up and chmod-ed my mach

Re: First stable release completed!

2017-05-17 Thread Dan Halperin
Great job, folks. What an amazing amount of work, and I'd like to especially thank the community for participating in hackathons and extensive release validation over the last few weeks! We caught some crucial issues in time and really pushed a much better release as a result. Thanks everyone! Dan

Re: [VOTE] First stable release: release candidate #4

2017-05-15 Thread Dan Halperin
+1 In addition to the review for RC2/RC3 notes in the Acceptance Criteria doc, I've manually verified that RC4 releases staged on dist.apache.org do not include binary artifacts :). Thanks everyone! On Mon, May 15, 2017 at 11:54 PM, Pei HE wrote: > +1 > > Given users can run WordCount with inp

Re: TextIO and .withWindowedWrites() - filenamepolicy

2017-05-12 Thread Dan Halperin
t;>>> Another idea - we can extend the existing pattern that >>>>> DefaultFileNamePolicy understands to include windows. >>>>> >>>>> Today it replaces SSS with the shard, and NNN with the number of >>>>> >>>> shards >>

Re: First stable release: Acceptance criteria

2017-05-11 Thread Dan Halperin
I'm focusing on: * user reported bugs (Avro, TextIO, MongoDb) * the actual Apache Release criteria (licensing, dependencies, etc.) On Thu, May 11, 2017 at 3:04 PM, Lukasz Cwik wrote: > I have been trying out various Python scenarios on Windows. > > On Thu, May 11, 2017 at 3:01 PM, Jason Kuster

Re: TextIO and .withWindowedWrites() - filenamepolicy

2017-05-11 Thread Dan Halperin
(we should probably throw an exception at construction time in the various FileBasedSinks if you use WindowedWrites and the default filename policy though, that's a no-brainer and it's backwards-compatible.) On Thu, May 11, 2017 at 8:41 AM, Dan Halperin wrote: > +Eugene, Reuven who

Re: TextIO and .withWindowedWrites() - filenamepolicy

2017-05-11 Thread Dan Halperin
+Eugene, Reuven who reviewed and implemented this code. They may have opinions. Note that changing the default filename policy would be backwards-incompatible, so this would either need to go into 2.0.0 (and a new RC3) or it would not go in. On Thu, May 11, 2017 at 8:36 AM, Borisa Zivkovic wrote

Re: Process for getting the first stable release out

2017-05-08 Thread Dan Halperin
>> > release branch. > >> > > >> > Davor > >> > > >> > [1] https://github.com/apache/beam/tree/release-2.0.0 > >> > > >> > On Fri, May 5, 2017 at 1:57 PM, Thomas Groh > > >> > wrote: > >> >

Re: Process for getting the first stable release out

2017-05-05 Thread Dan Halperin
I am +1 on cutting the branch, and the sentiment that we expect the first pancake will be not ready to serve customers. On Fri, May 5, 2017 at 11:40 AM, Kenneth Knowles wrote: > On Thu, May 4, 2017 at 12:07 PM, Davor Bonaci

Re: Slack Invites

2017-05-04 Thread Dan Halperin
My understanding is that if you use something like that plugin, and they detect it, Slack will ban you from new invites entirely or otherwise punish you. They want this friction for free projects so that there's pressure to pay. On Thu, May 4, 2017 at 9:18 AM, Jesse Anderson wrote: > Is possible

Re: Status of our CI tools

2017-04-30 Thread Dan Halperin
I think the confusion to new users is much worse than any temporary loss of functionality here. +1 * 100! On Fri, Apr 28, 2017 at 11:00 PM, Mingmin Xu wrote: > +1 > Have ignored TravisCI for some time as the failures are not related with > code/test issues. > > I still hope TravisCI could work w

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

2017-04-30 Thread Dan Halperin
Hi Shen, Most runners are expected to use `UnboundedReadFromBoundedSource` (in `runners-core-construction`) to convert a BoundedSource to an UnboundedSource if that is the semantics they need. As Eugene says, I suspect you can also get similar behavior with a SplittableDoFn. Dan On Sat, Apr 29,

Re: An Update on Jenkins

2017-04-26 Thread Dan Halperin
> If not, feel free to reply to this thread ... not. :) :( On Tue, Apr 25, 2017 at 9:58 PM, Jean-Baptiste Onofré wrote: > Thanks for the update ! > > Regards > JB > > On Apr 26, 2017, 05:51, at 05:51, Jason Kuster > > wrote: > >Hey folks, > > > >There have been a couple of different issues ov

Re: Naming of Combine.Globally

2017-04-18 Thread Dan Halperin
Great discussion! As Aljoscha says, Fold, Reduce, and Combine are all intertwined and not quite identical as we use them. Another simple but perhaps coy answer is that if you read the MapReduce paper by Dean and Ghemawat that started this all, they used "Map", "Reduce", and "Combine" (see section

Re: [DISCUSSION] PAssert success/failure count validation for all runners

2017-04-17 Thread Dan Halperin
, Apr 17, 2017 at 11:14 AM, Dan Halperin wrote: > I believe Pablo's existing proposal is here: https://lists.apache. > org/thread.html/CADJfNJBEuWYhhH1mzMwwvUL9Wv2HyFc8_E=9zYBKwUgT8ca1HA@mail. > gmail.com > > The idea is that we'll stick with the current design -- aggrega

Re: [DISCUSSION] PAssert success/failure count validation for all runners

2017-04-17 Thread Dan Halperin
I believe Pablo's existing proposal is here: https://lists.apache.org/thread.html/CADJfNJBEuWYhhH1mzMwwvUL9Wv2HyFc8_E=9zybkwugt8ca...@mail.gmail.com The idea is that we'll stick with the current design -- aggregator- (but now metric)-driven validation of PAssert. Runners that do not support these

Re: Join to external table

2017-04-14 Thread Dan Halperin
Hi Jingsong, This seems like a fantastic, reusable pattern to add, and indeed it's a fairly common one. There are probably some interesting API issues too -- such as how you make a nice clean interface that works for many backends (Bigtable? HBase? Redis? Memcache? etc.), and how you let users sup

Re: [DISCUSSION] Consistent use of loggers

2017-04-12 Thread Dan Halperin
), but we can't assure that future runners could > support this. > > So it seems we're left with: > 1) Add documentation around logging in each runner. > 2) Consider enabling a binding (JUL) for direct runner profile in examples > module and maven archetypes. > 3)

Re: Adding logging for RunnableOnService/ValidatesRunner tests

2017-04-04 Thread Dan Halperin
On Mon, Apr 3, 2017 at 6:21 PM, Pablo Estrada wrote: > Hello there, > I'm running RunnableOnService tests on the DirectRunner, with 'mvn clean > verify' in runners/direct-java; and I'd like to add some logging to figure > out what's going on in some failures. My question is: > > 1. Is there a way

Re: [DISCUSSION] Consistent use of loggers

2017-04-03 Thread Dan Halperin
At this point, I'm a little unclear on what is the proposal. Can you refresh a simplified/aggregated view after this conversation? IMO I don't think the DirectRunner should depend directly on any specific logging backend (at least, not in the compile or runtime scopes). I think it should depend on

Re: Proposal on porting PAssert away from aggregators

2017-03-30 Thread Dan Halperin
Yeah, this sounds really nice. It also seems to let runners do "whatever they want" -- write to files, switch to some magic but small-scale state, etc. On Thu, Mar 30, 2017 at 10:50 PM, Jean-Baptiste Onofré wrote: > Hi Pablo, > > it sounds a good idea ! > > Regards > JB > > > On 03/31/2017 12:12

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Dan Halperin
about whatever get's ingested into the pipeline at a specific time and > > don't care (up to the point of losing data) about correctness. > > I would be happy to hear more about your use case. > > > > > semantic, each unbounded IO should try its best to rest

Re: Style: how much testing for transform builder classes?

2017-03-21 Thread Dan Halperin
https://github.com/apache/beam/commit/b202548323b4d59b11bbdf06c99d0f99e6a947ef is one example where tests of feature Bar exist but did not discover bugs that could be introduced by builders. AutoValue like alleviates many, but not all, of these concerns - as Ismael points out. On Tue, Mar 21, 2

Re: [DISCUSSION] using NexMark for Beam

2017-03-21 Thread Dan Halperin
Not a deep response, but this is awesome! We'd really like to have some good benchmarks, and I'm excited you're updating Nexmark. This will be great! On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot wrote: > Hi all, > > Ismael and I are working on upgrading the Nexmark implementation for Beam.

Vacation for a few weeks

2017-03-02 Thread Dan Halperin
Hey folks, I wanted to give you a heads-up that I'll be offline starting tomorrow through 20th March. I think I've handled most of the questions and pull requests and JIRA issues you've sent me, but I know the community will be happy to help with urgent issues in the rest. (I also will not be ab

Re: Pipeline termination in the unified Beam model

2017-03-02 Thread Dan Halperin
Note that even "unbounded pipeline in a streaming runner".waitUntilFinish() can return, e.g., if you cancel it or terminate it. It's totally reasonable for users to want to understand and handle these cases. +1 Dan On Thu, Mar 2, 2017 at 2:53 AM, Jean-Baptiste Onofré wrote: > +1 > > Good idea

Re: First stable release: version designation?

2017-03-01 Thread Dan Halperin
A large set of Beam users will be coming from the pre-Apache technologies (aka Google Cloud Dataflow, Scio). Because Dataflow was 1.0 before Beam started, there is a lot of pre-existing documentation, Stack Overflow, etc. that refers to version 1.0 to mean what is now a year-and-a-half old release.

Re: Beam join with double stream join key

2017-02-28 Thread Dan Halperin
Hi, It looks like you may have tried to attach an image or something, but it did not come through the mailing list. Can you please try again? This is what we see: https://lists.apache.org/thread.html/f4a1ce5291428a70ecd54d3eefff56daf2f32b7a558f575eddc3729e@%3Cdev.beam.apache.org%3E Dan On Tue,

Re: Release 0.6.0

2017-02-27 Thread Dan Halperin
Sounds great to me! On Mon, Feb 27, 2017 at 2:10 PM, Sourabh Bajaj < sourabhba...@google.com.invalid> wrote: > +1 for the new release > > On Mon, Feb 27, 2017 at 2:06 PM Davor Bonaci wrote: > > > +1 -- let's get it started! > > > > On Mon, Feb 27, 2017 at 2:01 PM, Ahmet Altay > > wrote: > > > >

Re: Enforcer Rule- JDK1.7 for Beam

2017-02-27 Thread Dan Halperin
I think there are a few separable questions: 1. Can the module itself be used with Java7 only, and can we enforce this? 2. When testing, can we use Java 8 dependencies and how? I think that the automated enforcements are useful to ensure property #1, assuming we want the module to work in Java7.

Re: Merge HadoopInputFormatIO and HDFSIO in a single module

2017-02-16 Thread Dan Halperin
n > HdfsIO's > > own class and method names. AvroHdfsFileSource etc would work just as > well > > with new IO. > > > > On Thu, Feb 16, 2017 at 8:17 AM, Dan Halperin > > > > > wrote: > > > > > (And I think renaming to HadoopIO doesn&

Re: Merge HadoopInputFormatIO and HDFSIO in a single module

2017-02-16 Thread Dan Halperin
Chiming in a bit late, but here's my 2 cents. HdfsFileSystem vs Hadoop*InputFormatIO is a red herring: * HdfsFileSystem is for file-format-specific, Beam-native, parsers of files. It will make TextIO, AvroIO, etc., work for files that happen to be located at hdfs:// URIs. * This is complementa

We've hit 2000 PRs!

2017-02-16 Thread Dan Halperin
eally empowering users to build portable, long-lived, fast data processing pipelines. Thanks everyone for making this community and keeping this project really fun :) Dan On Mon, Sep 26, 2016 at 2:47 PM, Dan Halperin wrote: > Hey folks! > > Just wanted to send out a note -- we've hit

Re: Better developer instructions for using Maven?

2017-02-10 Thread Dan Halperin
On Fri, Feb 10, 2017 at 7:42 AM, Kenneth Knowles wrote: > On Feb 10, 2017 07:36, "Dan Halperin" wrote: > > Before we added checkstyle it was under a minute. Now it's over five? > That's awful IMO > > > Checkstyle didn't cause all that, did it? >

Re: Better developer instructions for using Maven?

2017-02-10 Thread Dan Halperin
Regards > > >> JB > > >> > > >> On Feb 10, 2017, 07:51, at 07:51, Aviem Zur > > >wrote: > > >> >Can we consider adding rat-plugin and findbugs to the default verify > > >> >phase? > > >> >Currently they only run w

Re: Should you always have a separate PTransform class for a new transform?

2017-02-07 Thread Dan Halperin
I am generally persuaded to at least change my number to something like 0 :). These are pretty reasonable perspectives, especially pointing out that withSideInputs is pretty useless in Count ;) On Tue, Feb 7, 2017 at 10:04 PM, Kenneth Knowles wrote: > On Tue, Feb 7, 2017 at 8:43 PM, Eugene Kirp

Re: Should you always have a separate PTransform class for a new transform?

2017-02-07 Thread Dan Halperin
A little bit more inline: On Tue, Feb 7, 2017 at 5:15 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hello, > > I was auditing Beam for violations of PTransform style guide > https://beam.apache.org/contribute/ptransform-style-guide/ and came across > another style point that deser

Re: Should you always have a separate PTransform class for a new transform?

2017-02-07 Thread Dan Halperin
I'll agree with the "Cons" by referencing back to this thread: https://lists.apache.org/thread.html/caa8k_flvcmx+tyksxdmcxxe9y_zyohe4ovht9f2jb1wckob...@mail.gmail.com On Tue, Feb 7, 2017 at 5:15 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hello, > > I was auditing Beam for viol

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #2

2017-02-05 Thread Dan Halperin
+1 * I ran my own usual sanity check pipelines, which passed. * New to this Beam release, I also ran some additional Google-internal tests. * Verified module list: new modules are io-elasticsearch and io-mqtt ; and verified licenses for the direct, non-Apache dependencies in pom.xml * mvn apache-r

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-02-02 Thread Dan Halperin
t;>> On Tue, 31 Jan 2017 at 19:28 Kenneth Knowles >>>> wrote: >>>> >>>> I agree. -1 and let's do the smartest thing to undo the regression. >>>> >>>> Those two commits are not sufficient to restore late data dropping. >&

Re: Doesn't PAssertTest.runExpectingAssertionFailure need to call waitUntilFinish?

2017-01-31 Thread Dan Halperin
Hi Shen, Great question. The trick is that the `pipeline` object is an instance of TestPipeline [0], for which p.run() is the same as p.run().waitUntilFinish(). It might be documentationally better to use p.run().waitUntilFinish() to be consistent with real runners, or add a method to TestPipelin

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-31 Thread Dan Halperin
ssing suite of Jenkins jobs: >>>>> * https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/6870/ >>>>> * https://builds.apache.org/job/beam_PostCommit_Java_MavenInst >>>>> all/2474/ >>>>> * >>>>> >>>>> >>>>> htt

Re: Let's make Beam transforms comply with PTransform Style Guide

2017-01-30 Thread Dan Halperin
On Mon, Jan 30, 2017 at 7:56 PM, Dan Halperin wrote: > On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > >> Hello, >> >> The PTransform Style Guide is live >> https://beam.apache.org/contribute/ptransform-style-g

Re: Let's make Beam transforms comply with PTransform Style Guide

2017-01-30 Thread Dan Halperin
On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hello, > > The PTransform Style Guide is live > https://beam.apache.org/contribute/ptransform-style-guide/ - a natural > next > step is to audit Beam libraries for compliance and file JIRAs for places > that

Re: [VOTE] Apache Beam, version 0.5.0, release candidate #1

2017-01-30 Thread Dan Halperin
I am worried about https://issues.apache.org/jira/browse/BEAM-1346 for RC1 and would at least wait for resolution there before proceeding. On Mon, Jan 30, 2017 at 3:48 AM, Jean-Baptiste Onofré wrote: > Good catch for the PPMC, I'm upgrading the email template in the release > guide (it was a cop

Re: Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #2473

2017-01-30 Thread Dan Halperin
Hey folks, It looks like the python-sdk -> master merge went bad and, unfortunately, we have it configured to email anyone who ever contributed a commit to the merge, which I think devolves to "anyone who ever committed to that branch". I've disabled further emails in this job's configuration for

Re: TextIO binary file

2017-01-30 Thread Dan Halperin
Stas' comment is the right one. The "canonical" use of TextIO is using something like a TextualIntegerCoder , but that should almost certainly be replaced with TextIO.Read |

Re: Pipeline graph reflection

2017-01-29 Thread Dan Halperin
Thomas is working on this pretty explicitly. Beam needs this for the Runner/Fn APIs -- except, probably, the unique IDs will be numbers or hashes so that they are more useable than long strings. The code to check whether names are unique, etc., is actually in the SDK core right now. See, e.g., htt

Re: Consistent Placement

2017-01-27 Thread Dan Halperin
r these transforms. Dan > > > On Fri, Jan 27, 2017 at 11:30 AM Dan Halperin > > wrote: > > > Hi Jesse, can you specifically say which functions on Combine and Count > > you're thinking of? I believe these transforms are consistent with the > > "p

Re: Consistent Placement

2017-01-27 Thread Dan Halperin
Hi Jesse, can you specifically say which functions on Combine and Count you're thinking of? I believe these transforms are consistent with the "principle of least visibility" -- make nothing more public than it needs to be. Look at Combine.globally

Re: Better developer instructions for using Maven?

2017-01-25 Thread Dan Halperin
g >>>>> others to look at it. I think this should be our criteria (i.e. what >>>>> will a new but maven-savvy user run before pushing their code). >>>>> >>>>> As long as the pre-commit hooks still check everything I'm ok with >>>

Re: How to implement Timer in runner

2017-01-24 Thread Dan Halperin
Hi Jingsong, Sorry for the delayed response; this email ended up being misclassified by my mail server and I missed it. Maybe Kenn or Aljoscha has suggestions on how runners can best implement timers? Dan On Thu, Jan 19, 2017 at 9:55 PM, lzljs3620320 wrote: > Hi there, > I'm working on the bea

Re: Subscription to to beam project

2017-01-23 Thread Dan Halperin
+original mailer, assuming he is not on dev@... On Sun, Jan 22, 2017 at 7:31 PM, Davor Bonaci wrote: > Welcome! Please check out the support page [1] with all mailing lists and > subscribe links. > > [1] https://beam.apache.org/get-started/support/ > > On Sat, Jan 21, 2017 at 11:59 PM, Ritesh Ka

Re: [VOTE] Merge Python SDK to the master branch

2017-01-20 Thread Dan Halperin
[X] +1, Merge python-sdk branch to master after the 0.5.0 release, and release it in the subsequent minor release. Thanks and woo! On Fri, Jan 20, 2017 at 12:00 PM, Jean-Baptiste Onofré wrote: > +1 to merge Python SDK after 0.5.0 release. > > Regards > JB > > > On 01/20/2017 06:03 PM, Ahmet Alt

Re: Runner-provided ValueProviders

2017-01-20 Thread Dan Halperin
I think this was initially motivated by BEAM-758 . Copying from that issue: In the forthcoming runner API, a user will be able to save a pipeline to JSON and then run it repeatedly. Many pieces of code (e.g., BigQueryIO.Read or Write) rely o

Re: [DISCUSS] Python SDK status and next steps

2017-01-19 Thread Dan Halperin
I do not think that Python SDK yet meets the bar [1] for implementing the Beam model -- supporting Unbounded data is very important. That said, given the committed and sustained set of contributors, it generally makes sense to me to make an exception in anticipation of these features being fleshed

Re: Beam Fn API

2017-01-19 Thread Dan Halperin
"relatively little extra work" once the base APIs are implemented. On Thu, Jan 19, 2017 at 11:26 PM, Dan Halperin wrote: > This is an extremely ambitious part of the technical vision. I think it's > a lot of work, but well worth it -- Python-SDK-on-Java-runner with >

Re: Beam Fn API

2017-01-19 Thread Dan Halperin
This is an extremely ambitious part of the technical vision. I think it's a lot of work, but well worth it -- Python-SDK-on-Java-runner with relatively extra work? I don't care what the overhead is, this is making the impossible possible. On Thu, Jan 19, 2017 at 3:56 PM, Lukasz Cwik wrote: > I h

Re: Composite Types and the Runner API

2017-01-19 Thread Dan Halperin
skimmed doc and PR, +1. On Tue, Jan 17, 2017 at 4:26 PM, Lukasz Cwik wrote: > +1 since this brings us closer to a portability story. > > On Tue, Jan 17, 2017 at 3:10 PM, Jean-Baptiste Onofré > wrote: > > > +1 > > > > It makes sense. > > > > Thanks ! > > Regards > > JB > > > > > > On 01/17/2017

Re: Better developer instructions for using Maven?

2017-01-05 Thread Dan Halperin
to run mvn verify before commits. > > On Thu, Jan 5, 2017 at 9:25 AM Dan Halperin > wrote: > > > Several folks seem to have been confused after BEAM-246, where we moved > the > > "slow things" into the release profile. I've started a discussion with > &g

Better developer instructions for using Maven?

2017-01-05 Thread Dan Halperin
Several folks seem to have been confused after BEAM-246, where we moved the "slow things" into the release profile. I've started a discussion with https://github.com/apache/beam/pull/1740 to see if there are things we can do to fill these gaps. Would love folks to chime in with opinions. Dan On

Re: [VOTE] Release 0.4.0, release candidate #1

2016-12-29 Thread Dan Halperin
* mvn verify passes with and without network enabled * mvn apache-ret:check passes * mvn verify passes with -Prelease * release signature properly signed by JB (using the KEYS file as the keyring) * No binary files [one false positive empty file in ./runners/core-java/src/test/java/.placeholder we

Re: PCollection to PCollection Conversion

2016-12-29 Thread Dan Halperin
be used in development. Some of > our assumptions will break down when programmers aren't the ones using > Beam. I can see from the user traffic already that not everyone using Beam > is a programmer and they'll need classes like this to be productive. > On Thu, Dec 29, 2016

Re: Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Spark #574

2016-12-29 Thread Dan Halperin
Manual build by me in release testing -- I entered the wrong tag. Please ignore. On Thu, Dec 29, 2016 at 2:30 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See RunnableOnService_Spark/574/> > > --

Re: PCollection to PCollection Conversion

2016-12-29 Thread Dan Halperin
On Thu, Dec 29, 2016 at 1:36 PM, Jesse Anderson wrote: > I prefer JB's take. I think there should be three overloaded methods on the > class. I like Vikas' name ToString. The methods for a simple conversion > should be: > > ToString.strings() - Outputs the .toString() of the objects in the > PCol

Re: Running a Specific Test

2016-12-29 Thread Dan Halperin
n that command line worked. Thanks! > > On Thu, Dec 29, 2016 at 11:23 AM Stas Levin wrote: > > > I believe you raise a good point :) > > > > On Thu, Dec 29, 2016 at 9:00 PM Dan Halperin > > > wrote: > > > > > I suspect -- but may be wrong -- that

Re: Running a Specific Test

2016-12-29 Thread Dan Halperin
I suspect -- but may be wrong -- that the command line Stas gives will use the *installed* version of beam-sdks-java-core. If you are iterating on a @NeedsRunner test in the SDK core, you will either need to reinstall it over and over again, or use `-am` to force recompilation of the core. Here is

Re: PipelineResult state management

2016-12-27 Thread Dan Halperin
Right now we're using PipelineResult as a sort of hybrid Future (e.g, waitUntilFinish is like Future.get()), and I think Stas has a reasonable point that this is confusing. I know that figuring out the PipelineResult API for real is part of the pre-stable-release work -- sounds like the community