Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Etienne Chauchot
Hi everyone, I'm reopening this subject because, IMHO, it is important to unify pipelines termination semantics in the model. Here are the differences I have observed in streaming pipelines termination: - direct runner: when the output watermarks of all of its PCollections progress to +infin

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Jean-Baptiste Onofré
Hi Etienne, First, the termination vs triggering wording is pretty important. In streaming, a pipeline basically never terminates. Maybe I don't understand what you mean, but, taking the direct runner for example, the pipeline doesn't terminate (it's what I just tested). IMHO, all behaviors

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Jean-Baptiste Onofré
OK, now I understand: you are talking about waitUntilFinish(), whereas I was thinking about a simple run(). IMHO spark and flink sound like the most logic behavior for a streaming pipeline. Regards JB On 05/10/2017 10:20 AM, Etienne Chauchot wrote: Hi everyone, I'm reopening this subject be

Re: [DISCUSS] Remove TimerInternals.deleteTimer(*) and Timer.cancel()

2017-05-10 Thread Aljoscha Krettek
Thanks for taking care of that, Kenn! > On 10. May 2017, at 05:45, Kenneth Knowles wrote: > > I think there's been adequate time for at least a quick comment like "hey, > I use that [in Beam or similar] and I need it!" > > Filed https://issues.apache.org/jira/browse/BEAM-2245 and will address i

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Aljoscha Krettek
Hi, A bit of clarification, the Flink Runner does not terminate a Job when the timeout is reached in waitUntilFinish(Duration). When we reach the timeout we simply return and keep the job running. I thought that was the expected behaviour. Regarding job termination, I think it’s easy to change

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Thomas Groh
I think that generally this is actually less of a big deal than suggested, for a pretty simple reason: All bounded pipelines are expected to terminate when they are complete. Almost all unbounded pipelines are expected to run until explicitly shut down. As a result, shutting down an unbounded pip

Direct runner doesn't seem to finalize checkpoint "quickly"

2017-05-10 Thread Jean-Baptiste Onofré
Hi Beamers, I'm working on some fixes in the JmsIO and MqttIO. Those two IOs behave in a similar way on the reading side: - they consume messages from a JMS or MQTT broker - the "pending" messages are stored in the checkpoint mark. When a new message is added to the checkpoint, we compare the

Re: Direct runner doesn't seem to finalize checkpoint "quickly"

2017-05-10 Thread Thomas Groh
I'm going to start with number two, because it's got an easy answer: When performing an unbounded read, the DirectRunner will finalize a checkpoint after it has completed a subsequent read from that split where at least one element was read. A bounded read from an unbounded source will never be fin

Re: Direct runner doesn't seem to finalize checkpoint "quickly"

2017-05-10 Thread Jean-Baptiste Onofré
Thanks Thomas, Let me try what you are proposing. I keep you posted. Regards JB On 05/10/2017 06:33 PM, Thomas Groh wrote: I'm going to start with number two, because it's got an easy answer: When performing an unbounded read, the DirectRunner will finalize a checkpoint after it has completed

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Kenneth Knowles
+1 to what Thomas said: At +infinity all data is droppable so, philosophically, leaving the pipeline up is just burning CPU. Since TestStream is a ways off for most runners, we should make sure we have tests for other sorts of unbounded-but-finite collections to track which runners this works on.

Re: Help needed with WordCount commands

2017-05-10 Thread Hadar Hod
Thanks to everyone who helped out with this PR[1]. We still need to verify the directions for the Apex, Flink, and Spark runners. If folks could take a look and just comment on the PR letting me know if the directions are correct, that'd be really great. Otherwise, I'll submit the PR with TODOs for

[PROPOSAL] Apache Hive connector

2017-05-10 Thread Madhusudan Borkar
Hi all, Thank you for your response to the earlier proposal. Taking into account all the suggestions, we are making a new proposal for Hive connector. Please, let us know your feedback. [1] https://docs.google.com/document/d/1aeQRLXjVr38Z03_zWkHO9YQhtnj0jHoCfhsSNm-wxtA/edit?usp=sharing [2] https:

Re: [PROPOSAL] Running Splittable DoFn via Source API

2017-05-10 Thread Eugene Kirpichov
Hi, Aljoscha - can you clarify your concern? Basically, previously the plan was: 1. Wait for runners to fully support SDF (hard) 2. Implement existing sources as SDF 3. Implement new APIs and now it's: 0. Implement adapter (easy) 1. Implement existing sources as SDF 2. Wait for runners to have b

Re: Towards a spec for robust streaming SQL, Part 1

2017-05-10 Thread Tyler Akidau
On Tue, May 9, 2017 at 3:06 PM Fabian Hueske wrote: > Hi Tyler, > > thank you very much for this excellent write-up and the super nice > visualizations! > You are discussing a lot of the things that we have been thinking about as > well from a different perspective. > IMO, yours and the Flink mod

First stable release: release candidate #2

2017-05-10 Thread Davor Bonaci
The release candidate #2 for the version 2.0.0 has been built. I believe this is a pretty good candidate for release -- we have (basically) cleared the entire list of blocking changes [8]! I'd like to ask everyone to give this candidate a try -- this is the last opportunity we have to fix any cri

Re: First stable release: release candidate #1

2017-05-10 Thread Davor Bonaci
This RC is now obsolete -- see RC #2 information in a separate thread [1]. Davor [1] https://lists.apache.org/thread.html/f4f13f8b4768d0597e7f0c60a0b4d4358c4b8cbc4af86f17f58a1a4d@%3Cdev.beam.apache.org%3E On Mon, May 8, 2017 at 1:59 PM, Davor Bonaci wrote: > The release candidate #1 for the ve

Re: First stable release: Acceptance criteria

2017-05-10 Thread Davor Bonaci
Just a quick remainder to consider to consider contributing here. We are now at 6 criteria -- thanks! On Tue, May 9, 2017 at 2:29 AM, Aljoscha Krettek wrote: > Thanks for starting this document! > > I added a criterion and also verified it on the current RC. > > Best, > Aljoscha > > > On 8. May