Re: Pipeline termination in the unified Beam model

2017-05-23 Thread Etienne Chauchot
@Thomas, @Aljoscha, @Kenneth, @JB, I was indeed talking about whether or not waitUntilFinish(duration) is blocking on an unbounded-but-finite collection. Thanks for your comments, they make a lot of sense. Etienne Le 10/05/2017 à 19:15, Kenneth Knowles a écrit : +1 to what Thomas said: At

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Kenneth Knowles
+1 to what Thomas said: At +infinity all data is droppable so, philosophically, leaving the pipeline up is just burning CPU. Since TestStream is a ways off for most runners, we should make sure we have tests for other sorts of unbounded-but-finite collections to track which runners this works on.

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Thomas Groh
I think that generally this is actually less of a big deal than suggested, for a pretty simple reason: All bounded pipelines are expected to terminate when they are complete. Almost all unbounded pipelines are expected to run until explicitly shut down. As a result, shutting down an unbounded

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Aljoscha Krettek
Hi, A bit of clarification, the Flink Runner does not terminate a Job when the timeout is reached in waitUntilFinish(Duration). When we reach the timeout we simply return and keep the job running. I thought that was the expected behaviour. Regarding job termination, I think it’s easy to change

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Jean-Baptiste Onofré
OK, now I understand: you are talking about waitUntilFinish(), whereas I was thinking about a simple run(). IMHO spark and flink sound like the most logic behavior for a streaming pipeline. Regards JB On 05/10/2017 10:20 AM, Etienne Chauchot wrote: Hi everyone, I'm reopening this subject

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Etienne Chauchot
Hi everyone, I'm reopening this subject because, IMHO, it is important to unify pipelines termination semantics in the model. Here are the differences I have observed in streaming pipelines termination: - direct runner: when the output watermarks of all of its PCollections progress to

Re: Pipeline termination in the unified Beam model

2017-04-18 Thread Aljoscha Krettek
BEAM-593 is blocked by Flink issues: - https://issues.apache.org/jira/browse/FLINK-2313: Change Streaming Driver Execution Model - https://issues.apache.org/jira/browse/FLINK-4272: Create a JobClient for job control and monitoring where the second is kind of a duplicate of the first one.

Re: Pipeline termination in the unified Beam model

2017-04-18 Thread Stas Levin
Ted, the timeout is needed mostly for testing purposes. AFAIK there is no easy way to express the fact a source is "done" in a Spark native streaming application. Moreover, the Spark streaming "native" flow can either "awaitTermination()" or "awaitTerminationOrTimeout(...)". If you

Re: Pipeline termination in the unified Beam model

2017-04-18 Thread Ted Yu
Why is the timeout needed for Spark ? Thanks > On Apr 18, 2017, at 3:05 AM, Etienne Chauchot wrote: > > +1 on "runners really terminate in a timely manner to easily programmatically > orchestrate Beam pipelines in a portable way, you do need to know whether > the pipeline

Re: Pipeline termination in the unified Beam model

2017-04-18 Thread Etienne Chauchot
+1 on "runners really terminate in a timely manner to easily programmatically orchestrate Beam pipelines in a portable way, you do need to know whether the pipeline will finish without thinking about the specific runner and its options" As an example, in Nexmark, we have streaming mode tests,

Re: Pipeline termination in the unified Beam model

2017-04-16 Thread Aviem Zur
+1 To help integrate this we can start by adding `ValidatesRunner` tests with a new category and run it only with runners which adhere to the rules mentioned, and eventually in all runners. On Fri, Mar 3, 2017 at 12:46 AM Amit Sela wrote: > +1 on Eugene's words - this

Re: Pipeline termination in the unified Beam model

2017-03-02 Thread Eugene Kirpichov
OK, I'm glad everybody is in agreement on this. I raised this point because we've been discussing implementing this behavior in the Dataflow streaming runner, and I wanted to make sure that people are okay with it from a conceptual point of view before proceeding. On Thu, Mar 2, 2017 at 10:27 AM

Re: Pipeline termination in the unified Beam model

2017-03-02 Thread Kenneth Knowles
Isn't this already the case? I think semantically it is an unavoidable conclusion, so certainly +1 to that. The DirectRunner and TestDataflowRunner both have this behavior already. I've always considered that a streaming job running forever is just [very] suboptimal shutdown latency :-) Some

Re: Pipeline termination in the unified Beam model

2017-03-02 Thread Dan Halperin
Note that even "unbounded pipeline in a streaming runner".waitUntilFinish() can return, e.g., if you cancel it or terminate it. It's totally reasonable for users to want to understand and handle these cases. +1 Dan On Thu, Mar 2, 2017 at 2:53 AM, Jean-Baptiste Onofré wrote:

Re: Pipeline termination in the unified Beam model

2017-03-02 Thread Jean-Baptiste Onofré
+1 Good idea !! Regards JB On 03/02/2017 02:54 AM, Eugene Kirpichov wrote: Raising this onto the mailing list from https://issues.apache.org/jira/browse/BEAM-849 The issue came up: what does it mean for a pipeline to finish, in the Beam model? Note that I am deliberately not talking about

Re: Pipeline termination in the unified Beam model

2017-03-02 Thread Stas Levin
+1! I think it's a very cool way to abstract away the batch vs. streaming dissonance from the Beam model. It does require that practitioners are *educated* to think this way as well. I believe that nowadays the terms "batch" and "streaming" are so deeply rooted, that they play a key role in the

Re: Pipeline termination in the unified Beam model

2017-03-01 Thread Thomas Groh
+1 I think it's a fair claim that a PCollection is "done" when it's watermark reaches positive infinity, and then it's easy to claim that a Pipeline is "done" when all of its PCollections are done. Completion is an especially reasonable claim if we consider positive infinity to be an actual

Pipeline termination in the unified Beam model

2017-03-01 Thread Eugene Kirpichov
Raising this onto the mailing list from https://issues.apache.org/jira/browse/BEAM-849 The issue came up: what does it mean for a pipeline to finish, in the Beam model? Note that I am deliberately not talking about "batch" and "streaming" pipelines, because this distinction does not exist in the