@Thomas, @Aljoscha, @Kenneth, @JB, I was indeed talking about whether or
not waitUntilFinish(duration) is blocking on an unbounded-but-finite
collection.
Thanks for your comments, they make a lot of sense.
Etienne
Le 10/05/2017 à 19:15, Kenneth Knowles a écrit :
+1 to what Thomas said: At
+1 to what Thomas said: At +infinity all data is droppable so,
philosophically, leaving the pipeline up is just burning CPU. Since
TestStream is a ways off for most runners, we should make sure we have
tests for other sorts of unbounded-but-finite collections to track which
runners this works on.
I think that generally this is actually less of a big deal than suggested,
for a pretty simple reason:
All bounded pipelines are expected to terminate when they are complete.
Almost all unbounded pipelines are expected to run until explicitly shut
down.
As a result, shutting down an unbounded
Hi,
A bit of clarification, the Flink Runner does not terminate a Job when the
timeout is reached in waitUntilFinish(Duration). When we reach the timeout we
simply return and keep the job running. I thought that was the expected
behaviour.
Regarding job termination, I think it’s easy to change
OK, now I understand: you are talking about waitUntilFinish(), whereas I was
thinking about a simple run().
IMHO spark and flink sound like the most logic behavior for a streaming
pipeline.
Regards
JB
On 05/10/2017 10:20 AM, Etienne Chauchot wrote:
Hi everyone,
I'm reopening this subject
Hi everyone,
I'm reopening this subject because, IMHO, it is important to unify
pipelines termination semantics in the model. Here are the differences I
have observed in streaming pipelines termination:
- direct runner: when the output watermarks of all of its PCollections
progress to
BEAM-593 is blocked by Flink issues:
- https://issues.apache.org/jira/browse/FLINK-2313: Change Streaming Driver
Execution Model
- https://issues.apache.org/jira/browse/FLINK-4272: Create a JobClient for job
control and monitoring
where the second is kind of a duplicate of the first one.
Ted, the timeout is needed mostly for testing purposes.
AFAIK there is no easy way to express the fact a source is "done" in a
Spark native streaming application.
Moreover, the Spark streaming "native" flow can either "awaitTermination()"
or "awaitTerminationOrTimeout(...)". If you
Why is the timeout needed for Spark ?
Thanks
> On Apr 18, 2017, at 3:05 AM, Etienne Chauchot wrote:
>
> +1 on "runners really terminate in a timely manner to easily programmatically
> orchestrate Beam pipelines in a portable way, you do need to know whether
> the pipeline
+1 on "runners really terminate in a timely manner to easily
programmatically orchestrate Beam pipelines in a portable way, you do
need to know whether
the pipeline will finish without thinking about the specific runner and
its options"
As an example, in Nexmark, we have streaming mode tests,
+1
To help integrate this we can start by adding `ValidatesRunner` tests with
a new category and run it only with runners which adhere to the rules
mentioned, and eventually in all runners.
On Fri, Mar 3, 2017 at 12:46 AM Amit Sela wrote:
> +1 on Eugene's words - this
OK, I'm glad everybody is in agreement on this. I raised this point because
we've been discussing implementing this behavior in the Dataflow streaming
runner, and I wanted to make sure that people are okay with it from a
conceptual point of view before proceeding.
On Thu, Mar 2, 2017 at 10:27 AM
Isn't this already the case? I think semantically it is an unavoidable
conclusion, so certainly +1 to that.
The DirectRunner and TestDataflowRunner both have this behavior already.
I've always considered that a streaming job running forever is just [very]
suboptimal shutdown latency :-)
Some
Note that even "unbounded pipeline in a streaming runner".waitUntilFinish()
can return, e.g., if you cancel it or terminate it. It's totally reasonable
for users to want to understand and handle these cases.
+1
Dan
On Thu, Mar 2, 2017 at 2:53 AM, Jean-Baptiste Onofré
wrote:
+1
Good idea !!
Regards
JB
On 03/02/2017 02:54 AM, Eugene Kirpichov wrote:
Raising this onto the mailing list from
https://issues.apache.org/jira/browse/BEAM-849
The issue came up: what does it mean for a pipeline to finish, in the Beam
model?
Note that I am deliberately not talking about
+1!
I think it's a very cool way to abstract away the batch vs. streaming
dissonance from the Beam model.
It does require that practitioners are *educated* to think this way as well.
I believe that nowadays the terms "batch" and "streaming" are so deeply
rooted, that they play a key role in the
+1
I think it's a fair claim that a PCollection is "done" when it's watermark
reaches positive infinity, and then it's easy to claim that a Pipeline is
"done" when all of its PCollections are done. Completion is an especially
reasonable claim if we consider positive infinity to be an actual
Raising this onto the mailing list from
https://issues.apache.org/jira/browse/BEAM-849
The issue came up: what does it mean for a pipeline to finish, in the Beam
model?
Note that I am deliberately not talking about "batch" and "streaming"
pipelines, because this distinction does not exist in the
18 matches
Mail list logo