Re: API behavior with data sinks (lazy) and eager operations

2015-01-27 Thread Max Michels
Let's make it clear that count/collection type of actions execute the plan up till that point (including the data sinks). From a user perspective, this seems most logic to me. The user might even rely on the data generated by the sinks. On Mon, Jan 19, 2015 at 11:46 AM, Fabian Hueske wrote: > Thi

Re: API behavior with data sinks (lazy) and eager operations

2015-01-19 Thread Fabian Hueske
This is a difficult question. A program might also later refer to some intermediate data set that would have been already computed if sinks are executed together with the count() call and need to be computed again. Also what do we do with sinks that are not connected with the collected or counted

Re: API behavior with data sinks (lazy) and eager operations

2015-01-19 Thread Till Rohrmann
I agree with Ufuk that it depends on how much both subgraphs and also future subgraphs overlap. It is conceivable that the user will reuse subgraphs of an already computed data sink after he called collect(). Then we also would have to reexecute parts of the dataflow graph. I guess we easily find e

Re: API behavior with data sinks (lazy) and eager operations

2015-01-19 Thread Ufuk Celebi
I think this question depends on how much both subgraphs overlap? But in general, I agree that the first approach seems more desirable from the runtime view (multiple consumers at the branch point). On Mon, Jan 19, 2015 at 10:59 AM, Robert Metzger wrote: > I would also execute the sinks immediat

Re: API behavior with data sinks (lazy) and eager operations

2015-01-19 Thread Robert Metzger
I would also execute the sinks immediately. I think its a corner case because the sinks are usually the last thing in a plan and all print() or collect() statements are earlier in the plan. print() should go to the client command line, yes. On Mon, Jan 19, 2015 at 1:42 AM, Stephan Ewen wrote: >

API behavior with data sinks (lazy) and eager operations

2015-01-18 Thread Stephan Ewen
Hi there! With the upcoming more interactive extensions to the API (operations that go back to the client from a program and need to be eagerly evaluated) we need to define how different actions should behave. Currently, nothing gets executed until the "env.execute()" call is made. That allows to