Should collect() and count() be treated as data sinks?

2015-04-02 Thread Felix Neutatz
Hi, I have run the following program: final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List l = Arrays.asList(new Tuple1(1L)); TypeInformation t = TypeInfoParser.parse("Tuple1"); DataSet> data = env.fromCollection(l, t); long value = data.count(); System.out.prin

Re: Should collect() and count() be treated as data sinks?

2015-04-02 Thread Alexander Alexandrov
I have a similar issue here: I would like to run a dataflow up to a particular point and materialize (in memory) the intermediate result. Is this possible at the moment? Regards, Alex 2015-04-02 17:33 GMT+02:00 Felix Neutatz : > Hi, > > I have run the following program: > > final ExecutionEnvir

Re: Should collect() and count() be treated as data sinks?

2015-04-02 Thread Maximilian Michels
Hi Felix, count() defines a sink through the DiscardingOutputFormat. The error you're seeing is because the execution of the plan is already triggered within the count() method. When you call env.execute() again, the plan has been already cleared from the ExecutionEnvironment and it fails to execu

Re: Should collect() and count() be treated as data sinks?

2015-04-02 Thread Aljoscha Krettek
In my opinion it should not be handled like print. The idea behind count()/collect() is that they immediately return the result which can then be used in further flink operations. Right now, this is not properly/efficiently implemented but once we have support for intermediate results on this leve

Re: Should collect() and count() be treated as data sinks?

2015-04-06 Thread Stephan Ewen
count() and collect() need to immediately trigger an execution, because the driver program cannot proceed otherwise. They are "eager". Regular sinks are "lazy", they wait until the program is triggered anyways. BTW: Should "print()" be also an "eager" statement? I think it needs to be, if we want

Re: Should collect() and count() be treated as data sinks?

2015-04-07 Thread Alexander Alexandrov
> Should "print()" be also an "eager" statement? I would expect this to be the case as I can only imagine an implementation of print() via collect(). 2015-04-06 14:37 GMT+02:00 Stephan Ewen : > count() and collect() need to immediately trigger an execution, because the > driver program cannot pr

Re: Should collect() and count() be treated as data sinks?

2015-04-07 Thread Maximilian Michels
On Mon, Apr 6, 2015 at 2:37 PM, Stephan Ewen wrote: > BTW: Should "print()" be also an "eager" statement? I think it needs to be, > if we want to print to the driver's std out Yes, if we change print() to print on the Client, then it needs to execute eagerly. On Thu, Apr 2, 2015 at 6:59 PM, Al

Re: Should collect() and count() be treated as data sinks?

2015-04-07 Thread Stephan Ewen
For the sake of prototyping, can you use a util that simply materializes the intermediate result in a file system (using typeInfo input and output formats) ? On Tue, Apr 7, 2015 at 6:21 PM, Maximilian Michels wrote: > On Mon, Apr 6, 2015 at 2:37 PM, Stephan Ewen wrote: > > > BTW: Should "print(