Re: Logs meaning states

2015-06-29 Thread Stephan Ewen
Ah, thank you! - If you create a data set from a Java/Scala collection, this data source has the parallelism one. - The map function is chained to that source, so it runs with parallelism one as well. - To run it with a higher parallelism, use "setParallelism(...)" on the mapFunction, or call "

Re: Logs meaning states

2015-06-29 Thread Juan Fumero
Yeah, sorry. I would like to do something simple like this, but using Java Threads. DataSet> input = env.fromCollection(in); DataSet output = input.map(new HighWorkLoad()); ArrayList result = output.consume(); // ? like collect but in parallel, some operation that consumes the pipeline. return r

Re: Logs meaning states

2015-06-29 Thread Stephan Ewen
It is not quite easy to understand what you are trying to do. Can you post your program here? Then we can take a look and give you a good answer... On Mon, Jun 29, 2015 at 3:47 PM, Juan Fumero < juan.jose.fumero.alfo...@oracle.com> wrote: > Is there any other way to apply the function in paralle

Re: Logs meaning states

2015-06-29 Thread Juan Fumero
Is there any other way to apply the function in parallel and return the result to the client in parallel? Thanks Juan On Mon, 2015-06-29 at 15:01 +0200, Stephan Ewen wrote: > In general, avoid collect if you can. Collect brings data top the > client, where the computation is not parallel any mor

Re: Logs meaning states

2015-06-29 Thread Stephan Ewen
In general, avoid collect if you can. Collect brings data top the client, where the computation is not parallel any more. Try to do as much on the DataSet as possible. On Mon, Jun 29, 2015 at 2:58 PM, Juan Fumero < juan.jose.fumero.alfo...@oracle.com> wrote: > Hi Stephan, > so should I use ano

Re: Logs meaning states

2015-06-29 Thread Juan Fumero
Hi Stephan, so should I use another method instead of collect? It seems multithread is not working with this. Juan On Mon, 2015-06-29 at 14:51 +0200, Stephan Ewen wrote: > Hi Juan! > > > This is an artifact of a workaround right now. The actual collect() > logic happens in the flatMap() a

Re: Logs meaning states

2015-06-29 Thread Stephan Ewen
Hi Juan! This is an artifact of a workaround right now. The actual collect() logic happens in the flatMap() and the sink is a dummy that executes nothing. The flatMap writes the data to be collected to the "accumulator" that delivers it back. Greetings, Stephan On Mon, Jun 29, 2015 at 2:30 PM,

Logs meaning states

2015-06-29 Thread Juan Fumero
Hi, I am starting with Flink. I have tried to look for the documentation but I havent found it clear. I wonder the difference between these two states: FlatMap RUNNING vs DataSink RUNNIG. FlatMap is doing data any data transformation? Compilation? In which point is actually executing the f