Re: Getting elements from DataStream

2016-04-11 Thread Stephan Ewen
Hi! The equivalent does not yet exist on DataStream. Stephan On Mon, Apr 11, 2016 at 3:42 PM, subash basnet wrote: > Hello all, > > Getting certain number of elements from DataSet is possible as below: > *DataSet centroids = newDataSet.map(new >

Re: Limit buffer size for a job

2016-04-11 Thread Stephan Ewen
Hi! Ufuk's suggestion explains how to buffer less between Flink operators. Is that what you were looking for, or are you looking for a way to fetch more fine grained in the source from the message queue? What type of source are you using? Greetings, Stephan On Mon, Apr 11, 2016 at 5:02 PM,

Re: Limit buffer size for a job

2016-04-11 Thread Ufuk Celebi
Hey Andrew, take a look at this here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/index.html#controlling-latency Does this help? – Ufuk On Thu, Apr 7, 2016 at 3:04 PM, Andrew Ge Wu wrote: > Hi guys > > We have a prioritized queue, where

Getting elements from DataStream

2016-04-11 Thread subash basnet
Hello all, Getting certain number of elements from DataSet is possible as below: *DataSet centroids = newDataSet.map(new TupleCentroidConverter()).first(3);* How could I get elements as above in DataStream *DataStream centroids = newDataStream.map(new TupleCentroidConverter()).???* Best

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Robert Metzger
Flink's DataStream API also allows reading files from disk (local, hdfs, etc.). So you don't have to set up Kafka to make this work (If you have it already, you can of course use it). On Mon, Apr 11, 2016 at 11:08 AM, Ufuk Celebi wrote: > On Mon, Apr 11, 2016 at 10:26 AM, Raul

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Ufuk Celebi
On Mon, Apr 11, 2016 at 10:26 AM, Raul Kripalani wrote: > Would appreciate the feedback of the community. Even if it's to inform that > currently this iterative, batch, windowed approach is not possible, that's > ok! Hey Raul! What you describe should work with Flink. This is

Re: varying results: local VS cluster

2016-04-11 Thread Stephan Ewen
Just to make sure: Most numeric programs produce varying results across different execution. If the algorithm is correct, they should converge towards the same solution, but it is very common that the exact solution differs. On Mon, Apr 11, 2016 at 10:16 AM, Aljoscha Krettek

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Raul Kripalani
Hello, Perhaps the description of use case wasn't clear enough? Please let me know. Would appreciate the feedback of the community. Even if it's to inform that currently this iterative, batch, windowed approach is not possible, that's ok! Cheers, *Raúl Kripalani* PMC & Committer @ Apache

Re: varying results: local VS cluster

2016-04-11 Thread Aljoscha Krettek
Hi, could you please provide a minimal example input and maybe also the output for parallelism=5 and parallelism=1 so that we can check. -- aljoscha On Mon, 4 Apr 2016 at 09:52 Lydia Ickler wrote: > Hi all, > > I have an issue regarding execution on 1 machine VS 5

Re: BulkIteration and BroadcastVariables

2016-04-11 Thread Aljoscha Krettek
Hi, it is not possible to change broadcast variables. Internally they are also just a dataset that get's streamed through on an additional input of an operator. -- aljoscha On Wed, 30 Mar 2016 at 17:34 Lydia Ickler wrote: > Hi all, > I have a question regarding the