Regarding source Parallelism and usage of coflatmap transformation

2016-05-05 Thread Biplob Biswas
Hi, I have 2 different questions, both influencing each other in a way. *1)* I am getting a stream of tuples from a data generator using the following statements, "env.addSource(new DataStreamGenerator(filePath));" This generator reads a line from the file and splits it into different

Writing Intermediates to disk

2016-05-05 Thread Paschek, Robert
Hi Mailing List, I want to write and read intermediates to/from disk. The following foo- codesnippet may illustrate my intention: public void mapPartition(Iterable tuples, Collector out) { for (T tuple : tuples) { if (Condition)

Re: Restart Flink in Yarn

2016-05-05 Thread Robert Metzger
Hi Dominic, I'm sorry that you ran into this issue. What do you mean by "flink streaming routes" ? Regarding the second question: "Now I want to restart these routes to continue their work from the last checkpoint. What can i do?" I think the feature you are looking for are savepoints:

Re: Flink + Kafka + Scalabuff issue

2016-05-05 Thread Robert Metzger
Hi Alex, thanks for the update. I'm happy to hear you were able to resolve the issue. How are the other ad-hoc streaming pipelines setup? Maybe these pipelines use a different threading model than Flink. In Flink, we often have many instances of the same serializer running in the same JVM. Maybe

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Robert Schmidtke
The TM's request the buffers in batches, so you 384 were requested, but only 200 were left in the pool. This means your overall pool size is too small. Here is the relevant section from the documentation:

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Robert Metzger
The default value of taskmanager.network.numberOfBuffers is 2048. I would recommend to use a multiple of that value, for example 16384 (given that you have enough memory per TaskManager) I recommend checking out these slides I created a while ago. They explain what the network buffers are needed

Re: Accessing elements from DataStream

2016-05-05 Thread Robert Metzger
Yes. Here: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#example-program On Thu, May 5, 2016 at 1:16 PM, Piyush Shrivastava wrote: > Hi Robert, > > Can you share an example where flatmap is used to access elements? > > Thanks and

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Punit Naik
Yes I followed it and changed it to 298 but again it said the same thing. The only change was that it now said "required 298, but only 200 available". Why did it say that? On Thu, May 5, 2016 at 4:50 PM, Robert Metzger wrote: > Hi, > > I think you've chosen a good initial

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Robert Metzger
Hi, I think you've chosen a good initial value for the parallelism. The higher the parallelism, the more network buffers are needed. I would follow the recommendation from the exception and increase the number of network buffers. On Thu, May 5, 2016 at 11:23 AM, Punit Naik

Re: Accessing elements from DataStream

2016-05-05 Thread Piyush Shrivastava
Hi Robert, Can you share an example where flatmap is used to access elements?  Thanks and Regards,Piyush Shrivastava http://webograffiti.com On Thursday, 5 May 2016 4:45 PM, Robert Metzger wrote: Hi, you can just use a flatMap() on a DataStream to access individual

Re: Accessing elements from DataStream

2016-05-05 Thread Robert Metzger
Hi, you can just use a flatMap() on a DataStream to access individual elements from a stream. On Thu, May 5, 2016 at 1:00 PM, Piyush Shrivastava wrote: > Hi all, > > Can we access individual elements from a DataStream through an iterator > like we can in a WindowedStream

Re: Any way for Flink Elasticsearch connector reflecting IP change of Elasticsearch cluster?

2016-05-05 Thread Robert Metzger
Hi, the Kafka connector is able to handle leader changes. On Wed, May 4, 2016 at 5:46 PM, Fabian Hueske wrote: > Sorry, I confused the mail threads. We're already on the user list :-) > Thanks for the suggestion. > > 2016-05-04 17:35 GMT+02:00 Fabian Hueske

Accessing elements from DataStream

2016-05-05 Thread Piyush Shrivastava
Hi all, Can we access individual elements from a DataStream through an iterator like we can in a WindowedStream with the apply function? I am able to access the elements of a WindowedStream using the apply function and using the Iterable and Collector interfaces: val ds = ws.apply((K, W, input:

Re: Regarding Broadcast of datasets in streaming context

2016-05-05 Thread Biplob Biswas
This is exactly what I am confused about, if i understand it correctly each of the map function in the co-flat map would receive one tuple each at a time .. so that would mean if i have a datastream of centroids, it would arrive one at a time on the partitions and that would defeat the purpose.

Re: Flink - start-cluster.sh

2016-05-05 Thread Punit Naik
Yes. On Thu, May 5, 2016 at 3:04 PM, Flavio Pompermaier wrote: > Do you run the start-cluster.sh script with the same user having the ssh > passwordless login? > > > On Thu, May 5, 2016 at 11:03 AM, Punit Naik > wrote: > >> Okay, so it was a

Window

2016-05-05 Thread toletum
Hi! I want a window which should be fired each 5 seconds, whether or not there events. env.socketTextStream("192.168.1.101", ) .map(new mapper()) .keyBy(0) .timeWindow(Time.seconds(5)) .apply(new MyRichWindowFunction()) It works if the window has

Re: Flink - start-cluster.sh

2016-05-05 Thread Flavio Pompermaier
Do you run the start-cluster.sh script with the same user having the ssh passwordless login? On Thu, May 5, 2016 at 11:03 AM, Punit Naik wrote: > Okay, so it was a configuration mistake on my part. but still for me the > start-cluster.sh command won't work. It only

How to choose the 'parallelism.default' value

2016-05-05 Thread Punit Naik
Hello I was running a program with 'parallelism.default' of 384 as I read in the documentation on Flink's official page that 'parallelism.default' is "the total number of CPUs in the cluster". I have four machines with 96 cores on each of them. So 96*4=384. But the program thew an error saying:

Re: Flink - start-cluster.sh

2016-05-05 Thread Punit Naik
Okay, so it was a configuration mistake on my part. but still for me the start-cluster.sh command won't work. It only starts the Jobmanager on the master node for me. Therefore I had to manually start Taskmanagers on every node and it worked fine. Anyone familiar with this issue? On Wed, May 4,

Re: Discussion about a Flink DataSource repository

2016-05-05 Thread Flavio Pompermaier
HI Fabian, thanks for your detailed answer, as usual ;) I think that an external service it's ok,actually I wasn't aware of the TableSource interface. As you said, an utility to serialize and deserialize them would be very helpful and will ease this thing. However, registering metadata for a