date:20160505

Regarding source Parallelism and usage of coflatmap transformation

2016-05-05 Thread Biplob Biswas

Hi, I have 2 different questions, both influencing each other in a way. *1)* I am getting a stream of tuples from a data generator using the following statements, "env.addSource(new DataStreamGenerator(filePath));" This generator reads a line from the file and splits it into different attributes

Writing Intermediates to disk

2016-05-05 Thread Paschek, Robert

Hi Mailing List, I want to write and read intermediates to/from disk. The following foo- codesnippet may illustrate my intention: public void mapPartition(Iterable tuples, Collector out) { for (T tuple : tuples) { if (Condition)

Re: Restart Flink in Yarn

2016-05-05 Thread Robert Metzger

Hi Dominic, I'm sorry that you ran into this issue. What do you mean by "flink streaming routes" ? Regarding the second question: "Now I want to restart these routes to continue their work from the last checkpoint. What can i do?" I think the feature you are looking for are savepoints: https://ci.

Re: Flink + Kafka + Scalabuff issue

2016-05-05 Thread Robert Metzger

Hi Alex, thanks for the update. I'm happy to hear you were able to resolve the issue. How are the other ad-hoc streaming pipelines setup? Maybe these pipelines use a different threading model than Flink. In Flink, we often have many instances of the same serializer running in the same JVM. Maybe th

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Robert Schmidtke

The TM's request the buffers in batches, so you 384 were requested, but only 200 were left in the pool. This means your overall pool size is too small. Here is the relevant section from the documentation: https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html#configuring-th

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Robert Metzger

The default value of taskmanager.network.numberOfBuffers is 2048. I would recommend to use a multiple of that value, for example 16384 (given that you have enough memory per TaskManager) I recommend checking out these slides I created a while ago. They explain what the network buffers are needed f

Re: Accessing elements from DataStream

2016-05-05 Thread Robert Metzger

Yes. Here: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#example-program On Thu, May 5, 2016 at 1:16 PM, Piyush Shrivastava wrote: > Hi Robert, > > Can you share an example where flatmap is used to access elements? > > Thanks and Regards, > Piyush Shrivastava

Re: Diff between stop and cancel job

2016-05-05 Thread Robert Metzger

Hi Abhi, The difference between cancelling and stopping a (streaming) job is the following: On a cancel call, the operators in a job immediately receive a `cancel()` method call to cancel them as soon as possible. If operators are not not stopping after the cancel call, Flink will start interrupt

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Punit Naik

Yes I followed it and changed it to 298 but again it said the same thing. The only change was that it now said "required 298, but only 200 available". Why did it say that? On Thu, May 5, 2016 at 4:50 PM, Robert Metzger wrote: > Hi, > > I think you've chosen a good initial value for the parallel

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Robert Metzger

Hi, I think you've chosen a good initial value for the parallelism. The higher the parallelism, the more network buffers are needed. I would follow the recommendation from the exception and increase the number of network buffers. On Thu, May 5, 2016 at 11:23 AM, Punit Naik wrote: > Hello > > I

Re: Accessing elements from DataStream

2016-05-05 Thread Piyush Shrivastava

Hi Robert, Can you share an example where flatmap is used to access elements? Thanks and Regards,Piyush Shrivastava http://webograffiti.com On Thursday, 5 May 2016 4:45 PM, Robert Metzger wrote: Hi, you can just use a flatMap() on a DataStream to access individual elements from a stre

Re: Accessing elements from DataStream

2016-05-05 Thread Robert Metzger

Hi, you can just use a flatMap() on a DataStream to access individual elements from a stream. On Thu, May 5, 2016 at 1:00 PM, Piyush Shrivastava wrote: > Hi all, > > Can we access individual elements from a DataStream through an iterator > like we can in a WindowedStream with the apply function

Re: Any way for Flink Elasticsearch connector reflecting IP change of Elasticsearch cluster?

2016-05-05 Thread Robert Metzger

Hi, the Kafka connector is able to handle leader changes. On Wed, May 4, 2016 at 5:46 PM, Fabian Hueske wrote: > Sorry, I confused the mail threads. We're already on the user list :-) > Thanks for the suggestion. > > 2016-05-04 17:35 GMT+02:00 Fabian Hueske : > >> I'm not so much familiar with

Accessing elements from DataStream

2016-05-05 Thread Piyush Shrivastava

Hi all, Can we access individual elements from a DataStream through an iterator like we can in a WindowedStream with the apply function? I am able to access the elements of a WindowedStream using the apply function and using the Iterable and Collector interfaces: val ds = ws.apply((K, W, input: I

Re: Regarding Broadcast of datasets in streaming context

2016-05-05 Thread Biplob Biswas

This is exactly what I am confused about, if i understand it correctly each of the map function in the co-flat map would receive one tuple each at a time .. so that would mean if i have a datastream of centroids, it would arrive one at a time on the partitions and that would defeat the purpose. Ar

Re: Flink - start-cluster.sh

2016-05-05 Thread Punit Naik

Yes. On Thu, May 5, 2016 at 3:04 PM, Flavio Pompermaier wrote: > Do you run the start-cluster.sh script with the same user having the ssh > passwordless login? > > > On Thu, May 5, 2016 at 11:03 AM, Punit Naik > wrote: > >> Okay, so it was a configuration mistake on my part. but still for me th

Window

2016-05-05 Thread toletum

Hi! I want a window which should be fired each 5 seconds, whether or not there events. env.socketTextStream("192.168.1.101", ) .map(new mapper()) .keyBy(0) .timeWindow(Time.seconds(5)) .apply(new MyRichWindowFunction()) It works if the window has event

Re: Flink - start-cluster.sh

2016-05-05 Thread Flavio Pompermaier

Do you run the start-cluster.sh script with the same user having the ssh passwordless login? On Thu, May 5, 2016 at 11:03 AM, Punit Naik wrote: > Okay, so it was a configuration mistake on my part. but still for me the > start-cluster.sh command won't work. It only starts the Jobmanager on the >

How to choose the 'parallelism.default' value

2016-05-05 Thread Punit Naik

Hello I was running a program with 'parallelism.default' of 384 as I read in the documentation on Flink's official page that 'parallelism.default' is "the total number of CPUs in the cluster". I have four machines with 96 cores on each of them. So 96*4=384. But the program thew an error saying: C

Re: Flink - start-cluster.sh

2016-05-05 Thread Punit Naik

Okay, so it was a configuration mistake on my part. but still for me the start-cluster.sh command won't work. It only starts the Jobmanager on the master node for me. Therefore I had to manually start Taskmanagers on every node and it worked fine. Anyone familiar with this issue? On Wed, May 4, 20

Re: Discussion about a Flink DataSource repository

2016-05-05 Thread Flavio Pompermaier

HI Fabian, thanks for your detailed answer, as usual ;) I think that an external service it's ok,actually I wasn't aware of the TableSource interface. As you said, an utility to serialize and deserialize them would be very helpful and will ease this thing. However, registering metadata for a table

Regarding source Parallelism and usage of coflatmap transformation

Writing Intermediates to disk

Re: Restart Flink in Yarn

Re: Flink + Kafka + Scalabuff issue

Re: How to choose the 'parallelism.default' value

Re: How to choose the 'parallelism.default' value

Re: Accessing elements from DataStream

Re: Diff between stop and cancel job

Re: How to choose the 'parallelism.default' value

Re: How to choose the 'parallelism.default' value

Re: Accessing elements from DataStream

Re: Accessing elements from DataStream

Re: Any way for Flink Elasticsearch connector reflecting IP change of Elasticsearch cluster?

Accessing elements from DataStream

Re: Regarding Broadcast of datasets in streaming context

Re: Flink - start-cluster.sh

Window

Re: Flink - start-cluster.sh

How to choose the 'parallelism.default' value

Re: Flink - start-cluster.sh

Re: Discussion about a Flink DataSource repository

21 matches

Site Navigation

Mail list logo

Footer information