Re: [Discussion] Query Regarding Operator chaining

2016-07-14 Thread Robert Metzger
Aljoscha is right. Multiple consumers in the same consumer group can not read from the same partition. You'll need to create a Kafka topic with more partitions to have higher parallelism. On Wed, Jul 6, 2016 at 10:45 AM, Aljoscha Krettek wrote: > Hi, > unfortunately the reading of one Kafka part

Re: [Discussion] Query Regarding Operator chaining

2016-07-06 Thread Aljoscha Krettek
Hi, unfortunately the reading of one Kafka partition cannot be split among several parallel instances of the Kafka source. So if you have only 2 partitions your reading parallelism is limited to that. You are right that this can lead to bad performance and underutilization. The only solution I see

Re: [Discussion] Query Regarding Operator chaining

2016-07-05 Thread Vinay Patil
Hi, The re-balance actually distributes it to all the task managers, and now all TM's are getting utilized, You were right , I am seeing two boxes(Tasks) now. I have one question regarding the task slots : For the source the parallelism is set to 56, now when we see on the UI and click on source

Re: [Discussion] Query Regarding Operator chaining

2016-07-04 Thread Vinay Patil
Thanks a lot guys, this helps to understand better Regards, Vinay Patil On Mon, Jul 4, 2016 at 8:43 PM, Stephan Ewen wrote: > Just to be sure: Each *subtask* has one thread - so for each task, there > are as many parallel threads (distributed across nodes) as your parallelism > indicates. > > F

Re: [Discussion] Query Regarding Operator chaining

2016-07-04 Thread Stephan Ewen
Just to be sure: Each *subtask* has one thread - so for each task, there are as many parallel threads (distributed across nodes) as your parallelism indicates. For most cases, having long chains and then a higher parallelism is a good choice. Cases where individual functions (MapFunction, etc) do

Re: [Discussion] Query Regarding Operator chaining

2016-07-04 Thread Aljoscha Krettek
Hi, chaining is useful to minimize communication overhead. But in your case you might benefit more from having good cluster utilization. There seems to be a tradeoff. Maybe you can run some easy tests to see how it behaves for you. Cheers, Aljoscha On Mon, 4 Jul 2016 at 16:28 Vinay Patil wrote:

Re: [Discussion] Query Regarding Operator chaining

2016-07-04 Thread Vinay Patil
Thanks, so is operator chaining useful in terms of utilizing the resources or we should keep the chaining to minimal use, say 3-4 operators and disable chaining ? I am worried because I am seeing all the operators in one box on flink UI. Regards, Vinay Patil On Mon, Jul 4, 2016 at 7:13 PM, Aljo

Re: [Discussion] Query Regarding Operator chaining

2016-07-04 Thread Aljoscha Krettek
Hi, this is true, yes. If the number of Kafka partitions is less than the parallelism then some of the sources might not be utilized. If you insert a rebalance after the sources you should be able to utilize all the downstream operations equally. Cheers, Aljoscha On Mon, 4 Jul 2016 at 11:13 Vinay

Re: [Discussion] Query Regarding Operator chaining

2016-07-04 Thread Vinay Patil
Just an update, the task will be executed by multiple threads , my bad I asked the wrong way. Can you please clarify other things. Out of 8 node only 3 of them are getting utilized, reading the data from Kafka , does it mean that the Kafka partitions are set to less number ? What if we use rescal

[Discussion] Query Regarding Operator chaining

2016-07-01 Thread Vinay Patil
Hi, According to the documentation : *"**Each task is executed by one thread ,**Chaining operators together into tasks is a useful optimization: it reduces the overhead of thread-to-thread handover and buffering, and increases overall throughput while decreasing latency"* So does it mean that the