Re: Flink Kafka more consumers than partitions

2016-09-05 Thread Maximilian Michels
Thanks for letting us know! On Sat, Sep 3, 2016 at 12:42 PM, neo21 zerro wrote: > Hi all, > > It turns out that there were other factors influencing my performance tests. > (actually hbase) > Hence, more consumers than partitions in Flink was not the problem. > Thanks for the help! > > > On Wedne

Re: Flink Kafka more consumers than partitions

2016-09-03 Thread neo21 zerro
Hi all,  It turns out that there were other factors influencing my performance tests. (actually hbase)Hence, more consumers than partitions in Flink was not the problem. Thanks for the help! On Wednesday, August 3, 2016 5:42 PM, neo21 zerro wrote: Hello,  I've tried to increase the ne

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread neo21 zerro
Hello,  I've tried to increase the network buffers but I didn't get any performance improvement. However, I have to re-run some tests just to be sure that the testing was not influenced by other factors. Will get back with more info.  Thanks for the help for now. On Wednesday, August 3, 201

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread neo21 zerro
It's the default, ProcessingTime.  On Wednesday, August 3, 2016 12:07 PM, Stephan Ewen wrote: Hi! Are you running on ProcessingTime or on EventTime? Thanks,Stephan On Wed, Aug 3, 2016 at 11:57 AM, neo21 zerro wrote: Hi guys, Thanks for getting back to me. So to clarify:     Topolog

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread neo21 zerro
Hi Stephan,  Yes,  I use key by between the source and the window operator.Interesting theory, will try it out and get back to you :)  Thanks! On Wednesday, August 3, 2016 12:14 PM, Stephan Ewen wrote: Do you use a keyBy() between the source and the window operator? One think I can thi

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread Stephan Ewen
Do you use a keyBy() between the source and the window operator? One think I can think of is the following: - With the higher source parallelism, you have more logical connections (each source rebalances across all window operators). - with source parallelism 20, you have 20 * 160 = 3200 logi

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread Stephan Ewen
Hi! Are you running on ProcessingTime or on EventTime? Thanks, Stephan On Wed, Aug 3, 2016 at 11:57 AM, neo21 zerro wrote: > Hi guys, > > Thanks for getting back to me. > > So to clarify: > Topology wise flink kafka source (does avro deserialization and small > map) -> window operator whi

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread neo21 zerro
Hi guys, Thanks for getting back to me. So to clarify: Topology wise flink kafka source (does avro deserialization and small map) -> window operator which does batching for 3 seconds -> hbase sink Experiments: 1. flink source: parallelism 40 (20 idle tasks) -> window operator: parallel

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread Sameer Wadkar
What is the parallelism of the sink or the operator which writes to the sinks in the first case. HBase puts are constrained by the following: 1. How your regions are distributed. Are you pre-splitting your regions for the table. Do you know the number of regions your Hbase tables are split into.

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread Stephan Ewen
Hi! That is interesting, indeed. The idle sources should not create backpressure. In fact, sources cannot create back pressure, because back pressure pressures backwards and there is nothing backwards from the sources ;-) Do you adjust also the parallelism of the operator that interacts with HBas

Flink Kafka more consumers than partitions

2016-08-03 Thread neo21 zerro
Hello everybody, I'm using Flink Kafka consumer 0.8.x with kafka 0.8.2 and flink 1.0.3 on YARN. In kafka I have a topic which have 20 partitions and my flink topology reads from kafka (source) and writes to hbase (sink). when: 1. flink source has parallelism set to 40 (20 of the tasks are