In my tests , I am using around 24 consumer groups.  I never call 
consumer.close() or consumer.unsubscribe() until the application is shutting 
down.

So the consumers never leave but new consumer instances do get created as the 
parallel requests pile up . Also, I am reusing consumer instances

if they are idle ( i,.e not serving any consume request). So with 9 partitions 
, I do 9 parallel consume requests in parallel every second under the same 
consumer group.

So to summarize I have the following test setup : 3 Kafka brokers , 2 zookeeper 
nodes,  1 topic , 9 partitions , 24 consumer groups and 9 consume requests at a 
time.


________________________________
From: Dana Powers <dana.pow...@gmail.com>
Sent: 19 June 2016 10:45
To: users@kafka.apache.org
Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api

Is your test reusing a group name? And if so, are your consumer instances
gracefully leaving? This may cause subsequent 'rebalance' operations to
block until those old consumers check-in or the session timeout happens
(30secs)

-Dana
On Jun 18, 2016 8:56 PM, "Rohit Sardesai" <rohit.sarde...@outlook.com>
wrote:

> I am using the group management feature of Kafka 0.9 to handle partition
> assignment to consumer instances. I use the subscribe() API to subscribe to
> the topic I am interested in reading data from.  I have an environment
> where I have 3 Kafka brokers  with a couple of Zookeeper nodes . I created
> a topic with 9 partitions . The performance tests attempt to send 9
> parallel poll() requests to the Kafka brokers every second. The results
> show that each poll() operation takes around 30 seconds for the first time
> it polls and returns 0 records. Also , when I print the partition
> assignment to this consumer instance , I see no partitions assigned to it.
> The next poll() does return quickly ( ~ 10-20 ms) with data and some
> partitions assigned to it.
>
> With each consumer taking 30 seconds , the performance tests report very
> low throughput since I run the tests for around 1000 seconds out which I
> produce messages on the topic for the complete duration and I start the
> parallel consume requests after 400 seconds. So out of 400 seconds , with 9
> consumers taking 30 seconds each , around 270 seconds are spent in the
> first poll without any data. Is this because of the re-balance operation
> that the consumers are blocked on the poll() ? What is the best way to use
> poll()  if I have to serve many parallel requests per second ?  Should I
> prefer manual assignment of partitions in this case instead of relying on
> re-balance ?
>
>
> Regards,
>
> Rohit Sardesai
>
>

Reply via email to