Turn on GC logging (verbose time stamps) and see how long your pauses are. 

Sure, try increasing the timeout to see if it fixes the problem, but I would 
hesitate making that change permanent until you understand the problem better. 

You could also profile your consumer to see where it is spending its time.  
Perhaps you can make your message consumption quicker. I am sure the core 
commiters would also have some ideas.

Philip

----------------------------------
http://www.philipotoole.com

> On Aug 7, 2014, at 3:06 PM, Jason Rosenberg <j...@squareup.com> wrote:
> 
> Yeah, it's possible that's happening (but no smoking gun).  The main thing 
> I'm seeing is that when it actually takes the time to process messages, it 
> takes longer to get back to the ConsumerIterator for the next message.  That 
> alone seems to be the problem (does that make any sense)?  I would have 
> thought the zk listeners are in separate async threads (and that's what it 
> looks like looking at the kafka consumer code).
> 
> Maybe I should increase the zk session timeout and see if that helps.
> 
> 
>> On Thu, Aug 7, 2014 at 2:56 PM, Philip O'Toole 
>> <philip.oto...@yahoo.com.invalid> wrote:
>> A big GC pause in your application, for example, could do it.
>> 
>> Philip
>> 
>>  
>> -----------------------------------------
>> http://www.philipotoole.com
>> 
>> 
>> On Thursday, August 7, 2014 11:56 AM, Philip O'Toole 
>> <philip.oto...@yahoo.com> wrote:
>> 
>> 
>> 
>> I think the question is what in your consuming application could cause it 
>> not to check in with ZK for longer than the timeout.
>> 
>>  
>> -----------------------------------------
>> http://www.philipotoole.com
>> 
>> 
>> On Thursday, August 7, 2014 8:16 AM, Jason Rosenberg <j...@squareup.com> 
>> wrote:
>> 
>> 
>> 
>> Well, it's possible that when processing, it might take longer than the
>> zookeeper timeout to process a message, intermittently.  Would that cause a
>> zookeeper timeout?
>> 
>> (btw I'm usind 0.8.1.1).
>> 
>> 
>> 
>> On Thu, Aug 7, 2014 at 2:30 AM, Clark Haskins <chask...@linkedin.com.invalid
>> > wrote:
>> 
>> > Is your application possibly timing out its zookeeper connection during
>> > consumption while doing its processing, thus triggering the rebalance?
>> >
>> > -Clark
>> >
>> > On 8/6/14, 11:18 PM, "Jason Rosenberg" <j...@squareup.com> wrote:
>> >
>> > >We've noticed that some of our consumers are more likely to repeatedly
>> > >trigger rebalancing when the app is consuming messages more slowly (e.g.
>> > >persisting data to back-end systems, etc.).
>> > >
>> > >If on the other hand we 'fast-forward' the consumer (which essentially
>> > >means we tell it to consume but do nothing with the messages until all
>> > >caught up), it will never decide to do a rebalance during this time.  So
>> > >it
>> > >can go hours without rebalancing while fast forwarding and consuming super
>> > >fast, while during normal processing, it might decide to rebalance every
>> > >minute or
>>  so.
>> > >
>> > >Is there any simple explanation for this?
>> > >
>> > >Usually the trigger for rebalance logged is that a "topic info for path X
>> > >has changed to Y, triggering rebalance".
>> > >
>> > >Thanks for any ideas.
>> > >
>> > >We'd like to reduce the rebalancing, as it essentially slows down
>> > >consumption each time it happens.
>> > >
>> > >Thanks
>> > >
>> > >Jason
>> >
>> >
> 

Reply via email to