The problem is not the fact that the timeout exceptions are being thrown.  We 
have tried with and without the timeout setting and, in both cases, we end up 
with threads that are stalled and not consuming data. Thus the problem is 
consumers that are registered and not consuming and no rebalancing is done  We 
suspected a problem with zookeeper but we have run smoke and latency tests and 
got reasonable results.

-drew

Sent from Moxier Mail
(http://www.moxier.com)


----- Original Message -----
From: Jun Rao <jun...@gmail.com>
To: "users@kafka.apache.org" <users@kafka.apache.org>
Sent: 8/13/2013 10:17 PM
Subject: Re: Kafka Consumer Threads Stalled



If you don't want to see ConsumerTimeoutException, just set
consumer.timeout.ms to -1. If you do need consumer.timeout.ms larger than
0, make sure that on ConsumerTimeoutException,  your consumer thread loops
back and calls hasNext() on the iterator to resume the consumption.

Thanks,

Jun


On Tue, Aug 13, 2013 at 4:57 PM, Drew Daugherty <
drew.daughe...@returnpath.com> wrote:

> Hi,
>
> We are using zookeeper 3.3.6 with kafka 0.7.2. We have a topic with 8
> partitions on each of 3 brokers that we are consuming with a consumer group
> with multiple threads.  We are using the following settings for our
> consumers:
> zk.connectiontimeout.ms=12000000
> fetch_size=52428800
> queuedchunks.max=6
> consumer.timeout.ms=5000
>
> Our brokers have the following configuration:
> socket.send.buffer=1048576
> socket.receive.buffer=1048576
> max.socket.request.bytes=104857600
> log.flush.interval=10000
> log.default.flush.interval.ms=1000
> log.default.flush.scheduler.interval.ms=1000
> log.retention.hours=4
> log.file.size=536870912
> enable.zookeeper=true
> zk.connectiontimeout.ms=6000
> zk.sessiontimeout.ms=6000
> max.message.size=52428800
>
> We are noticing that after the consumer runs for a short while, some
> threads stop consuming and start throwing the following timeout exceptions:
> kafka.consumer.ConsumerTimeoutException
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:66)
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:32)
>         at
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
>         at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
>
> When this happens, message consumption on the affected partitions doesn't
> recover but stalls and the consumer offset remains frozen.  The exceptions
> also continue to be thrown in the logs as the thread logic logs the error
> then tries to create another iterator from the stream and consume from it.
>  We also notice that consumption tends to freeze on 2/3 brokers but there
> is one that always seems to keep the consumers fed.  Are there settings or
> logic we can use to avoid or recover from such exceptions?
>
> -drew
>

Reply via email to