Re: understanding OffsetOutOfRangeException's....

Jason Rosenberg Tue, 07 Jan 2014 18:48:22 -0800

So, sometimes I just get the WARN from the ConsumerFetcherThread (as
previously noted, above), e.g.:


2014-01-08 02:31:47,394  WARN [ConsumerFetcherThread-myconsumerapp-11]
consumer.ConsumerFetcherThread -
[ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970
for partition [mypartition,0] out of range; reset offset to
16175326044

More recently, I see these in the following log line (not sure why I
didn't see it previously), coming from the ConsumerIterator:

2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match
fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598:
consumed offset = 16163904970;
 Consumer may lose data

Why would I not see this second ERROR everytime there's a
corresponding WARN on the FetcherThread for an offset reset?

Should I only be concerned about possible lost data if I see the
second ERROR log line?

Jason

On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <j...@squareup.com> wrote:
> But I assume this would not be normally you'd want to log (every
> incoming producer request?).  Maybe just for debugging?  Or is it only
> for consumer fetch requests?
>
> On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>> TRACE is lower than INFO so INFO level request logging would also be
>> recorded.
>>
>> You can check for "Completed XXX request" in the log files to check the
>> request info with the correlation id.
>>
>> Guozhang
>>
>>
>> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg <j...@squareup.com> wrote:
>>
>>> Hmmm, it looks like I'm enabling all logging at INFO, and the request
>>> logging is only done at TRACE (why is that?).
>>>
>>> I suppose one wouldn't normally want to see request logs, so by default,
>>> they aren't enabled?
>>>
>>>
>>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao <jun...@gmail.com> wrote:
>>>
>>> > Did you enable request log? It logs the ip of every request.
>>> >
>>> > Thanks,
>>> >
>>> > Jun
>>> >
>>> >
>>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg <j...@squareup.com>
>>> wrote:
>>> >
>>> > > Hi Guozhang,
>>> > >
>>> > > I'm not sure I understand your first answer.  I don't see anything
>>> > > regarding the correlation id, elsewhere in the broker logs.....They
>>> only
>>> > > show up in those ERROR messages....
>>> > >
>>> > > I do see correlation id's in clients, but not on the broker.....
>>> > >
>>> > > Jason
>>> > >
>>> > >
>>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang <wangg...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Jason,
>>> > > >
>>> > > > You can search the correlation id in the public access log on the
>>> > servers
>>> > > > to get the consumer information.
>>> > > >
>>> > > > As for logging, I agree that we should use the same level on both
>>> > sides.
>>> > > > Could you file a jira for this?
>>> > > >
>>> > > > Guozhang
>>> > > >
>>> > > >
>>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg <j...@squareup.com>
>>> > > wrote:
>>> > > >
>>> > > > > In our broker logs, we occasionally see errors like this:
>>> > > > >
>>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2]
>>> > > server.KafkaApis
>>> > > > -
>>> > > > > [KafkaApi-45] Error when processing fetch request for partition
>>> > > > [mytopic,0]
>>> > > > > offset 204243601 from consumer with correlation id 130341
>>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset
>>> 204243601
>>> > > but
>>> > > > we
>>> > > > > only have log segments in the range 204343397 to 207423640.
>>> > > > >
>>> > > > > I assume this means there's a consumer that has fallen behind
>>> > consuming
>>> > > > > messages, and the log retention policy has removed messages before
>>> > they
>>> > > > > could be consumed by the consumer.
>>> > > > >
>>> > > > > However, I'm not 100% which consumer it is, and it looks like the
>>> > only
>>> > > > info
>>> > > > > we have is the correlation id of the consumer, e.g.:
>>> > > > >
>>> > > > > "from consumer with correlation id 130341"
>>> > > > >
>>> > > > > Is there a way to know which consumer this refers to?  It seems
>>> there
>>> > > are
>>> > > > > far more correlation id's than there are consumers.  Would it be
>>> > > possible
>>> > > > > to provide a bit more descriptive error message here, so we can
>>> > > > immediately
>>> > > > > know which consumer is falling behind?
>>> > > > >
>>> > > > > We do see a corresponding entry in the consumer logs too:
>>> > > > >
>>> > > > > 2013-12-23 05:02:08,797  WARN
>>> > > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
>>> > > > > consumer.ConsumerFetcherThread -
>>> > > > >
>>> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
>>> > > > > Current offset 204243601 for partition [mytopic,0] out of range;
>>> > reset
>>> > > > > offset to 204343397
>>> > > > >
>>> > > > > But it would be nice to be able to also use the broker log to
>>> quickly
>>> > > > find
>>> > > > > consumers with issues.
>>> > > > >
>>> > > > > Also, I'm not sure, why is logging the event as an ERROR in the
>>> > broker,
>>> > > > but
>>> > > > > a WARN in the consumer?
>>> > > > >
>>> > > > > Jason
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > -- Guozhang
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> -- Guozhang

Re: understanding OffsetOutOfRangeException's....

Reply via email to