Re: understanding OffsetOutOfRangeException's....

Jason Rosenberg Wed, 08 Jan 2014 16:30:43 -0800

Joe,

I'm creating separate connectors, and creating separate streams, with a
separate thread pool to process them, for each connector.


This appears to be working well (e.g. each connector seems to be correctly
processing data).

The only difference is the extra ERROR log message I'm seeing on the
ConsumerIterator....

Jazson


On Wed, Jan 8, 2014 at 1:59 PM, Joe Stein <joe.st...@stealth.ly> wrote:

> thinking/typing out loud here not sure if this is the problem but could be
> so figure I throw it out there
>
> the ZookeeperConsumerConnector has a messageStreamCreated atomic boolean
> stopping more than one consumer connector being created for messages stream
> by filter at once...
>
> do you have separate instantiated objects of your ConsumerConnector (so
> calling ConsumerConnector.create for each one) or do you have one
> ConsumerConnector.create and then you call createMessageStreams multiple
> times on it (which would cause a RuntimeException so probably not but maybe
> your code is swallowing that)... if the latter that could explain why it is
> lagging behind unexpectedly since really one ConsumerConnector is
> running....
>
> again, you might not be doing this but figure I throw it out there in case
> you were.
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Wed, Jan 8, 2014 at 1:44 PM, Jason Rosenberg <j...@squareup.com> wrote:
>
> > Yes, it's happening continuously, at the moment (although I'm expecting
> the
> > consumer to catch up soon)....
> >
> > It seemed to start happening after I refactored the consumer app to use
> > multiple consumer connectors in the same process (each one has a separate
> > topic filter, so should be no overlap between them).  All using the same
> > consumer group.
> >
> > Could it be a thread safety issue in the ZookeeperConsumerConnector
> (seems
> > unlikely).
> >
> > Jason
> >
> >
> > On Wed, Jan 8, 2014 at 1:04 AM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Normally, if the consumer can't keep up, you should just see the
> > > OffsetOutOfRangeException warning. The offset mismatch error should
> never
> > > happen. It could be that OffsetOutOfRangeException exposed a bug. Do
> you
> > > think you can reproduce this easily?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Jan 7, 2014 at 9:29 PM, Jason Rosenberg <j...@squareup.com>
> > wrote:
> > >
> > > > Jun,
> > > >
> > > > I'm not sure I understand your question, wrt produced data?
> > > >
> > > > But yes, in general, I believe the consumer is not keeping up with
> the
> > > > broker's deleting the data.  So it's trying to fetch the next batch
> of
> > > > data, but it's last offset is no longer there, etc.  So that's the
> > reason
> > > > for the WARN message, in the fetcher thread.
> > > >
> > > > I'm just not sure I understand then why we don't always see the
> > > > ConsumerIterator error also, because won't there always be missing
> data
> > > > detected there?  Why sometimes and not always?  What's the
> difference?
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Jan 8, 2014 at 12:07 AM, Jun Rao <jun...@gmail.com> wrote:
> > > >
> > > > > The WARN and ERROR may not be completely correlated. Could it be
> that
> > > the
> > > > > consumer is slow and couldn't keep up with the produced data?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Jan 7, 2014 at 6:47 PM, Jason Rosenberg <j...@squareup.com>
> > > > wrote:
> > > > >
> > > > > > So, sometimes I just get the WARN from the ConsumerFetcherThread
> > (as
> > > > > > previously noted, above), e.g.:
> > > > > >
> > > > > > 2014-01-08 02:31:47,394  WARN
> > > [ConsumerFetcherThread-myconsumerapp-11]
> > > > > > consumer.ConsumerFetcherThread -
> > > > > > [ConsumerFetcherThread-myconsumerapp-11], Current offset
> > 16163904970
> > > > > > for partition [mypartition,0] out of range; reset offset to
> > > > > > 16175326044
> > > > > >
> > > > > > More recently, I see these in the following log line (not sure
> why
> > I
> > > > > > didn't see it previously), coming from the ConsumerIterator:
> > > > > >
> > > > > > 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0]
> > > > > > consumer.ConsumerIterator - consumed offset: 16163904970 doesn't
> > > match
> > > > > > fetch offset: 16175326044 for mytopic:0: fetched offset =
> > > 16175330598:
> > > > > > consumed offset = 16163904970;
> > > > > >  Consumer may lose data
> > > > > >
> > > > > > Why would I not see this second ERROR everytime there's a
> > > > > > corresponding WARN on the FetcherThread for an offset reset?
> > > > > >
> > > > > > Should I only be concerned about possible lost data if I see the
> > > > > > second ERROR log line?
> > > > > >
> > > > > > Jason
> > > > > >
> > > > > > On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <
> j...@squareup.com
> > >
> > > > > wrote:
> > > > > > > But I assume this would not be normally you'd want to log
> (every
> > > > > > > incoming producer request?).  Maybe just for debugging?  Or is
> it
> > > > only
> > > > > > > for consumer fetch requests?
> > > > > > >
> > > > > > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <
> > > wangg...@gmail.com>
> > > > > > wrote:
> > > > > > >> TRACE is lower than INFO so INFO level request logging would
> > also
> > > be
> > > > > > >> recorded.
> > > > > > >>
> > > > > > >> You can check for "Completed XXX request" in the log files to
> > > check
> > > > > the
> > > > > > >> request info with the correlation id.
> > > > > > >>
> > > > > > >> Guozhang
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg <
> > > j...@squareup.com
> > > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> Hmmm, it looks like I'm enabling all logging at INFO, and the
> > > > request
> > > > > > >>> logging is only done at TRACE (why is that?).
> > > > > > >>>
> > > > > > >>> I suppose one wouldn't normally want to see request logs, so
> by
> > > > > > default,
> > > > > > >>> they aren't enabled?
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao <jun...@gmail.com>
> > > > wrote:
> > > > > > >>>
> > > > > > >>> > Did you enable request log? It logs the ip of every
> request.
> > > > > > >>> >
> > > > > > >>> > Thanks,
> > > > > > >>> >
> > > > > > >>> > Jun
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg <
> > > > j...@squareup.com
> > > > > >
> > > > > > >>> wrote:
> > > > > > >>> >
> > > > > > >>> > > Hi Guozhang,
> > > > > > >>> > >
> > > > > > >>> > > I'm not sure I understand your first answer.  I don't see
> > > > > anything
> > > > > > >>> > > regarding the correlation id, elsewhere in the broker
> > > > > logs.....They
> > > > > > >>> only
> > > > > > >>> > > show up in those ERROR messages....
> > > > > > >>> > >
> > > > > > >>> > > I do see correlation id's in clients, but not on the
> > > > broker.....
> > > > > > >>> > >
> > > > > > >>> > > Jason
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang <
> > > > > wangg...@gmail.com
> > > > > > >
> > > > > > >>> > wrote:
> > > > > > >>> > >
> > > > > > >>> > > > Jason,
> > > > > > >>> > > >
> > > > > > >>> > > > You can search the correlation id in the public access
> > log
> > > on
> > > > > the
> > > > > > >>> > servers
> > > > > > >>> > > > to get the consumer information.
> > > > > > >>> > > >
> > > > > > >>> > > > As for logging, I agree that we should use the same
> level
> > > on
> > > > > both
> > > > > > >>> > sides.
> > > > > > >>> > > > Could you file a jira for this?
> > > > > > >>> > > >
> > > > > > >>> > > > Guozhang
> > > > > > >>> > > >
> > > > > > >>> > > >
> > > > > > >>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg <
> > > > > > j...@squareup.com>
> > > > > > >>> > > wrote:
> > > > > > >>> > > >
> > > > > > >>> > > > > In our broker logs, we occasionally see errors like
> > this:
> > > > > > >>> > > > >
> > > > > > >>> > > > > 2013-12-23 05:02:08,456 ERROR
> [kafka-request-handler-2]
> > > > > > >>> > > server.KafkaApis
> > > > > > >>> > > > -
> > > > > > >>> > > > > [KafkaApi-45] Error when processing fetch request for
> > > > > partition
> > > > > > >>> > > > [mytopic,0]
> > > > > > >>> > > > > offset 204243601 from consumer with correlation id
> > 130341
> > > > > > >>> > > > > kafka.common.OffsetOutOfRangeException: Request for
> > > offset
> > > > > > >>> 204243601
> > > > > > >>> > > but
> > > > > > >>> > > > we
> > > > > > >>> > > > > only have log segments in the range 204343397 to
> > > 207423640.
> > > > > > >>> > > > >
> > > > > > >>> > > > > I assume this means there's a consumer that has
> fallen
> > > > behind
> > > > > > >>> > consuming
> > > > > > >>> > > > > messages, and the log retention policy has removed
> > > messages
> > > > > > before
> > > > > > >>> > they
> > > > > > >>> > > > > could be consumed by the consumer.
> > > > > > >>> > > > >
> > > > > > >>> > > > > However, I'm not 100% which consumer it is, and it
> > looks
> > > > like
> > > > > > the
> > > > > > >>> > only
> > > > > > >>> > > > info
> > > > > > >>> > > > > we have is the correlation id of the consumer, e.g.:
> > > > > > >>> > > > >
> > > > > > >>> > > > > "from consumer with correlation id 130341"
> > > > > > >>> > > > >
> > > > > > >>> > > > > Is there a way to know which consumer this refers to?
> >  It
> > > > > seems
> > > > > > >>> there
> > > > > > >>> > > are
> > > > > > >>> > > > > far more correlation id's than there are consumers.
> > >  Would
> > > > it
> > > > > > be
> > > > > > >>> > > possible
> > > > > > >>> > > > > to provide a bit more descriptive error message here,
> > so
> > > we
> > > > > can
> > > > > > >>> > > > immediately
> > > > > > >>> > > > > know which consumer is falling behind?
> > > > > > >>> > > > >
> > > > > > >>> > > > > We do see a corresponding entry in the consumer logs
> > too:
> > > > > > >>> > > > >
> > > > > > >>> > > > > 2013-12-23 05:02:08,797  WARN
> > > > > > >>> > > > >
> > > > > >
> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45]
> > > > > > >>> > > > > consumer.ConsumerFetcherThread -
> > > > > > >>> > > > >
> > > > > > >>>
> > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45],
> > > > > > >>> > > > > Current offset 204243601 for partition [mytopic,0]
> out
> > of
> > > > > > range;
> > > > > > >>> > reset
> > > > > > >>> > > > > offset to 204343397
> > > > > > >>> > > > >
> > > > > > >>> > > > > But it would be nice to be able to also use the
> broker
> > > log
> > > > to
> > > > > > >>> quickly
> > > > > > >>> > > > find
> > > > > > >>> > > > > consumers with issues.
> > > > > > >>> > > > >
> > > > > > >>> > > > > Also, I'm not sure, why is logging the event as an
> > ERROR
> > > in
> > > > > the
> > > > > > >>> > broker,
> > > > > > >>> > > > but
> > > > > > >>> > > > > a WARN in the consumer?
> > > > > > >>> > > > >
> > > > > > >>> > > > > Jason
> > > > > > >>> > > > >
> > > > > > >>> > > >
> > > > > > >>> > > >
> > > > > > >>> > > >
> > > > > > >>> > > > --
> > > > > > >>> > > > -- Guozhang
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> -- Guozhang
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: understanding OffsetOutOfRangeException's....

Reply via email to