So, sometimes I just get the WARN from the ConsumerFetcherThread (as previously noted, above), e.g.:
2014-01-08 02:31:47,394 WARN [ConsumerFetcherThread-myconsumerapp-11] consumer.ConsumerFetcherThread - [ConsumerFetcherThread-myconsumerapp-11], Current offset 16163904970 for partition [mypartition,0] out of range; reset offset to 16175326044 More recently, I see these in the following log line (not sure why I didn't see it previously), coming from the ConsumerIterator: 2014-01-08 02:31:47,681 ERROR [myconsumerthread-0] consumer.ConsumerIterator - consumed offset: 16163904970 doesn't match fetch offset: 16175326044 for mytopic:0: fetched offset = 16175330598: consumed offset = 16163904970; Consumer may lose data Why would I not see this second ERROR everytime there's a corresponding WARN on the FetcherThread for an offset reset? Should I only be concerned about possible lost data if I see the second ERROR log line? Jason On Tue, Dec 24, 2013 at 3:49 PM, Jason Rosenberg <j...@squareup.com> wrote: > But I assume this would not be normally you'd want to log (every > incoming producer request?). Maybe just for debugging? Or is it only > for consumer fetch requests? > > On Tue, Dec 24, 2013 at 12:50 PM, Guozhang Wang <wangg...@gmail.com> wrote: >> TRACE is lower than INFO so INFO level request logging would also be >> recorded. >> >> You can check for "Completed XXX request" in the log files to check the >> request info with the correlation id. >> >> Guozhang >> >> >> On Mon, Dec 23, 2013 at 10:46 PM, Jason Rosenberg <j...@squareup.com> wrote: >> >>> Hmmm, it looks like I'm enabling all logging at INFO, and the request >>> logging is only done at TRACE (why is that?). >>> >>> I suppose one wouldn't normally want to see request logs, so by default, >>> they aren't enabled? >>> >>> >>> On Mon, Dec 23, 2013 at 11:46 PM, Jun Rao <jun...@gmail.com> wrote: >>> >>> > Did you enable request log? It logs the ip of every request. >>> > >>> > Thanks, >>> > >>> > Jun >>> > >>> > >>> > On Mon, Dec 23, 2013 at 3:52 PM, Jason Rosenberg <j...@squareup.com> >>> wrote: >>> > >>> > > Hi Guozhang, >>> > > >>> > > I'm not sure I understand your first answer. I don't see anything >>> > > regarding the correlation id, elsewhere in the broker logs.....They >>> only >>> > > show up in those ERROR messages.... >>> > > >>> > > I do see correlation id's in clients, but not on the broker..... >>> > > >>> > > Jason >>> > > >>> > > >>> > > On Mon, Dec 23, 2013 at 6:46 PM, Guozhang Wang <wangg...@gmail.com> >>> > wrote: >>> > > >>> > > > Jason, >>> > > > >>> > > > You can search the correlation id in the public access log on the >>> > servers >>> > > > to get the consumer information. >>> > > > >>> > > > As for logging, I agree that we should use the same level on both >>> > sides. >>> > > > Could you file a jira for this? >>> > > > >>> > > > Guozhang >>> > > > >>> > > > >>> > > > On Mon, Dec 23, 2013 at 3:09 PM, Jason Rosenberg <j...@squareup.com> >>> > > wrote: >>> > > > >>> > > > > In our broker logs, we occasionally see errors like this: >>> > > > > >>> > > > > 2013-12-23 05:02:08,456 ERROR [kafka-request-handler-2] >>> > > server.KafkaApis >>> > > > - >>> > > > > [KafkaApi-45] Error when processing fetch request for partition >>> > > > [mytopic,0] >>> > > > > offset 204243601 from consumer with correlation id 130341 >>> > > > > kafka.common.OffsetOutOfRangeException: Request for offset >>> 204243601 >>> > > but >>> > > > we >>> > > > > only have log segments in the range 204343397 to 207423640. >>> > > > > >>> > > > > I assume this means there's a consumer that has fallen behind >>> > consuming >>> > > > > messages, and the log retention policy has removed messages before >>> > they >>> > > > > could be consumed by the consumer. >>> > > > > >>> > > > > However, I'm not 100% which consumer it is, and it looks like the >>> > only >>> > > > info >>> > > > > we have is the correlation id of the consumer, e.g.: >>> > > > > >>> > > > > "from consumer with correlation id 130341" >>> > > > > >>> > > > > Is there a way to know which consumer this refers to? It seems >>> there >>> > > are >>> > > > > far more correlation id's than there are consumers. Would it be >>> > > possible >>> > > > > to provide a bit more descriptive error message here, so we can >>> > > > immediately >>> > > > > know which consumer is falling behind? >>> > > > > >>> > > > > We do see a corresponding entry in the consumer logs too: >>> > > > > >>> > > > > 2013-12-23 05:02:08,797 WARN >>> > > > > [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45] >>> > > > > consumer.ConsumerFetcherThread - >>> > > > > >>> [ConsumerFetcherThread-myconsumergroup-1387353494862-7aa0c61d-0-45], >>> > > > > Current offset 204243601 for partition [mytopic,0] out of range; >>> > reset >>> > > > > offset to 204343397 >>> > > > > >>> > > > > But it would be nice to be able to also use the broker log to >>> quickly >>> > > > find >>> > > > > consumers with issues. >>> > > > > >>> > > > > Also, I'm not sure, why is logging the event as an ERROR in the >>> > broker, >>> > > > but >>> > > > > a WARN in the consumer? >>> > > > > >>> > > > > Jason >>> > > > > >>> > > > >>> > > > >>> > > > >>> > > > -- >>> > > > -- Guozhang >>> > > > >>> > > >>> > >>> >> >> >> >> -- >> -- Guozhang