The reason for this is the mechanic by which each of the lags are
calculated. MaxLag (and the FetcherLagMetric) are calculated by the
consumer itself using the difference between the offset it knows it is at,
and the offset that the broker has as the end of the partition. The offset
checker, however, uses the last offset that the consumer committed.
Depending on your configuration, this is somewhere behind where the
consumer actually is. For example, if your commit interval is set to 10
minutes, the number used by the offset checker can be up to 10 minutes
behind where it actually is.

So while MaxLag may be more up to date at any given time, it's actually
less accurate. Because MaxLag relies on the consumer to report it, if the
consumer breaks, you will not see an accurate lag number. This is why when
we are checking consumer lag, we use an external process that uses the
committed consumer offsets. This allows us to catch a broken consumer, as
well as an active consumer that is just falling behind.

-Todd


On Fri, Feb 13, 2015 at 9:34 PM, tao xiao <xiaotao...@gmail.com> wrote:

> Thanks Joel. But I discover that both MaxLag and FetcherLagMetrics are
> always
> much smaller than the lag shown in offset checker. any reason?
>
> On Sat, Feb 14, 2015 at 7:22 AM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
> > There are FetcherLagMetrics that you can take a look at. However, it
> > is probably easiest to just monitor MaxLag as that reports the maximum
> > of all the lag metrics.
> >
> > On Fri, Feb 13, 2015 at 05:03:28PM +0800, tao xiao wrote:
> > > Hi team,
> > >
> > > Is there a metric that shows the consumer lag of a particular consumer
> > > group? similar to what offset checker provides
> > >
> > > --
> > > Regards,
> > > Tao
> >
> >
>
>
> --
> Regards,
> Tao
>

Reply via email to