Re: Monitoring offset lag

Tom Dearman Fri, 08 Jul 2016 09:21:06 -0700

When you say ‘for the first partition’ do you literally mean partition zero, or 
you mean any partition.  It is true that when I had only 1 user there were only 
messages on partition 15 but the second user happened to go to partition zero.  
Is it the case that partition zero must have a consumer commit?


> On 8 Jul 2016, at 17:16, Todd Palino <[email protected]> wrote:
> 
> If you open up an issue on the project, I'd be happy to dig into this in
> more detail if needed. Excluding the ZK offset checking, Burrow doesn't
> enumerate consumer groups - it learns about them from offset commits. It
> sounds like maybe your consumer had not committed offsets for the first
> partition (at least not after Burrow was started).
> 
> -Todd
> 
> On Friday, July 8, 2016, Tom Dearman <[email protected]> wrote:
> 
>> Todd,
>> 
>> Thanks for that I am taking a look.
>> 
>> Is there a bug whereby if you only have a couple of messages on a topic,
>> both with the same key, that burrow doesn’t return correct info.  I was
>> finding that http://localhost:8100/v2/kafka/betwave/consumer <
>> http://localhost:8100/v2/kafka/betwave/consumer> was returning a message
>> with empty consumers until I put on another message with a different key,
>> i.e. a minimum of 2 partitions with something in them.  I know this is not
>> very like production, but on my local this I was only testing with one user
>> so get just one partition filled.
>> 
>> Tom
>>> On 6 Jul 2016, at 18:08, Todd Palino <[email protected] <javascript:;>>
>> wrote:
>>> 
>>> Yeah, I've written dissertations at this point on why MaxLag is flawed.
>> We
>>> also used to use the offset checker tool, and later something similar
>> that
>>> was a little easier to slot into our monitoring systems. Problems with
>> all
>>> of these is why I wrote Burrow (https://github.com/linkedin/Burrow)
>>> 
>>> For more details, you can also check out my blog post on the release:
>>> 
>> https://engineering.linkedin.com/apache-kafka/burrow-kafka-consumer-monitoring-reinvented
>>> 
>>> -Todd
>>> 
>>> On Wednesday, July 6, 2016, Tom Dearman <[email protected]
>> <javascript:;>> wrote:
>>> 
>>>> I recently had a problem on my production which I believe was a
>>>> manifestation of the issue kafka-2978 (Topic partition is not sometimes
>>>> consumed after rebalancing of consumer group), this is fixed in 0.9.0.1
>> and
>>>> we will upgrade our client soon.  However, it made me realise that I
>> didn’t
>>>> have any monitoring set up on this.  The only thing I can find as a
>> metric
>>>> is the
>>>> 
>> kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+),
>>>> which, if I understand correctly, is the max lag of any partition that
>> that
>>>> particular consumer is consuming.
>>>> 1. If I had been monitoring this, and if my consumer was suffering from
>>>> the issue in kafka-2978, would I actually have been alerted, i.e. since
>> the
>>>> consumer would think it is consuming correctly would it not have updated
>>>> the metric.
>>>> 2. There is another way to see offset lag using the command
>>>> /usr/bin/kafka-consumer-groups --new-consumer --bootstrap-server
>>>> 10.10.1.61:9092 --describe —group consumer_group_name and parsing the
>>>> response.  Is it safe or advisable to do this?  I like the fact that it
>>>> tells me each partition lag, although it is also not available if no
>>>> consumer from the group is currently consuming.
>>>> 3. Is there a better way of doing this?
>>> 
>>> 
>>> 
>>> --
>>> *Todd Palino*
>>> Staff Site Reliability Engineer
>>> Data Infrastructure Streaming
>>> 
>>> 
>>> 
>>> linkedin.com/in/toddpalino
>> 
>> 
> 
> -- 
> *Todd Palino*
> Staff Site Reliability Engineer
> Data Infrastructure Streaming
> 
> 
> 
> linkedin.com/in/toddpalino

Re: Monitoring offset lag

Reply via email to