I worked on a project that needed backpressure for a variety of reasons & 
causes. We ended up needing to compute lag on all partitions, which was messy 
because of the boilerplate code implied in Filipp's 3rd option. I looked for a 
simple Kafka client API to get lags but did not find it.

Filipp, I used your 3rd option and tried to see if there was a performance 
impact. It turns out to be tricky to see performance impact, so I did not find 
any. Maybe it has no impact, which would be good!

Lag data are also available via JMX, but getting that data seemed more complex 
than getting "committed and end". I'll probably try it at some point. One 
factor is whether JMX queries have different failure modes compared to Kafka 
protocol. 

Over the past couple of years, it seemed like changes in AdminClient and its 
return types are moving towards making lags easier to query, but it's hard to 
tell.  For example AdminClient now has "metrics()" as of Kafka 2.1.0. See 
KIP-324 and KAFKA-6986. The catch is that MetricName and Metric name are super 
clear parts of the API. Section 6.6 in Kafka docs, "Monitoring", lists metrics, 
but how to use them is a bit undocumented IMO. You can probably connect the 
dots with some experiments. 

Also there are a few MetricName search hits that may help:
http://cloudurable.com/blog/kafka-tutorial-kafka-producer-advanced-java-examples/index.html
https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/

You may be able to use JMX metrics from the Monitoring section like these:

kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id} 
Attribute: records-lag-max

kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}"
(see "records-lag")

IMO there is an opportunity to write Kafka project docs that make getting lag 
more clear. And metrics would be easier to use if all valid metrics were API 
constants in MetricName.

Bill

-----Original Message-----
From: Peter Bukowinski [mailto:pmb...@gmail.com] 
Sent: Tuesday, February 19, 2019 11:01 AM
To: users@kafka.apache.org
Subject: Re: Lag checking from producer

From your description, it sounds like kafka may be ill-suited for your project. 
A backpressure mechanism essentially requires producers to be aware of 
consumers and that is counter to Kafka’s design. Also, it sounds like your 
producers are logical (if not actual) consumers of data generated by the 
consumers. I see a couple options:

1. Kafka has a quota system which can rate-limit producers. If you can predict 
the rate at which your consumers can ingest data from the Kafka cluster, 
keeping your producers to that rate would be more kafka-esque (haha) than 
bolting on a separate on/off flow mechanism.

2. If you need more control than a rate limiter, then you should probably 
introduce a new topic that your “consumers” produce into and that your 
“producers” consume from. If your producers then depend on new messages being 
available in this topic before they can produce new data, you can have a direct 
rate link between both sides.

Just spitballing, here. :)

—
Peter

> On Feb 19, 2019, at 9:25 AM, Filipp Zhinkin <filipp.zhin...@gmail.com> wrote:
> 
> Hi,
> 
> thank you for the reply!
> 
> I'm developing system where producers are spending money every time a
> request arrives.
> Consumers account money spent using the data from producers as well as
> few other sources.
> Consumers are also responsible to calculatate statistics that affect
> policies used by producers to spend money.
> 
> As a result, if consumers are temporary lagging then producres are
> neither know if they can spend money, nor they know how to better do
> it.
> 
> There is some lag values that producers can tolerate, but if lag is
> growing further then producers have to stop.
> 
> Thanks,
> Filipp.
> 
> On Tue, Feb 19, 2019 at 7:44 PM Javier Arias Losada
> <javier.ari...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> could you please be more specific on your use case?
>> One of the theoretical advantages of a system like kafka is that you can
>> decouple producers and consumers, so you don't need to to do backpressure.
>> A different topic is how to handle lagging consumers, in that scenario you
>> could scale up your service, etc.
>> 
>> Best.
>> 
>> El mar., 19 feb. 2019 a las 15:43, Filipp Zhinkin 
>> (<filipp.zhin...@gmail.com>)
>> escribió:
>> 
>>> Hi!
>>> 
>>> I'm trying to implement backpressure mechanism that asks producers to
>>> stop doing any work when consumers are not able to process all
>>> messages in time (producers require statistics calculated by consumers
>>> in order to answer client requests, when consumers are lagging behind
>>> we have to stop producers from making any responses).
>>> 
>>> I see several ways to implement it:
>>> - compute lag on consumer side and store it somewhere (zk, some db, etc);
>>> - use separate service like Burrow;
>>> - compute lag on every producer by getting commited and end offsets
>>> for every partition via consumer API.
>>> 
>>> Are there any downsides of the latter approach? Would is negatively
>>> impact brokers performance?
>>> 
>>> Thanks,
>>> Filipp.
>>> 

Reply via email to