These ideas are specific to Samza and ymmv in how they apply to other
processing frameworks, but we use a couple of custom tools to keep tabs on
processing lag:

- one is a produce/consume timestamp comparison tool which utilizes writes
a message production timestamps out to ZooKeeper on a per-partition basis;
then in our stream processor, Samza, we then write out a consumption
timestamp for the same partition to ZooKeeper, and we can use these
differences (with some compensations for partition offset differences) to
see what our processing lag is;
- we also have a custom offset/checkpoint comparison tool which ingests
Samza's checkpoint topic and compares it against the latest offsets on a
per-partition basis to know how far behind each partition is in processing
messages (this also doubles as a checkpoint-properties file generator which
we can use to rebuild the checkpoint topic if it gets too large)

These two tools have been invaluable in helping us monitor our Samza
processing clusters.

Cheers,
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


On Mon, May 11, 2020 at 5:23 PM Eleanore Jin <eleanore....@gmail.com> wrote:

> Hi community,
>
> I just wonder what is the difference between the consumer lag reported by
> Kafka client and the consumer lag reported by burrow?
>
> Thanks a lot!
> Eleanore
>

Reply via email to