Hi there,

for our three-node ZooKeeper (3.9.1) deployment we monitor a couple of
ZooKepper metrics via Prometheus using the JMX exporter.
After a node restart it sporadically happens that for the ZK process
running on that node the AvgRequestLatency metric exceeds the configured
alerting threshold. That's fine and can probably be attributed to the load
situation of the server during the restart.
What's unexpected, though, is, that the metric's value never decreases -
even after a few hours - resulting in a false positive alert. My impression
is that the metric represents kind of a maximum or worst request latency
rather than a moving average.

Any ideas?

Thanks,
Thilo

Reply via email to