Hi there, for our three-node ZooKeeper (3.9.1) deployment we monitor a couple of ZooKepper metrics via Prometheus using the JMX exporter. After a node restart it sporadically happens that for the ZK process running on that node the AvgRequestLatency metric exceeds the configured alerting threshold. That's fine and can probably be attributed to the load situation of the server during the restart. What's unexpected, though, is, that the metric's value never decreases - even after a few hours - resulting in a false positive alert. My impression is that the metric represents kind of a maximum or worst request latency rather than a moving average.
Any ideas? Thanks, Thilo
