Re: Performance impact of enabling Prometheus Metrics

2021-04-26 Thread Enrico Olivelli
Sorry for top posting. I have never seen in other applications that Prometheus has such a significant impact. The first things that come into my mind: - collect a couple of dumps with some perf tool and dig into the problem - verify that we have the latest version of Prometheus client - tune the

Re: Performance impact of enabling Prometheus Metrics

2021-04-26 Thread Ted Dunning
Batching metrics reporting is very similar to option (c) but with locking like option (a). That can usually be made faster by passing a reference to the metrics accumulator to the reporting thread which can do the batch update without locks. Usually requires ping-pong metrics accumulators so that a

Re: Performance impact of enabling Prometheus Metrics

2021-04-26 Thread Ted Dunning
There are three patterns I have seen: a) shared object that all threads update with locks (I think that this is what is causing the slowdown) b) message queue to separate thread with a metrics object. I think that this is what you suggested last. This can be higher performance than (a) because a

Re: Performance impact of enabling Prometheus Metrics

2021-04-26 Thread Li Wang
batching metrics reporting can help. For example, in the CommitProcessor, increasing the maxCommitBatchSize helps improving the the performance of write operation. On Mon, Apr 26, 2021 at 9:21 PM Li Wang wrote: > Yes, I am thinking that handling metrics reporting in a separate thread, > so it d

Re: Performance impact of enabling Prometheus Metrics

2021-04-26 Thread Li Wang
Yes, I am thinking that handling metrics reporting in a separate thread, so it doesn't impact the "main" thread. Not sure about the idea of merging at the end of a reporting period. Can you elaborate a bit on it? Thanks, Li On Mon, Apr 26, 2021 at 9:11 PM Ted Dunning wrote: > Would it help to

Re: Performance impact of enabling Prometheus Metrics

2021-04-26 Thread Ted Dunning
Would it help to keep per thread metrics that are either reported independently or are merged at the end of a reporting period? On Mon, Apr 26, 2021 at 8:51 PM Li Wang wrote: > Hi Community, > > I've done further investigation on the issue and found the following > > 1. The perf of the read op

Re: Performance impact of enabling Prometheus Metrics

2021-04-26 Thread Li Wang
Hi Community, I've done further investigation on the issue and found the following 1. The perf of the read operation was decreased due to the lock contention in the Prometheus TimeWindowQuantiles APIs. 3 out of 4 CommitProcWorker threads were blocked on the TimeWindowQuantiles.insert() API when t

Re: write performance issue in 3.6.2

2021-04-26 Thread Li Wang
Hi Srikant, 1. Have you tried to run the test without enabling Prometheus metrics? What I observed that enabling Prometheus has significant performance impact (about 40%-60% degradation) 2. In addition to the session expiry errors and max latency increasing issue, did you see any issue with throug

Re: write performance issue in 3.6.2

2021-04-26 Thread Li Wang
Hi Michael, Thanks for your reply. 1. The workload is 500 concurrent users creating nodes with data size of 4 bytes. 2. It's pure write 3. The perf issue is that under the same load, there were many session expired and connection loss errors when using ZK 3.6.2 but no such errors in ZK 3.4.14. T