Sorry for top posting.
I have never seen in other applications that Prometheus has such a
significant impact.
The first things that come into my mind:
- collect a couple of dumps with some perf tool and dig into the problem
- verify that we have the latest version of Prometheus client
- tune the
Batching metrics reporting is very similar to option (c) but with locking
like option (a). That can usually be made faster by passing a reference to
the metrics accumulator to the reporting thread which can do the batch
update without locks. Usually requires ping-pong metrics accumulators so
that a
There are three patterns I have seen:
a) shared object that all threads update with locks (I think that this is
what is causing the slowdown)
b) message queue to separate thread with a metrics object. I think that
this is what you suggested last. This can be higher performance than (a)
because a
batching metrics reporting can help. For example, in the CommitProcessor,
increasing the maxCommitBatchSize helps improving the the performance of
write operation.
On Mon, Apr 26, 2021 at 9:21 PM Li Wang wrote:
> Yes, I am thinking that handling metrics reporting in a separate thread,
> so it d
Yes, I am thinking that handling metrics reporting in a separate thread, so
it doesn't impact the "main" thread.
Not sure about the idea of merging at the end of a reporting period. Can
you elaborate a bit on it?
Thanks,
Li
On Mon, Apr 26, 2021 at 9:11 PM Ted Dunning wrote:
> Would it help to
Would it help to keep per thread metrics that are either reported
independently or are merged at the end of a reporting period?
On Mon, Apr 26, 2021 at 8:51 PM Li Wang wrote:
> Hi Community,
>
> I've done further investigation on the issue and found the following
>
> 1. The perf of the read op
Hi Community,
I've done further investigation on the issue and found the following
1. The perf of the read operation was decreased due to the lock contention
in the Prometheus TimeWindowQuantiles APIs. 3 out of 4 CommitProcWorker
threads were blocked on the TimeWindowQuantiles.insert() API when t
Hi Srikant,
1. Have you tried to run the test without enabling Prometheus metrics? What
I observed that enabling Prometheus has significant performance impact
(about 40%-60% degradation)
2. In addition to the session expiry errors and max latency increasing
issue, did you see any issue with throug
Hi Michael,
Thanks for your reply.
1. The workload is 500 concurrent users creating nodes with data size of 4
bytes.
2. It's pure write
3. The perf issue is that under the same load, there were many session
expired and connection loss errors when using ZK 3.6.2 but no such errors
in ZK 3.4.14.
T