Denis, I measure the impact of metrics collecting on my laptop, it's about
5 seconds on each node for collecting metrics of 1000 caches (all caches in
one cache group) with 32000 partitions. All this time tcp-disco-msg-worker
is blocked.
Guys, thanks for your proposals, I'd filled ticket [1].
[1]
Hi,
One of the problems with metrics is a huge size in case when a lot caches
started on node (for example, I see 7000 caches).
We have to think how to compact them.
Not all metrics changed frequently, so, we may store locally and send over
wire only a difference from previous collect.
And think
Hi Alex,
Agree with you. Most of the time these distribution of metrics is not
needed. In future we will have more and more information which potentially
needs to be shared between nodes. E.g. IO statistics, SQL statistics for
query optimizer, SQL execution history, etc. We need common mechanics f
hi, Alex.
imo:
1. metrics through discovery require refactoring.
2. local cache metrics should be available (if configured) on each node.
3. there must be an opportunity to configure metrics in runtime.
thanks.
>
>
>Hi Igniters,
>
>In the current implementation, cache metrics are collected on ea
Alex,
Did you measure the impact of metrics collection? What is the overhead you
are trying to avoid?
Just to make it clear, MetricUpdateMessage-s are used as heartbeats.
So they are sent anyways, even if no metrics are distributed between nodes.
Denis
вт, 4 дек. 2018 г. в 12:46, Alex Plehanov
Hi Igniters,
In the current implementation, cache metrics are collected on each node and
sent across the whole cluster with discovery message
(TcpDiscoveryMetricsUpdateMessage) with configured frequency
(MetricsUpdateFrequency, 2 seconds by default) even if no one requested
them.
If there are a lo