Hi there,

I'm currently working on using the prometheus exporter to provide some
detailed insights for our Solr Cloud clusters.

Using the provided template killed our prometheus server, as well as the
exporter due to the size of our clusters *(each cluster is around 96 nodes,
~300 collections with 3way replication and 16 shards)*, so you can imagine
the amount of data that comes through /admin/metrics and not filtering it
down first.

I've began working on writing my own template to reduce the amount of data
being requested and it's working fine, and I'm starting to build some nice
graphs in Grafana.

The only difficulty I'm having with this, is I'm struggling to find decent
documentation on the metrics themselves. I was using the resources metrics
reporting - metrics-api
<https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api>
 and monitoring solr with prometheus and grafana
<https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html>
but
there is a lack of information on most metrics.

For example:

"ADMIN./admin/collections.totalTime":6715327903,

I understand this is a counter, however, I'm not sure what unit this would
be represented when displaying it, for example:

[image: image.png]

A latency of 1mil, not sure if this means milliseconds, million, etc.,
Another example would be the GC metrics:

      "gc.ConcurrentMarkSweep.count":7,
      "gc.ConcurrentMarkSweep.time":1247,
      "gc.ParNew.count":16759,
      "gc.ParNew.time":884173,

Which when displayed, doesn't give the clearest insight as to what the unit is:

[image: image.png]

If anyone has any advice / guidance, that would be greatly
appreciated. If there isn't documentation for the API, then this would
also be something I'll look into help contributing with too.

Thanks,

-- 

Richard Goodman

Reply via email to