Hi there, I'm currently working on using the prometheus exporter to provide some detailed insights for our Solr Cloud clusters.
Using the provided template killed our prometheus server, as well as the exporter due to the size of our clusters *(each cluster is around 96 nodes, ~300 collections with 3way replication and 16 shards)*, so you can imagine the amount of data that comes through /admin/metrics and not filtering it down first. I've began working on writing my own template to reduce the amount of data being requested and it's working fine, and I'm starting to build some nice graphs in Grafana. The only difficulty I'm having with this, is I'm struggling to find decent documentation on the metrics themselves. I was using the resources metrics reporting - metrics-api <https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api> and monitoring solr with prometheus and grafana <https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html> but there is a lack of information on most metrics. For example: "ADMIN./admin/collections.totalTime":6715327903, I understand this is a counter, however, I'm not sure what unit this would be represented when displaying it, for example: [image: image.png] A latency of 1mil, not sure if this means milliseconds, million, etc., Another example would be the GC metrics: "gc.ConcurrentMarkSweep.count":7, "gc.ConcurrentMarkSweep.time":1247, "gc.ParNew.count":16759, "gc.ParNew.time":884173, Which when displayed, doesn't give the clearest insight as to what the unit is: [image: image.png] If anyone has any advice / guidance, that would be greatly appreciated. If there isn't documentation for the API, then this would also be something I'll look into help contributing with too. Thanks, -- Richard Goodman