Hi Richard, We do not use API to collect metrics but JMX, but I believe that those are the same (did not verify it in code). You can see how we handled those metrics into reports/charts or even use our agent to send data to Prometheus: https://github.com/sematext/sematext-agent-integrations/tree/master/solr <https://github.com/sematext/sematext-agent-integrations/tree/master/solr>
You can also see some links to Solr metric related blog posts in this repo. If you find out that managing your own monitoring stack is overwhelming, you can try our Solr integration. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 7 Oct 2019, at 12:40, Richard Goodman <richa...@brandwatch.com> wrote: > > Hi there, > > I'm currently working on using the prometheus exporter to provide some > detailed insights for our Solr Cloud clusters. > > Using the provided template killed our prometheus server, as well as the > exporter due to the size of our clusters (each cluster is around 96 nodes, > ~300 collections with 3way replication and 16 shards), so you can imagine the > amount of data that comes through /admin/metrics and not filtering it down > first. > > I've began working on writing my own template to reduce the amount of data > being requested and it's working fine, and I'm starting to build some nice > graphs in Grafana. > > The only difficulty I'm having with this, is I'm struggling to find decent > documentation on the metrics themselves. I was using the resources metrics > reporting - metrics-api > <https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api> > and monitoring solr with prometheus and grafana > <https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html> > but there is a lack of information on most metrics. > > For example: > "ADMIN./admin/collections.totalTime":6715327903, > I understand this is a counter, however, I'm not sure what unit this would be > represented when displaying it, for example: > > > > A latency of 1mil, not sure if this means milliseconds, million, etc., > Another example would be the GC metrics: > "gc.ConcurrentMarkSweep.count":7, > "gc.ConcurrentMarkSweep.time":1247, > "gc.ParNew.count":16759, > "gc.ParNew.time":884173, > Which when displayed, doesn't give the clearest insight as to what the unit > is: > > > If anyone has any advice / guidance, that would be greatly appreciated. If > there isn't documentation for the API, then this would also be something I'll > look into help contributing with too. > > Thanks, > -- > Richard Goodman