Hello! We have a Solrcloud(7.4) consisting of 90+ hosts(each of them running multiple nodes of solr, e.g. ports 8983, 8984, 8985), numerous shards(each having several replicas) and numerous collections.
I was given a task to summarize the total index size(on disks) of a certain collection. First I calculated it from web interface(via copy-paste) manually and there were thousands of lines (The http interface(8983) Cloud - Nodes tab). It took about several hours. Now i consider this task needs some automatization. I read the API documentation and googled but still no luck... And any possible solution could help somebody else in the future. What i tried: 1) If I poll one of the solr cores via " http://solrhost1.somecorporatesite.org:8983/solr/admin/metrics?wt=JSON&prefix=INDEX " I get output like (**cores.json**): "responseHeader":{ "status":0, "Qtime":2004}, "metrics":{ "solr.core.collectionname1-2020-12-05.shard12.replica_n240:{ "INDEX.size":"456 bytes", "INDEX.sizeInBytes":456}, "solr.core.collectionname2-2020-12-04.shard74.replica_n650:{ "INDEX.size":"2.88 GB", "INDEX.sizeInBytes":3088933801}, ... and so on which is what i need BUT only according to one core(local). But there are more than 200 of them. 2) I can get a list of all collections, shards and replicas via: http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json and it looks like (**collections.json**) "responseHeader":{ "status":0, "QTime":184}, "cluster":{ "collections":{ "collectionname1":{ "pullReplicas":"0", "replicationFactor":"1", "shards":{ "shard1":{ "range":"800000000-80e0ffff", "state":active", "replicas":{ "core_node67":{ "core":"collectionname123-2020-11-30_shard1_replica_n54", "node_name":"solrhost99.somecorporatesite.org:8985/solr", "state":"active", "type":"NRT", "force_ste_state":"false", "leader":"true"}, "core_node548":{ "core":"collectionname223-2020-11-29_shard1_replica_n448", "node_name":"solrhost77.somecorporatesite.org:8984/solr", "state":"active", "type":"NRT", "force_ste_state":"false"}}}, "shard2":{ "range": ... and so on, 117 156 lines The question is, how can i insert the fields of INDEX.size into the second output(clusterstatus) for calculation of sum disk space used by indices? In other words, i need the correspondings fields of INDEX.size in replicas sections of **collections.json** Currently the whole solr system consumes 100TB+ and is still growing, we need to know the tempo of it's growth. Many thanks in advance!