Hi

We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
are testing to see if we have Async Solr Cloud backup  (
https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup) done
every time we are create a new collection or update an existing collection.
There are 1 replica and 8 shards per collection. Two Solr nodes.

For the largest collection (index size of 80GB), we see that BACKUP to the
EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
option from the application. We are seeing that that the performance
significantly (2x) degrades on the read (get) performance when we BACK-UP
is going on in parallel.

Is there anyway to tune the system so that read does not suffer?

Any other best practices? like should we run back up during off peak load?

Is there a way to keep track of which collections are already backed up?

Reply via email to