For backup purposes to an offsite data center, I need to make sure that each 
core's configuration has replication to a consistently defined backup directory 
on a Netapp filer.   The Netapp filer's snapshot can be invoked manually, and 
its snap mirror will copy the data to the offsite data center where it will be 
mounted.
A comparison script at the offsite data center can then rsync data to the local 
filesystem, and signal Solr to reload the core.

Since we have less than 30G of index data, less than 10 million documents, and 
about 2 QPS, we think SolrCloud doesn't make sense for us at this time.

I'm wondering whether SolrCloud has any advantage to me if I define this as 2 
SolrCloud's with replication from the master cloud to the slave cloud.    More 
specifically, without SolrCloud I see some need to modify the solrconfig.xml of 
each core to assure  the ReplicationHandler is defined and has the right backup 
parameters.   With SolrCloud, I would hope for some way to set backup 
parameters for data globally.

I know about SemaText's post and the issue (with sub-tasks) on this general 
area, but have had no time to parse the whole thing to understand whether 
SolrCloud offers me value over Solr non-cloud in this configuration.

Another architecture would be to take an LVM snapshot after commit, mount that 
on the master node/single-node cloud, and rsync to the Netapp for both backup 
and fault-tolerance.   A signal file on the Netapp would cause the slave to 
rsync and reload.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

Reply via email to