On 8/29/2018 7:17 AM, Pure Host - Wolfgang Freudenberger wrote:
I am currently restructuring a big-data cloud with 1000+ collections on a SOLRCloud. The datas are stored on 4 shards without a replica. This data are deprecated and readonly for some purpose, so I want to migrate them to a new cloud with 1 Shard and 1 Replica.

If you have no replicas, then you have no data to query. You can create a collection with zero replicas, but then you must specifically add a replica before you can actually use it.

I think you probably mean that you are going from a one-replica install (replicationFactor=1) to a two-replica install.  The leaders are also replicas.

Is there an "easy" way to merge the shards? Or do I have to read/write copy from the old to the new cloud?

The Collections API does not yet have a way to merge shards.  An issue has been created, but it hasn't been implemented yet.  I do not know when that might happen:

https://issues.apache.org/jira/browse/SOLR-9407

The CoreAdmin API does have an option to merge indexes -- but when running in cloud mode, the CoreAdmin API is an expert API and should not normally be used.

The way I would handle this in reality is to re-index the data onto the new cloud.  Anytime I upgrade Solr or build something new, I index from scratch.  It works better that way. You should always be prepared to reindex your data from scratch -- it's a common need with a search engine.

If reindexing is more difficult for you, and you're not upgrading Solr, then you could try this:  Copy cores from the older SolrCloud install to a standalone server, merge the indexes there, build the collection in the new cloud, and replace the index for that collection with the merged index.

Thanks,
Shawn

Reply via email to