On 8/29/2018 7:17 AM, Pure Host - Wolfgang Freudenberger wrote:
I am currently restructuring a big-data cloud with 1000+ collections
on a SOLRCloud. The datas are stored on 4 shards without a replica.
This data are deprecated and readonly for some purpose, so I want to
migrate them to a new cloud with 1 Shard and 1 Replica.
If you have no replicas, then you have no data to query. You can create
a collection with zero replicas, but then you must specifically add a
replica before you can actually use it.
I think you probably mean that you are going from a one-replica install
(replicationFactor=1) to a two-replica install. The leaders are also
replicas.
Is there an "easy" way to merge the shards? Or do I have to read/write
copy from the old to the new cloud?
The Collections API does not yet have a way to merge shards. An issue
has been created, but it hasn't been implemented yet. I do not know
when that might happen:
https://issues.apache.org/jira/browse/SOLR-9407
The CoreAdmin API does have an option to merge indexes -- but when
running in cloud mode, the CoreAdmin API is an expert API and should not
normally be used.
The way I would handle this in reality is to re-index the data onto the
new cloud. Anytime I upgrade Solr or build something new, I index from
scratch. It works better that way. You should always be prepared to
reindex your data from scratch -- it's a common need with a search engine.
If reindexing is more difficult for you, and you're not upgrading Solr,
then you could try this: Copy cores from the older SolrCloud install to
a standalone server, merge the indexes there, build the collection in
the new cloud, and replace the index for that collection with the merged
index.
Thanks,
Shawn