Basically, we have an environment that has a large number of solr nodes (~100) and an environment with fewer solr nodes (~10). In the “big” environment, we have lots of smaller cores (around 3Gb), and in the smaller environment, we have fewer bigger cores (around 30 Gb). We transfer data between these two environments around once per month or so. We’ve traditionally followed the model of 1 core per solr node, so we typically reindex solr when we move between environments, which takes 2 days typically, whereas using solr’s BACKUP and RESTORE apis each take a few minutes typically to run. I’m planning to investigate performance differences between having several small cores on a single solr node vs having one big solr core on each node. In the meantime, however, I was interested to see if it would be possible, at least in the short term, to replace our current procedure with the following: 1) BACKUP solr collection in the big environment 2) RESTORE the collection in the small environment 3) MIGRATE the collection in the small environment to another collection in the same environment with 1 shard per solr node.
I’ve also heard mention of an API to combine shards (https://github.com/bloomreach/solrcloud-rebalance-api and https://issues.apache.org/jira/browse/SOLR-9241). Doesn’t seem like there’s been any development on integrating this work into the official solr distribution, but this also seems like it would probably solve my requirements. Let me know if anything is still unclear. Thanks, Matthew On 6/25/18, 1:38 PM, "Shawn Heisey" <apa...@elyograg.org> wrote: On 6/22/2018 12:14 PM, Matthew Faw wrote: > So I’ve tried running MIGRATE on solr 7.3.1 using the following parameters: > 1) “split.key=” > 2) “split.key=!” > 3) “split.key=DERP_” > 4) “split.key=DERP/0!” > > For 1-3, I am seeing the same ERRORs you see. For 4, I do not see any ERRORs. > > Interestingly, I’m seeing this WARN message for all 4 scenarios: > > org.apache.solr.common.SolrException: SolrCore not found:split_shard2_temp_shard2_shard1_replica_n3 in [derp_shard1_replica_n1, derp_shard2_replica_n6, herp_shard1_replica_n1, herp_shard2_replica_n6] I saw something similar as well. I think the way that MIGRATE works internally is to copy data from the source collection to a temporary index, and then from there to the final target. I think I've figured out why split.key is required. The entire reason the MIGRATE api was created was for people who use route keys to split one of those route keys into a separate collection. It does not appear to have been intended for handling everything in a collection, but only for splitting indexes where such keys are in use. https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-5308&data=02%7C01%7CMatthew.Faw%40verato.com%7C0f854de95816464908de08d5dac283ae%7Ca837817fdbe1417692831265955652cf%7C0%7C0%7C636655451338924556&sdata=%2FFF5tCNs5CWEGo3Qkv7589lhRjosjeVG15l7YGJ9MMU%3D&reserved=0 With id values like DERP_3e5bc047f13f6c562f985f00 you're not using routing prefix keys, so I think you probably aren't able to use the migrate API at all. So let's back up a couple of steps so we can find you a workable solution. Is this a one-time migration that you're trying to do, or are you expecting to do this frequently? What requirement are you trying to satisfy by copying data from one collection to another, and what are the details of the requirement? Thanks, Shawn The content of this email is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.