Basically, we have an environment that has a large number of solr nodes (~100) 
and an environment with fewer solr nodes (~10).  In the “big” environment, we 
have lots of smaller cores (around 3Gb), and in the smaller environment, we 
have fewer bigger cores (around 30 Gb).  We transfer data between these two 
environments around once per month or so.  We’ve traditionally followed the 
model of 1 core per solr node, so we typically reindex solr when we move 
between environments, which takes 2 days typically, whereas using solr’s BACKUP 
and RESTORE apis each take a few minutes typically to run.  I’m planning to 
investigate performance differences between having several small cores on a 
single solr node vs having one big solr core on each node.  In the meantime, 
however, I was interested to see if it would be possible, at least in the short 
term, to replace our current procedure with the following:
1) BACKUP solr collection in the big environment
2) RESTORE the collection in the small environment
3) MIGRATE the collection in the small environment to another collection in the 
same environment with 1 shard per solr node.

I’ve also heard mention of an API to combine shards 
(https://github.com/bloomreach/solrcloud-rebalance-api and 
https://issues.apache.org/jira/browse/SOLR-9241).  Doesn’t seem like there’s 
been any development on integrating this work into the official solr 
distribution, but this also seems like it would probably solve my requirements.

Let me know if anything is still unclear.

Thanks,
Matthew

On 6/25/18, 1:38 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:

    On 6/22/2018 12:14 PM, Matthew Faw wrote:
    > So I’ve tried running MIGRATE on solr 7.3.1 using the following 
parameters:
    > 1) “split.key=”
    > 2) “split.key=!”
    > 3) “split.key=DERP_”
    > 4) “split.key=DERP/0!”
    >
    > For 1-3, I am seeing the same ERRORs you see.  For 4, I do not see any 
ERRORs.
    >
    > Interestingly, I’m seeing this WARN message for all 4 scenarios:
    >
    > org.apache.solr.common.SolrException: SolrCore not 
found:split_shard2_temp_shard2_shard1_replica_n3 in [derp_shard1_replica_n1, 
derp_shard2_replica_n6, herp_shard1_replica_n1, herp_shard2_replica_n6]

    I saw something similar as well.  I think the way that MIGRATE works
    internally is to copy data from the source collection to a temporary
    index, and then from there to the final target.

    I think I've figured out why split.key is required.  The entire reason
    the MIGRATE api was created was for people who use route keys to split
    one of those route keys into a separate collection.  It does not appear
    to have been intended for handling everything in a collection, but only
    for splitting indexes where such keys are in use.

    
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-5308&data=02%7C01%7CMatthew.Faw%40verato.com%7C0f854de95816464908de08d5dac283ae%7Ca837817fdbe1417692831265955652cf%7C0%7C0%7C636655451338924556&sdata=%2FFF5tCNs5CWEGo3Qkv7589lhRjosjeVG15l7YGJ9MMU%3D&reserved=0

    With id values like DERP_3e5bc047f13f6c562f985f00 you're not using
    routing prefix keys, so I think you probably aren't able to use the
    migrate API at all.

    So let's back up a couple of steps so we can find you a workable
    solution.  Is this a one-time migration that you're trying to do, or are
    you expecting to do this frequently?  What requirement are you trying to
    satisfy by copying data from one collection to another, and what are the
    details of the requirement?

    Thanks,
    Shawn



The content of this email is intended solely for the individual or entity named 
above and access by anyone else is unauthorized. If you are not the intended 
recipient, any disclosure, copying, distribution, or use of the contents of 
this information is prohibited and may be unlawful. If you have received this 
electronic transmission in error, please reply immediately to the sender that 
you have received the message in error, and delete it. Thank you.

Reply via email to