Would anyone be able to confirm that this is indeed a Solr bug between restore and autoscaling?
I have done some local testing and found the following patterns and discoveries. When using a replica count autoscaling policy {"replica": "<2","shard": "#EACH","node": "#ANY"}, it breaks Solr restore functionality because for some reason Solr code needs double the amount of replica count for restore to work from an existing backup. If 1 replica exists on a node on a backup, restore with autoscaling requires a rule that allows 2 replicas to exist on any node. If 2 replicas exist on a node on a backup, restore with autoscaling requires a rule that allows 4 replicas to exist on any node. If 3 replicas exist on a node on a backup, restore with autoscaling requires a rule that allows 6 replicas to exist on any node. NOTE: When given double room, the backup comes up exactly as it was before the restore, so nothing is actually duplicated after restore. It's just for some reason, the current restore code may be bugged where it actually needs more room than necessary to restore.(when there's a replica count autoscaling policy) Rajeswari posted a separate reply to this thread that brought me to another discovery. https://lucene.472066.n3.nabble.com/Re-CAUTION-Re-Solr-7-7-restore-issue-tp4450714.html In it, they reference the legacy rule based replica placement documentation: https://lucene.apache.org/solr/guide/7_6/rule-based-replica-placement.html After doing some more local testing, I found that adding the same replica count restraint as a rule onto the collection somehow now allows Solr restore to work as intended. Example below: collection rule: replica:<2,node:* autoscaling policy: {"replica": "<2","node": "#ANY"} When both are in place, restore functionality finally works. Ideally, we should not have to do anything extra outside of placing the original autoscaling replica count policy. But as of right now, it appears that two workarounds involve either removing the cluster policy during restore, or adding a legacy collection rule in addition to autoscaling policy for restore to work. Please let me know if something crucial is being missed, otherwise I hope the above can help in tracking down any actual bug. Thanks. Repeatable steps if you want to test locally using Solr tutorial: ./bin/solr stop -all ; rm -Rf example/cloud/ ./bin/solr start -e cloud (choose 1 node for gettingstarted with 1 shard 1 replica) curl -X POST "http://localhost:8983/solr/admin/autoscaling" --data-binary \ '{"set-cluster-policy": [{"replica": "<2","shard": "#EACH","node": "#ANY"}]}' curl 'http://localhost:8983/solr/admin/collections?action=BACKUP&name=myBackupName&collection=gettingstarted&location=/choose/location/' curl 'http://localhost:8983/solr/admin/collections?action=DELETE&name=gettingstarted' curl 'http://localhost:8983/solr/admin/collections?action=RESTORE&name=myBackupName&location=/choose/location/&collection=gettingstarted' (use this before backup, and then restore works) curl 'http://localhost:8983/solr/admin/collections?action=MODIFYCOLLECTION&collection=gettingstarted&rule=shard:*,replica:<2,node:*' Koen De Groote wrote > I also ran into this while researching cluster policies. Solr 7.6 > > Except same situation: introduce a rule to control placement of > collections. Backup. Delete. Restore. Solr complains it can't do it. > > I don't need them just yet, so I stopped there, but reading this is quite > disturbing. > > Does deleting the rule, restore and then immediately re-instating the rule > work? > > > > On Wed, Oct 9, 2019 at 6:33 AM Natarajan, Rajeswari < > rajeswari.natarajan@ >> wrote: > >> I am also facing the same issue. With Solr 7.6 restore fails with below >> rule. Would like to place one replica per node by below rule >> >> with the rule to place one replica per node >> "set-cluster-policy": [{ >> "replica": "<2", >> "shard": "#EACH", >> "node": "#ANY" >> }] >> >> Without the rule the restore works. But we need this rule. Any >> suggestions >> to overcome this issue. >> >> Thanks, >> Rajeswari >> >> On 7/12/19, 11:00 AM, "Mark Thill" < > mark.thill@ > > wrote: >> >> I have a 4 node cluster. My goal is to have 2 shards with two >> replicas >> each and only allowing 1 core on each node. I have a cluster policy >> set to: >> >> [{"replica":"2", "shard": "#EACH", "collection":"test", >> "port":"8983"},{"cores":"1", "node":"#ANY"}] >> >> I then manually create a collection with: >> >> name: test >> config set: test >> numShards: 2 >> replicationFact: 2 >> >> This works and I get a collection that looks like what I expect. I >> then >> backup this collection. But when I try to restore the collection it >> fails >> and says >> >> "Error getting replica locations : No node can satisfy the rules" >> [{"replica":"2", "shard": "#EACH", "collection":"test", >> "port":"8983"},{"cores":"1", "node":"#ANY"}] >> >> If I set my cluster-policy rules back to [] and try to restore it >> then >> successfully restores my collection exactly how I expect it to be. >> It >> appears that having any cluster-policy rules in place is affecting my >> restore, but the "error getting replica locations" is strange. >> >> Any suggestions? >> >> mark < > mark.thill@ > > >> >> >> -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html