On 11/9/2021 9:50 AM, Nick Vladiceanu wrote:
We definitely should keep the things simple, however, I’m still doubting that the initial purpose of the “firstSearcher” queries when a brand new core is created is not satisfied: yes, it works when you RELOAD a core, or restart Solr, but it's definitely not working as intended when creating a brand new core, and I think this has to be addressed.
I don't see the problem. It's doing exactly what it is configured to do -- when the first searcher is opened, it runs firstSearcher queries. Those queries will complete in practically no time at all when the core is brand new and has no index yet.
Btw, an workaround way to make the “firstSearcher” queries run is to issue the REQUESTRECOVERY action immediately after the brand new core is created: https://solr.apache.org/guide/8_10/coreadmin-api.html#requestrecovery-parameters <https://solr.apache.org/guide/8_10/coreadmin-api.html#requestrecovery-parameters> . This will force the core to fetch the index from the leader “immediately”, and rerun the firstSearcher queries afterwards, which will run against an index with data. We’re testing this solution right now, and the results are as expected - caches are warmed up, no impact on the cluster performance when a brand new core is created.
You've added something here that you didn't mention before -- the fact that there is replication involved.
Replication adds a wrinkle, and because of that, I think that replication is the place to think about this. Putting on my coder hat... it might be very difficult to determine when a replicated index should run firstSearcher queries instead of newSearcher.
And then there is probably another elephant in the room that you didn't mention before -- SolrCloud.
I think REQUESTRECOVERY is only a valid operation when running in SolrCloud mode. And if you are running SolrCloud, you should NOT be using the CoreAdmin API for ANYTHING. Everything you do with indexes in SolrCloud should be done with the Collections API. Using the CoreAdmin API when in cloud mode is a recipe for disaster. It WILL cause problems. We've tried to help people who have screwed up their SolrCloud collections by trying to do operations with CoreAdmin. It does not go well.
If you are adding additional replicas, use the Collections API to do that, and then just reload the collection after things stabilize. Behind the scenes, that will reload all the cores that make up the collection, which will run all the firstSearcher queries on EVERY replica.
Thanks, Shawn
