> On 8. Nov 2021, at 11:45 PM, Shawn Heisey <[email protected]> wrote: > > On 11/8/21 2:05 PM, Nick Vladiceanu wrote: >> Ok, makes sense. However, when the core is initially created, the data is >> not yet there. Running the firstSearcher queries against empty index won’t >> have any beneficial effects when it comes to cache warming. Is there any way >> to open the first searcher after the data is pulled from the leader, and >> therefore, run the warmup queries? What’s the point of opening the first >> searcher when initially the core is created, if there is no data? > > For the purposes of things like warming queries, the searcher isn't aware > that the index is empty when it starts. It just knows when it is the first > searcher, and when that is the case, it runs any configured firstSearcher > queries. Making it aware of something like that for the purposes of avoiding > such queries is possible, but that would add a lot of complexity. Bugs are > more likely as the code gets more complex. And I would strongly argue that > any benefits of added complexity in such an important piece of code do not > outweigh the risks. > > If this really concerns you, just have your indexing software reload the core > after the index is built, so firstSearcher queries are executed again. If > the list of firstSearcher queries takes a long time to run, just set > useColdSearcher to true, and the searcher will be made ready for queries > before the warming queries are executed. I don't remember where in > solrconfig.xml that config is. > > What I will generally recommend that people do is define a set of queries in > firstSearcher that will do initial warming on a completely cold index, set > useColdSearcher to true, and mostly rely on cache autowarming after that. If > cache autowarming doesn't do a good job, then there are some possible > remedies: > > 1) Add more memory so the OS can cache the index better. > 2) Change the cache autowarming config. > 3) Define some queries in newSearcher. > newSearcher is usually a smaller list than firstSearcher. > > A lot of performance issues are cured by adding more memory so the OS can > cache the index better. Good index caching is critical to getting good > performance out of any Lucene based software, which includes Solr. Note that > I am not talking about heap size -- I am talking about memory that is not > allocated to any program. > > When building the list of firstSearcher queries, the idea is to begin the > process of populating the OS disk cache and Solr's caches, not to run every > possible query variation users are likely to create. If you end up with more > than a handful of queries in that list, it's probably too long. > > Thanks, > Shawn > >
Thanks Shawn for the comprehensive explanation. We definitely should keep the things simple, however, I’m still doubting that the initial purpose of the “firstSearcher” queries when a brand new core is created is not satisfied: yes, it works when you RELOAD a core, or restart Solr, but it's definitely not working as intended when creating a brand new core, and I think this has to be addressed. A solution to that would be to enforce IndexFetcher run after the core is created, and before the first searcher is opened. Just an change in the order (perhaps a bit more, however, not radical changes in the logic I assume). Btw, an workaround way to make the “firstSearcher” queries run is to issue the REQUESTRECOVERY action immediately after the brand new core is created: https://solr.apache.org/guide/8_10/coreadmin-api.html#requestrecovery-parameters <https://solr.apache.org/guide/8_10/coreadmin-api.html#requestrecovery-parameters> . This will force the core to fetch the index from the leader “immediately”, and rerun the firstSearcher queries afterwards, which will run against an index with data. We’re testing this solution right now, and the results are as expected - caches are warmed up, no impact on the cluster performance when a brand new core is created.
