On 11/8/21 2:05 PM, Nick Vladiceanu wrote:
Ok, makes sense. However, when the core is initially created, the data is not
yet there. Running the firstSearcher queries against empty index won’t have any
beneficial effects when it comes to cache warming. Is there any way to open the
first searcher after the data is pulled from the leader, and therefore, run the
warmup queries? What’s the point of opening the first searcher when initially
the core is created, if there is no data?
For the purposes of things like warming queries, the searcher isn't
aware that the index is empty when it starts. It just knows when it is
the first searcher, and when that is the case, it runs any configured
firstSearcher queries. Making it aware of something like that for the
purposes of avoiding such queries is possible, but that would add a lot
of complexity. Bugs are more likely as the code gets more complex. And
I would strongly argue that any benefits of added complexity in such an
important piece of code do not outweigh the risks.
If this really concerns you, just have your indexing software reload the
core after the index is built, so firstSearcher queries are executed
again. If the list of firstSearcher queries takes a long time to run,
just set useColdSearcher to true, and the searcher will be made ready
for queries before the warming queries are executed. I don't remember
where in solrconfig.xml that config is.
What I will generally recommend that people do is define a set of
queries in firstSearcher that will do initial warming on a completely
cold index, set useColdSearcher to true, and mostly rely on cache
autowarming after that. If cache autowarming doesn't do a good job,
then there are some possible remedies:
1) Add more memory so the OS can cache the index better.
2) Change the cache autowarming config.
3) Define some queries in newSearcher.
newSearcher is usually a smaller list than firstSearcher.
A lot of performance issues are cured by adding more memory so the OS
can cache the index better. Good index caching is critical to getting
good performance out of any Lucene based software, which includes Solr.
Note that I am not talking about heap size -- I am talking about memory
that is not allocated to any program.
When building the list of firstSearcher queries, the idea is to begin
the process of populating the OS disk cache and Solr's caches, not to
run every possible query variation users are likely to create. If you
end up with more than a handful of queries in that list, it's probably
too long.
Thanks,
Shawn